What should you do?

You are operating a Google Kubernetes Engine (GKE) cluster for your company where different teams can run non-production workloads. Your Machine Learning (ML) team needs access to Nvidia Tesla P100 GPUs to train their models. You want to minimize effort and cost.

What should you do?
A . Ask your ML team to add the “accelerator: gpu” annotation to their pod specification.
B . Recreate all the nodes of the GKE cluster to enable GPUs on all of them.
C . Create your own Kubernetes cluster on top of Compute Engine with nodes that have GPUs.
Dedicate this cluster to your ML team.
D . Add a new, GPU-enabled, node pool to the GKE cluster. Ask your ML team to add the cloud.google.com/gke -accelerator: nvidia-tesla-p100 nodeSelector to their pod specification.

Answer: D

Explanation:

This is the most optimal solution. Rather than recreating all nodes, you create a new node pool with GPU enabled. You then modify the pod specification to target particular GPU types by adding node selector to your workloads Pod specification. YOu still have a single cluster so you pay Kubernetes cluster management fee for just one cluster thus minimizing the cost.

Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus

Ref: https://cloud.google.com/kubernetes-engine/pricing

Example:

apiVersion: v1

kind: Pod

metadata:

name: my-gpu-pod

spec:

containers:

name: my-gpu-container

image: nvidia/cuda:10.0-runtime-ubuntu18.04

command: [/bin/bash]

resources:

limits:

nvidia.com/gpu: 2

nodeSelector:

cloud.google.com/gke-accelerator: nvidia-tesla-k80 # or nvidia-tesla-p100 or nvidia-tesla-p4 or nvidia-tesla-v100 or nvidia-tesla-t4