What should you do?

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance.

What should you do?
A . Increase the instance memory to 512 GB and increase the batch size.
B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
C. Enable early stopping in your Vertex AI Training job.
D. Use the tf.distribute.Strategy API and run a distributed training job.

Answer: C

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments