Google Professional Machine Learning Engineer Google Professional Machine Learning Engineer Online Training

exams

3 years ago

Question #1

As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention.

What should you do?

A . Use the batch prediction functionality of Al Platform
B . Create a serving pipeline in Compute Engine for prediction
C . Use Cloud Functions for prediction each time a new data point is ingested
D . Deploy the model on Al Platform and create a version of it for online inference.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://cloud.google.com/ai-platform/prediction/docs/batch-predict

Question #2

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data. Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production.

What is the most streamlined and reliable way to perform this validation?

A . Use the TFX Model Validator tools to specify performance metrics for production readiness
B . Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.
C . Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data
D . Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://www.tensorflow.org/tfx/guide/evaluator

Question #3

You work on a growing team of more than 50 data scientists who all use Al Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way.

Which strategy should you choose?

A . Set up restrictive I AM permissions on the Al Platform notebooks so that only a single user or group can access a given instance.
B . Separate each data scientist’s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.
C . Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources
D . Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about Al Platform resource usage In BigQuery create a SQL view that maps users to the resources they are using.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels

You can add labels to your AI Platform Prediction jobs, models, and model versions, then use those

labels to organize resources into categories when viewing or monitoring the resources. For example, you can label jobs by team (such as engineering or research) and development phase (prod or test), then filter the jobs based on the team and phase. Labels are also available on operations, but these labels are derived from the resource to which the operation applies. You cannot add or update labels on an operation.

https://cloud.google.com/ai-platform/prediction/docs/sharing-models.

Question #4

During batch training of a neural network, you notice that there is an oscillation in the loss.

How should you adjust your model to ensure that it converges?

A . Increase the size of the training batch
B . Decrease the size of the training batch
C . Increase the learning rate hyperparameter
D . Decrease the learning rate hyperparameter

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/playground-exercises

Question #5

You are building a linear model with over 100 input features, all with values between -1 and 1. You suspect that many features are non-informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form.

Which technique should you use?

A . Use Principal Component Analysis to eliminate the least informative features.
B . Use L1 regularization to reduce the coefficients of uninformative features to 0.
C . After building your model, use Shapley values to determine which features are the most informative.
D . Use an iterative dropout technique to identify which features do not degrade the model when removed.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview#sampled-shapley

Question #6

Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your platforms. You analyzed the requirements and decided to use TensorFlow to build the classifier so that you have full control of the model’s code, serving, and deployment. You will use Kubeflow pipelines for the ML platform. To save time, you want to build on existing resources and use managed services instead of building a completely new model.

How should you build the classifier?

A . Use the Natural Language API to classify support requests
B . Use AutoML Natural Language to build the support requests classifier
C . Use an established text classification model on Al Platform to perform transfer learning
D . Use an established text classification model on Al Platform as-is to classify support requests

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

the model cannot work as-is as the classes to predict will likely not be the same; we need to use transfer learning to retrain the last layer and adapt it to the classes we need

Question #7

Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written.

You have a large training dataset that is structured like this:

You followed the standard 80%-10%-10% data distribution across the training, testing, and evaluation subsets.

How should you distribute the training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?

A)

B)

C)

D)

A . Option A
B . Option B
C . Option C
D . Option D

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

If we just put inside the Training set , Validation set and Test set , randomly Text, Paragraph or sentences the model will have the ability to learn specific qualities about The Author’s use of language beyond just his own articles. Therefore the model will mixed up different opinions. Rather if we divided things up a the author level, so that given authors were only on the training data, or only in the test data or only in the validation data. The model will find more difficult to get a high accuracy on the test validation (What is correct and have more sense!). Because it will need to really focus in author by author articles rather than get a single political affiliation based on a bunch of mixed articles from different authors. https://developers.google.com/machine-learning/crash-course/18th-century-literature

For example, suppose you are training a model with purchase data from a number of stores. You know, however, that the model will be used primarily to make predictions for stores that are not in the training data. To ensure that the model can generalize to unseen stores, you should segregate your data sets by stores. In other words, your test set should include only stores different from the evaluation set, and the evaluation set should include only stores different from the training set. https://cloud.google.com/automl-tables/docs/prepare#ml-use

Question #8

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time.

What should they use to track and report their experiments while minimizing manual effort?

A . Use Kubeflow Pipelines to execute the experiments Export the metrics file, and query the results using the Kubeflow Pipelines API.
B . Use Al Platform Training to execute the experiments Write the accuracy metrics to BigQuery, and query the results using the BigQueryAPI.
C . Use Al Platform Training to execute the experiments Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
D . Use Al Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://codelabs.developers.google.com/codelabs/cloud-kubeflow-pipelines-gis Kubeflow Pipelines (KFP) helps solve these issues by providing a way to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility. Cloud AI Pipelines makes it easy to set up a KFP installation.

https://www.kubeflow.org/docs/components/pipelines/introduction/#what-is-kubeflow-pipelines

"Kubeflow Pipelines supports the export of scalar metrics. You can write a list of metrics to a local file to describe the performance of the model. The pipeline agent uploads the local file as your run-time metrics. You can view the uploaded metrics as a visualization in the Runs page for a particular experiment in the Kubeflow Pipelines UI."

https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/

Question #9

You are an ML engineer at a bank that has a mobile application. Management has asked you to build an ML-based biometric authentication for the app that verifies a customer’s identity based on their fingerprint. Fingerprints are considered highly sensitive personal information and cannot be downloaded and stored into the bank databases.

Which learning strategy should you recommend to train and deploy this ML model?

A . Differential privacy
B . Federated learning
C . MD5 to encrypt data
D . Data Loss Prevention API

Reveal Solution Hide Solution

Correct Answer: B

Question #10

You are building a linear regression model on BigQuery ML to predict a customer’s likelihood of purchasing your company’s products. Your model uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables.

What should you do?

A . Create a new view with BigQuery that does not include a column with city information
B . Use Dataprep to transform the state column using a one-hot encoding method, and make each city a column with binary values.
C . Use Cloud Data Fusion to assign each city to a region labeled as 1, 2, 3, 4, or 5r and then use that number to represent the city in the model.
D . Use TensorFlow to create a categorical variable with a vocabulary list Create the vocabulary file, and upload it as part of your model to BigQuery ML.

Reveal Solution Hide Solution

Correct Answer: C

Question #11

You work for a toy manufacturer that has been experiencing a large increase in demand. You need to build an ML model to reduce the amount of time spent by quality control inspectors checking for product defects. Faster defect detection is a priority. The factory does not have reliable Wi-Fi. Your company wants to implement the new ML model as soon as possible.

Which model should you use?

A . AutoML Vision model
B . AutoML Vision Edge mobile-versatile-1 model
C . AutoML Vision Edge mobile-low-latency-1 model
D . AutoML Vision Edge mobile-high-accuracy-1 model

Reveal Solution Hide Solution

Correct Answer: A

Question #12

You are going to train a DNN regression model with Keras APIs using this code:

How many trainable weights does your model have? (The arithmetic below is correct.)

A . 501*256+257*128+2 = 161154
B . 500*256+256*128+128*2 = 161024
C . 501*256+257*128+128*2=161408
D . 500*256*0 25+256*128*0 25+128*2 = 40448

Reveal Solution Hide Solution

Correct Answer: C

Question #13

You recently joined a machine learning team that will soon release a new project. As a lead on the project, you are asked to determine the production readiness of the ML components. The team has already tested features and data, model development, and infrastructure.

Which additional readiness check should you recommend to the team?

A . Ensure that training is reproducible
B . Ensure that all hyperparameters are tuned
C . Ensure that model performance is monitored
D . Ensure that feature expectations are captured in the schema

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

This is an important step in ensuring that the model has been developed and trained properly before it is put into production.

Model performance monitoring is also a crucial step to ensure that the model is working as expected after it is released, and to identify areas where further refinement may be necessary.

This would help to ensure that the model is performing well in production, and would also help to identify any issues that may arise over time.

Additionally, this would allow the team to better understand what changes need to be made in order to help the model perform optimally in production.

Question #14

You recently designed and built a custom neural network that uses critical dependencies specific to your organization’s framework. You need to train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by Al Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the scheduler, workers, and servers distribution structure.

What should you do?

A . Use a built-in model available on Al Platform Training
B . Build your custom container to run jobs on Al Platform Training
C . Build your custom containers to run distributed training jobs on Al Platform Training
D . Reconfigure your code to a ML framework with dependencies that are supported by Al Platform Training

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

"ML framework and related dependencies are not supported by Al Platform Training" use custom containers "your model and your data are too large to fit in memory on a single machine " use distributed learning techniques

Question #15

You are an ML engineer in the contact center of a large enterprise. You need to build a sentiment analysis tool that predicts customer sentiment from recorded phone conversations. You need to identify the best approach to building a model while ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results.

What should you do?

A . Extract sentiment directly from the voice recordings
B . Convert the speech to text and build a model based on the words
C . Convert the speech to text and extract sentiments based on the sentences
D . Convert the speech to text and extract sentiment using syntactical analysis

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

To ensure that gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results, it is important to focus on the meaning and context of the conversation, rather than the characteristics of the speaker.

Converting the speech to text and then using syntactical analysis to extract sentiment will allow you to focus on the meaning and context of the conversation, rather than characteristics of the speaker. This approach will also give you more data to work with, as you can analyze the entire conversation, rather than just the voice recordings.

Reference: https://cloud.google.com/natural-language/docs/sentiment-tutorial

Question #16

You work for an advertising company and want to understand the effectiveness of your company’s latest advertising campaign. You have streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas dataframe in an Al Platform notebook.

What should you do?

A . Use Al Platform Notebooks’ BigQuery cell magic to query the data, and ingest the results as a pandas dataframe
B . Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance
C . Download your table from BigQuery as a local CSV file, and upload it to your Al Platform notebook instance Use pandas. read_csv to ingest the file as a pandas dataframe
D . From a bash cell in your Al Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use gsutii cp to copy the data into the notebook Use pandas. read_csv to ingest the file as a pandas dataframe

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Refer to this link for details: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas

First 2 points talks about querying the data.

Download query results to a pandas DataFrame by using the BigQuery Storage API from the IPython magics for BigQuery in a Jupyter notebook.

Download query results to a pandas DataFrame by using the BigQuery client library for Python. Download BigQuery table data to a pandas DataFrame by using the BigQuery client library for Python.

Download BigQuery table data to a pandas DataFrame by using the BigQuery Storage API client library for Python.

https://googleapis.dev/python/bigquery/latest/magics.html#ipython-magics-for-bigquery

https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas

Question #17

You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on Al Platform for high-throughput online prediction.

Which architecture should you use?

A . • Validate the accuracy of the model that you trained on preprocessed data
• Create a new model that uses the raw data and is available in real time
• Deploy the new model onto Al Platform for online prediction
B . • Send incoming prediction requests to a Pub/Sub topic
• Transform the incoming data using a Dataflow job
• Submit a prediction request to Al Platform using the transformed data
• Write the predictions to an outbound Pub/Sub queue
C . • Stream incoming prediction request data into Cloud Spanner
• Create a view to abstract your preprocessing logic.
• Query the view every second for new records
• Submit a prediction request to Al Platform using the transformed data
• Write the predictions to an outbound Pub/Sub queue.
D . • Send incoming prediction requests to a Pub/Sub topic
• Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic.
• Implement your preprocessing logic in the Cloud Function
• Submit a prediction request to Al Platform using the transformed data
• Write the predictions to an outbound Pub/Sub queue

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing

Question #18

You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model’s accuracy dropped to 66%.

How can you make your production model more accurate?

A . Normalize the data for the training, and test datasets as two separate steps.
B . Split the training and test data based on time rather than a random split to avoid leakage
C . Add more data to your test set to ensure that you have a fair distribution and sample for testing
D . Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://community.rapidminer.com/discussion/32592/normalising-data-before-data-split-or-after

Question #19

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention.

What should you do?

A . Normalize the data using Google Kubernetes Engine
B . Translate the normalization algorithm into SQL for use with BigQuery
C . Use the normalizer_fn argument in TensorFlow’s Feature Column API
D . Normalize the data with Apache Spark using the Dataproc connector for BigQuery

Reveal Solution Hide Solution

Correct Answer: B

Question #20

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge.

How should you resolve the class imbalance problem?

A . Use the class distribution to generate 10% positive examples
B . Use a convolutional neural network with max pooling and softmax activation
C . Downsample the data with upweighting to create a sample with 10% positive examples
D . Remove negative examples until the numbers of positive and negative examples are equal

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting

https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data

Question #21

You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model

architectures, store training data, and be able to compare the evaluation metrics in the same dashboard.

What should you do?

A . Create multiple models using AutoML Tables
B . Automate multiple training runs using Cloud Composer
C . Run multiple training jobs on Al Platform with similar job names
D . Create an experiment in Kubeflow Pipelines to organize multiple runs

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://www.kubeflow.org/docs/components/pipelines/concepts/experiment/

https://www.kubeflow.org/docs/components/pipelines/concepts/run/

Question #22

Your team needs to build a model that predicts whether images contain a driver’s license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver’s licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [‘driversjicense’, ‘passport’, ‘credit_card’].

Which loss function should you use?

A . Categorical hinge
B . Binary cross-entropy
C . Categorical cross-entropy
D . Sparse categorical cross-entropy

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

– **Categorical entropy** is better to use when you want to **prevent the model from giving more importance to a certain class**. Or if the **classes are very unbalanced** you will get a better result by using Categorical entropy.

– But **Sparse Categorical Entropy** is a more optimal coice if you have a huge amount of classes, enough to make a lot of memory usage, so since sparse categorical entropy uses less columns it **uses less memory**. https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

Question #23

You are an ML engineer at a global car manufacturer. You need to build an ML model to predict car sales in different cities around the world.

Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?

A . Three individual features binned latitude, binned longitude, and one-hot encoded car type
B . One feature obtained as an element-wise product between latitude, longitude, and car type
C . One feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type
D . Two feature crosses as a element-wise product the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding

https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture

https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding

Question #24

You trained a text classification model.

You have the following SignatureDefs:

What is the correct way to write the predict request?

A . data = json.dumps({"signature_name": "serving_default’ "instances": [fab’, ‘be1, ‘cd’]]})
B . data = json dumps({"signature_name": "serving_default"! "instances": [[‘a’, ‘b’, "c", ‘d’, ‘e’, ‘f’]]})
C . data = json.dumps({"signature_name": "serving_default, "instances": [[‘a’, ‘b ‘c’1, [d ‘e T]]})
D . data = json dumps({"signature_name": f,serving_default", "instances": [[‘a’, ‘b’], [c ‘d’], [‘e T]]})

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://stackoverflow.com/questions/37956197/what-is-the-negative-index-in-shape-arrays-used-for-tensorflow

Question #25

You work for a social media company. You need to detect whether posted images contain cars. Each training example is a member of exactly one class. You have trained an object detection neural network and deployed the model version to Al Platform Prediction for evaluation. Before deployment, you created an evaluation job and attached it to the Al Platform Prediction model version. You notice that the precision is lower than your business requirements allow.

How should you adjust the model’s final layer softmax threshold to increase precision?

A . Increase the recall
B . Decrease the recall.
C . Increase the number of false positives
D . Decrease the number of false negatives

Reveal Solution Hide Solution

Correct Answer: D

Question #26

You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure.

What should you do?

A . Significantly increase the max_batch_size TensorFlow Serving parameter
B . Switch to the tensorflow-model-server-universal version of TensorFlow Serving
C . Significantly increase the max_enqueued_batches TensorFlow Serving parameter
D . Recompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://www.tensorflow.org/tfx/serving/performance

Question #27

You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production model is required to keep up with market changes. Since being deployed to production, the model hasn’t changed; however the accuracy of the model has steadily deteriorated.

What issue is most likely causing the steady decline in model accuracy?

A . Poor data quality
B . Lack of model retraining
C . Too few layers in the model for capturing information
D . Incorrect data split ratio during model training, evaluation, validation, and test

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Retraining is needed as the market is changing. its how the Model keep updated and predictions accuracy.

Question #28

You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model. Your models features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory data on a daily basis.

Which algorithms should you use to build the model?

A . Classification
B . Reinforcement Learning
C . Recurrent Neural Networks (RNN)
D . Convolutional Neural Networks (CNN)

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

"algorithm to learn from new inventory data on a daily basis" = time series model, best option to deal with time series is forsure RNN

https://builtin.com/data-science/recurrent-neural-networks-and-lstm

Question #29

You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute Engine.

You use the following parameters:

• Optimizer: SGD

• Image shape = 224×224

• Batch size = 64

• Epochs = 10

• Verbose = 2

During training you encounter the following error: ResourceExhaustedError: out of Memory (oom) when allocating tensor.

What should you do?

A . Change the optimizer
B . Reduce the batch size
C . Change the learning rate
D . Reduce the image shape

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reference:

https://github.com/tensorflow/tensorflow/issues/136

https://stackoverflow.com/questions/59394947/how-to-fix-resourceexhaustederror-oom-when-allocating-tensor/59395251#:~:text=OOM%20stands%20for%20%22out%20of,in%20your%20Dense%20%2C%20Conv2D%20layers

Question #30

You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory.

How should you create a dataset following Google-recommended best practices?

A . Create a tf.data.Dataset.prefetch transformation
B . Convert the images to tf .Tensor Objects, and then run Dataset. from_tensor_slices{).
C . Convert the images to tf .Tensor Objects, and then run tf. data. Dataset. from_tensors ().
D . Convert the images Into TFRecords, store the images in Cloud Storage, and then use the tf. data API to read the images for training

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Cite from Google Pag: to construct a Dataset from data in memory, use

tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices(). When input data is stored in a file (not in memory), the recommended TFRecord format, you can use tf.data.TFRecordDataset(). tf.data.Dataset is for data in memory. tf.data.TFRecordDataset is for data in non-memory storage. https://cloud.google.com/architecture/ml-on-gcp-best-practices#store-image-video-audio-and-unstructured-data-on-cloud-storage

" Store image, video, audio and unstructured data on Cloud Storage Store these data in large container formats on Cloud Storage. This applies to sharded TFRecord files if you’re using TensorFlow, or Avro files if you’re using any other framework. Combine many individual images, videos, or audio clips into large files, as this will improve your read and write throughput to Cloud Storage. Aim for files of at least 100mb, and between 100 and 10,000 shards. To enable data management, use Cloud Storage buckets and directories to group the shards. "

Question #31

You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests.

You want to store the results for analytics and visualization.

How should you configure the pipeline?

A . 1 = Dataflow, 2 – Al Platform, 3 = BigQuery
B . 1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable
C . 1 = BigQuery, 2 = AutoML, 3 = Cloud Functions
D . 1 = BigQuery, 2 = Al Platform, 3 = Cloud Storage

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Reference: https://cloud.google.com/solutions/building-anomaly-detection-dataflow-bigqueryml-dlp

Question #32

You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using Al Platform, and then using the best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the downstream processes. You want to speed up the tuning job without significantly compromising its effectiveness.

Which actions should you take? Choose 2 answers

A . Decrease the number of parallel trials
B . Decrease the range of floating-point values
C . Set the early stopping parameter to TRUE
D . Change the search algorithm from Bayesian search to random search.
E . Decrease the maximum number of trials during subsequent training phases.

Reveal Solution Hide Solution

Correct Answer: CE
CE

Explanation:

Reference:

https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview

https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning#early-stopping

Question #33

You have written unit tests for a Kubeflow Pipeline that require custom libraries. You want to automate the execution of unit tests with each new push to your development branch in Cloud Source Repositories.

What should you do?

A . Write a script that sequentially performs the push to your development branch and executes the unit tests on Cloud Run
B . Using Cloud Build, set an automated trigger to execute the unit tests when changes are pushed to your development branch.
C . Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories Configure a Pub/Sub trigger for Cloud Run, and execute the unit tests on Cloud Run.
D . Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Execute the unit tests using a Cloud Function that is triggered when messages are sent to the Pub/Sub topic

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#cicd_architecture

Question #34

You have trained a deep neural network model on Google Cloud. The model has low loss on the training data, but is performing worse on the validation data. You want the model to be resilient to overfitting.

Which strategy should you use when retraining the model?

A . Apply a dropout parameter of 0 2, and decrease the learning rate by a factor of 10
B . Apply a L2 regularization parameter of 0.4, and decrease the learning rate by a factor of 10.
C . Run a hyperparameter tuning job on Al Platform to optimize for the L2 regularization and dropout parameters
D . Run a hyperparameter tuning job on Al Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Applying a L2 regularization parameter of 0.4 and decreasing the learning rate by a factor of 10 can help to reduce overfitting and make the model more resilient. Source: Google Cloud

Question #35

You are training a Resnet model on Al Platform using TPUs to visually categorize types of defects in automobile engines. You capture the training profile using the Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bottleneck and speed up your model training process.

Which modifications should you make to the tf .data dataset? Choose 2 answers

A . Use the interleave option for reading data
B . Reduce the value of the repeat parameter
C . Increase the buffer size for the shuffle option.
D . Set the prefetch option equal to the training batch size
E . Decrease the batch size argument in your transformation

Reveal Solution Hide Solution

Correct Answer: DE
DE

Explanation:

https://towardsdatascience.com/overcoming-data-preprocessing-bottlenecks-with-tensorflow-data-service-nvidia-dali-and-other-d6321917f851

Question #36

You work for a public transportation company and need to build a model to estimate delay times for

multiple transportation routes. Predictions are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices.

How should you configure the end-to-end architecture of the predictive model?

A . Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.
B . Use a model trained and deployed on BigQuery ML and trigger retraining with the scheduled query feature in BigQuery
C . Write a Cloud Functions script that launches a training and deploying job on Ai Platform that is triggered by Cloud Scheduler
D . Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

(https://www.kubeflow.org/docs/components/pipelines/overview/pipelines-overview/ https://medium.com/google-cloud/how-to-build-an-end-to-end-propensity-to-purchase-solution-using-bigquery-ml-and-kubeflow-pipelines-cd4161f734d9#75c7

Question #37

You are an ML engineer at a global shoe store. You manage the ML models for the company’s website. You are asked to build a model that will recommend new products to the user based on their purchase behavior and similarity with other users.

What should you do?

A . Build a classification model
B . Build a knowledge-based filtering model
C . Build a collaborative-based filtering model
D . Build a regression model using the features as predictors

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Reference:

https://cloud.google.com/solutions/recommendations-using-machine-learning-on-compute-engine

https://developers.google.com/machine-learning/recommendation/collaborative/basics

https://cloud.google.com/architecture/recommendations-using-machine-learning-on-compute-engine#filtering_the_data

Question #38

You are training an LSTM-based model on Al Platform to summarize text using the following job submission script:

You want to ensure that training time is minimized without significantly compromising the accuracy of your model.

What should you do?

A . Modify the ‘epochs’ parameter
B . Modify the ‘scale-tier’ parameter
C . Modify the batch size’ parameter
D . Modify the ‘learning rate’ parameter

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://cloud.google.com/ai-platform/training/docs/machine-types#scale_tiers

Google may optimize the configuration of the scale tiers for different jobs over time, based on customer feedback and the availability of cloud resources. Each scale tier is defined in terms of its suitability for certain types of jobs. Generally, the more advanced the tier, the more machines are allocated to the cluster, and the more powerful the specifications of each virtual machine. As you increase the complexity of the scale tier, the hourly cost of training jobs, measured in training units, also increases. See the pricing page to calculate the cost of your job.

Question #39

You are training a TensorFlow model on a structured data set with 100 billion records stored in several CSV files. You need to improve the input/output execution performance.

What should you do?

A . Load the data into BigQuery and read the data from BigQuery.
B . Load the data into Cloud Bigtable, and read the data from Bigtable
C . Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage
D . Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS)

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Reference: https://cloud.google.com/dataflow/docs/guides/templates/provided-batch

Question #40

You have deployed multiple versions of an image classification model on Al Platform. You want to monitor the performance of the model versions overtime.

How should you perform this comparison?

A . Compare the loss performance for each model on a held-out dataset.
B . Compare the loss performance for each model on the validation data
C . Compare the receiver operating characteristic (ROC) curve for each model using the What-lf Tool
D . Compare the mean average precision across the models using the Continuous Evaluation feature

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://cloud.google.com/ai-platform/prediction/docs/continuous-evaluation/view-metrics

Question #41

Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a change in the distribution of the input data.

How should you address the input differences in production?

A . Create alerts to monitor for skew, and retrain the model.
B . Perform feature selection on the model, and retrain the model with fewer features
C . Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service
D . Perform feature selection on the model, and retrain the model on a monthly basis with fewer features

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Data drift doesn’t necessarily require feature reselection (e.g. by L2 regularization). https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#challenges

Data values skews: These skews are significant changes in the statistical properties of data, which means that data patterns are changing, and you need to trigger a retraining of the model to capture these changes. https://developers.google.com/machine-learning/guides/rules-of-ml/#rule_37_measure_trainingserving_skew

Question #42

You manage a team of data scientists who use a cloud-based backend system to submit training jobs.

This system has become very difficult to administer, and you want to use a managed service instead.

The data scientists you work with use many different frameworks, including Keras, PyTorch, theano.

Scikit-team, and custom libraries.

What should you do?

A . Use the Al Platform custom containers feature to receive training jobs using any framework
B . Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TFJob
C . Create a library of VM images on Compute Engine; and publish these images on a centralized repository
D . Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

because AI platform supported all the frameworks mentioned. And Kubeflow is not managed service in GCP. https://cloud.google.com/ai-platform/training/docs/getting-started-pytorch

https://cloud.google.com/ai-platform/training/docs/containers-overview#advantages_of_custom_containers

Use the ML framework of your choice. If you can’t find an AI Platform Training runtime version that supports the ML framework you want to use, then you can build a custom container that installs your chosen framework and use it to run jobs on AI Platform Training.

Question #43

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible.

What should you do?

A . Use the BigQuery console to execute your query and then save the query results Into a new BigQuery table.
B . Write a Python script that uses the BigQuery API to execute queries against BigQuery Execute this script as the first step in your Kubeflow pipeline
C . Use the Kubeflow Pipelines domain-specific language to create a custom component that uses the Python BigQuery client library to execute queries
D . Locate the Kubeflow Pipelines repository on GitHub Find the BigQuery Query Component, copy that component’s URL, and use it to load the component into your pipeline. Use the component to execute queries against BigQuery

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://linuxtut.com/en/f4771efee37658c083cc/

https://github.com/kubeflow/pipelines/blob/master/components/gcp/bigquery/query/sample.ipynb

; https://v0-5.kubeflow.org/docs/pipelines/reusable-components/

Question #44

You are developing ML models with Al Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code.

What should you do?

A . Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job
B . Use the gcloud command-line tool to submit training jobs on Al Platform when you update your code
C . Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository
D . Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

CI/CD for Kubeflow pipelines. At the heart of this architecture is Cloud Build, infrastructure. Cloud Build can import source from Cloud Source Repositories, GitHub, or Bitbucket, and then execute a build to your specifications, and produce artifacts such as Docker containers or Python tar files. https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#cicd_architecture

Question #45

Your organization’s call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally Identifiable Information (Pll) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics.

How should the data pipeline be designed?

A . 1 = Dataflow, 2 = BigQuery
B . 1 = Pub/Sub, 2 = Datastore
C . 1 = Dataflow, 2 = Cloud SQL
D . 1 = Cloud Function, 2 = Cloud SQL

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://github.com/GoogleCloudPlatform/dataflow-contact-center-speech-analysis

Question #46

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company’s product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use Al Platform’s continuous evaluation service to ensure that the models have high accuracy on your test data set.

What should you do?

A . Keep the original test dataset unchanged even if newer products are incorporated into retraining
B . Extend your test dataset with images of the newer products when they are introduced to retraining
C . Replace your test dataset with images of the newer products when they are introduced to retraining.
D . Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.

Reveal Solution Hide Solution

Correct Answer: B

Question #47

You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process.

Which service should you use?

A . Dataflow
B . Dataprep
C . Apache Flink
D . Cloud Data Fusion

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://cloud.google.com/data-fusion/docs/concepts/overview#using_the_code-free_web_ui

Question #48

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage.

How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

A . Use Data Fusion’s GUI to build the transformation pipelines, and then write the data into BigQuery
B . Convert your PySpark into SparkSQL queries to transform the data and then run your pipeline on Dataproc to write the data into BigQuery.
C . Ingest your data into Cloud SQL convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning
D . Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Google has bought this software and support for this tool is not good. SQL can work in Cloud fusion pipelines too but I would prefer to use a single tool like Bigquery to both transform and store data.

Question #49

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (Pll) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files.

How should you ensure that the Pll is not accessible by unauthorized individuals?

A . Stream all files to Google CloudT and then write the data to BigQuery Periodically conduct a bulk scan of the table using the DLP API.
B . Stream all files to Google Cloud, and write batches of the data to BigQuery While the data is being written to BigQuery conduct a bulk scan of the data using the DLP API.
C . Create two buckets of data Sensitive and Non-sensitive Write all data to the Non-sensitive bucket Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket
D . Create three buckets of data: Quarantine, Sensitive, and Non-sensitive Write all data to the Quarantine bucket.
E . Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket

Reveal Solution Hide Solution

Correct Answer: A

Question #50

You are designing an ML recommendation model for shoppers on your company’s ecommerce website. You will use Recommendations Al to build, test, and deploy your system.

How should you develop recommendations that increase revenue while following best practices?

A . Use the "Other Products You May Like" recommendation type to increase the click-through rate
B . Use the "Frequently Bought Together’ recommendation type to increase the shopping cart size for each order.
C . Import your user events and then your product catalog to make sure you have the highest quality event stream
D . Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Frequently bought together’ recommendations aim to up-sell and cross-sell customers by providing product.

Reference: https://rejoiner.com/resources/amazon-recommendations-secret-selling-online/

Question #51

You are designing an architecture with a serverless ML system to enrich customer support tickets with informative metadata before they are routed to a support agent. You need a set of models to predict ticket priority, predict ticket resolution time, and perform sentiment analysis to help agents make strategic decisions when they process support requests. Tickets are not expected to have any domain-specific terms or jargon.

The proposed architecture has the following flow:

Which endpoints should the Enrichment Cloud Functions call?

A . 1 = Vertex Al. 2 = Vertex Al. 3 = AutoML Natural Language
B . 1 = Vertex Al. 2 = Vertex Al. 3 = Cloud Natural Language API
C . 1 = Vertex Al. 2 = Vertex Al. 3 = AutoML Vision
D . 1 = Cloud Natural Language API. 2 = Vertex Al, 3 = Cloud Vision API

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://cloud.google.com/architecture/architecture-of-a-serverless-ml-model#architecture

The architecture has the following flow:

A user writes a ticket to Firebase, which triggers a Cloud Function.

-The Cloud Function calls 3 different endpoints to enrich the ticket:

-An AI Platform endpoint, where the function can predict the priority.

-An AI Platform endpoint, where the function can predict the resolution time.

-The Natural Language API to do sentiment analysis and word salience.

-For each reply, the Cloud Function updates the Firebase real-time database.

-The Cloud Function then creates a ticket into the helpdesk platform using the RESTful API.

Question #52

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE).

How should you architect this workflow?

A . Configure your pipeline with Dataflow, which saves the files in Cloud Storage After the file is saved, start the training job on a GKE cluster
B . Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files As soon as a file arrives, initiate the training job
C . Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster
D . Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job. check the timestamp of objects in your Cloud Storage bucket If there are no new files since the last run, abort the job.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build#triggering-and-scheduling-kubeflow-pipelines

Question #53

You are developing models to classify customer support emails. You created models with TensorFlow Estimators using small datasets on your on-premises system, but you now need to train the models using large datasets to ensure high performance. You will port your models to Google Cloud and want to minimize code refactoring and infrastructure overhead for easier migration from on-prem to cloud.

What should you do?

A . Use Vertex Al Platform for distributed training
B . Create a cluster on Dataproc for training
C . Create a Managed Instance Group with autoscaling
D . Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

AI platform also contains kubeflow pipelines. you don’t need to set up infrastructure to use it. For D you need to set up a kubernetes cluster engine. The question asks us to minimize infrastructure overheard.

Question #54

You work for a large technology company that wants to modernize their contact center. You have

been asked to develop a solution to classify incoming calls by product so that requests can be more quickly routed to the correct support team. You have already transcribed the calls using the Speech-to-Text API. You want to minimize data preprocessing and development time.

How should you build the model?

A . Use the Al Platform Training built-in algorithms to create a custom model
B . Use AutoML Natural Language to extract custom entities for classification
C . Use the Cloud Natural Language API to extract custom entities for classification
D . Build a custom model to identify the product keywords from the transcribed calls, and then run the keywords through a classification algorithm

Reveal Solution Hide Solution

Correct Answer: B

Question #55

You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers.

What factors should you consider before building the model?

A . Redaction, reproducibility, and explainability
B . Traceability, reproducibility, and explainability
C . Federated learning, reproducibility, and explainability
D . Differential privacy federated learning, and explainability

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://www.oecd.org/finance/Impact-Big-Data-AI-in-the-Insurance-Sector.pdf

https://medium.com/artefact-engineering-and-data-science/including-ethics-best-practices-in-your-data-science-project-from-day-one-c15b26c2bf99

Question #56

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 30 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns.

How should you ensure that AutoML fits the best model to your data?

A . Manually combine all columns that contain a time signal into an array Allow AutoML to interpret this array appropriately
Choose an automatic data split across the training, validation, and testing sets
B . Submit the data for training without performing any manual transformations Allow AutoML to handle the appropriate
transformations Choose an automatic data split across the training, validation, and testing sets
C . Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column Allow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets
D . Submit the data for training without performing any manual transformations Use the columns that have a time signal to manually split your data Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing set is from 30 days after your validation set

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://cloud.google.com/automl-tables/docs/data-best-practices#time

Question #56

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 30 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns.

How should you ensure that AutoML fits the best model to your data?

A . Manually combine all columns that contain a time signal into an array Allow AutoML to interpret this array appropriately
Choose an automatic data split across the training, validation, and testing sets
B . Submit the data for training without performing any manual transformations Allow AutoML to handle the appropriate
transformations Choose an automatic data split across the training, validation, and testing sets
C . Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column Allow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets
D . Submit the data for training without performing any manual transformations Use the columns that have a time signal to manually split your data Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing set is from 30 days after your validation set

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://cloud.google.com/automl-tables/docs/data-best-practices#time

Question #56

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 30 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns.

How should you ensure that AutoML fits the best model to your data?

A . Manually combine all columns that contain a time signal into an array Allow AutoML to interpret this array appropriately
Choose an automatic data split across the training, validation, and testing sets
B . Submit the data for training without performing any manual transformations Allow AutoML to handle the appropriate
transformations Choose an automatic data split across the training, validation, and testing sets
C . Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column Allow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets
D . Submit the data for training without performing any manual transformations Use the columns that have a time signal to manually split your data Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing set is from 30 days after your validation set

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://cloud.google.com/automl-tables/docs/data-best-practices#time

Question #56

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 30 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns.

How should you ensure that AutoML fits the best model to your data?

A . Manually combine all columns that contain a time signal into an array Allow AutoML to interpret this array appropriately
Choose an automatic data split across the training, validation, and testing sets
B . Submit the data for training without performing any manual transformations Allow AutoML to handle the appropriate
transformations Choose an automatic data split across the training, validation, and testing sets
C . Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column Allow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets
D . Submit the data for training without performing any manual transformations Use the columns that have a time signal to manually split your data Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing set is from 30 days after your validation set

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

https://cloud.google.com/automl-tables/docs/data-best-practices#time

Question #60

Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when your model predicts that a user’s account balance will drop below the $25 threshold

Reveal Solution Hide Solution

Correct Answer: D

Explanation:

Firebase is designed for exactly this sort of scenario. Also, it would not be possible to create millions of pubsub topics due to GCP quotas https://cloud.google.com/pubsub/quotas#quotas https://firebase.google.com/docs/cloud-messaging

Question #61

You have trained a text classification model in TensorFlow using Al Platform. You want to use the trained model for batch predictions on text data stored in BigQuery while minimizing computational overhead.

What should you do?

A . Export the model to BigQuery ML.
B . Deploy and version the model on Al Platform.
C . Use Dataflow with the SavedModel to read the data from BigQuery
D . Submit a batch prediction job on Al Platform that points to the model location in Cloud Storage.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#importing_models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#bq

CREATE OR REPLACE MODEL example_dataset.imported_tf_model OPTIONS (MODEL_TYPE=’TENSORFLOW’, MODEL_PATH=’gs://cloud-training-demos/txtclass/export/exporter/1549825580/*’)

Question #61

You have trained a text classification model in TensorFlow using Al Platform. You want to use the trained model for batch predictions on text data stored in BigQuery while minimizing computational overhead.

What should you do?

A . Export the model to BigQuery ML.
B . Deploy and version the model on Al Platform.
C . Use Dataflow with the SavedModel to read the data from BigQuery
D . Submit a batch prediction job on Al Platform that points to the model location in Cloud Storage.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#importing_models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#bq

CREATE OR REPLACE MODEL example_dataset.imported_tf_model OPTIONS (MODEL_TYPE=’TENSORFLOW’, MODEL_PATH=’gs://cloud-training-demos/txtclass/export/exporter/1549825580/*’)

Question #61

You have trained a text classification model in TensorFlow using Al Platform. You want to use the trained model for batch predictions on text data stored in BigQuery while minimizing computational overhead.

What should you do?

A . Export the model to BigQuery ML.
B . Deploy and version the model on Al Platform.
C . Use Dataflow with the SavedModel to read the data from BigQuery
D . Submit a batch prediction job on Al Platform that points to the model location in Cloud Storage.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#importing_models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#bq

CREATE OR REPLACE MODEL example_dataset.imported_tf_model OPTIONS (MODEL_TYPE=’TENSORFLOW’, MODEL_PATH=’gs://cloud-training-demos/txtclass/export/exporter/1549825580/*’)

Question #61

You have trained a text classification model in TensorFlow using Al Platform. You want to use the trained model for batch predictions on text data stored in BigQuery while minimizing computational overhead.

What should you do?

A . Export the model to BigQuery ML.
B . Deploy and version the model on Al Platform.
C . Use Dataflow with the SavedModel to read the data from BigQuery
D . Submit a batch prediction job on Al Platform that points to the model location in Cloud Storage.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#importing_models

https://cloud.google.com/bigquery-ml/docs/making-predictions-with-imported-tensorflow-models#bq

CREATE OR REPLACE MODEL example_dataset.imported_tf_model OPTIONS (MODEL_TYPE=’TENSORFLOW’, MODEL_PATH=’gs://cloud-training-demos/txtclass/export/exporter/1549825580/*’)

Question #65

Dispatch an appropriately sized shuttle and provide the map with the required stops based on the simulated outcome.

Reveal Solution Hide Solution

Correct Answer: C

Explanation:

This is a case where machine learning would be terrible, as it would not be 100% accurate and some passengers would not get picked up. A simple algorith works better here, and the question confirms customers will be indicating when they are at the stop so no ML required.

Question #66

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving.

What should you do?

A . Configure AutoML Tables to perform the classification task
B . Run a BigQuery ML task to perform logistic regression for the classification
C . Use Al Platform Notebooks to run the classification model with pandas library
D . Use Al Platform to run the classification model job configured for hyperparameter tuning

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

https://cloud.google.com/automl-tables/docs/beginners-guide

Question #67

You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate descriptions for each table in BigQuery, and you are searching for the proper BigQuery table to use for a model you are building on AI Platform.

How should you find the data that you need?

A . Use Data Catalog to search the BigQuery datasets by using keywords in the table description.
B . Tag each of your model and version resources on AI Platform with the name of the BigQuery table that was used for training.
C . Maintain a lookup table in BigQuery that maps the table descriptions to the table ID. Query the lookup table to find the correct table ID for the data that you need.
D . Execute a query in BigQuery to retrieve all the existing table names in your project using the INFORMATION_SCHEMA metadata tables that are native to BigQuery. Use the result o find the table that you need.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

A should be the way to go for large datasets –This is also good but it is legacy way of checking:-NFORMATION_SCHEMA contains these views for table metadata: TABLES and TABLE_OPTIONS for metadata about tables. COLUMNS and COLUMN_FIELD_PATHS for metadata about columns and fields. PARTITIONS for metadata about table partitions (Preview)

Question #68

You are working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You haven’t explored using any sophisticated algorithms or spent any time on hyperparameter tuning.

What should your next step be to identify and fix the problem?

A . Address the model overfitting by using a less complex algorithm.
B . Address data leakage by applying nested cross-validation during model training.
C . Address data leakage by removing features highly correlated with the target value.
D . Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9

Question #69

You work for an online travel agency that also sells advertising placements on its website to other companies.

You have been asked to predict the most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99, the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to Implement the simplest solution.

How should you configure the prediction pipeline?

A . Embed the client on the website, and then deploy the model on AI Platform Prediction.
B . Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.
C . Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud
Bigtable for writing and for reading the user’s navigation context, and then deploy the model on AI Platform Prediction.
D . Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the user’s navigation context, and then deploy the model on Google Kubernetes Engine.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://medium.com/google-cloud/secure-cloud-run-cloud-functions-and-app-engine-with-api-key-73c57bededd1

Question #70

Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction.

Which environment should you train your model on?

A . AVM on Compute Engine and 1 TPU with all dependencies installed manually.
B . AVM on Compute Engine and 8 GPUs with all dependencies installed manually.
C . A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.
D . A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://cloud.google.com/deep-learning-vm/docs/cli#creating_an_instance_with_one_or_more_gpus

https://cloud.google.com/deep-learning-vm/docs/introduction#pre-installed_packages

"speed up model training" will make us biased towards GPU,TPU options by options eliminations we may need to stay away of any manual installations , so using preconfigered deep learning will speed up time to market

Question #71

You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way.

Which strategy should you choose?

A . Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.
B . Separate each data scientist’s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.
C . Use labels to organize resources into descriptive categories. Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources.
D . Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In BigQuery, create a SQL view that maps users to the resources they are using

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://cloud.google.com/ai-platform/prediction/docs/resource-labels#overview_of_labels You can add labels to your AI Platform Prediction jobs, models, and model versions, then use those labels to organize resources into categories when viewing or monitoring the resources.

Question #72

You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize detection of fraudulent transactions while minimizing false positives.

Which optimization objective should you use when training the model?

A . An optimization objective that minimizes Log loss
B . An optimization objective that maximizes the Precision at a Recall value of 0.50
C . An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value
D . An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://stats.stackexchange.com/questions/262616/roc-vs-precision-recall-curves-on-imbalanced-dataset

https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc

https://icaiit.org/proceedings/6th_ICAIIT/1_3Fayzrakhmanov.pdf The problem of fraudulent transactions detection, which is an imbalanced classification problem (most transactions are not fraudulent), you want to maximize both precision and recall; so the area under the PR curve. As a matter of fact, the question asks you to focus on detecting fraudulent transactions (maximize true positive rate, a.k.a. Recall) while minimizing false positives (a.k.a. maximizing Precision). Another way to see it is this: for imbalanced problems like this one you’ll get a lot of true negatives even from a bad model (it’s easy to guess a transaction as "non-fraudulent" because most of them are!), and with high TN the ROC curve goes high fast, which would be misleading. So you wanna avoid dealing with true negatives in your evaluation, which is precisely what the PR curve allows you to do.

Question #73

Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your company’s website.

Which result should you use to determine whether the model is successful?

A . The model predicts videos as popular if the user who uploads them has over 10,000 likes.
B . The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.
C . The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.
D . The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

https://developers.google.com/machine-learning/problem-framing/framing#quantify-it

Question #74

You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for model training, you discover that gradient optimization is having difficulty moving weights to a good solution.

What should you do?

A . Use feature construction to combine the strongest features.
B . Use the representation transformation (normalization) technique.
C . Improve the data cleaning step by removing features with missing values.
D . Change the partitioning step to reduce the dimension of the test set and have a larger training set.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://developers.google.com/machine-learning/data-prep/transform/transform-numeric

– NN models needs features with close ranges

– SGD converges well using features in [0, 1] scale

– The question specifically mention "different ranges"

Documentation – https://developers.google.com/machine-learning/data-prep/transform/transform-numeric

Question #75

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent.

Which data transformation strategy would likely improve the performance of your classifier?

A . Write your data in TFRecords.
B . Z-normalize all the numeric features.
C . Oversample the fraudulent transaction 10 times.
D . Use one-hot encoding on all categorical features.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Reference: https://towardsdatascience.com/how-to-build-a-machine-learning-model-to-identify-credit-card-fraud-in-5-stepsa-hands-on-modeling-5140b3bd19f1

Question #76

You are developing an ML model intended to classify whether X-Ray images indicate bone fracture risk. You have trained on Api Resnet architecture on Vertex AI using a TPU as an accelerator, however you are unsatisfied with the trainning time and use memory usage. You want to quickly iterate your training code but make minimal changes to the code. You also want to minimize impact on the models accuracy.

What should you do?

A . Configure your model to use bfloat16 instead float32
B . Reduce the global batch size from 1024 to 256
C . Reduce the number of layers in the model architecture
D . Reduce the dimensions of the images used un the model

Reveal Solution Hide Solution

Correct Answer: B

Question #77

Your task is classify if a company logo is present on an image. You found out that 96% of a data does not include a logo. You are dealing with data imbalance problem.

Which metric do you use to evaluate to model?

A . F1 Score
B . RMSE
C . F Score with higher precision weighting than recall
D . F Score with higher recall weighted than precision

Reveal Solution Hide Solution

Correct Answer: D

Question #78

You need to train a regression model based on a dataset containing 50,000 records that is stored in BigQuery. The data includes a total of 20 categorical and numerical features with a target variable that can include negative values. You need to minimize effort and training time while maximizing model performance.

What approach should you take to train this regression model?

A . Create a custom TensorFlow DNN model.
B . Use BQML XGBoost regression to train the model
C . Use AutoML Tables to train the model without early stopping.
D . Use AutoML Tables to train the model with RMSLE as the optimization objective

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

https://cloud.google.com/bigquery-ml/docs/introduction

Question #79

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests.

Which platform components should you choose for this system?

A . Vertex AI Pipelines and App Engine
B . Vertex AI Pipelines and Al Platform Prediction
C . Cloud Composer, BigQuery ML , and Al Platform Prediction
D . Cloud Composer, Al Platform Training with custom containers , and App Engine

Reveal Solution Hide Solution

Correct Answer: B

Question #80

While monitoring your model training’s GPU utilization, you discover that you have a native synchronous implementation. The training data is split into multiple files. You want to reduce the execution time of your input pipeline.

What should you do?

A . Increase the CPU load
B . Add caching to the pipeline
C . Increase the network bandwidth
D . Add parallel interleave to the pipeline

Reveal Solution Hide Solution

Correct Answer: A

Question #81

Your data science team is training a PyTorch model for image classification based on a pre-trained

RestNet model. You need to perform hyperparameter tuning to optimize for several parameters.

What should you do?

A . Convert the model to a Keras model, and run a Keras Tuner job.
B . Run a hyperparameter tuning job on AI Platform using custom containers.
C . Create a Kuberflow Pipelines instance, and run a hyperparameter tuning job on Katib.
D . Convert the model to a TensorFlow model, and run a hyperparameter tuning job on AI Platform.

Reveal Solution Hide Solution

Correct Answer: C

Question #82

You have a large corpus of written support cases that can be classified into 3 separate categories: Technical Support, Billing Support, or Other Issues. You need to quickly build, test, and deploy a service that will automatically classify future written requests into one of the categories.

How should you configure the pipeline?

A . Use the Cloud Natural Language API to obtain metadata to classify the incoming cases.
B . Use AutoML Natural Language to build and test a classifier. Deploy the model as a REST API.
C . Use BigQuery ML to build and test a logistic regression model to classify incoming requests. Use BigQuery ML to perform inference.
D . Create a TensorFlow model using Google’s BERT pre-trained model. Build and test a classifier, and
deploy the model using Vertex AI.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

AutoML Natural Language is a service that allows you to quickly build, test and deploy natural language processing (NLP) models without needing to have expertise in NLP or machine learning. You can use it to train a classifier on your corpus of written support cases, and then use the AutoML API to perform classification on new requests. Once the model is trained, it can be deployed as a REST API. This allows the classifier to be integrated into your pipeline and be easily consumed by other systems.

Question #83

You need to quickly build and train a model to predict the sentiment of customer reviews with custom categories without writing code. You do not have enough data to train a model from scratch. The resulting model should have high predictive performance.

Which service should you use?

A . AutoML Natural Language
B . Cloud Natural Language API
C . AI Hub pre-made Jupyter Notebooks
D . AI Platform Training built-in algorithms

Reveal Solution Hide Solution

Correct Answer: A

Question #84

You need to build an ML model for a social media application to predict whether a user’s submitted profile photo meets the requirements. The application will inform the user if the picture meets the requirements.

How should you build a model to ensure that the application does not falsely accept a non-compliant picture?

A . Use AutoML to optimize the model’s recall in order to minimize false negatives.
B . Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.
C . Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that meet the profile photo requirements.
D . Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that do not meet the profile photo requirements.

Reveal Solution Hide Solution

Correct Answer: C

Question #85

You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level TensorFlow APIs on AI Platform with GPUs. Your team usually

takes a few weeks or months to iterate on a new version of a model. You were recently asked to review your team’s spending.

How should you reduce your Google Cloud compute costs without impacting the model’s performance?

A . Use AI Platform to run distributed training jobs with checkpoints.
B . Use AI Platform to run distributed training jobs without checkpoints.
C . Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
D . Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.

Reveal Solution Hide Solution

Correct Answer: D

Question #86

You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error.

What should you do?

A . Use batch prediction mode instead of online mode.
B . Send the request again with a smaller batch of instances.
C . Use base64 to encode your data before using it for prediction.
D . Apply for a quota increase for the number of prediction requests.

Reveal Solution Hide Solution

Correct Answer: C

Question #87

You work at a subscription-based company. You have trained an ensemble of trees and neural networks to predict customer churn, which is the likelihood that customers will not renew their yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is located in New York City, and became a customer in 1997. You need to explain the difference between the actual prediction, a 70% churn rate, and the average prediction. You want to use Vertex Explainable AI.

What should you do?

A . Train local surrogate models to explain individual predictions.
B . Configure sampled Shapley explanations on Vertex Explainable AI.
C . Configure integrated gradients explanations on Vertex Explainable AI.
D . Measure the effect of each feature as the weight of the feature multiplied by the feature value.

Reveal Solution Hide Solution

Correct Answer: A

Question #88

You need to execute a batch prediction on 100 million records in a BigQuery table with a custom TensorFlow DNN regressor model, and then store the predicted results in a BigQuery table. You want to minimize the effort required to build this inference pipeline.

What should you do?

A . Import the TensorFlow model with BigQuery ML, and run the ml.predict function.
B . Use the TensorFlow BigQuery reader to load the data, and use the BigQuery API to write the results to BigQuery.
C . Create a Dataflow pipeline to convert the data in BigQuery to TFRecords. Run a batch inference on Vertex AI Prediction, and write the results to BigQuery.
D . Load the TensorFlow SavedModel in a Dataflow pipeline. Use the BigQuery I/O connector with a custom function to perform the inference within the pipeline, and write the results to BigQuery.

Reveal Solution Hide Solution

Correct Answer: A

Question #89

You are creating a deep neural network classification model using a dataset with categorical input values. Certain columns have a cardinality greater than 10,000 unique values.

How should you encode these categorical values as input into the model?

A . Convert each categorical value into an integer value.
B . Convert the categorical string data to one-hot hash buckets.
C . Map the categorical variables into a vector of boolean values.
D . Convert each categorical value into a run-length encoded string.

Reveal Solution Hide Solution

Correct Answer: C

Question #90

You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000 unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network.

What should you do?

A . Create a hot-encoding of words, and feed the encodings into your model.
B . Identify word embeddings from a pre-trained model, and use the embeddings in your model.
C . Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model.
D . Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model.

Reveal Solution Hide Solution

Correct Answer: B

Question #91

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests.

Which platform components should you choose for this system?

A . Vertex AI Pipelines and App Engine
B . Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring
C . Cloud Composer, BigQuery ML, and Vertex AI Prediction
D . Cloud Composer, Vertex AI Training with custom containers, and App Engine

Reveal Solution Hide Solution

Correct Answer: A

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #92

You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance.

Which action should you try first to increase the efficiency of your pipeline?

A . Preprocess the input CSV file into a TFRecord file.
B . Randomly select a 10 gigabyte subset of the data to train your model.
C . Split into multiple CSV files and use a parallel interleave transformation.
D . Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method.

Reveal Solution Hide Solution

Correct Answer: D

Question #101

Export the batch prediction job outputs from Cloud Storage and import them into BigQuery.

Reveal Solution Hide Solution

Correct Answer: C

Question #102

Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading.

Which approach should you use?

A . Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior.
B . Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.
C . Build a logistic regression model for each user that predicts whether an article should be recommended to a user.
D . Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories.

Reveal Solution Hide Solution

Correct Answer: A

Question #103

You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a human.

Which metric(s) should you use to monitor the model’s performance?

A . Number of messages flagged by the model per minute
B . Number of messages flagged by the model per minute confirmed as being inappropriate by humans.
C . Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
D . Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute

Reveal Solution Hide Solution

Correct Answer: B

Question #104

You are a lead ML engineer at a retail company. You want to track and manage ML metadata in a centralized way so that your team can have reproducible experiments by generating artifacts.

Which management solution should you recommend to your team?

A . Store your tf.logging data in BigQuery.
B . Manage all relational entities in the Hive Metastore.
C . Store all ML metadata in Google Cloud’s operations suite.
D . Manage your ML workflows with Vertex ML Metadata.

Reveal Solution Hide Solution

Correct Answer: C

Question #105

You have been given a dataset with sales predictions based on your company’s marketing activities. The data is structured and stored in BigQuery, and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks. You only have a few hours to gather the results of your experiments.

Which Google Cloud tools should you use to complete this task in the most efficient and self-serviced way?

A . Use BigQuery ML to run several regression models, and analyze their performance.
B . Read the data from BigQuery using Dataproc, and run several models using SparkML.
C . Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics.
D . Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms.

Reveal Solution Hide Solution

Correct Answer: A

Question #106

You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision.

What should you do?

A . Use local feature importance from the predictions.
B . Use the correlation with target values in the data summary page.
C . Use the feature importance percentages in the model evaluation page.
D . Vary features independently to identify the threshold per feature that changes the classification.

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

This would allow you to identify which features contributed the most to the model’s decision, which would in turn help you to better explain why the model rejected the customer’s loan request. Varying features independently could also be a useful approach, as this would help to identify specific threshold values that change the classification. However, this strategy would be more time-consuming and would not provide as much insight into the inner-workings of the model compared to using local feature importance.

Question #107

You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to AI Platform. You need to determine which customer attribute has the most predictive power for each prediction served by the model.

What should you do?

A . Use AI Platform notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong signal.
B . Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each feature and the target variable.
C . Use the AI Explanations feature on AI Platform. Submit each prediction request with the ‘explain’ keyword to retrieve feature attributions using the sampled Shapley method.
D . Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature importance in order of those that caused the most significant performance drop when removed from the model.

Reveal Solution Hide Solution

Correct Answer: D

Question #108

You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed.

Which metrics would give you the most confidence in your model?

A . F-score where recall is weighed more than precision
B . RMSE
C . F1 score
D . F-score where precision is weighed more than recall

Reveal Solution Hide Solution

Correct Answer: A

Question #109

You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product sales volumes, expenses, and profits for all regions.

What should you use as the input and output for your model?

A . Use latitude, longitude, and product type as features. Use profit as model output.
B . Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.
C . Use product type and the feature cross of latitude with longitude, followed by binning, as features.
Use profit as model output.
D . Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model outputs.

Reveal Solution Hide Solution

Correct Answer: C

Question #110

You work as an ML engineer at a social media company, and you are developing a visual filter for users’ profile photos. This requires you to train an ML model to detect bounding boxes around human faces. You want to use this filter in your company’s iOS-based mobile phone application. You want to minimize code development and want the model to be optimized for inference on mobile phones.

What should you do?

A . Train a model using AutoML Vision and use the “export for Core ML” option.
B . Train a model using AutoML Vision and use the “export for Coral” option.
C . Train a model using AutoML Vision and use the “export for TensorFlow.js” option.
D . Train a custom TensorFlow model and convert it to TensorFlow Lite (TFLite).

Reveal Solution Hide Solution

Correct Answer: A

Question #111

You have been asked to build a model using a dataset that is stored in a medium-sized (~10 GB) BigQuery table. You need to quickly determine whether this data is suitable for model development. You want to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. You require maximum flexibility to create your report.

What should you do?

A . Use Vertex AI Workbench user-managed notebooks to generate the report.
B . Use the Google Data Studio to create the report.
C . Use the output from TensorFlow Data Validation on Dataflow to generate the report.
D . Use Dataprep to create the report.

Reveal Solution Hide Solution

Correct Answer: C

Question #112

You work on an operations team at an international company that manages a large fleet of on-premises servers located in few data centers around the world. Your team collects monitoring data from the servers, including CPU/memory consumption. When an incident occurs on a server, your team is responsible for fixing it. Incident data has not been properly labeled yet. Your management team wants you to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team.

What should you do first?

A . Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values significantly differ from the predicted performance values.
B . Implement a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict anomalies based on this labeled dataset.
C . Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Test this heuristic in a production environment.
D . Hire a team of qualified analysts to review and label the machines’ historical performance data.
Train a model based on this manually labeled dataset.

Reveal Solution Hide Solution

Correct Answer: D

Question #113

You are developing an ML model that uses sliced frames from video feed and creates bounding boxes around specific objects. You want to automate the following steps in your training pipeline: ingestion and preprocessing of data in Cloud Storage, followed by training and hyperparameter tuning of the object model using Vertex AI jobs, and finally deploying the model to an endpoint. You want to orchestrate the entire pipeline with minimal cluster management.

What approach should you use?

A . Use Kubeflow Pipelines on Google Kubernetes Engine.
B . Use Vertex AI Pipelines with TensorFlow Extended (TFX) SDK.
C . Use Vertex AI Pipelines with Kubeflow Pipelines SDK.
D . Use Cloud Composer for the orchestration.

Reveal Solution Hide Solution

Correct Answer: A

Question #114

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance.

What should you do?

A . Increase the instance memory to 512 GB and increase the batch size.
B . Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job.
C . Enable early stopping in your Vertex AI Training job.
D . Use the tf.distribute.Strategy API and run a distributed training job.

Reveal Solution Hide Solution

Correct Answer: C

Exit mobile version