Amazon MLS-C01 AWS Certified Machine Learning – Specialty Online Training
Amazon MLS-C01 Online Training
The questions for MLS-C01 were last updated at May 15,2024.
- Exam Code: MLS-C01
- Exam Name: AWS Certified Machine Learning - Specialty
- Certification Provider: Amazon
- Latest update: May 15,2024
A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression During exploratory data analysis the Specialist observes that many features are highly correlated with each other This may make the model unstable
What should be done to reduce the impact of having such a large number of features?
- A . Perform one-hot encoding on highly correlated features
- B . Use matrix multiplication on highly correlated features.
- C . Create a new feature space using principal component analysis (PCA)
- D . Apply the Pearson correlation coefficient
A Machine Learning Specialist wants to determine the appropriate SageMaker Variant Invocations Per Instance setting for an endpoint automatic scaling configuration. The Specialist has performed a load test on a single instance and determined that peak requests per second (RPS) without service degradation is about 20 RPS As this is the first deployment, the Specialist intends to set the invocation safety factor to 0 5
Based on the stated parameters and given that the invocations per instance setting is measured on a per-minute basis, what should the Specialist set as the sageMaker variant invocations Per instance setting?
- A . 10
- B . 30
- C . 600
- D . 2,400
A Machine Learning Specialist deployed a model that provides product recommendations on a company’s website Initially, the model was performing very well and resulted in customers buying more products on average However within the past few months the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago
Which method should the Specialist try to improve model performance?
- A . The model needs to be completely re-engineered because it is unable to handle product inventory changes
- B . The model’s hyperparameters should be periodically updated to prevent drift
- C . The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
- D . The model should be periodically retrained using the original training data plus new data as product inventory changes
A manufacturer of car engines collects data from cars as they are being driven. The data collected includes timestamp, engine temperature, rotations per minute (RPM), and other sensor readings The company wants to predict when an engine is going to have a problem so it can notify drivers in advance to get engine maintenance The engine data is loaded into a data lake for training.
Which is the MOST suitable predictive model that can be deployed into production’?
- A . Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem Use a recurrent neural network (RNN) to train the model to recognize when an engine might need maintenance for a certain fault.
- B . This data requires an unsupervised learning algorithm Use Amazon SageMaker k-means to cluster the data
- C . Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem Use a convolutional neural network (CNN) to train the model to recognize when an engine might need maintenance for a certain fault.
- D . This data is already formulated as a time series Use Amazon SageMaker seq2seq to model the time series.
A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset
Which tool should be used to improve the validation accuracy?
- A . Amazon Comprehend syntax analysts and entity detection
- B . Amazon SageMaker BlazingText allow mode
- C . Natural Language Toolkit (NLTK) stemming and stop word removal
- D . Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers
A Machine Learning Specialist is developing recommendation engine for a photography blog Given a picture, the recommendation engine should show a picture that captures similar objects The Specialist would like to create a numerical representation feature to perform nearest-neighbor searches
What actions would allow the Specialist to get relevant numerical representations?
- A . Reduce image resolution and use reduced resolution pixel values as features
- B . Use Amazon Mechanical Turk to label image content and create a one-hot representation indicating the presence of specific labels
- C . Run images through a neural network pie-trained on ImageNet, and collect the feature vectors from the penultimate layer
- D . Average colors by channel to obtain three-dimensional representations of images.
A gaming company has launched an online game where people can start playing for free but they need to pay if they choose to use certain features The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year The company has gathered a labeled dataset from 1 million users
The training dataset consists of 1.000 positive samples (from users who ended up paying within 1 year) and 999.000 negative samples (from users who did not use any paid features) Each data sample consists of 200 features including user age, device, location, and play patterns Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set However, the prediction results on a test dataset were not satisfactory.
Which of the following approaches should the Data Science team take to mitigate this issue? (Select TWO.)
- A . Add more deep trees to the random forest to enable the model to learn more features.
- B . indicate a copy of the samples in the test database in the training dataset
- C . Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data.
- D . Change the cost function so that false negatives have a higher impact on the cost value than false positives
- E . Change the cost function so that false positives have a higher impact on the cost value than false negatives
While reviewing the histogram for residuals on regression evaluation data a Machine Learning Specialist notices that the residuals do not form a zero-centered bell shape as shown.
What does this mean?
- A . The model might have prediction errors over a range of target values.
- B . The dataset cannot be accurately represented using the regression model
- C . There are too many variables in the model
- D . The model is predicting its target values perfectly.
During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates.
What is the MOST likely cause of this issue?
- A . The class distribution in the dataset is imbalanced
- B . Dataset shuffling is disabled
- C . The batch size is too big
- D . The learning rate is very high
A Machine Learning Specialist observes several performance problems with the training portion of a machine learning solution on Amazon SageMaker The solution uses a large training dataset 2 TB in size and is using the SageMaker k-means algorithm The observed issues include the unacceptable length of time it takes before the training job launches and poor I/O throughput while training the model
What should the Specialist do to address the performance issues with the current solution?
- A . Use the SageMaker batch transform feature
- B . Compress the training data into Apache Parquet format.
- C . Ensure that the input mode for the training job is set to Pipe.
- D . Copy the training dataset to an Amazon EFS volume mounted on the SageMaker instance.