Which of the following approaches would you combine for detecting and managing drift in your ML model?

exams MLA-C01 MLA-C01 exam 0 Comments

You are a data scientist working for a financial institution that uses a machine learning model to predict loan defaults. The model was trained on historical data from the past five years, but after being deployed for several months, its accuracy has gradually decreased. Upon investigation, you suspect that the underlying data distribution has changed due to economic shifts and changes in customer behavior. This phenomenon is known as model drift, and you need to address it to ensure the model continues to perform well.

Which of the following approaches would you combine for detecting and managing drift in your ML model? (Select two)
A . Decrease the complexity of the model by removing features and layers, thereby turning it into a simpler model that can various types of data distributions
B . Increase the complexity of the model by adding more features and deeper layers, ensuring it can adapt to changing data distributions over time
C . Deploy a secondary model trained on different data and compare its predictions with the original model to detect any significant differences, indicating potential drift
D . Retrain the model on the most recent data to ensure it captures current trends, and use model versioning to track performance improvements over time
E . Implement continuous monitoring of input data features and model predictions using statistical tests
to detect shifts in data distribution or performance, triggering an alert when drift is detected

Answer: D, E

Explanation:

Correct options:

Implement continuous monitoring of input data features and model predictions using statistical tests to detect shifts in data distribution or performance, triggering an alert when drift is detected

Retrain the model on the most recent data to ensure it captures current trends, and use model versioning to track performance improvements over time

For a model to predict accurately, the data that it is making predictions on must have a similar distribution as the data on which the model was trained. Because data distributions can be expected to drift over time, deploying a model is not a one-time exercise but rather a continuous process. It is a good practice to continuously monitor the incoming data and retrain your model on newer data if you find that the data distribution has deviated significantly from the original training data distribution. If monitoring data to detect a change in the data distribution has a high overhead, then a simpler strategy is to retrain the model periodically, for example, daily, weekly, or monthly.

via – https://aws.amazon.com/blogs/machine-learning/automate-model-retraining-with-amazon-sagemaker-pipelines-when-drift-is-detected/

Incorrect options:

Deploy a secondary model trained on different data and compare its predictions with the original model to detect any significant differences, indicating potential drift – Comparing predictions from a secondary model might indicate drift, but it’s not a robust or scalable solution. It requires maintaining multiple models, which can be resource-intensive and complex to manage.

Increase the complexity of the model by adding more features and deeper layers, ensuring it can adapt to changing data distributions over time – Increasing model complexity may help capture more nuances in the data, but it does not address drift directly. In fact, a more complex model could exacerbate issues related to overfitting or make the model more sensitive to minor changes in data distribution.

Decrease the complexity of the model by removing features and layers, thereby turning it into a simpler model that can various types of data distributions – Decreasing the model complexity and turning the model into a simpler one would introduce more bias into the model, thereby rendering it

ineffective to address changes in the data distribution.

References:

https://docs.aws.amazon.com/machine-learning/latest/dg/retraining-models-on-new-data.html

https://aws.amazon.com/blogs/machine-learning/automate-model-retraining-with-amazon-sagemaker-pipelines-when-drift-is-detected/