Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema: prediction DOUBLE actual DOUBLE Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data...

March 19, 2025 No Comments READ MORE +

Which of the following changes can the data scientist make to accomplish the task?

A data scientist is attempting to tune a logistic regression model logistic using scikit-learn. They want to specify a search space for two hyperparameters and let the tuning process randomly select values for each evaluation. They attempt to run the following code block, but it does not accomplish the desired...

March 14, 2025 No Comments READ MORE +

Which of the following lines of code can the data scientist run to accomplish the task?

A data scientist is wanting to explore summary statistics for Spark DataFrame spark_df. The data scientist wants to see the count, mean, standard deviation, minimum, maximum, and interquartile range (IQR) for each numerical feature. Which of the following lines of code can the data scientist run to accomplish the task?A...

March 6, 2025 No Comments READ MORE +

Which of the following approaches can they take to include as much information as possible in the feature set?

A data scientist has replaced missing values in their feature set with each respective feature variable’s median value. A colleague suggests that the data scientist is throwing away valuable information by doing this. Which of the following approaches can they take to include as much information as possible in the...

March 5, 2025 No Comments READ MORE +

Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?

Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?A . MLflow Experiment TrackingB . Spark MLC . Autoscaling clustersD . Autoscaling clustersE . Delta LakeView AnswerAnswer: B Explanation: Spark ML (part of Apache Spark's MLlib) is...

March 5, 2025 No Comments READ MORE +

Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?

Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?A . TrainValidationSplitB . DataFrame.whereC . CrossValidatorD . TrainValidationSplitModelE . DataFrame.randomSplitView AnswerAnswer: E Explanation: The correct method to randomly split a Spark DataFrame into training and...

March 4, 2025 No Comments READ MORE +

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?A . KerasB . pandasC . PvTorchD . Spark MLE . Scikit-learnView AnswerAnswer: D Explanation: Spark ML (Machine Learning Library) is designed specifically for...

March 4, 2025 No Comments READ MORE +

Which of the following changes does the data scientist need to make to their objective_function in order to produce a more accurate model?

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model. They elect to use the Hyperopt library's fmin operation to facilitate this process. Unfortunately, the final model is not very accurate. The data scientist suspects that there is an issue with the objective_function being passed as an...

February 28, 2025 No Comments READ MORE +

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?A . Leave-one-out encodingB . Target encodingC . One-hot encodingD . CategoricalE . String indexingView AnswerAnswer: C Explanation: The method that transforms categorical features into a series of binary indicator variables is...

February 26, 2025 No Comments READ MORE +

Which of the following terms is used to describe this combination of models?

A data scientist has produced two models for a single machine learning problem. One of the models performs well when one of the features has a value of less than 5, and the other model performs well when the value of that feature is greater than or equal to 5....

February 24, 2025 No Comments READ MORE +