Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?
Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?A . KerasB . pandasC . PvTorchD . Spark MLE . Scikit-learnView AnswerAnswer: D Explanation: Spark ML (Machine Learning Library) is designed specifically for...
Which of the following changes does the data scientist need to make to their objective_function in order to produce a more accurate model?
A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model. They elect to use the Hyperopt library's fmin operation to facilitate this process. Unfortunately, the final model is not very accurate. The data scientist suspects that there is an issue with the objective_function being passed as an...
What is the name of the method that transforms categorical features into a series of binary indicator feature variables?
What is the name of the method that transforms categorical features into a series of binary indicator feature variables?A . Leave-one-out encodingB . Target encodingC . One-hot encodingD . CategoricalE . String indexingView AnswerAnswer: C Explanation: The method that transforms categorical features into a series of binary indicator variables is...
Which of the following terms is used to describe this combination of models?
A data scientist has produced two models for a single machine learning problem. One of the models performs well when one of the features has a value of less than 5, and the other model performs well when the value of that feature is greater than or equal to 5....
Which of the following code blocks will accomplish this task?
A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0. Which of the following code blocks will accomplish this task?A . spark_df[spark_df["price"] > 0]B . spark_df.filter(col("price") >...
Which of the following statements describes a Spark ML estimator?
Which of the following statements describes a Spark ML estimator?A . An estimator is a hyperparameter arid that can be used to train a modelB . An estimator chains multiple alqorithms toqether to specify an ML workflowC . An estimator is a trained ML model which turns a DataFrame with...
Which of the following changes do they need to make to the above code block in order to accomplish the task?
A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective function objective_function and they have defined the search space search_space. As a result, they have the following code block: Which of the following changes do...
Which of the following explanations justifies this suggestion?
An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository. Which of the following explanations justifies this suggestion?A . One-hot encoding is not supported by...
Which of the following changes can the data scientist make to address the concern?
A data scientist is using Spark ML to engineer features for an exploratory machine learning project. They decide they want to standardize their features using the following code block: Upon code review, a colleague expressed concern with the features being standardized prior to splitting the data into a training set...
Which of the following values represents the overall cross-validation root-mean-squared error?
A data scientist uses 3-fold cross-validation when optimizing model hyperparameters for a regression problem. The following root-mean-squared-error values are calculated on each of the validation folds: • 10.0 • 12.0 • 17.0 Which of the following values represents the overall cross-validation root-mean-squared error?A . 13.0B . 17.0C . 12.0D ....