Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

exams MLS-C01 V1 MLS-C01 exam 0 Comments

Example Corp has an annual sale event from October to December. The company has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year’s upcoming event.

Which method should Example Corp use to split the data into a training dataset and evaluation dataset?
A . Pre-split the data before uploading to Amazon S3
B . Have Amazon ML split the data randomly.
C . Have Amazon ML split the data sequentially.
D . Perform custom cross-validation on the data

Answer: C

Explanation:

A sequential split is a method of splitting data into training and evaluation datasets while preserving the order of the data records. This method is useful when the data has a temporal or sequential structure, and the order of the data matters for the prediction task. For example, if the data contains sales data for different months or years, and the goal is to predict the sales for the next month or year, a sequential split can ensure that the training data comes from the earlier period and the evaluation data comes from the later period. This can help avoid data leakage, which occurs when the training data contains information from the future that is not available at the time of prediction. A sequential split can also help evaluate the model performance on the most recent data, which may be more relevant and representative of the future data.

In this question, Example Corp has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year’s upcoming annual sale event. A sequential split is the most appropriate method for splitting the data, as it can preserve the order of the data and prevent data leakage. For example, Example Corp can use the data from the first 14 years as the training dataset, and the data from the last year as the evaluation dataset. This way, the model can learn from the historical data and be tested on the most recent data.

Amazon ML provides an option to split the data sequentially when creating the training and evaluation data sources. To use this option, Example Corp can specify the percentage of the data to use for training and evaluation, and Amazon ML will use the first part of the data for training and the remaining part of the data for evaluation. For more information, see Splitting Your Data – Amazon Machine Learning.

Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

Latest MLS-C01 Dumps Valid Version with 104 Q&As