Why do you need to split a machine learning dataset into training data and test data?

Why do you need to split a machine learning dataset into training data and test data?
A . So you can try two different sets of features
B . To make sure your model is generalized for more than just the training data
C . To allow you to create unit tests in your code
D . So you can use one dataset for a wide model and one for a deep model

Answer: B

Explanation:

The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.

Reference: https://machinelearningmastery.com/a-simple-intuition-for-overfitting/

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments