What should you do?

You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set.

What should you do?
A . Randomly redistribute the data, with 70% for the training set and 30% for the test set
B. Use sparse representation in the test set
C. Apply one-hot encoding on the categorical variables in the test data.
D. Collect more data representing all categories

Answer: C

Explanation:

This approach ensures that the model is able to accurately interpret the categorical data in the test set. As the training set already contains one-hot encoded data, it is important to apply the same encoding to the test set so the model can interpret the data accurately.

References: https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/https://machinelearningmastery.com/how-to-use-one-hot-encoding-for-categorical-data/.

When working with categorical input variables, it’s important to ensure that the same preprocessing steps are applied to both the training and test sets. One-hot encoding is a common method used to convert categorical variables into numerical values, which can then be used as inputs to machine learning models. By applying one-hot encoding to the test set, you will ensure that the test data has the same format as the training data and that the model can make accurate predictions.