A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions.
Here is an example from the dataset:
"The quck BROWN FOX jumps over the lazy dog.”
Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.)
A . Perform part-of-speech tagging and keep the action verb and the nouns only.
B . Normalize all words by making the sentence lowercase.
C . Remove stop words using an English stopword dictionary.
D . Correct the typography on "quck" to "quick.”
E . One-hot encode all words in the sentence.
F . Tokenize the sentence into words.