CertNexus AIP-210 CertNexus Certified Artificial Intelligence Practitioner (CAIP) Online Training

exams

6 months ago

Question #1

You and your team need to process large datasets of images as fast as possible for a machine learning task. The project will also use a modular framework with extensible code and an active developer community.

Which of the following would BEST meet your needs?

A . Caffe
B . Keras
C . Microsoft Cognitive Services
D . TensorBoard

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Caffe is a deep learning framework that is designed for speed and modularity. It can process large datasets of images efficiently and supports various types of neural networks. It also has a large and active developer community that contributes to its code base and documentation. Caffe is suitable for image processing tasks such as classification, segmentation, detection, and recognition

Question #2

Which of the following principles supports building an ML system with a Privacy by Design methodology?

A . Avoiding mechanisms to explain and justify automated decisions.
B . Collecting and processing the largest amount of data possible.
C . Understanding, documenting, and displaying data lineage.
D . Utilizing quasi-identifiers and non-unique identifiers, alone or in combination.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Data lineage is the process of tracking the origin, transformation, and usage of data throughout its lifecycle. It helps to ensure data quality, integrity, and provenance. Data lineage also supports the Privacy by Design methodology, which is a framework that aims to embed privacy principles into the design and operation of systems, processes, and products that involve personal data. By understanding, documenting, and displaying data lineage, an ML system can demonstrate how it collects, processes, stores, and deletes personal data in a transparent and accountable manner3.

Question #3

A data scientist is tasked to extract business intelligence from primary data captured from the public.

Which of the following is the most important aspect that the scientist cannot forget to include?

A . Cyberprotection
B . Cybersecurity
C . Data privacy
D . Data security

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Data privacy is the right of individuals to control how their personal data is collected, used, shared, and protected. It also involves complying with relevant laws and regulations that govern the handling of personal data. Data privacy is especially important when extracting business intelligence from primary data captured from the public, as it may contain sensitive or confidential information that could harm the individuals if misused or breached.

Question #4

For a particular classification problem, you are tasked with determining the best algorithm among SVM, random forest, K-nearest neighbors, and a deep neural network. Each of the algorithms has similar accuracy on your data. The stakeholders indicate that they need a model that can convey each feature’s relative contribution to the model’s accuracy.

Which is the best algorithm for this use case?

A . Deep neural network
B . K-nearest neighbors
C . Random forest
D . SVM

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Random forest is an ensemble learning method that combines multiple decision trees to create a more accurate and robust classifier or regressor. Random forest can convey each feature’s relative contribution to the model’s accuracy by measuring how much the prediction error increases when a feature is randomly permuted. This metric is called feature importance or Gini importance. Random forest can also provide insights into the interactions and dependencies among features by visualizing the decision trees.

Question #5

A dataset can contain a range of values that depict a certain characteristic, such as grades on tests in a class during the semester. A specific student has so far received the following grades: 76,81, 78, 87, 75, and 72. There is one final test in the semester.

What minimum grade would the student need to achieve on the last test to get an 80% average?

A . 82
B . 89
C . 91
D . 94

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

To calculate the minimum grade needed to achieve an 80% average, we can use the following formula:

minimum grade = (target average * number of tests – sum of grades) / (number of tests – 1)

Plugging in the given values, we get:

minimum grade = (80 * 7 – (76 + 81 + 78 + 87 + 75 + 72)) / (7 – 6)

minimum grade = (560 – 469) / 1

minimum grade = 91

Therefore, the student needs to score at least 91 on the last test to get an 80% average.

Question #6

Which three security measures could be applied in different ML workflow stages to defend them against malicious activities? (Select three.)

A . Disable logging for model access.
B . Launch ML Instances In a virtual private cloud (VPC).
C . Monitor model degradation.
D . Use data encryption.
E . Use max privilege to control access to ML artifacts.
F . Use Secrets Manager to protect credentials.

Reveal Solution Hide Solution

Correct Answer: BDF
BDF

Explanation:

Security measures can be applied in different ML workflow stages to defend them against malicious activities, such as data theft, model tampering, or adversarial attacks.

Some of the security measures are:

Launch ML Instances In a virtual private cloud (VPC): A VPC is a logically isolated section of a cloud provider’s network that allows users to launch and control their own resources. By launching ML instances in a VPC, users can enhance the security and privacy of their data and models, as well as restrict the access and traffic to and from the instances.

Use data encryption: Data encryption is the process of transforming data into an unreadable format using a secret key or algorithm. Data encryption can protect the confidentiality, integrity, and availability of data at rest (stored in databases or files) or in transit (transferred over networks). Data encryption can prevent unauthorized access, modification, or leakage of sensitive data.

Use Secrets Manager to protect credentials: Secrets Manager is a service that helps users securely store, manage, and retrieve secrets, such as passwords, API keys, tokens, or certificates. Secrets Manager can help users protect their credentials from unauthorized access or exposure, as well as rotate them automatically to comply with security policies.

Question #7

A healthcare company experiences a cyberattack, where the hackers were able to reverse-engineer a dataset to break confidentiality.

Which of the following is TRUE regarding the dataset parameters?

A . The model is overfitted and trained on a high quantity of patient records.
B . The model is overfitted and trained on a low quantity of patient records.
C . The model is underfitted and trained on a high quantity of patient records.
D . The model is underfitted and trained on a low quantity of patient records.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Overfitting is a problem that occurs when a model learns too much from the training data and fails to generalize well to new or unseen data. Overfitting can result from using a low quantity of training data, a high complexity of the model, or a lack of regularization. Overfitting can also increase the risk of reverse-engineering a dataset from a model’s outputs, as the model may reveal too much information about the specific features or patterns of the training data. This can break the confidentiality of the data and expose sensitive information about the individuals in the dataset.

Question #8

When working with textual data and trying to classify text into different languages, which approach to representing features makes the most sense?

A . Bag of words model with TF-IDF
B . Bag of bigrams (2 letter pairs)
C . Word2Vec algorithm
D . Clustering similar words and representing words by group membership

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

A bag of bigrams (2 letter pairs) is an approach to representing features for textual data that involves counting the frequency of each pair of adjacent letters in a text. For example, the word “hello” would be represented as {“he”: 1, “el”: 1, “ll”: 1, “lo”: 1}. A bag of bigrams can capture some information about the spelling and structure of words, which can be useful for identifying the language of a text. For example, some languages have more common bigrams than others, such as “th” in English or “ch” in German.

Question #9

Which of the following options is a correct approach for scheduling model retraining in a weather prediction application?

A . As new resources become available
B . Once a month
C . When the input format changes
D . When the input volume changes

Reveal Solution Hide Solution

Correct Answer: C B
C B

Explanation:

The input format is the way that the data is structured, organized, and presented to the model. For example, the input format could be a CSV file, an image file, or a JSON object. The input format can affect how the model interprets and processes the data, and therefore how it makes predictions. When the input format changes, it may require retraining the model to adapt to the new format and ensure its accuracy and reliability. For example, if the weather prediction application switches from using numerical values to categorical values for some features, such as wind direction or cloud cover, it may need to retrain the model to handle these changes.

Question #10

Which of the following tools would you use to create a natural language processing application?

A . AWS DeepRacer
B . Azure Search
C . DeepDream
D . NLTK

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

NLTK (Natural Language Toolkit) is a Python library that provides a set of tools and resources for natural language processing (NLP). NLP is a branch of AI that deals with analyzing, understanding, and generating natural language texts or speech. NLTK offers modules for various NLP tasks, such as tokenization, stemming, lemmatization, parsing, tagging, chunking, sentiment analysis, named entity recognition, machine translation, text summarization, and more.

Question #11

A classifier has been implemented to predict whether or not someone has a specific type of disease.

Considering that only 1% of the population in the dataset has this disease, which measures will work the BEST to evaluate this model?

A . Mean squared error
B . Precision and accuracy
C . Precision and recall
D . Recall and explained variance

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Precision and recall are two measures that can evaluate the performance of a classifier, especially when the data is imbalanced. Precision is the ratio of true positives (correctly predicted positive cases) to all predicted positive cases. Recall is the ratio of true positives to all actual positive cases. Precision and recall can help assess how well the classifier can identify the positive cases (the disease) and avoid false negatives (missed diagnosis) or false positives (unnecessary treatment).

Question #12

Which of the following describes a typical use case of video tracking?

A . Augmented dreaming
B . Medical diagnosis
C . Traffic monitoring
D . Video composition

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Video tracking is a technique that involves detecting and following moving objects in a video sequence. Video tracking can be used for various applications, such as surveillance, security, sports analysis, and human-computer interaction. One typical use case of video tracking is traffic monitoring, where video tracking can help measure traffic flow, detect congestion, identify violations, and optimize traffic signals.

Question #13

You are developing a prediction model. Your team indicates they need an algorithm that is fast and requires low memory and low processing power.

Assuming the following algorithms have similar accuracy on your data, which is most likely to be an ideal choice for the job?

A . Deep learning neural network
B . Random forest
C . Ridge regression
D . Support-vector machine

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Ridge regression is a type of linear regression that adds a regularization term to the loss function to reduce overfitting and improve generalization. Ridge regression is fast and requires low memory and low processing power, as it only involves solving a system of linear equations. Ridge regression can also handle multicollinearity (high correlation among predictors) by shrinking the coefficients of correlated predictors.

Question #14

For each of the last 10 years, your team has been collecting data from a group of subjects, including their age and numerous biomarkers collected from blood samples. You are tasked with creating a prediction model of age using the biomarkers as input. You start by performing a linear regression using all of the data over the 10-year period, with age as the dependent variable and the biomarkers as predictors.

Which assumption of linear regression is being violated?

A . Equality of variance (Homoscedastidty)
B . Independence
C . Linearity
D . Normality

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Independence is an assumption of linear regression that states that the errors (residuals) of the model are independent of each other, meaning that they are not correlated or influenced by previous or subsequent errors. Independence can be violated when the data has serial correlation or autocorrelation, which means that the value of a variable at a given time depends on its previous or future values. This can happen when the data is collected over time (time series) or over space (spatial data). In this case, the data is collected over time from a group of subjects, which may introduce serial correlation among the errors.

Question #15

When should you use semi-supervised learning? (Select two.)

A . A small set of labeled data is available but not representative of the entire distribution.
B . A small set of labeled data is biased toward one class.
C . Labeling data is challenging and expensive.
D . There is a large amount of labeled data to be used for predictions.
E . There is a large amount of unlabeled data to be used for predictions.

Reveal Solution Hide Solution

Correct Answer: CE
CE

Explanation:

Semi-supervised learning is a type of machine learning that uses both labeled and unlabeled data to train a model.

Semi-supervised learning can be useful when:

Labeling data is challenging and expensive: Labeling data requires human intervention and domain expertise, which can be costly and time-consuming. Semi-supervised learning can leverage the large amount of unlabeled data that is easier and cheaper to obtain and use it to improve the model’s performance.

There is a large amount of unlabeled data to be used for predictions: Unlabeled data can provide additional information and diversity to the model, which can help it learn more complex patterns and generalize better to new data. Semi-supervised learning can use various techniques, such as self-training, co-training, or generative models, to incorporate unlabeled data into the learning process.

Question #16

Which of the following can benefit from deploying a deep learning model as an embedded model on edge devices?

A . A more complex model
B . Guaranteed availability of enough space
C . Increase in data bandwidth consumption
D . Reduction in latency

Reveal Solution Hide Solution

Correct Answer: D
D

Explanation:

Latency is the time delay between a request and a response. Latency can affect the performance and user experience of an application, especially when real-time or near-real-time responses are required. Deploying a deep learning model as an embedded model on edge devices can reduce latency, as the model can run locally on the device without relying on network connectivity or cloud servers. Edge devices are devices that are located at the edge of a network, such as smartphones, tablets, laptops, sensors, cameras, or drones.

Question #17

Which of the following is the definition of accuracy?

A . (True Positives + False Positives) / Total Predictions
B . (True Positives + True Negatives) / Total Predictions
C . True Positives / (True Positives + False Negatives)
D . True Positives / (True Positives + False Positives)

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Accuracy is a measure of how well a classifier can correctly predict the class of an instance. Accuracy is calculated by dividing the number of correct predictions (true positives and true negatives) by the total number of predictions. True positives are instances that are correctly predicted as positive (belonging to the target class). True negatives are instances that are correctly predicted as negative (not belonging to the target class).

Question #18

Personal data should not be disclosed, made available, or otherwise used for purposes other than specified with which of the following exceptions? (Select two.)

A . If it is for a good cause.
B . If it was collected accidentally.
C . If it was requested by the authority of law.
D . If it was with consent of the person it is collected from.
E . If the data is only collected once.

Reveal Solution Hide Solution

Correct Answer: CD
CD

Explanation:

Personal data is any information that relates to an identified or identifiable individual, such as name, address, email, phone number, or biometric data. Personal data should not be disclosed, made available, or otherwise used for purposes other than specified, except with:

The consent of the person it is collected from: Consent is a clear and voluntary indication of agreement by the person to the processing of their personal data for a specific purpose. Consent can be given by a statement or a clear affirmative action, such as ticking a box or clicking a button.

The authority of law: The authority of law is a legal basis or obligation that requires or permits the processing of personal data for a legitimate purpose. For example, the authority of law could be a court order, a subpoena, a warrant, or a statute.

Question #19

Which of the following sentences is TRUE about the definition of cloud models for machine learning pipelines?

A . Data as a Service (DaaS) can host the databases providing backups, clustering, and high availability.
B . Infrastructure as a Service (IaaS) can provide CPU, memory, disk, network and GPU.
C . Platform as a Service (PaaS) can provide some services within an application such as payment applications to create efficient results.
D . Software as a Service (SaaS) can provide AI practitioner data science services such as Jupyter notebooks.

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Cloud models are service models that provide different levels of abstraction and control over computing resources in a cloud environment. Some of the common cloud models for machine learning pipelines are:

Software as a Service (SaaS): SaaS provides ready-to-use applications that run on the cloud provider’s infrastructure and are accessible through a web browser or an API. SaaS can provide AI practitioner data science services such as Jupyter notebooks, which are web-based interactive environments that allow users to create and share documents that contain code, text, visualizations, and more.

Platform as a Service (PaaS): PaaS provides a platform that allows users to develop, run, and manage applications without worrying about the underlying infrastructure. PaaS can provide some services within an application such as payment applications to create efficient results.

Infrastructure as a Service (IaaS): IaaS provides access to fundamental computing resources such as servers, storage, networks, and operating systems. IaaS can provide CPU, memory, disk, network and GPU resources that can be used to run machine learning models and applications.

Data as a Service (DaaS): DaaS provides access to data sources that can be consumed by applications or users on demand. DaaS can host the databases providing backups, clustering, and high availability.

Question #20

In a self-driving car company, ML engineers want to develop a model for dynamic pathing.

Which of following approaches would be optimal for this task?

A . Dijkstra Algorithm
B . Reinforcement learning
C . Supervised Learning.
D . Unsupervised Learning

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Reinforcement learning is a type of machine learning that involves learning from trial and error based on rewards and penalties. Reinforcement learning can be used to develop models for dynamic pathing, which is the problem of finding an optimal path from one point to another in an uncertain and changing environment. Reinforcement learning can enable the model to adapt to new situations and learn from its own actions and feedback. For example, a self-driving car company can use reinforcement learning to train its model to navigate complex traffic scenarios and avoid collisions.

Question #21

R-squared is a statistical measure that:

A . Combines precision and recall of a classifier into a single metric by taking their harmonic mean.
B . Expresses the extent to which two variables are linearly related.
C . Is the proportion of the variance for a dependent variable thaf’ s explained by independent variables.
D . Represents the extent to which two random variables vary together.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

R-squared is a statistical measure that indicates how well a regression model fits the data. R-squared is calculated by dividing the explained variance by the total variance. The explained variance is the amount of variation in the dependent variable that can be attributed to the independent variables. The total variance is the amount of variation in the dependent variable that can be observed in the data. R-squared ranges from 0 to 1, where 0 means no fit and 1 means perfect fit.

Question #22

Which of the following equations best represent an LI norm?

A . |x| + |y|
B . |x|+|y|^2
C . |x|-|y|
D . |x|^2+|y|^2

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

An L1 norm is a measure of distance or magnitude that is defined as the sum of the absolute values of the components of a vector. For example, if x and y are two components of a vector, then the L1 norm of that vector is |x| + |y|. The L1 norm is also known as the Manhattan distance or the taxicab distance, as it represents the shortest path between two points in a grid-like city.

Question #23

Which of the following statements are true regarding highly interpretable models? (Select two.)

A . They are usually binary classifiers.
B . They are usually easier to explain to business stakeholders.
C . They are usually referred to as "black box" models.
D . They are usually very good at solving non-linear problems.
E . They usually compromise on model accuracy for the sake of interpretability.

Reveal Solution Hide Solution

Correct Answer: BE
BE

Explanation:

Highly interpretable models are models that can provide clear and intuitive explanations for their predictions, such as decision trees, linear regression, or logistic regression.

Some of the statements that are true regarding highly interpretable models are:

They are usually easier to explain to business stakeholders: Highly interpretable models can help communicate the logic and reasoning behind their predictions, which can increase trust and confidence among business stakeholders. For example, a decision tree can show how each feature contributes to a decision outcome, or a linear regression can show how each coefficient affects the dependent variable.

They usually compromise on model accuracy for the sake of interpretability: Highly interpretable models may not be able to capture complex or non-linear patterns in the data, which can reduce their accuracy and generalization. For example, a decision tree may overfit or underfit the data if it is too deep or too shallow, or a linear regression may not be able to model curved relationships between variables.

Question #24

Which two of the following decrease technical debt in ML systems? (Select two.)

A . Boundary erosion
B . Design anti-patterns
C . Documentation readability
D . Model complexity
E . Refactoring

Reveal Solution Hide Solution

Correct Answer: CE
CE

Explanation:

Technical debt is a metaphor that describes the implied cost of additional work or rework caused by choosing an easy or quick solution over a better but more complex solution. Technical debt can accumulate in ML systems due to various factors, such as changing requirements, outdated code, poor documentation, or lack of testing.

Some of the ways to decrease technical debt in ML systems are:

Documentation readability: Documentation readability refers to how easy it is to understand and use the documentation of an ML system. Documentation readability can help reduce technical debt by providing clear and consistent information about the system’s design, functionality, performance, and maintenance. Documentation readability can also facilitate communication and collaboration among different stakeholders, such as developers, testers, users, and managers.

Refactoring: Refactoring is the process of improving the structure and quality of code without changing its functionality. Refactoring can help reduce technical debt by eliminating code smells, such as duplication, complexity, or inconsistency. Refactoring can also enhance the readability, maintainability, and extensibility of code.

Question #25

Which of the following describes a neural network without an activation function?

A . A form of a linear regression
B . A form of a quantile regression
C . An unsupervised learning technique
D . A radial basis function kernel

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

A neural network without an activation function is equivalent to a form of a linear regression. A neural network is a computational model that consists of layers of interconnected nodes (neurons) that process inputs and produce outputs. An activation function is a function that determines the output of a neuron based on its input. An activation function can introduce non-linearity into a neural network, which allows it to model complex and non-linear relationships between inputs and outputs. Without an activation function, a neural network becomes a linear combination of inputs and weights, which is essentially a linear regression model.

Question #26

The following confusion matrix is produced when a classifier is used to predict labels on a test dataset.

How precise is the classifier?

A . 48/(48+37)
B . 37/(37+8)
C . 37/(37+7)
D . (48+37)/100

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Precision is a measure of how well a classifier can avoid false positives (incorrectly predicted positive cases). Precision is calculated by dividing the number of true positives (correctly predicted positive cases) by the number of predicted positive cases (true positives and false positives). In this confusion matrix, the true positives are 37 and the false positives are 8, so the precision is 37/(37+8) = 0.822.

Question #27

Given a feature set with rows that contain missing continuous values, and assuming the data is normally distributed, what is the best way to fill in these missing features?

A . Delete entire rows that contain any missing features.
B . Fill in missing features with random values for that feature in the training set.
C . Fill in missing features with the average of observed values for that feature in the entire dataset.
D . Delete entire columns that contain any missing features.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Missing values are a common problem in data analysis and machine learning, as they can affect the quality and reliability of the data and the model. There are various methods to deal with missing values, such as deleting, imputing, or ignoring them. One of the most common methods is imputing, which means replacing the missing values with some estimated values based on some criteria. For continuous variables, one of the simplest and most widely used imputation methods is to fill in the missing values with the mean (average) of the observed values for that variable in the entire dataset. This method can preserve the overall distribution and variance of the data, as well as avoid introducing bias or noise.

Question #28

In addition to understanding model performance, what does continuous monitoring of bias and variance help ML engineers to do?

A . Detect hidden attacks
B . Prevent hidden attacks
C . Recover from hidden attacks
D . Respond to hidden attacks

Reveal Solution Hide Solution

Correct Answer: A
A

Explanation:

Hidden attacks are malicious activities that aim to compromise or manipulate an ML system without being detected or noticed. Hidden attacks can target different stages of an ML workflow, such as data collection, model training, model deployment, or model monitoring. Some examples of hidden attacks are data poisoning, backdoor attacks, model stealing, or adversarial examples. Continuous monitoring of bias and variance can help ML engineers to prevent hidden attacks, as it can help them detect any anomalies or deviations in the data or the model’s performance that may indicate a potential attack.

Question #29

A company is developing a merchandise sales application The product team uses training data to teach the AI model predicting sales, and discovers emergent bias.

What caused the biased results?

A . The AI model was trained in winter and applied in summer.
B . The application was migrated from on-premise to a public cloud.
C . The team set flawed expectations when training the model.
D . The training data used was inaccurate.

Reveal Solution Hide Solution

Correct Answer: C
C

Explanation:

Emergent bias is a type of bias that arises when an AI model encounters new or different data or scenarios that were not present or accounted for during its training or development. Emergent bias can cause the model to make inaccurate or unfair predictions or decisions, as it may not be able to generalize well to new situations or adapt to changing conditions. One possible cause of emergent bias is seasonality, which means that some variables or patterns in the data may vary depending on the time of year. For example, if an AI model for merchandise sales prediction was trained in winter and applied in summer, it may produce biased results due to differences in customer behavior, demand, or preferences.

Question #30

You train a neural network model with two layers, each layer having four nodes, and realize that the model is underfit.

Which of the actions below will NOT work to fix this underfitting?

A . Add features to training data
B . Get more training data
C . Increase the complexity of the model
D . Train the model for more epochs

Reveal Solution Hide Solution

Correct Answer: B
B

Explanation:

Underfitting is a problem that occurs when a model learns too little from the training data and fails to capture the underlying complexity or structure of the data. Underfitting can result from using insufficient or irrelevant features, a low complexity of the model, or a lack of training data. Underfitting can reduce the accuracy and generalization of the model, as it may produce oversimplified or inaccurate predictions.

Some of the ways to fix underfitting are:

Add features to training data: Adding more features or variables to the training data can help increase the information and diversity of the data, which can help the model learn more complex patterns and relationships.

Increase the complexity of the model: Increasing the complexity of the model can help increase its expressive power and flexibility, which can help it fit better to the data. For example, adding more layers or nodes to a neural network can increase its complexity.

Train the model for more epochs: Training the model for more epochs can help increase its learning ability and convergence, which can help it optimize its parameters and reduce its error.

Getting more training data will not work to fix underfitting, as it will not change the complexity or structure of the data or the model. Getting more training data may help with overfitting, which is when a model learns too much from the training data and fails to generalize well to new or unseen data.