What should the solutions architect do to prevent AWS Glue from reprocessing old data?

exams SAA-C03 V2 SAA-C03 exam 0 Comments

A company has an AWS Glue extract. transform, and load (ETL) job that runs every day at the same time. The job processes XML data that is in an Amazon S3 bucket.

New data is added to the S3 bucket every day. A solutions architect notices that AWS Glue is processing all the data during each run.

What should the solutions architect do to prevent AWS Glue from reprocessing old data?
A . Edit the job to use job bookmarks.
B. Edit the job to delete data after the data is processed
C. Edit the job by setting the NumberOfWorkers field to 1.
D. Use a FindMatches machine learning (ML) transform.

Answer: C

Explanation:

This is the purpose of bookmarks: "AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data." https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

Latest SAA-C03 Practice Questions with 400 Q&As

Updated Study Material | Instant Download | Detailed Answers and Explanations

Instant Download SAA-C03 PDF SAA-C03 Questions Online Training

0 Comments