What should the solutions architect do to prevent AWS Glue from reprocessing old data?

A company has an AWS Glue extract. transform, and load (ETL) job that runs every day at the same time. The job processes XML data that is in an Amazon S3 bucket.

New data is added to the S3 bucket every day. A solutions architect notices that AWS Glue is processing all the data during each run.

What should the solutions architect do to prevent AWS Glue from reprocessing old data?
A . Edit the job to use job bookmarks.
B. Edit the job to delete data after the data is processed
C. Edit the job by setting the NumberOfWorkers field to 1.
D. Use a FindMatches machine learning (ML) transform.

Answer: C

Explanation:

This is the purpose of bookmarks: "AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data." https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

Latest SAA-C03 Dumps Valid Version with 400 Q&As

Latest And Valid Q&A | Instant Download | Once Fail, Full Refund

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments