Which Apache Spark SQL operation should you use?

You have an Azure Databricks workspace that contains a Delta Lake dimension table named Tablet. Table1 is a Type 2 slowly changing dimension (SCD) table. You need to apply updates from a source table to Table1.

Which Apache Spark SQL operation should you use?
A . CREATE
B. UPDATE
C. MERGE
D. ALTER

Answer: C

Explanation:

The Delta provides the ability to infer the schema for data input which further reduces the effort required in managing the schema changes. The Slowly Changing Data(SCD) Type 2 records all the changes made to each key in the dimensional table. These operations require updating the existing rows to mark the previous values of the keys as old and then inserting new rows as the latest values. Also, Given a source table with the updates and the target table with dimensional data, SCD Type 2 can be expressed with the merge.

Example:

// Implementing SCD Type 2 operation using merge function

customersTable

as("customers")

merge(

stagedUpdates.as("staged_updates"),

"customers.customerId = mergeKey")

whenMatched("customers.current = true AND customers.address <>

staged_updates.address")

updateExpr(Map(

"current" -> "false",

"endDate" -> "staged_updates.effectiveDate"))

whenNotMatched()

insertExpr(Map(

"customerid" -> "staged_updates.customerId",

"address" -> "staged_updates.address",

"current" -> "true",

"effectiveDate" -> "staged_updates.effectiveDate",

"endDate" -> "null"))

execute()

}

Reference: https://www.projectpro.io/recipes/what-is-slowly-changing-data-scd-type-2-operation-delta-table-databricks

Latest DP-203 Dumps Valid Version with 116 Q&As

Latest And Valid Q&A | Instant Download | Once Fail, Full Refund

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments