Which of the statements are correct about lakehouse?

Which of the statements are correct about lakehouse?
A . Lakehouse only supports Machine learning workloads and Data warehouses support BI workloads
B. Lakehouse only supports end-to-end streaming workloads and Data warehouses support Batch workloads
C. Lakehouse does not support ACID
D. In Lakehouse Storage and compute are coupled
E. Lakehouse supports schema enforcement and evolution

Answer: E

Explanation:

The answer is Lakehouse supports schema enforcement and evolution,

Lakehouse using Delta lake can not only enforce a schema on write which is contrary to traditional big data systems that can only enforce a schema on read, it also supports evolving schema over time with the ability to control the evolution.

For example below is the Dataframe writer API and it supports three modes of enforcement and evolution,

Default: Only enforcement, no changes are allowed and any schema drift/evolution will result in failure.

Merge: Flexible, supports enforcement and evolution

✑ New columns are added ✑ Evolves nested columns

✑ Supports evolving data types, like Byte to Short to Integer to Bigint How to enable:

✑ DF.write.format("delta").option("mergeSchema", "true").saveAsTable("table_name")

✑ or

✑ spark.databricks.delta.schema.autoMerge = True ## Spark session

Overwrite: No enforcement

✑ Dropping columns

✑ Change string to integer ✑ Rename columns

How to enable:

✑ DF.write.format("delta").option("overwriteSchema", "True").saveAsTable("table_name")

What Is a Lakehouse? – The Databricks Blog


Graphical user interface, text, application

Description automatically generated

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments