Delta Lake — Automatic Schema Evolution | by Vitor Teixeira | Mar, 2023

By Jessie Hobb On Mar 10, 2023

What happens and what you can/can’t do when merging evolutive DataFrames

In the last post, we covered the transaction log and how to keep Delta Tables fast and clean. This time we will be covering automatic schema evolution in Delta tables.

Schema evolution is a critical aspect of managing data over time. It is very common for data sources to evolve and adapt to new business requirements, which might mean adding or removing fields from an existing data schema. As a data consumer, it is imperative a quick and agile adaption to the new characteristics of the data sources and automatic schema evolution allows us to seamlessly adapt to these changes.

In this post, we will cover automatic schema evolution in Delta while using the people10m public dataset that is available on Databricks Community Edition. We’ll test adding and removing fields in several scenarios.

Automatic schema evolution can be enabled in two ways, depending on our workload. If we are doing blind appends, all we need to do is to enable mergeSchema option:

What happens and what you can/can’t do when merging evolutive DataFrames

In the last post, we covered the transaction log and how to keep Delta Tables fast and clean. This time we will be covering automatic schema evolution in Delta tables.

Automatic schema evolution can be enabled in two ways, depending on our workload. If we are doing blind appends, all we need to do is to enable mergeSchema option:

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Delta Lake — Automatic Schema Evolution | by Vitor Teixeira | Mar, 2023

What happens and what you can/can’t do when merging evolutive DataFrames

Adding a field

Removing a field

Renaming a column

Changing a column type/order

Adding/Removing a field in a struct

Adding/Removing a field in an array of structs

Adding/Removing a field in a map of structs

What happens and what you can/can’t do when merging evolutive DataFrames

Adding a field

Removing a field

Renaming a column

Changing a column type/order

Adding/Removing a field in a struct

Adding/Removing a field in an array of structs

Adding/Removing a field in a map of structs