€ 1.399,- (excl. VAT)
2 days
Utrecht, Utrecht (Open Leercentrum)
Description
In the training "Data Engineering with Azure Databricks" we will work with Databricks for two days to
build a Data Lakehouse. The entire spectrum is covered here: architecture and design, Databricks setup,
Implementation of transformations, orchestration of your tasks, version management and everything else that is
needed for your Data Lakehouse. At the end of the training you will be able to independently set up a Data Lakehouse
within Databricks.
This training is primarily aimed at Data Engineers and Data Warehouse developers or managers who have experience
with Data Warehousing or other forms of (batch) data processing. Participants know that they want to learn more
about Databricks and building a Data Lakehouse. Most participants already have some experience with cloud
environments, but it is not mandatory: even if you are making the step from an on-premises Data Warehouse to a Data
Lakehouse, this training fits in well.
After this training you:
know what the architecture of a Data Lakehouse looks like and works
understand the principles of Databricks, Data Lakehouses and Delta Lake
can set up Databricks independently for a Data Lakehouse
understand how Delta Lake Storage works and how it enables a Data Lakehouse
can manage files in your Data Lake using Databricks
can do orchestration within Databricks using jobs
know which layers exist in a Data Lakehouse and how you can bring data from your practice here
can do transformation and integration on the data in Databricks using PySpark and SparkSQL
know how to deal with schedules and schedule evolution
Prerequisites to follow the Data Engineering with Azure Databricks training
To participate in this training it is important that you have prior knowledge of the following topics:
Basic knowledge of SQL:
Query Concepts (SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, and JOIN)
DDL (CREATE, ALTER, DROP of tables and databases)
DML (DELETE, INSERT, UPDATE, MERGE)
Knowledge of data engineering in the cloud (VMs, storage accounts, AD accounts, etc.)
Basic knowledge of Python (modules, reading data, simple operations)
Incremental data processing with Structured Streaming and Auto Loader
Data Lakehouse architecture: the "medallions" (gold, silver, bronze)
Delta Live Tables
Orchestration with Jobs
Databricks SQL
Rights management
Bring dashboards and queries into production
Dbfs: the DataBricks File System
Manage your Data Lake from Databricks
Transform data into a data lakehouse using PySpark
Databricks tables: managed and unmanaged
Hive Metastore
Version management in Databricks
Time travel
Schedule enforcement
Scheme evolution
Study material
In the training "Data Engineering with Azure Databricks" we work with official Databricks material. We
make sure that you receive all the necessary material on time.