DP-750: Implementing Data Engineering Solutions with Azure Databricks

Price
Net
VAT

Price
Price on Request

Duration
4 days

For companies and job seekers:
this course is 100% fundable!
 

Location

Course Language
English

Training Solutions
Online Live

Data volumes are growing rapidly, processes are becoming more complex, and modern businesses need powerful platforms for analysis, automation, and intelligent evaluation. Cloud-based data engineering solutions are therefore becoming increasingly important. Azure Databricks combines scalable data processing with flexible workflows and provides the technical foundation for high-performance data architectures in modern IT environments.

Key Topics

  • Development of scalable data pipelines
  • Processing large data volumes with Apache Spark
  • Building modern lakehouse architectures
  • Integration of different data sources
  • Optimizing performance and data quality
  • Automation of cloud-based data processes
  • Managing structured and unstructured data
  • Analysis of streaming and real-time data

Prerequisites
Basic knowledge of databases, SQL, and cloud technologies. A technical understanding of data processes and analytical structures is helpful.

Target Audience
Suitable for professionals in data engineering, cloud computing, analytics, business intelligence, software development, and IT projects.

Modern data platforms shape digital business processes, AI applications, and data-driven decisions. Up-to-date expertise in Azure Databricks opens up promising opportunities for challenging projects with high technological relevance.

Print as PDF
Course Content
  • Selecting Appropriate Computer Types
  • Configuring performance and scaling
  • Setup of Photon, Runtime, Spark, and ML
  • Installing libraries
  • Managing access rights
  • Apply naming conventions
  • Create catalogs and schemas
  • Create volumes
  • Create tables, views, and materialized views
  • Configure external catalogs
  • Implement DDL operations for tables
  • Configure AI/BI Genie for Data Discovery
  • Grant permissions for Unity Catalog objects
  • Implement table, column, and row-level security
  • Use Azure Key Vault keys in Azure Databricks
  • Authenticate data access with service principals
  • Authenticate resource access with managed identities
  • Maintaining table and column definitions
  • Configuring ABAC with tags and guidelines
  • Setting up row filters and column masking
  • Implementation of data retention policies
  • Managing data lineage in Catalog Explorer
  • Configuring audit logs
  • Developing secure delta sharing strategies
  • Configure data ingestion and data sources
  • Select a suitable ingestion tool (e.g., Lakeflow Connect, Notebooks, Azure Data Factory)
  • Select batch or streaming ingestion
  • Select table format (Parquet, Delta, CSV, JSON, Iceberg)
  • Define the partitioning scheme
  • Select SCD type
  • Specify appropriate data granularity
  • Map the timeline of changes
  • Define clustering strategy (Liquid Clustering, Z-Ordering, Deletion Vectors)
  • Choose between managed and unmanaged tables
  • Data Acquisition with Lakeflow Connect (Batch & Streaming)
  • Data collection using laptops (batch & streaming)
  • Data import via SQL (CTAS, CREATE OR REPLACE, COPY INTO)
  • Data integration via CDC feeds
  • Data ingestion with Spark Structured Streaming
  • Capturing streaming data from Azure Event Hubs
  • Data integration with Lakeflow Spark Declarative Pipelines, including Auto Loader
  • Profile data and analyze distributions
  • Select appropriate column types
  • Clean up duplicates, missing values, and null values
  • Filter, group, and aggregate data
  • Link data using Join, Union, and Intersect
  • Pivot and denormalize data
  • Load data using Merge, Insert, and Append
  • Implement validation checks for null values, cardinality, and value ranges
  • Implement data type checks
  • Manage schema validation and schema deviations
  • Control data quality with pipeline expectations in Lakeflow Spark Declarative Pipelines
  • Define workflows for data pipelines
  • Choose between Notebook and Lakeflow pipelines
  • Develop task logic for Lakeflow
  • Error handling for pipelines and jobs
  • Create a data pipeline with Notebook
  • Create a data pipeline with Lakeflow Pipelines
  • Creating a job, including setup and configuration
  • Configuring job triggers
  • Planning a project
  • Configuring notifications for a job
  • Configuring automatic restarts for a job or data pipeline
  • Using Git for Version Control
  • Managing branches, pull requests, and conflicts
  • Implementing test strategies (component, integration, E2E, and UAT tests)
  • Configuring and packaging Databricks Asset Bundles
  • Deploy bundles via the Azure Databricks CLI
  • Deploy bundles via the REST API
  • Managing cluster utilization for performance and cost optimization
  • Troubleshooting in Lakeflow jobs, including repair, restart, pause, and run functions
  • Troubleshooting and performance optimization in Apache Spark jobs and notebooks
  • Analysis and resolution of caching, skew, spill, and shuffle issues using DAG, Spark UI, and Query Profile
  • Optimizing Delta tables with OPTIMIZE and VACUUM
  • Implementing log streaming with Log Analytics in Azure Monitor
  • Configuring alerts in Azure Monitor

Frequently Asked Questions

  • Develop data pipelines, process large volumes of data, and deploy scalable solutions with Azure Databricks.
  • The course demonstrates how real-time data, analytics, and cloud workflows can be efficiently integrated into a single platform.
  • Azure Databricks, Apache Spark, Delta Lake, and modern data engineering workflows for production-ready cloud projects.
  • Faster data processing, automated pipelines, and robust analytics solutions for large datasets.
  • For anyone looking to build a professional data engineering practice with Azure or expand their existing cloud skills in a targeted way.
  • The focus is on scalable architectures, high-performance processing, and practical Databricks solutions.
  • Improved data quality, reliable pipelines, and faster analytics through modern lakehouse technologies.
  • Because data integration, AI, and analytics can be efficiently combined into a powerful platform.

Do you have any further questions? Please contact us.