Skip to main content
Download free report
SoftBlues
SoftBlues
Back to Projects

Delta Lake Migration & Auto-Scaling ETL Platform

Senior Data Engineer2020 - 2024Vitalii P.
Vitalii P.
Vitalii P.

Senior Big Data Engineer / Platform Engineer

Data Engineer & Big Data

Key Expertise

Big Data EngineeringDelta Lake MigrationConfiguration-Driven ArchitectureCloud-Native InfrastructureETL Pipeline OptimizationScalable Data PipelinesData Platform Architecting

Experience

12+ years

Timezone

CET (UTC +1)

Skills

AI / ML

AI-assisted Migration ToolingJupyter Notebooks

Languages

PythonScala

Databases

HDFSApache IcebergDatabricksAWS S3Delta Lake

Infrastructure

AWS CloudWatchAWS LambdaJenkins CI/CDKubernetesDockerCI/CDYARN

Frameworks

Configuration-Driven ArchitectureCustom Logging & Tracing FrameworksApache Spark

Integrations & Protocols

AWS KinesisConcourse CIApache Kafka
7-day risk-free trial
Response within 24 hours
View Full Profile

Overview

End-to-end ownership of the data-exchange (DX) ETL platform on Databricks for a Tier-1 US telecom and media operator, supporting large-scale ingestion, transformation, and analytics workloads. The project encompassed migrating storage to Delta Lake for ACID guarantees, building auto-scaling compute infrastructure for volatile workloads, and automating operational tooling to reduce manual ops overhead across the data engineering team.

Achievements

Improved query performance by approximately 3x via the Delta Lake migration, cut compute costs by ~25% through Lambda-driven auto-scaling, and eliminated significant manual overhead by automating CloudWatch dashboard and alarm provisioning across 10+ services.

Responsibilities

  • Built and maintained 15+ production ETL pipelines on Spark/Databricks, processing large-scale daily ingestion and transformation workloads for downstream analytics consumers.
  • Led the migration of pipeline storage to Delta Lake, introducing ACID compliance and time-travel semantics that boosted query performance roughly 3x and eliminated entire classes of correctness bugs.
  • Designed auto-scaling infrastructure using AWS Lambda to dynamically provision Databricks resources based on workload signals, reducing compute spend by ~25% while sustaining throughput during peak loads.
  • Developed Databricks notebooks that automated provisioning of AWS CloudWatch dashboards and alarms across 10+ services, replacing manual configuration and accelerating incident response.
  • Enhanced Concourse CI jobs and reporting pipelines, improving release cadence and giving the team continuous visibility into data quality and pipeline health metrics.

Technologies Used

Apache SparkScalaAzure DatabricksDelta LakeApache KafkaAWS KinesisAWS LambdaAWS CloudWatchConcourse CIPython
Vitalii P.

This project was delivered by

Vitalii P.

View Full Profile

Ready to Build Your AI Team?

Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.

We respond within 24 hours.