Centralized Data Platform & Configuration-Driven Framework
Key Expertise
Experience
12+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
Overview
Co-architected a configuration-driven unified abstract framework that abstracts heterogeneous data sources - HDFS, S3, Kafka, and Iceberg - behind a single declarative interface for a next-generation centralized data platform. The framework standardizes how dozens of teams build, deploy, and operate Spark pipelines, replacing fragmented per-team implementations with a consistent foundation that enforces best practices and shortens time-to-production.
Achievements
Onboarded the platform’s daily ingestion workload (50+ pipeline jobs) onto the new framework, with improved build stability and deployment flexibility. The configuration layer reduced per-pipeline boilerplate substantially and accelerated onboarding for new pipelines from days to hours, positioning the framework as the default ingestion pattern across the broader engineering organization.
Responsibilities
- Architected the configuration layer of the centralized platform, defining a declarative, source-agnostic schema that unified HDFS, S3, Kafka, and Iceberg under a single pipeline contract.
- Co-authored the abstraction interfaces and source connectors, ensuring extensibility for future data formats while preserving backward compatibility with legacy job definitions.
- Onboarded the platform’s daily pipeline workload (50+ jobs) to the new framework, validating output parity, performance characteristics, and observability across environments.
- Contributed to architectural design reviews and technical RFCs, aligning migration strategies with reliability, compliance, and long-term platform-evolution requirements.
- Built validation tooling using Jupyter notebooks to verify data correctness during cutover, enabling parallel-run comparisons between legacy and new pipelines at scale.
Technologies Used
Key Expertise
Experience
12+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
This project was delivered by
Vitalii P.
More Projects by Vitalii P.
Spark Pipeline Migration from YARN to Kubernetes
Senior Big Data Engineer
Modernization of mission-critical content-moderation data infrastructure for one of the world’s largest technology companies, migrating legacy Spark-on-YARN pipelines to a cloud-native Spark-on-Kubernetes platform. The initiative enables elastic scaling, reduces operational overhead, and aligns the data stack with the broader enterprise shift toward containerized infrastructure across thousands of services.
Delta Lake Migration & Auto-Scaling ETL Platform
Senior Data Engineer
End-to-end ownership of the data-exchange (DX) ETL platform on Databricks for a Tier-1 US telecom and media operator, supporting large-scale ingestion, transformation, and analytics workloads. The project encompassed migrating storage to Delta Lake for ACID guarantees, building auto-scaling compute infrastructure for volatile workloads, and automating operational tooling to reduce manual ops overhead across the data engineering team.
Ready to Build Your AI Team?
Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.
We respond within 24 hours.