Skip to main content
Download free report
SoftBlues
SoftBlues
Back to Projects

Centralized Data Platform & Configuration-Driven Framework

Senior Big Data Engineer2024 – PresentVitalii P.
Vitalii P.
Vitalii P.

Senior Big Data Engineer / Platform Engineer

Data Engineer & Big Data

Key Expertise

Big Data EngineeringDelta Lake MigrationConfiguration-Driven ArchitectureCloud-Native InfrastructureETL Pipeline OptimizationScalable Data PipelinesData Platform Architecting

Experience

12+ years

Timezone

CET (UTC +1)

Skills

AI / ML

AI-assisted Migration ToolingJupyter Notebooks

Languages

PythonScala

Databases

HDFSApache IcebergDatabricksAWS S3Delta Lake

Infrastructure

AWS CloudWatchAWS LambdaJenkins CI/CDKubernetesDockerCI/CDYARN

Frameworks

Configuration-Driven ArchitectureCustom Logging & Tracing FrameworksApache Spark

Integrations & Protocols

AWS KinesisConcourse CIApache Kafka
7-day risk-free trial
Response within 24 hours
View Full Profile

Overview

Co-architected a configuration-driven unified abstract framework that abstracts heterogeneous data sources - HDFS, S3, Kafka, and Iceberg - behind a single declarative interface for a next-generation centralized data platform. The framework standardizes how dozens of teams build, deploy, and operate Spark pipelines, replacing fragmented per-team implementations with a consistent foundation that enforces best practices and shortens time-to-production.

Achievements

Onboarded the platform’s daily ingestion workload (50+ pipeline jobs) onto the new framework, with improved build stability and deployment flexibility. The configuration layer reduced per-pipeline boilerplate substantially and accelerated onboarding for new pipelines from days to hours, positioning the framework as the default ingestion pattern across the broader engineering organization.

Responsibilities

  • Architected the configuration layer of the centralized platform, defining a declarative, source-agnostic schema that unified HDFS, S3, Kafka, and Iceberg under a single pipeline contract.
  • Co-authored the abstraction interfaces and source connectors, ensuring extensibility for future data formats while preserving backward compatibility with legacy job definitions.
  • Onboarded the platform’s daily pipeline workload (50+ jobs) to the new framework, validating output parity, performance characteristics, and observability across environments.
  • Contributed to architectural design reviews and technical RFCs, aligning migration strategies with reliability, compliance, and long-term platform-evolution requirements.
  • Built validation tooling using Jupyter notebooks to verify data correctness during cutover, enabling parallel-run comparisons between legacy and new pipelines at scale.

Technologies Used

Apache SparkKubernetesHDFSAWS S3Apache KafkaApache IcebergDelta LakeJupyter NotebooksConfiguration-Driven ArchitectureCI/CD
Vitalii P.

This project was delivered by

Vitalii P.

View Full Profile

Ready to Build Your AI Team?

Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.

We respond within 24 hours.