Vitalii P.

Senior Big Data Engineer / Platform Engineer

Senior Big Data Engineer with 12+ years of software development experience and deep expertise in building and modernizing large-scale data pipelines. Specialized in Apache Spark, Scala, and cloud-native data infrastructure across Kubernetes, Databricks, and AWS. Proven track record of migrating mission-critical systems, reducing technical debt, and improving pipeline reliability at companies including Grid Dynamics, Comcast, and TiVo/Xperi. Combines strong platform engineering skills with a collaborative, results-driven approach to solving complex data challenges.

Key Expertise

Big Data EngineeringDelta Lake MigrationConfiguration-Driven ArchitectureCloud-Native InfrastructureETL Pipeline OptimizationScalable Data PipelinesData Platform Architecting

Experience

12+ years

Timezone

CET (UTC +1)

Skills

AI / ML

AI-assisted Migration ToolingJupyter Notebooks

Languages

PythonScala

Databases

HDFSApache IcebergDatabricksAWS S3Delta Lake

Infrastructure

AWS CloudWatchAWS LambdaJenkins CI/CDKubernetesDockerCI/CDYARN

Frameworks

Configuration-Driven ArchitectureCustom Logging & Tracing FrameworksApache Spark

Integrations & Protocols

AWS KinesisConcourse CIApache Kafka

7-day risk-free trial

Response within 24 hours

1. Centralized Data Platform & Configuration-Driven Framework

Senior Big Data Engineer·2024 – Present

Project overview:

Co-architected a configuration-driven unified abstract framework that abstracts heterogeneous data sources - HDFS, S3, Kafka, and Iceberg - behind a single declarative interface for a next-generation centralized data platform. The framework standardizes how dozens of teams build, deploy, and operate Spark pipelines, replacing fragmented per-team implementations with a consistent foundation that enforces best practices and shortens time-to-production.

Responsibilities:

Architected the configuration layer of the centralized platform, defining a declarative, source-agnostic schema that unified HDFS, S3, Kafka, and Iceberg under a single pipeline contract.
Co-authored the abstraction interfaces and source connectors, ensuring extensibility for future data formats while preserving backward compatibility with legacy job definitions.
Onboarded the platform’s daily pipeline workload (50+ jobs) to the new framework, validating output parity, performance characteristics, and observability across environments.
Contributed to architectural design reviews and technical RFCs, aligning migration strategies with reliability, compliance, and long-term platform-evolution requirements.
Built validation tooling using Jupyter notebooks to verify data correctness during cutover, enabling parallel-run comparisons between legacy and new pipelines at scale.

Achievements:

Onboarded the platform’s daily ingestion workload (50+ pipeline jobs) onto the new framework, with improved build stability and deployment flexibility. The configuration layer reduced per-pipeline boilerplate substantially and accelerated onboarding for new pipelines from days to hours, positioning the framework as the default ingestion pattern across the broader engineering organization.

Technology stack:

Apache SparkKubernetesHDFSAWS S3Apache KafkaApache IcebergDelta LakeJupyter NotebooksConfiguration-Driven ArchitectureCI/CD

2. Spark Pipeline Migration from YARN to Kubernetes

Senior Big Data Engineer·2024 – Present

Project overview:

Modernization of mission-critical content-moderation data infrastructure for one of the world’s largest technology companies, migrating legacy Spark-on-YARN pipelines to a cloud-native Spark-on-Kubernetes platform. The initiative enables elastic scaling, reduces operational overhead, and aligns the data stack with the broader enterprise shift toward containerized infrastructure across thousands of services.

Responsibilities:

Designed and tuned the containerized Spark resource model on Kubernetes — cluster sizing, executor configuration, partition strategy — driving the migration’s compute-efficiency gains while validating performance parity against legacy YARN through systematic benchmarking.
Owned the end-to-end migration of the content-moderation pipeline portfolio (Spark 2.x → 3.4), handling dependency upgrades, configuration tuning, and production cutover with zero data loss.
Conducted dependency-tree audits across the platform, eliminating ~30% of obsolete libraries and resolving CVEs to harden the security posture.
Built advanced logging and tracing instrumentation that surfaced root causes during complex builds and deployments, cutting debugging time by approximately 40%.
Partnered with Site Reliability Engineers to finalize production onboarding — ensuring deployment pipelines, health checks, and observability met operational SLAs.

Achievements:

Successfully migrated the full content-moderation pipeline portfolio (20+ production pipelines, each processing 1–2 TB per run) to Kubernetes with full performance parity against legacy YARN. Reduced per-job compute footprint by ~6x — from 600 to 100 instances (3-core, 20 GB RAM each) — eliminated ~30% of obsolete dependencies (reducing CVE exposure), and cut debugging time by ~40% through enhanced observability. Delivered ahead of compliance deadlines.

Technology stack:

Apache SparkKubernetesScalaPythonDockerYARNJenkins CI/CDCustom Logging & Tracing Frameworks

3. Delta Lake Migration & Auto-Scaling ETL Platform

Senior Data Engineer·2020 - 2024

Project overview:

End-to-end ownership of the data-exchange (DX) ETL platform on Databricks for a Tier-1 US telecom and media operator, supporting large-scale ingestion, transformation, and analytics workloads. The project encompassed migrating storage to Delta Lake for ACID guarantees, building auto-scaling compute infrastructure for volatile workloads, and automating operational tooling to reduce manual ops overhead across the data engineering team.

Responsibilities:

Built and maintained 15+ production ETL pipelines on Spark/Databricks, processing large-scale daily ingestion and transformation workloads for downstream analytics consumers.
Led the migration of pipeline storage to Delta Lake, introducing ACID compliance and time-travel semantics that boosted query performance roughly 3x and eliminated entire classes of correctness bugs.
Designed auto-scaling infrastructure using AWS Lambda to dynamically provision Databricks resources based on workload signals, reducing compute spend by ~25% while sustaining throughput during peak loads.
Developed Databricks notebooks that automated provisioning of AWS CloudWatch dashboards and alarms across 10+ services, replacing manual configuration and accelerating incident response.
Enhanced Concourse CI jobs and reporting pipelines, improving release cadence and giving the team continuous visibility into data quality and pipeline health metrics.

Achievements:

Improved query performance by approximately 3x via the Delta Lake migration, cut compute costs by ~25% through Lambda-driven auto-scaling, and eliminated significant manual overhead by automating CloudWatch dashboard and alarm provisioning across 10+ services.

Technology stack:

Apache SparkScalaAzure DatabricksDelta LakeApache KafkaAWS KinesisAWS LambdaAWS CloudWatchConcourse CIPython

Key Expertise

Big Data EngineeringDelta Lake MigrationConfiguration-Driven ArchitectureCloud-Native InfrastructureETL Pipeline OptimizationScalable Data PipelinesData Platform Architecting

Experience

12+ years

Timezone

CET (UTC +1)

Skills

AI / ML

AI-assisted Migration ToolingJupyter Notebooks

Languages

PythonScala

Databases

HDFSApache IcebergDatabricksAWS S3Delta Lake

Infrastructure

AWS CloudWatchAWS LambdaJenkins CI/CDKubernetesDockerCI/CDYARN

Frameworks

Configuration-Driven ArchitectureCustom Logging & Tracing FrameworksApache Spark

Integrations & Protocols

AWS KinesisConcourse CIApache Kafka

7-day risk-free trial

Response within 24 hours

Ready to Work with Vitalii P.?

Senior Big Data Engineer / Platform Engineer

Share your project details and our team will review the match and confirm availability.

Browse More Experts

We respond within 24 hours.

Vitalii P.

1. Centralized Data Platform & Configuration-Driven Framework

2. Spark Pipeline Migration from YARN to Kubernetes

3. Delta Lake Migration & Auto-Scaling ETL Platform

Ready to Work with Vitalii P.?

Claude Enterprise

Solutions

Gemini Enterprise

Company