Vitalii P.
Senior Big Data Engineer / Platform Engineer
Senior Big Data Engineer with 12+ years of software development experience and deep expertise in building and modernizing large-scale data pipelines. Specialized in Apache Spark, Scala, and cloud-native data infrastructure across Kubernetes, Databricks, and AWS. Proven track record of migrating mission-critical systems, reducing technical debt, and improving pipeline reliability at companies including Grid Dynamics, Comcast, and TiVo/Xperi. Combines strong platform engineering skills with a collaborative, results-driven approach to solving complex data challenges.
Key Expertise
Experience
12+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
1. Centralized Data Platform & Configuration-Driven Framework
Project overview:
Co-architected a configuration-driven unified abstract framework that abstracts heterogeneous data sources - HDFS, S3, Kafka, and Iceberg - behind a single declarative interface for a next-generation centralized data platform. The framework standardizes how dozens of teams build, deploy, and operate Spark pipelines, replacing fragmented per-team implementations with a consistent foundation that enforces best practices and shortens time-to-production.
Responsibilities:
- Architected the configuration layer of the centralized platform, defining a declarative, source-agnostic schema that unified HDFS, S3, Kafka, and Iceberg under a single pipeline contract.
- Co-authored the abstraction interfaces and source connectors, ensuring extensibility for future data formats while preserving backward compatibility with legacy job definitions.
- Onboarded the platform’s daily pipeline workload (50+ jobs) to the new framework, validating output parity, performance characteristics, and observability across environments.
- Contributed to architectural design reviews and technical RFCs, aligning migration strategies with reliability, compliance, and long-term platform-evolution requirements.
- Built validation tooling using Jupyter notebooks to verify data correctness during cutover, enabling parallel-run comparisons between legacy and new pipelines at scale.
Achievements:
Onboarded the platform’s daily ingestion workload (50+ pipeline jobs) onto the new framework, with improved build stability and deployment flexibility. The configuration layer reduced per-pipeline boilerplate substantially and accelerated onboarding for new pipelines from days to hours, positioning the framework as the default ingestion pattern across the broader engineering organization.
Technology stack:
2. Spark Pipeline Migration from YARN to Kubernetes
Project overview:
Modernization of mission-critical content-moderation data infrastructure for one of the world’s largest technology companies, migrating legacy Spark-on-YARN pipelines to a cloud-native Spark-on-Kubernetes platform. The initiative enables elastic scaling, reduces operational overhead, and aligns the data stack with the broader enterprise shift toward containerized infrastructure across thousands of services.
Responsibilities:
- Designed and tuned the containerized Spark resource model on Kubernetes — cluster sizing, executor configuration, partition strategy — driving the migration’s compute-efficiency gains while validating performance parity against legacy YARN through systematic benchmarking.
- Owned the end-to-end migration of the content-moderation pipeline portfolio (Spark 2.x → 3.4), handling dependency upgrades, configuration tuning, and production cutover with zero data loss.
- Conducted dependency-tree audits across the platform, eliminating ~30% of obsolete libraries and resolving CVEs to harden the security posture.
- Built advanced logging and tracing instrumentation that surfaced root causes during complex builds and deployments, cutting debugging time by approximately 40%.
- Partnered with Site Reliability Engineers to finalize production onboarding — ensuring deployment pipelines, health checks, and observability met operational SLAs.
Achievements:
Successfully migrated the full content-moderation pipeline portfolio (20+ production pipelines, each processing 1–2 TB per run) to Kubernetes with full performance parity against legacy YARN. Reduced per-job compute footprint by ~6x — from 600 to 100 instances (3-core, 20 GB RAM each) — eliminated ~30% of obsolete dependencies (reducing CVE exposure), and cut debugging time by ~40% through enhanced observability. Delivered ahead of compliance deadlines.
Technology stack:
3. Delta Lake Migration & Auto-Scaling ETL Platform
Project overview:
End-to-end ownership of the data-exchange (DX) ETL platform on Databricks for a Tier-1 US telecom and media operator, supporting large-scale ingestion, transformation, and analytics workloads. The project encompassed migrating storage to Delta Lake for ACID guarantees, building auto-scaling compute infrastructure for volatile workloads, and automating operational tooling to reduce manual ops overhead across the data engineering team.
Responsibilities:
- Built and maintained 15+ production ETL pipelines on Spark/Databricks, processing large-scale daily ingestion and transformation workloads for downstream analytics consumers.
- Led the migration of pipeline storage to Delta Lake, introducing ACID compliance and time-travel semantics that boosted query performance roughly 3x and eliminated entire classes of correctness bugs.
- Designed auto-scaling infrastructure using AWS Lambda to dynamically provision Databricks resources based on workload signals, reducing compute spend by ~25% while sustaining throughput during peak loads.
- Developed Databricks notebooks that automated provisioning of AWS CloudWatch dashboards and alarms across 10+ services, replacing manual configuration and accelerating incident response.
- Enhanced Concourse CI jobs and reporting pipelines, improving release cadence and giving the team continuous visibility into data quality and pipeline health metrics.
Achievements:
Improved query performance by approximately 3x via the Delta Lake migration, cut compute costs by ~25% through Lambda-driven auto-scaling, and eliminated significant manual overhead by automating CloudWatch dashboard and alarm provisioning across 10+ services.
Technology stack:
Key Expertise
Experience
12+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
Ready to Work with Vitalii P.?
Senior Big Data Engineer / Platform Engineer
Share your project details and our team will review the match and confirm availability.
We respond within 24 hours.