Skip to main content
Download free report
SoftBlues
SoftBlues

Vitalii P.

Senior Big Data Engineer / Platform Engineer

Senior Big Data Engineer with 12+ years of software development experience and deep expertise in building and modernizing large-scale data pipelines. Specialized in Apache Spark, Scala, and cloud-native data infrastructure across Kubernetes, Databricks, and AWS. Proven track record of migrating mission-critical systems, reducing technical debt, and improving pipeline reliability at companies including Grid Dynamics, Comcast, and TiVo/Xperi. Combines strong platform engineering skills with a collaborative, results-driven approach to solving complex data challenges.

Key Expertise

Big Data EngineeringDelta Lake MigrationConfiguration-Driven ArchitectureCloud-Native InfrastructureETL Pipeline OptimizationScalable Data PipelinesData Platform Architecting

Experience

12+ years

Timezone

CET (UTC +1)

Skills

AI / ML

AI-assisted Migration ToolingJupyter Notebooks

Languages

PythonScala

Databases

HDFSApache IcebergDatabricksAWS S3Delta Lake

Infrastructure

AWS CloudWatchAWS LambdaJenkins CI/CDKubernetesDockerCI/CDYARN

Frameworks

Configuration-Driven ArchitectureCustom Logging & Tracing FrameworksApache Spark

Integrations & Protocols

AWS KinesisConcourse CIApache Kafka
7-day risk-free trial
Response within 24 hours

1. Centralized Data Platform & Configuration-Driven Framework

Senior Big Data Engineer·2024 – Present

Project overview:

Co-architected a configuration-driven unified abstract framework that abstracts heterogeneous data sources - HDFS, S3, Kafka, and Iceberg - behind a single declarative interface for a next-generation centralized data platform. The framework standardizes how dozens of teams build, deploy, and operate Spark pipelines, replacing fragmented per-team implementations with a consistent foundation that enforces best practices and shortens time-to-production.

Responsibilities:

  • Architected the configuration layer of the centralized platform, defining a declarative, source-agnostic schema that unified HDFS, S3, Kafka, and Iceberg under a single pipeline contract.
  • Co-authored the abstraction interfaces and source connectors, ensuring extensibility for future data formats while preserving backward compatibility with legacy job definitions.
  • Onboarded the platform’s daily pipeline workload (50+ jobs) to the new framework, validating output parity, performance characteristics, and observability across environments.
  • Contributed to architectural design reviews and technical RFCs, aligning migration strategies with reliability, compliance, and long-term platform-evolution requirements.
  • Built validation tooling using Jupyter notebooks to verify data correctness during cutover, enabling parallel-run comparisons between legacy and new pipelines at scale.

Achievements:

Onboarded the platform’s daily ingestion workload (50+ pipeline jobs) onto the new framework, with improved build stability and deployment flexibility. The configuration layer reduced per-pipeline boilerplate substantially and accelerated onboarding for new pipelines from days to hours, positioning the framework as the default ingestion pattern across the broader engineering organization.

Technology stack:

Apache SparkKubernetesHDFSAWS S3Apache KafkaApache IcebergDelta LakeJupyter NotebooksConfiguration-Driven ArchitectureCI/CD

2. Spark Pipeline Migration from YARN to Kubernetes

Senior Big Data Engineer·2024 – Present

Project overview:

Modernization of mission-critical content-moderation data infrastructure for one of the world’s largest technology companies, migrating legacy Spark-on-YARN pipelines to a cloud-native Spark-on-Kubernetes platform. The initiative enables elastic scaling, reduces operational overhead, and aligns the data stack with the broader enterprise shift toward containerized infrastructure across thousands of services.

Responsibilities:

  • Designed and tuned the containerized Spark resource model on Kubernetes — cluster sizing, executor configuration, partition strategy — driving the migration’s compute-efficiency gains while validating performance parity against legacy YARN through systematic benchmarking.
  • Owned the end-to-end migration of the content-moderation pipeline portfolio (Spark 2.x → 3.4), handling dependency upgrades, configuration tuning, and production cutover with zero data loss.
  • Conducted dependency-tree audits across the platform, eliminating ~30% of obsolete libraries and resolving CVEs to harden the security posture.
  • Built advanced logging and tracing instrumentation that surfaced root causes during complex builds and deployments, cutting debugging time by approximately 40%.
  • Partnered with Site Reliability Engineers to finalize production onboarding — ensuring deployment pipelines, health checks, and observability met operational SLAs.

Achievements:

Successfully migrated the full content-moderation pipeline portfolio (20+ production pipelines, each processing 1–2 TB per run) to Kubernetes with full performance parity against legacy YARN. Reduced per-job compute footprint by ~6x — from 600 to 100 instances (3-core, 20 GB RAM each) — eliminated ~30% of obsolete dependencies (reducing CVE exposure), and cut debugging time by ~40% through enhanced observability. Delivered ahead of compliance deadlines.

Technology stack:

Apache SparkKubernetesScalaPythonDockerYARNJenkins CI/CDCustom Logging & Tracing Frameworks

3. Delta Lake Migration & Auto-Scaling ETL Platform

Senior Data Engineer·2020 - 2024

Project overview:

End-to-end ownership of the data-exchange (DX) ETL platform on Databricks for a Tier-1 US telecom and media operator, supporting large-scale ingestion, transformation, and analytics workloads. The project encompassed migrating storage to Delta Lake for ACID guarantees, building auto-scaling compute infrastructure for volatile workloads, and automating operational tooling to reduce manual ops overhead across the data engineering team.

Responsibilities:

  • Built and maintained 15+ production ETL pipelines on Spark/Databricks, processing large-scale daily ingestion and transformation workloads for downstream analytics consumers.
  • Led the migration of pipeline storage to Delta Lake, introducing ACID compliance and time-travel semantics that boosted query performance roughly 3x and eliminated entire classes of correctness bugs.
  • Designed auto-scaling infrastructure using AWS Lambda to dynamically provision Databricks resources based on workload signals, reducing compute spend by ~25% while sustaining throughput during peak loads.
  • Developed Databricks notebooks that automated provisioning of AWS CloudWatch dashboards and alarms across 10+ services, replacing manual configuration and accelerating incident response.
  • Enhanced Concourse CI jobs and reporting pipelines, improving release cadence and giving the team continuous visibility into data quality and pipeline health metrics.

Achievements:

Improved query performance by approximately 3x via the Delta Lake migration, cut compute costs by ~25% through Lambda-driven auto-scaling, and eliminated significant manual overhead by automating CloudWatch dashboard and alarm provisioning across 10+ services.

Technology stack:

Apache SparkScalaAzure DatabricksDelta LakeApache KafkaAWS KinesisAWS LambdaAWS CloudWatchConcourse CIPython
Vitalii P.

Ready to Work with Vitalii P.?

Senior Big Data Engineer / Platform Engineer

Share your project details and our team will review the match and confirm availability.

Browse More Experts

We respond within 24 hours.