Yaroslav K

Big Data Engineer

Big Data Engineer with 7 years of experience designing and building cloud‐native data platforms and streaming pipelines on AWS and Azure Databricks. Proficient in Spark, Python, Scala, Java, and infrastructure‐as‐code. Proven ability to deliver scalable, reliable, and cost‐efficient solutions for global retail, telecom, and technology clients.

Key Expertise

Big Data EngineeringData Platform MigrationSpark Performance OptimizationCloud ModernizationScalable Data Architectures

Experience

8+ years

Timezone

CET (UTC +1)

Skills

AI / ML

SageMaker Unified Studio

Languages

Scala

Databases

HDFSPandasIcebergKuduOracleRedshiftAthena

Infrastructure

AWSTerraformJenkins CIAzure DevOpsYARNAzure Databricks

Frameworks

AirflowSparkHadoop

Integrations & Protocols

EventBridgeStep Functions

7-day risk-free trial

Response within 24 hours

1. Telecom BI Platform Migration to Hadoop

Big Data Engineer·2019-2020

Project overview:

The project involved migrating a legacy Oracle-based BI platform to a unified Hadoop-based solution for a major telecom company. The system supported ingestion and processing of TAP/RAP files containing telecom charging and tax data, while maintaining compatibility with existing Oracle ETL pipelines. The key challenge was improving scalability and processing efficiency while ensuring a smooth transition to a distributed data processing architecture.

Responsibilities:

Contributed to replacing legacy Oracle BI processing with a unified Hadoop-based solution built on Apache Spark.
Implemented ingestion pipelines for TAP/RAP telecom charging and tax files using Spark batch processing.
Ensured interoperability with existing Oracle ETL pipelines to support a smooth migration path.
Delivered end-to-end data processing pipelines for business-critical telecom reporting.
Optimized pipeline performance to improve processing efficiency and scalability.

Achievements:

Contributed to the replacement of a legacy Oracle BI platform with a Hadoop-based solution built around Apache Spark. Delivered end-to-end batch processing pipelines for telecom data and helped improve the scalability and performance of business-critical reporting workflows.

Technology stack:

ScalaHadoopHDFSYARNKuduSparkOracle

2. Enterprise Retail Data Platform

Big Data Engineer·2021-2025

Project overview:

The project involved building and maintaining an enterprise-scale data platform for a global apparel and footwear company. The platform processed shopping and transactional data to deliver curated datasets for analytics, reporting, and business decision-making. It combined Spark-based batch processing, lightweight Lambda workflows, Redshift analytical transformations, and unified orchestration. In later phases, the platform was migrated from AWS-based pipelines to Azure Databricks as part of the company’s cloud modernization strategy.

Responsibilities:

Designed and maintained production pipelines for shopping and transactional data, ensuring timely delivery of curated datasets within strict SLA requirements.
Optimized AWS Glue Spark jobs, reducing execution time by 40% through Spark tuning, transformation refactoring, and improved resource usage.
Designed and implemented the migration architecture for moving data processing workloads from AWS-based pipelines to Azure Databricks.
Built and optimized PySpark jobs on AWS Glue and Databricks, improving pipeline throughput, reliability, and maintainability.
Developed lightweight Pandas-based AWS Lambda jobs for small-scale processing workflows and operational automation.
Created reusable PySpark and Pandas-based internal libraries to standardize recurring data engineering patterns.
Implemented analytical transformations and stored procedures in Amazon Redshift to support reporting and downstream business use cases.
Orchestrated workflows using AWS Step Functions and Databricks Workflows, and automated infrastructure provisioning with Terraform, Jenkins, and Azure DevOps.

Achievements:

Improved AWS Glue job execution time by 40% through Spark performance optimization, transformation refactoring, and better resource utilization. Designed and implemented the target architecture for migrating data processing workloads from AWS to Azure Databricks. Delivered reliable shopping data pipelines within strict SLAs and created reusable PySpark and Pandas-based libraries that reduced repetitive development effort. Helped improve reporting consistency by exposing trusted curated datasets through Amazon Redshift.

Technology stack:

PythonAWSRedshiftStep FunctionsLambdaSparkPandasAzure DatabricksAzure DevOpsTerraformJenkins CI

3. Identity Verification Data Platform Modernization

Senior Big Data Engineer·2025-2026

Project overview:

The project involved modernizing a large-scale data processing platform used for identity validation, fraud detection, and analytical reporting. The system ingested data from external service providers and transformed it into reliable metrics for BI dashboards. A key part of the initiative was migrating the platform from Delta Lake to Apache Iceberg while preserving performance, stability, and cost efficiency. To reduce migration risk, a temporary dual-stack architecture was introduced, allowing Delta and Iceberg pipelines to run in parallel during the transition.

Responsibilities:

Designed and implemented PySpark-based pipelines for generating aggregated identity verification metrics for BI dashboards.
Built initial prototype pipelines using Athena and EventBridge before migrating the logic to production-grade Spark jobs.
Implemented a Scala-based dual-writer architecture to support parallel Delta Lake and Apache Iceberg writes during the migration phase.
Led the Delta Lake to Apache Iceberg migration, ensuring schema compatibility, partition strategy alignment, and metadata consistency.
Tuned Spark configurations and execution logic to improve Iceberg pipeline efficiency.
Integrated pipelines into Airflow DAGs and managed infrastructure changes using Terraform.
Worked across mixed Python and Scala codebases, including maintenance and extension of legacy Scala modules.

Achievements:

Led the migration from Delta Lake to Apache Iceberg while maintaining schema compatibility, partitioning consistency, and metadata reliability. Improved effective cluster resource utilization by 60% while keeping the solution cost-efficient. Tuned Spark execution and Iceberg configurations to achieve comparable or better performance than the legacy implementation.

Technology stack:

PythonScalaAWSEventBridgeSageMakerS3SageMaker Unified StudioAirflowSparkIcebergAzure DatabricksTerraformAzure DevOps

Key Expertise

Big Data EngineeringData Platform MigrationSpark Performance OptimizationCloud ModernizationScalable Data Architectures

Experience

8+ years

Timezone

CET (UTC +1)

Skills

AI / ML

SageMaker Unified Studio

Languages

Scala

Databases

HDFSPandasIcebergKuduOracleRedshiftAthena

Infrastructure

AWSTerraformJenkins CIAzure DevOpsYARNAzure Databricks

Frameworks

AirflowSparkHadoop

Integrations & Protocols

EventBridgeStep Functions

7-day risk-free trial

Response within 24 hours

Ready to Work with Yaroslav K?

Big Data Engineer

Share your project details and our team will review the match and confirm availability.

Browse More Experts

We respond within 24 hours.

Yaroslav K

1. Telecom BI Platform Migration to Hadoop

2. Enterprise Retail Data Platform

3. Identity Verification Data Platform Modernization

Ready to Work with Yaroslav K?

Claude Enterprise

Solutions

Gemini Enterprise

Company