Skip to main content
Download free report
SoftBlues
SoftBlues
Back to Projects

Enterprise Retail Data Platform

Big Data Engineer2021-2025Yaroslav K
YK
Yaroslav K

Big Data Engineer

Data Engineer & Big Data

Experience

8+ years

Timezone

CET (UTC +1)

Skills

AI / ML

SageMaker Unified Studio

Languages

Scala

Databases

HDFSPandasIcebergKuduOracleRedshiftAthena

Infrastructure

AWSTerraformJenkins CIAzure DevOpsYARNAzure Databricks

Frameworks

AirflowSparkHadoop

Integrations & Protocols

EventBridgeStep Functions
7-day risk-free trial
Response within 24 hours
View Full Profile

Overview

The project involved building and maintaining an enterprise-scale data platform for a global apparel and footwear company. The platform processed shopping and transactional data to deliver curated datasets for analytics, reporting, and business decision-making. It combined Spark-based batch processing, lightweight Lambda workflows, Redshift analytical transformations, and unified orchestration. In later phases, the platform was migrated from AWS-based pipelines to Azure Databricks as part of the company’s cloud modernization strategy.

Achievements

Improved AWS Glue job execution time by 40% through Spark performance optimization, transformation refactoring, and better resource utilization. Designed and implemented the target architecture for migrating data processing workloads from AWS to Azure Databricks. Delivered reliable shopping data pipelines within strict SLAs and created reusable PySpark and Pandas-based libraries that reduced repetitive development effort. Helped improve reporting consistency by exposing trusted curated datasets through Amazon Redshift.

Responsibilities

  • Designed and maintained production pipelines for shopping and transactional data, ensuring timely delivery of curated datasets within strict SLA requirements.
  • Optimized AWS Glue Spark jobs, reducing execution time by 40% through Spark tuning, transformation refactoring, and improved resource usage.
  • Designed and implemented the migration architecture for moving data processing workloads from AWS-based pipelines to Azure Databricks.
  • Built and optimized PySpark jobs on AWS Glue and Databricks, improving pipeline throughput, reliability, and maintainability.
  • Developed lightweight Pandas-based AWS Lambda jobs for small-scale processing workflows and operational automation.
  • Created reusable PySpark and Pandas-based internal libraries to standardize recurring data engineering patterns.
  • Implemented analytical transformations and stored procedures in Amazon Redshift to support reporting and downstream business use cases.
  • Orchestrated workflows using AWS Step Functions and Databricks Workflows, and automated infrastructure provisioning with Terraform, Jenkins, and Azure DevOps.

Technologies Used

PythonAWSRedshiftStep FunctionsLambdaSparkPandasAzure DatabricksAzure DevOpsTerraformJenkins CI
YK

This project was delivered by

Yaroslav K

View Full Profile

Ready to Build Your AI Team?

Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.

We respond within 24 hours.