Ihor M.
Data Engineering Team Lead
Lead Data Engineer with 11+ years of experience building data-intensive systems. Expert in Python and Cloud architecture, specializing in scaling global platforms to millions of records. Proven track record in leveraging LLMs and automation to accelerate delivery and reinforcing robust data foundations under high engineering standards.
Key Expertise
Experience
11+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
1. Large-Scale Data Platform for AI-Driven Recruiting
Project overview:
WorkHQ is an AI-powered recruiting platform designed to help companies source, contact, and manage talent at scale. The project centred on architecting and scaling a production-grade data platform capable of ingesting, normalising, and serving nearly 1 billion candidate profiles sourced from multiple global data providers. The core engineering challenge was transforming an unstable, custom legacy infrastructure (AWS S3 + Airflow) into a reliable, high-throughput Lakehouse architecture capable of supporting real-time semantic search and AI-powered candidate matching across 7 global regions.
Responsibilities:
- Led end-to-end technical strategy and architecture for the data platform, managing a team of two Data Engineers and driving key decisions across ingestion, transformation, and serving layers.
- Architected a Lakehouse-centric data delivery framework using Databricks and Delta Lake with Medallion Architecture, replacing legacy 2 TB S3-to-PostgreSQL pipelines.
- Designed a taxonomy mapping and normalisation system using vector embeddings and cosine similarity, with LLMs (OpenAI API) as an intelligent fallback for full alignment with global job-title standards.
- Implemented multi-region infrastructure to support daily ingestion across 7 global regions, scaling platform capacity 2.3x to 700M+ profiles.
- Directed the development of custom ML models for data extraction and LOT enrichment, boosting taxonomy coverage from 60% to 99%.
- Optimised synchronisation across OpenSearch and PostgreSQL to support high-throughput semantic search and Alembic-managed schema evolution.
Achievements:
Orchestrated a full architectural migration from Airflow + AWS S3 to Databricks and Delta Lake within 6 months, reducing pipeline execution time by 80% (from 24+ hours to 5 hours) and transitioning from weekly to daily incremental processing. Expanded global profile coverage 2.3x to 700M+ records while maintaining cost efficiency. Increased Lightcast Occupation Taxonomy (LOT) enrichment coverage from 60% to 99% through custom ML model development, improving data richness 4x across all candidate work experiences.
Technology stack:
2. AI Recruiter - Intelligent Candidate Matching & Automated Recruitment Pipeline
Project overview:
As a sub-project within the WorkHQ platform, the AI Recruiter was built to automate and intelligently augment the end-to-end recruitment workflow for HR teams. The system allows recruiters to search for the best-matched candidates either through a conversational chat interface or by uploading a job vacancy file. It combines semantic vector search across hundreds of millions of profiles with OpenAI Reasoning models to surface and explain the most relevant candidates. Beyond matching, the system automates the full downstream recruitment funnel - from application emails and online assessment links to interview slot scheduling and outcome notifications - all orchestrated via serverless AWS infrastructure.
Responsibilities:
- Designed and built the backend architecture for the AI Recruiter, including REST API design using AWS Lambda, AWS SAM, and Lambda Powertools — a serverless equivalent of FastAPI.
- Implemented semantic candidate search using vector databases and OpenAI Reasoning models to surface the most relevant profiles per vacancy, with transparent result explanations for HR users.
- Engineered the automated recruitment funnel using AWS Step Functions and Lambda: application email dispatch, online test delivery, interview slot selection, and outcome notification.
- Integrated external email services and communication workflows as part of the end-to-end automation pipeline.
- Managed infrastructure-as-code using CloudFormation and CI/CD pipelines via GitHub Actions, ensuring reliable, repeatable deployments across environments.
- Built the vacancy parsing layer to extract structured parameters (location, title, required skills) from job description files to drive vector search queries and LLM prompts.
Achievements:
Significantly reduced manual recruiter effort by automating the complete candidate pipeline from discovery through to interview scheduling. Delivered explainable AI matching results, increasing HR-team confidence in AI-assisted hiring decisions. Eliminated traditional server management by building the entire backend on a serverless-first architecture with AWS Lambda and SAM, reducing infrastructure overhead and deployment complexity.
Technology stack:
Key Expertise
Experience
11+ years
Timezone
CET (UTC +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
Ready to Work with Ihor M.?
Data Engineering Team Lead
Share your project details and our team will review the match and confirm availability.
We respond within 24 hours.