Large-Scale Data Platform for AI-Driven Recruiting

Data Engineering Team Lead2025 – PresentIhor M.

Data Engineering Team Lead

Data Engineer & Big Data

Key Expertise

Lakehouse Architecture DesignData Team LeadershipAI/LLM IntegrationSemantic Vector SearchScalable Data PipelinesML Data EnrichmentCross-Platform Framework Design

Experience

11+ years

Timezone

CET (UTC +1)

Skills

AI / ML

Vector EmbeddingsCosine SimilarityLLMsOpenAI API

Languages

Python

Databases

DatabricksVector DBOpenSearchPostgreSQLDelta Lake

Infrastructure

Medallion ArchitectureAWS SAMAPI GatewayAWSGitHub ActionsCloudFormationMicroservices Architecture

Frameworks

AIOHTTPAWS Lambda PowertoolsasyncioPySparkApache Airflow

Integrations & Protocols

WhatsApp Business APIFacebook Messenger APIREST APIWebSockets

7-day risk-free trial

Response within 24 hours

View Full Profile

Overview

WorkHQ is an AI-powered recruiting platform designed to help companies source, contact, and manage talent at scale. The project centred on architecting and scaling a production-grade data platform capable of ingesting, normalising, and serving nearly 1 billion candidate profiles sourced from multiple global data providers. The core engineering challenge was transforming an unstable, custom legacy infrastructure (AWS S3 + Airflow) into a reliable, high-throughput Lakehouse architecture capable of supporting real-time semantic search and AI-powered candidate matching across 7 global regions.

Achievements

Orchestrated a full architectural migration from Airflow + AWS S3 to Databricks and Delta Lake within 6 months, reducing pipeline execution time by 80% (from 24+ hours to 5 hours) and transitioning from weekly to daily incremental processing. Expanded global profile coverage 2.3x to 700M+ records while maintaining cost efficiency. Increased Lightcast Occupation Taxonomy (LOT) enrichment coverage from 60% to 99% through custom ML model development, improving data richness 4x across all candidate work experiences.

Responsibilities

Led end-to-end technical strategy and architecture for the data platform, managing a team of two Data Engineers and driving key decisions across ingestion, transformation, and serving layers.
Architected a Lakehouse-centric data delivery framework using Databricks and Delta Lake with Medallion Architecture, replacing legacy 2 TB S3-to-PostgreSQL pipelines.
Designed a taxonomy mapping and normalisation system using vector embeddings and cosine similarity, with LLMs (OpenAI API) as an intelligent fallback for full alignment with global job-title standards.
Implemented multi-region infrastructure to support daily ingestion across 7 global regions, scaling platform capacity 2.3x to 700M+ profiles.
Directed the development of custom ML models for data extraction and LOT enrichment, boosting taxonomy coverage from 60% to 99%.
Optimised synchronisation across OpenSearch and PostgreSQL to support high-throughput semantic search and Alembic-managed schema evolution.

Technologies Used

DatabricksDelta LakePySparkMedallion ArchitectureAWS S3Apache AirflowPostgreSQLOpenSearchVector EmbeddingsCosine SimilarityOpenAI APILLMs

Ihor M.

Data Engineering Team Lead

Data Engineer & Big Data

Key Expertise

Lakehouse Architecture DesignData Team LeadershipAI/LLM IntegrationSemantic Vector SearchScalable Data PipelinesML Data EnrichmentCross-Platform Framework Design

Experience

11+ years

Timezone

CET (UTC +1)

Skills

AI / ML

Vector EmbeddingsCosine SimilarityLLMsOpenAI API

Languages

Python

Databases

DatabricksVector DBOpenSearchPostgreSQLDelta Lake

Infrastructure

Medallion ArchitectureAWS SAMAPI GatewayAWSGitHub ActionsCloudFormationMicroservices Architecture

Frameworks

AIOHTTPAWS Lambda PowertoolsasyncioPySparkApache Airflow

Integrations & Protocols

WhatsApp Business APIFacebook Messenger APIREST APIWebSockets

7-day risk-free trial

Response within 24 hours

View Full Profile

This project was delivered by

Ihor M.

View Full Profile

More Projects by Ihor M.

2025

AI Recruiter - Intelligent Candidate Matching & Automated Recruitment Pipeline

Senior Data / Backend Engineer

As a sub-project within the WorkHQ platform, the AI Recruiter was built to automate and intelligently augment the end-to-end recruitment workflow for HR teams. The system allows recruiters to search for the best-matched candidates either through a conversational chat interface or by uploading a job vacancy file. It combines semantic vector search across hundreds of millions of profiles with OpenAI Reasoning models to surface and explain the most relevant candidates. Beyond matching, the system automates the full downstream recruitment funnel - from application emails and online assessment links to interview slot scheduling and outcome notifications - all orchestrated via serverless AWS infrastructure.

PythonLambdaAWS SAMAWS Lambda PowertoolsStep Functions+7

View Details

Ready to Build Your AI Team?

Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.

Book a Discovery Call Browse All Experts

We respond within 24 hours.

Large-Scale Data Platform for AI-Driven Recruiting

Overview

Achievements

Responsibilities

Technologies Used

More Projects by Ihor M.

AI Recruiter - Intelligent Candidate Matching & Automated Recruitment Pipeline

Ready to Build Your AI Team?

Claude Enterprise

Solutions

Gemini Enterprise

Company