Skip to main content
Download free report
SoftBlues
SoftBlues
Back to Projects

Large-Scale Data Platform for AI-Driven Recruiting

Data Engineering Team Lead2025 – PresentIhor M.
IM
Ihor M.

Data Engineering Team Lead

Data Engineer & Big Data

Key Expertise

Lakehouse Architecture DesignData Team LeadershipAI/LLM IntegrationSemantic Vector SearchScalable Data PipelinesML Data EnrichmentCross-Platform Framework Design

Experience

11+ years

Timezone

CET (UTC +1)

Skills

AI / ML

Vector EmbeddingsCosine SimilarityLLMsOpenAI API

Languages

Python

Databases

DatabricksVector DBOpenSearchPostgreSQLDelta Lake

Infrastructure

Medallion ArchitectureAWS SAMAPI GatewayAWSGitHub ActionsCloudFormationMicroservices Architecture

Frameworks

AIOHTTPAWS Lambda PowertoolsasyncioPySparkApache Airflow

Integrations & Protocols

WhatsApp Business APIFacebook Messenger APIREST APIWebSockets
7-day risk-free trial
Response within 24 hours
View Full Profile

Overview

WorkHQ is an AI-powered recruiting platform designed to help companies source, contact, and manage talent at scale. The project centred on architecting and scaling a production-grade data platform capable of ingesting, normalising, and serving nearly 1 billion candidate profiles sourced from multiple global data providers. The core engineering challenge was transforming an unstable, custom legacy infrastructure (AWS S3 + Airflow) into a reliable, high-throughput Lakehouse architecture capable of supporting real-time semantic search and AI-powered candidate matching across 7 global regions.

Achievements

Orchestrated a full architectural migration from Airflow + AWS S3 to Databricks and Delta Lake within 6 months, reducing pipeline execution time by 80% (from 24+ hours to 5 hours) and transitioning from weekly to daily incremental processing. Expanded global profile coverage 2.3x to 700M+ records while maintaining cost efficiency. Increased Lightcast Occupation Taxonomy (LOT) enrichment coverage from 60% to 99% through custom ML model development, improving data richness 4x across all candidate work experiences.

Responsibilities

  • Led end-to-end technical strategy and architecture for the data platform, managing a team of two Data Engineers and driving key decisions across ingestion, transformation, and serving layers.
  • Architected a Lakehouse-centric data delivery framework using Databricks and Delta Lake with Medallion Architecture, replacing legacy 2 TB S3-to-PostgreSQL pipelines.
  • Designed a taxonomy mapping and normalisation system using vector embeddings and cosine similarity, with LLMs (OpenAI API) as an intelligent fallback for full alignment with global job-title standards.
  • Implemented multi-region infrastructure to support daily ingestion across 7 global regions, scaling platform capacity 2.3x to 700M+ profiles.
  • Directed the development of custom ML models for data extraction and LOT enrichment, boosting taxonomy coverage from 60% to 99%.
  • Optimised synchronisation across OpenSearch and PostgreSQL to support high-throughput semantic search and Alembic-managed schema evolution.

Technologies Used

DatabricksDelta LakePySparkMedallion ArchitectureAWS S3Apache AirflowPostgreSQLOpenSearchVector EmbeddingsCosine SimilarityOpenAI APILLMs
IM

This project was delivered by

Ihor M.

View Full Profile

Ready to Build Your AI Team?

Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.

We respond within 24 hours.