Skip to main content
Download free report
SoftBlues
SoftBlues
Back to Projects

Email Archive Processing & AI Search Pipeline

Cloud / AI Engineer2024Dmytro B.
Dmytro B.
Dmytro B.

AI / Cloud Engineer

LLM & AI Agents

Key Expertise

Vector Similarity SearchGenerative AI SolutionsMulti-tenant RAG SystemsEnterprise Data Sync

Experience

6+ years

Timezone

CET (UTC +1)

Skills

AI / ML

Vertex AI SearchMatching EngineSpeech-to-TextGeminitext-embeddingGoogle Vertex AIDiscovery Engine

Languages

Python

Databases

RedisPostgreSQL

Infrastructure

NginxCloud Storage (GCP)Docker Composeoauth2-proxyDocker

Frameworks

FastAPIaiogramCelery

Integrations & Protocols

LibreOfficeGoogle Workspace OAuth
7-day risk-free trial
Response within 24 hours
View Full Profile

Overview

Built a distributed ETL platform for processing massive email archives (EML files in ZIP/7Z/RAR) into structured PDFs, uploading to Cloud Storage, and indexing in Vertex AI Search. 9 isolated workspaces with strict data separation - designed for a government organization.

Achievements

18 parallel workers across 3 containers with per-app queue isolation. Real-time WebSocket status updates. Batched operations with retry logic processing thousands of email files.

Responsibilities

  • Designed distributed task architecture: 9 isolated queues across 3 Celery workers with chord-based batch processing
  • Built EML pipeline: archive extraction → HTML parsing → format conversion via LibreOffice → PDF generation
  • Implemented real-time status: Redis pub/sub → FastAPI → WebSocket broadcast
  • Configured OAuth2 proxy with email allowlist for access control
  • Built retry-aware GCS upload and Discovery Engine incremental import

Technologies Used

PythonFastAPICeleryRedisPostgreSQLGoogle Vertex AICloud Storage (GCP)oauth2-proxyLibreOfficeDocker ComposeNginx
Dmytro B.

This project was delivered by

Dmytro B.

View Full Profile

Ready to Build Your AI Team?

Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.

We respond within 24 hours.