Key Expertise
Experience
7+ years
Timezone
CET (GMT +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
Overview
Automated document processing system for extracting structured data from diverse file formats and translating into target languages. Input: PDF, images, DOCX, TXT and other file formats. Core Pipeline: File ingestion → OCR extraction (Qwen2.5-VL) → structured JSON output for rendering → translation to target language (Gemma 3).
Achievements
Scalable pipeline that processes heterogeneous documents into structured, translatable output with support for multiple target languages.
Responsibilities
- Designed end-to-end document processing architecture: ingestion → OCR → structuring → translation.
- Implemented OCR extraction using Qwen2.5-VL (Qwen3-VL later) served via vLLM for high-throughput inference.
- Built translation module using Gemma 3 (GemmaTranslate later) served via Ollama for multi-language support.
- Developed structured JSON output schema for consistent rendering across document types.
- Configured Kafka message queue for asynchronous document processing and load balancing.
- Set up distributed orchestration with Ray.io for parallel processing of large document batches.
- Containerized all services with Docker for reproducible deployment.
Technologies Used
Key Expertise
Experience
7+ years
Timezone
CET (GMT +1)
Skills
AI / ML
Languages
Databases
Infrastructure
Frameworks
Integrations & Protocols
This project was delivered by
Mykhailo Z.
More Projects by Mykhailo Z.
Interrogation Transcription System for Law Enforcement
Voice AI Engineer
Automated real-time transcription of interviews to generate official protocols in a secure environment. On-premise (air-gapped) deployment ensuring maximum security and data privacy. Core Model: Python, OpenAI Whisper, Pyannote, Docker, on-premise deployment Orchestration: Custom system for real-time processing (voice detection + chunking + transcription). Supports up to 10 simultaneous sessions. Fine-tuning Pipeline: Created a pipeline for periodic model updates using client-provided datasets (edited transcripts). Focused on adapting to (local dialect) and low-quality audio. Metrics: Used WER (Word Error Rate) and CER (Character Error Rate) to validate model performance. Deployment: On-premise (Air-gapped). All components are deployed locally to ensure maximum security and data privacy.
Financial Voice Agent for Call Center
Voice AI Engineer
Voice agent integration for a financial services company with a focus on mobile stability. Focus: Integrated AI agents with telephony infrastructure. Solved architectural challenges regarding vendor integrations. Performance: Focused on maintaining high communication quality over mobile networks.
Ready to Build Your AI Team?
Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.
We respond within 24 hours.