Overview
Automated document processing system for extracting structured data from diverse file formats and translating into target languages. Input: PDF, images, DOCX, TXT and other file formats. Core Pipeline: File ingestion → OCR extraction (Qwen2.5-VL) → structured JSON output for rendering → translation to target language (Gemma 3).
Achievements
Scalable pipeline that processes heterogeneous documents into structured, translatable output with support for multiple target languages.
Responsibilities
- Designed end-to-end document processing architecture: ingestion → OCR → structuring → translation.
- Implemented OCR extraction using Qwen2.5-VL (Qwen3-VL later) served via vLLM for high-throughput inference.
- Built translation module using Gemma 3 (GemmaTranslate later) served via Ollama for multi-language support.
- Developed structured JSON output schema for consistent rendering across document types.
- Configured Kafka message queue for asynchronous document processing and load balancing.
- Set up distributed orchestration with Ray.io for parallel processing of large document batches.
- Containerized all services with Docker for reproducible deployment.
This project was delivered by
Mykhailo Z.
More Projects by Mykhailo Z.
Interrogation Transcription System for Law Enforcement
Voice AI Engineer
Automated real-time transcription of interviews to generate official protocols in a secure environment. On-premise (air-gapped) deployment ensuring maximum security and data privacy. Core Model: Python, OpenAI Whisper, Pyannote, Docker, on-premise deployment Orchestration: Custom system for real-time processing (voice detection + chunking + transcription). Supports up to 10 simultaneous sessions. Fine-tuning Pipeline: Created a pipeline for periodic model updates using client-provided datasets (edited transcripts). Focused on adapting to (local dialect) and low-quality audio. Metrics: Used WER (Word Error Rate) and CER (Character Error Rate) to validate model performance. Deployment: On-premise (Air-gapped). All components are deployed locally to ensure maximum security and data privacy.
Financial Voice Agent for Call Center
Voice AI Engineer
Voice agent integration for a financial services company with a focus on mobile stability. Focus: Integrated AI agents with telephony infrastructure. Solved architectural challenges regarding vendor integrations. Performance: Focused on maintaining high communication quality over mobile networks.
Ready to Build Your AI Team?
Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.
No commitment required. We respond within 24 hours.