Skip to main content
Download free report
SoftBlues
Back to Projects

OCR & Document translation pipeline

AI Engineer2025Mykhailo Z.
Mykhailo Z.
Mykhailo Z.

Voice AI Engineer

Voice AI Engineer

Key Expertise

Voice AI AgentsReal-time Audio StreamingAgentic RAG SystemsOn-premise LLM DeploymentSpeech EngineeringMultimodal Document AI

Experience

7+ years

Timezone

CET (GMT +1)

Skills

AI / ML

DeepseekRay.ioTTSNVIDIA NeMoTransformersTriton Inference ServerNVIDIA RivaEmbedding modelsLlamaOllamaWhisperLangGraphMistral/MixtralSTTQwenllama.cppAgentic frameworksRAGOCRGeminiLlamaIndexElevenLabsDiarizationPyannoteKAGMLflowvLLMClaudeLangChainPydantic Agents

Languages

Python

Databases

ChromaDBQdrantMongoDBCosmosDBOpenSearchPineconeElasticsearchRedisPostgreSQLFAISS

Infrastructure

KafkaDocker ComposeLangfuseKubernetesSageMakerDockerPydantic’s LogfireEKSLangSmith

Frameworks

Dagstern8nApache Airflow

Integrations & Protocols

RTP over UDPWebSocketLiveKitAsterisk PBXWebRTC
7-day risk-free trial
Response within 24 hours
View Full Profile

Overview

Automated document processing system for extracting structured data from diverse file formats and translating into target languages. Input: PDF, images, DOCX, TXT and other file formats. Core Pipeline: File ingestion → OCR extraction (Qwen2.5-VL) → structured JSON output for rendering → translation to target language (Gemma 3).

Achievements

Scalable pipeline that processes heterogeneous documents into structured, translatable output with support for multiple target languages.

Responsibilities

  • Designed end-to-end document processing architecture: ingestion → OCR → structuring → translation.
  • Implemented OCR extraction using Qwen2.5-VL (Qwen3-VL later) served via vLLM for high-throughput inference.
  • Built translation module using Gemma 3 (GemmaTranslate later) served via Ollama for multi-language support.
  • Developed structured JSON output schema for consistent rendering across document types.
  • Configured Kafka message queue for asynchronous document processing and load balancing.
  • Set up distributed orchestration with Ray.io for parallel processing of large document batches.
  • Containerized all services with Docker for reproducible deployment.

Technologies Used

OCRQwenOllamaKafkaRay.ioDockerPython
Mykhailo Z.

This project was delivered by

Mykhailo Z.

View Full Profile

Ready to Build Your AI Team?

Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.

We respond within 24 hours.