Mykhailo Z.

Voice AI Engineer

Mykhailo is an experienced AI Engineer with a deep specialization in Voice AI and real-time audio processing. His unique expertise bridges the gap between high-level LLM orchestration and low-level telephony engineering. He excels at building ultra-low-latency voice agents and resilient STT/TTS pipelines, where every stage, from codec selection to custom VAD logic is meticulously optimized for high performance in challenging environments. His track record ranges from developing secure, air-gapped transcription systems for law enforcement to launching financial AI assistants that maintain stability even over unstable mobile networks. Beyond voice technologies, Mykhailo designs sophisticated Agentic RAG systems and multimodal architectures using LangGraph, Neo4j, and modern Vision-Language models. He also masters the full MLOps lifecycle: from building distributed data processing systems on Ray.io and Kubernetes to high-load model serving via vLLM and Triton. His work is consistently driven by a steadfast focus on data security and operational efficiency.

Key Expertise

Voice AI AgentsReal-time Audio StreamingAgentic RAG SystemsOn-premise LLM DeploymentSpeech EngineeringMultimodal Document AI

Experience

7+ years

Timezone

CET (GMT +1)

Skills

AI / ML

DeepseekRay.ioTTSNVIDIA NeMoTransformersTriton Inference ServerNVIDIA RivaEmbedding modelsLlamaOllamaWhisperLangGraphMistral/MixtralSTTQwenllama.cppAgentic frameworksRAGOCRGeminiLlamaIndexElevenLabsDiarizationPyannoteKAGMLflowvLLMClaudeLangChainPydantic Agents

Languages

Python

Databases

ChromaDBQdrantMongoDBCosmosDBOpenSearchPineconeElasticsearchRedisPostgreSQLFAISS

Infrastructure

KafkaDocker ComposeLangfuseKubernetesSageMakerDockerPydantic’s LogfireEKSLangSmith

Frameworks

Dagstern8nApache Airflow

Integrations & Protocols

RTP over UDPWebSocketLiveKitAsterisk PBXWebRTC

7-day risk-free trial

Response within 24 hours

1. Interrogation Transcription System for Law Enforcement

Voice AI Engineer·2024

Project overview:

Automated real-time transcription of interviews to generate official protocols in a secure environment. On-premise (air-gapped) deployment ensuring maximum security and data privacy. Core Model: Python, OpenAI Whisper, Pyannote, Docker, on-premise deployment Orchestration: Custom system for real-time processing (voice detection + chunking + transcription). Supports up to 10 simultaneous sessions. Fine-tuning Pipeline: Created a pipeline for periodic model updates using client-provided datasets (edited transcripts). Focused on adapting to (local dialect) and low-quality audio. Metrics: Used WER (Word Error Rate) and CER (Character Error Rate) to validate model performance. Deployment: On-premise (Air-gapped). All components are deployed locally to ensure maximum security and data privacy.

Responsibilities:

Designed and built custom real-time processing system: voice detection (Pyannote) + chunking + transcription pipeline.
Implemented batch optimization, buffer tuning, and custom VAD logic for real-time Whisper-based recognition.
Created fine-tuning pipeline for periodic model updates using client-provided datasets (edited transcripts).
Managed model evolution: OpenAI Whisper Medium → Large → Turbo.
Built automated QA pipeline using WER/CER metrics with auto-retraining triggers.
Deployed all components locally on air-gapped infrastructure (via RAY.IO, Docker, vLLM).

Achievements:

• A production-ready system (active for 3+ years) that generates real-time protocols from microphone input, resilient to background noise and street recordings. • Supports up to 10 simultaneous transcription sessions (with different numbers of users per session). • Reduced Whisper latency from ~4s (Medium model) to 1.05s (Turbo) while maintaining high accuracy in noisy environments. • Production quality thresholds: WER < 7%, CER < 7% with automated re-training when exceeded. • Successfully adapted recognition for local dialect and low-quality audio sources.

Technology stack:

PythonOpenAIWhisperPyannoteDocker

2. Financial Voice Agent for Call Center

Voice AI Engineer·2025

Project overview:

Voice agent integration for a financial services company with a focus on mobile stability. Focus: Integrated AI agents with telephony infrastructure. Solved architectural challenges regarding vendor integrations. Performance: Focused on maintaining high communication quality over mobile networks.

Responsibilities:

Designed and implemented real-time voice communication architecture: Asterisk PBX ↔AudioSocket ↔ OpenAI Realtime API.
Optimized latency by switching from 16-bit PCM to 8-bit codecs (G.711/μ-law).
Configured intermediate server routing for optimal network paths.
Implemented RTP over UDP for production telephony to minimize delays.
Integrated AI agents with telephony infrastructure, resolving vendor-specific challenges.
Built and maintained voice agent flow: call handling, speech recognition, LLM processing, TTS response.

Achievements:

• Reduced voice latency from 200–250ms to 60–80ms through codec optimization and routing. • Successfully deployed real-time voice AI agent over unstable mobile networks in Uganda. • Built a production-grade bridge between Asterisk PBX and OpenAI Realtime API. • Achieved stable call quality using UDP protocol with minimal buffering layers.

Technology stack:

PythonAsterisk PBXOpenAIOpenAI APIWebSocketDockerRTP over UDP

3. RAG for Medical equipment marketplace

AI Engineer·2023-2024

Project overview:

Knowledge base system for medical device documentation with semantic search capabilities. Pipeline: Web scraping of manufacturer manuals for specified medical devices → chunking → indexing with metadata → storage in vector database. Core Functionality: On query, retrieves relevant documentation and specifications for a given medical device.

Responsibilities:

Designed and implemented web scraping pipelines to collect manufacturer manuals and device documentation.
Developed chunking and indexing strategies with metadata tagging for accurate retrieval.
Configured ChromaDB as vector store with metadata filtering for device-specific queries.
Integrated all-MiniLM-L6-v2 embedding model for semantic search capabilities.
Built RAG pipeline using LangChain with Llama 2 as the generation model.
Set up distributed processing with Ray.io for scalable document ingestion.

Achievements:

• Built a comprehensive knowledge base covering manufacturer manuals and specifications for medical devices. • Achieved high retrieval accuracy through optimized chunking strategies and metadata enrichment. • Reduced query response time to sub-second levels through embedding model selection and vector store optimization (as mixed search via cosine similarity and metadata usage).

Technology stack:

Ray.ioChromaDBLlamaLangChain

4. OCR & Document translation pipeline

AI Engineer·2025

Project overview:

Automated document processing system for extracting structured data from diverse file formats and translating into target languages. Input: PDF, images, DOCX, TXT and other file formats. Core Pipeline: File ingestion → OCR extraction (Qwen2.5-VL) → structured JSON output for rendering → translation to target language (Gemma 3).

Responsibilities:

Designed end-to-end document processing architecture: ingestion → OCR → structuring → translation.
Implemented OCR extraction using Qwen2.5-VL (Qwen3-VL later) served via vLLM for high-throughput inference.
Built translation module using Gemma 3 (GemmaTranslate later) served via Ollama for multi-language support.
Developed structured JSON output schema for consistent rendering across document types.
Configured Kafka message queue for asynchronous document processing and load balancing.
Set up distributed orchestration with Ray.io for parallel processing of large document batches.
Containerized all services with Docker for reproducible deployment.

Achievements:

Scalable pipeline that processes heterogeneous documents into structured, translatable output with support for multiple target languages.

Technology stack:

OCRQwenOllamaKafkaRay.ioDockerPython

Key Expertise

Voice AI AgentsReal-time Audio StreamingAgentic RAG SystemsOn-premise LLM DeploymentSpeech EngineeringMultimodal Document AI

Experience

7+ years

Timezone

CET (GMT +1)

Skills

AI / ML

Languages

Python

Databases

ChromaDBQdrantMongoDBCosmosDBOpenSearchPineconeElasticsearchRedisPostgreSQLFAISS

Infrastructure

KafkaDocker ComposeLangfuseKubernetesSageMakerDockerPydantic’s LogfireEKSLangSmith

Frameworks

Dagstern8nApache Airflow

Integrations & Protocols

RTP over UDPWebSocketLiveKitAsterisk PBXWebRTC

7-day risk-free trial

Response within 24 hours

Ready to Work with Mykhailo Z.?

Voice AI Engineer

Share your project details and our team will review the match and confirm availability.

Browse More Experts

We respond within 24 hours.

Mykhailo Z.

1. Interrogation Transcription System for Law Enforcement

2. Financial Voice Agent for Call Center

3. RAG for Medical equipment marketplace

4. OCR & Document translation pipeline

Ready to Work with Mykhailo Z.?

Claude Enterprise

Solutions

Gemini Enterprise

Company