Overview
Voice agent integration for a financial services company with a focus on mobile stability. Focus: Integrated AI agents with telephony infrastructure. Solved architectural challenges regarding vendor integrations. Performance: Focused on maintaining high communication quality over mobile networks.
Achievements
• Reduced voice latency from 200–250ms to 60–80ms through codec optimization and routing. • Successfully deployed real-time voice AI agent over unstable mobile networks in Uganda. • Built a production-grade bridge between Asterisk PBX and OpenAI Realtime API. • Achieved stable call quality using UDP protocol with minimal buffering layers.
Responsibilities
- Designed and implemented real-time voice communication architecture: Asterisk PBX ↔AudioSocket ↔ OpenAI Realtime API.
- Optimized latency by switching from 16-bit PCM to 8-bit codecs (G.711/μ-law).
- Configured intermediate server routing for optimal network paths.
- Implemented RTP over UDP for production telephony to minimize delays.
- Integrated AI agents with telephony infrastructure, resolving vendor-specific challenges.
- Built and maintained voice agent flow: call handling, speech recognition, LLM processing, TTS response.
This project was delivered by
Mykhailo Z.
More Projects by Mykhailo Z.
Interrogation Transcription System for Law Enforcement
Voice AI Engineer
Automated real-time transcription of interviews to generate official protocols in a secure environment. On-premise (air-gapped) deployment ensuring maximum security and data privacy. Core Model: Python, OpenAI Whisper, Pyannote, Docker, on-premise deployment Orchestration: Custom system for real-time processing (voice detection + chunking + transcription). Supports up to 10 simultaneous sessions. Fine-tuning Pipeline: Created a pipeline for periodic model updates using client-provided datasets (edited transcripts). Focused on adapting to (local dialect) and low-quality audio. Metrics: Used WER (Word Error Rate) and CER (Character Error Rate) to validate model performance. Deployment: On-premise (Air-gapped). All components are deployed locally to ensure maximum security and data privacy.
OCR & Document translation pipeline
AI Engineer
Automated document processing system for extracting structured data from diverse file formats and translating into target languages. Input: PDF, images, DOCX, TXT and other file formats. Core Pipeline: File ingestion → OCR extraction (Qwen2.5-VL) → structured JSON output for rendering → translation to target language (Gemma 3).
Ready to Build Your AI Team?
Get matched with the right AI experts for your project. Book a free discovery call to discuss your requirements.
No commitment required. We respond within 24 hours.