Interrogation Transcription System for Law Enforcement

Voice AI Engineer2024Mykhailo Z.

Voice AI Engineer

Key Expertise

Voice AI AgentsReal-time Audio StreamingAgentic RAG SystemsOn-premise LLM DeploymentSpeech EngineeringMultimodal Document AI

Experience

7+ years

Timezone

CET (GMT +1)

Skills

AI / ML

DeepseekRay.ioTTSNVIDIA NeMoTransformersTriton Inference ServerNVIDIA RivaEmbedding modelsLlamaOllamaWhisperLangGraphMistral/MixtralSTTQwenllama.cppAgentic frameworksRAGOCRGeminiLlamaIndexElevenLabsDiarizationPyannoteKAGMLflowvLLMClaudeLangChainPydantic Agents

Languages

Python

Databases

ChromaDBQdrantMongoDBCosmosDBOpenSearchPineconeElasticsearchRedisPostgreSQLFAISS

Infrastructure

KafkaDocker ComposeLangfuseKubernetesSageMakerDockerPydantic’s LogfireEKSLangSmith

Frameworks

Dagstern8nApache Airflow

Integrations & Protocols

RTP over UDPWebSocketLiveKitAsterisk PBXWebRTC

7-day risk-free trial

Response within 24 hours

View Full Profile

Overview

Automated real-time transcription of interviews to generate official protocols in a secure environment. On-premise (air-gapped) deployment ensuring maximum security and data privacy. Core Model: Python, OpenAI Whisper, Pyannote, Docker, on-premise deployment Orchestration: Custom system for real-time processing (voice detection + chunking + transcription). Supports up to 10 simultaneous sessions. Fine-tuning Pipeline: Created a pipeline for periodic model updates using client-provided datasets (edited transcripts). Focused on adapting to (local dialect) and low-quality audio. Metrics: Used WER (Word Error Rate) and CER (Character Error Rate) to validate model performance. Deployment: On-premise (Air-gapped). All components are deployed locally to ensure maximum security and data privacy.

Achievements

• A production-ready system (active for 3+ years) that generates real-time protocols from microphone input, resilient to background noise and street recordings. • Supports up to 10 simultaneous transcription sessions (with different numbers of users per session). • Reduced Whisper latency from ~4s (Medium model) to 1.05s (Turbo) while maintaining high accuracy in noisy environments. • Production quality thresholds: WER < 7%, CER < 7% with automated re-training when exceeded. • Successfully adapted recognition for local dialect and low-quality audio sources.

Responsibilities

Designed and built custom real-time processing system: voice detection (Pyannote) + chunking + transcription pipeline.
Implemented batch optimization, buffer tuning, and custom VAD logic for real-time Whisper-based recognition.
Created fine-tuning pipeline for periodic model updates using client-provided datasets (edited transcripts).
Managed model evolution: OpenAI Whisper Medium → Large → Turbo.
Built automated QA pipeline using WER/CER metrics with auto-retraining triggers.
Deployed all components locally on air-gapped infrastructure (via RAY.IO, Docker, vLLM).

Technologies Used

PythonOpenAIWhisperPyannoteDocker

Mykhailo Z.

Voice AI Engineer

Key Expertise

Voice AI AgentsReal-time Audio StreamingAgentic RAG SystemsOn-premise LLM DeploymentSpeech EngineeringMultimodal Document AI

Experience

7+ years

Timezone

CET (GMT +1)

Skills

AI / ML

Languages

Python

Databases

ChromaDBQdrantMongoDBCosmosDBOpenSearchPineconeElasticsearchRedisPostgreSQLFAISS

Infrastructure

KafkaDocker ComposeLangfuseKubernetesSageMakerDockerPydantic’s LogfireEKSLangSmith

Frameworks

Dagstern8nApache Airflow

Integrations & Protocols

RTP over UDPWebSocketLiveKitAsterisk PBXWebRTC

7-day risk-free trial

Response within 24 hours

View Full Profile

This project was delivered by

Mykhailo Z.

View Full Profile

More Projects by Mykhailo Z.

2025

Financial Voice Agent for Call Center

Voice AI Engineer

Voice agent integration for a financial services company with a focus on mobile stability. Focus: Integrated AI agents with telephony infrastructure. Solved architectural challenges regarding vendor integrations. Performance: Focused on maintaining high communication quality over mobile networks.

PythonAsterisk PBXOpenAIOpenAI APIWebSocket+2

View Details

2023-2024

RAG for Medical equipment marketplace

AI Engineer

Knowledge base system for medical device documentation with semantic search capabilities. Pipeline: Web scraping of manufacturer manuals for specified medical devices → chunking → indexing with metadata → storage in vector database. Core Functionality: On query, retrieves relevant documentation and specifications for a given medical device.

Ray.ioChromaDBLlamaLangChain

View Details