Which languages does the voice interviewer support?

The platform runs in English, Russian, and Ukrainian at production-grade voice quality. ElevenLabs Flash models were selected specifically for their non-English fidelity, and the conversation orchestration layer is language-agnostic so new languages can be added without re-architecting.

How many interviews can run in parallel?

LiveKit manages concurrent WebRTC rooms horizontally, so the platform supports dozens of interviews running at the same time. Capacity scales linearly with infrastructure, with no queue-waiting during peak enterprise hiring cycles.

Does the voice agent ask every candidate the same questions?

No. The interviewer uses a structured framework with a bank of 65+ questions, and Claude Sonnet chooses the next question dynamically based on the candidate's previous answers. Every candidate experiences a coherent, structured interview, but the path through the question bank is adaptive.

How is candidate data handled?

Candidate audio is processed in-memory during the live call and not retained after transcription. Transcripts and metadata are stored in the client's own Google Cloud project, with access controlled through the platform's existing authentication and authorisation. The architecture is GDPR-aligned by design.

Where does the 90% accuracy versus human analysts come from?

The voice agent captures and structures the conversation. A separate multi-model scoring layer, built on fine-tuned Gemini Flash models, produces the assessment scores that match senior analyst judgements at 90%+ accuracy. The scoring layer is a distinct workstream and will be covered in a dedicated assessment case study.

HR Tech • AI Product Development

AI that conducts the interview itself.

Q: How low is the conversational latency?

End-to-end round-trip latency (speech in, model thought, speech out) averages under 2 seconds across English, Russian, and Ukrainian. Voice activity detection and streaming speech-to-text keep the dialogue feeling natural, with no awkward pauses between candidate response and the next question.

AI Voice Interviewer for an HR Tech Platform

A real-time voice AI that generates adaptive questions tailored to each role, conducts the conversation in three languages, and decides on the fly when to probe deeper, when to move on, and when to end the interview. 300+ live interviews per week, in production.

Book a Case Walkthrough

Live interviews per week

Adaptive questions per interview

Languages supported

<2s

Round-trip response latency

Project Overview

The client operates an AI-powered candidate assessment platform serving recruitment, HR, and talent acquisition teams. Their core process required a 2 to 3 hour structured human interview per candidate, conducted by trained expert analysts assessing personality, cognitive style, and role fit. Demand was outgrowing the analyst team, with a queue of candidates waiting weeks to be interviewed and analyst capacity becoming the binding constraint on platform growth.

The Challenge

Senior interviewers were the binding constraint on platform throughput. The platform needed a voice AI that could run live interviews itself, adapt to each candidate, and handle three languages at production quality.

Senior interviewers spending 2 to 3 hours per candidate, becoming the binding constraint on platform throughput
Inconsistent interview quality across human interviewers, complicating fair comparison between candidates
Multilingual requirement across English, Russian, and Ukrainian, with poor voice quality from off-the-shelf models on non-English languages
Need for adaptive interview flow, the conversation must follow what the candidate says, not just read down a script
Strict consistency requirement so every candidate received the same structured interview experience regardless of who or what conducted it
High candidate volume requiring concurrent interview capacity, not a queue

Our Solution

Softblues built a real-time AI voice interviewer that generates a unique interview per candidate, runs the conversation in three languages, and adapts on the fly. It frees senior interviewers from conducting every live session while preserving recruiter oversight and decision-making.

Real-time AI voice interviewer that generates a unique 65+ question set per candidate, adapted to the role and the candidate profile
Adaptive conversation flow that decides when to probe deeper, when to move on, and when to gently redirect off-topic responses
ElevenLabs voice models selected specifically for multilingual quality, supporting English, Russian, and Ukrainian at production-grade fidelity
LiveKit for concurrent room management, enabling dozens of parallel interviews without queue waiting
Voice activity detection and turn-taking logic for natural conversational pacing
Multi-agent orchestration via LangGraph so each conversational decision (intent, completeness, follow-up generation) runs in its own narrow node

Technology

Built with Enterprise-Grade Technology

PythonGoogle Cloud PlatformVertex AIFine-tuned Gemini FlashClaude SonnetElevenLabs (STT + TTS)LiveKitLangGraphVoice Activity DetectionPostgreSQL + PGVectorBigQueryCloud RunReact / TypeScript

Client Goals

Goals and Objectives

The client came to us with clear objectives to transform their operations.

Replace Live-Interview Bottleneck

Free senior interviewers from running every live session themselves. Voice agent conducts the interview while recruiters retain oversight and decision-making.

Adaptive Question Generation

Generate a unique, role-appropriate interview for every candidate. No two interviews identical; every conversation tailored to position and respondent.

Production Multilingual Voice

Deliver natural, real-time voice interviews in English, Russian, and Ukrainian with consistent quality across all three languages, not just English.

Conversational Intelligence

The AI must probe when answers are shallow, move on when they are complete, and redirect when they go off-topic. Not a script reader.

Concurrent Capacity

Run dozens of interviews in parallel without queue waiting. Architecture must scale horizontally for peak hiring cycles.

Solution in Action

See the Platform in Action

From intake to completion, explore how the solution transforms operations.

Adaptive Interview Engine

The agent conducts a structured interview drawn from a bank of 65+ questions across multiple competency dimensions. Claude Sonnet chooses the next question in real time based on what the candidate has already said, which areas still need evidence, and how much time remains. Every candidate experiences a coherent structured interview, but the path through the question bank is adaptive: no two candidates are asked exactly the same sequence, and yet every candidate is evaluated against the same framework.

Real-Time Conversation Engine

Candidates connect via web or telephone. ElevenLabs handles real-time speech recognition and synthesis with production-grade quality in English, Russian, and Ukrainian. LiveKit manages concurrent interview rooms so the platform scales without queue waiting. Claude Sonnet orchestrates the conversation flow and decides when to probe, when to move on, and when to end the interview. Round-trip latency stays under two seconds, which is the threshold at which conversations feel natural rather than stilted.

Multi-Agent Orchestration Inside the Engine

The conversation engine from Block 02 is itself a graph of specialised LangGraph nodes. Each node handles one narrow conversational decision: intent recognition (what did the candidate mean), completeness check (is the answer complete enough), follow-up generation (what to ask next), topic transition (when to move on), time pacing (are we on schedule), and language detection (has the candidate switched languages). The nodes coordinate through a shared graph state, each reading what others have written and contributing their own decision. This is why the voice agent stays accurate across hundreds of distinct conversational scenarios. A single mega-prompt would crack at 20 to 30 scenarios; the graph approach scales cleanly per node, and new conversational behaviours add as new nodes rather than as bigger prompts.

Platform Architecture

How It All Works Together

Voice Processing Layer

ElevenLabs Flash models handle real-time speech recognition and synthesis across English, Russian, and Ukrainian. LiveKit manages the WebRTC media transport and orchestrates concurrent interview rooms, enabling the platform to run dozens of parallel interviews without contention. Voice activity detection and turn-taking logic ensure natural conversational pacing.

Conversation Orchestration Layer

Claude Sonnet conducts the interview using a structured conversation graph. The model decides which question to ask next based on the candidate responses, when to probe deeper into a topic, and when an assessment dimension has enough signal to score. LangGraph manages the conversation state and ensures consistency across candidates and across languages.

Conversation Decision Layer

A graph of specialised nodes built on LangGraph handles each conversational decision in its own narrow prompt: intent recognition, response completeness, follow-up generation, topic transition, and language detection. This pattern keeps accuracy high under conversational load. A single mega-prompt would crack at around 20 to 30 distinct scenarios; the graph approach scales cleanly as the question set and use cases grow.

Reporting and Integration Layer

Scored reports are written to PostgreSQL with vector embeddings stored in PGVector for similarity search across candidates. BigQuery powers the recruiter-facing analytics dashboards. The platform exposes APIs that integrate into customer ATS and HRIS systems so reports flow directly into the recruiter existing workflow.

Results

Value and Impact Delivered

Measurable improvements across every dimension of operations.

300+

300+ Live Interviews per Week

Voice agent in production handling 300+ live interviews per week, removing senior interviewer capacity as the limiting factor on platform growth.

65+

65+ Adaptive Questions per Interview

Each candidate receives a tailored interview with 65+ questions adapted to the role and their profile, plus follow-up questions generated dynamically during the conversation.

<2s

Sub-2-Second Response Latency

Real-time integrated voice models keep round-trip latency under 2 seconds, so the conversation feels natural rather than scripted.

3 Languages, Production Quality

English, Russian, and Ukrainian supported at production fidelity. ElevenLabs chosen specifically for multilingual quality where standard voice models drop accuracy.

Dozens of Concurrent Interviews

LiveKit-based architecture scales horizontally, running many interviews in parallel without queue waiting during peak hiring cycles.

90%+

High-Quality Interview Data

The voice agent adaptive interviewing produces structured interview data that downstream scoring achieves 90%+ accuracy versus senior expert analysts. See our forthcoming AI assessment case study for the analysis side.

FAQ

Frequently Asked Questions

End-to-end round-trip latency (speech in, model thought, speech out) averages under 2 seconds across English, Russian, and Ukrainian. Voice activity detection and streaming speech-to-text keep the dialogue feeling natural, with no awkward pauses between candidate response and the next question.

Ready to Transform Your HR Tech Operations?

See how AI can help your organisation reduce errors, speed up processing, and improve outcomes. Let's discuss your specific challenges.

Book Discovery Call

15+

Years Experience

200+

Projects Delivered

$1M

Insurance Coverage

Success Stories

Explore Other Projects

Discover more AI solutions delivering measurable results across industries

View All Case Studies