AI that conducts the interview itself.
AI Voice Interviewer for an HR Tech Platform
A real-time voice AI that generates adaptive questions tailored to each role, conducts the conversation in three languages, and decides on the fly when to probe deeper, when to move on, and when to end the interview. 300+ live interviews per week, in production.
Book a Case WalkthroughThe client operates an AI-powered candidate assessment platform serving recruitment, HR, and talent acquisition teams. Their core process required a 2 to 3 hour structured human interview per candidate, conducted by trained expert analysts assessing personality, cognitive style, and role fit. Demand was outgrowing the analyst team, with a queue of candidates waiting weeks to be interviewed and analyst capacity becoming the binding constraint on platform growth.
The Challenge
Senior interviewers were the binding constraint on platform throughput. The platform needed a voice AI that could run live interviews itself, adapt to each candidate, and handle three languages at production quality.
- Senior interviewers spending 2 to 3 hours per candidate, becoming the binding constraint on platform throughput
- Inconsistent interview quality across human interviewers, complicating fair comparison between candidates
- Multilingual requirement across English, Russian, and Ukrainian, with poor voice quality from off-the-shelf models on non-English languages
- Need for adaptive interview flow, the conversation must follow what the candidate says, not just read down a script
- Strict consistency requirement so every candidate received the same structured interview experience regardless of who or what conducted it
- High candidate volume requiring concurrent interview capacity, not a queue
Our Solution
Softblues built a real-time AI voice interviewer that generates a unique interview per candidate, runs the conversation in three languages, and adapts on the fly. It frees senior interviewers from conducting every live session while preserving recruiter oversight and decision-making.
- Real-time AI voice interviewer that generates a unique 65+ question set per candidate, adapted to the role and the candidate profile
- Adaptive conversation flow that decides when to probe deeper, when to move on, and when to gently redirect off-topic responses
- ElevenLabs voice models selected specifically for multilingual quality, supporting English, Russian, and Ukrainian at production-grade fidelity
- LiveKit for concurrent room management, enabling dozens of parallel interviews without queue waiting
- Voice activity detection and turn-taking logic for natural conversational pacing
- Multi-agent orchestration via LangGraph so each conversational decision (intent, completeness, follow-up generation) runs in its own narrow node
Built with Enterprise-Grade Technology
Goals and Objectives
The client came to us with clear objectives to transform their operations.
Replace Live-Interview Bottleneck
Free senior interviewers from running every live session themselves. Voice agent conducts the interview while recruiters retain oversight and decision-making.
Adaptive Question Generation
Generate a unique, role-appropriate interview for every candidate. No two interviews identical; every conversation tailored to position and respondent.
Production Multilingual Voice
Deliver natural, real-time voice interviews in English, Russian, and Ukrainian with consistent quality across all three languages, not just English.
Conversational Intelligence
The AI must probe when answers are shallow, move on when they are complete, and redirect when they go off-topic. Not a script reader.
Concurrent Capacity
Run dozens of interviews in parallel without queue waiting. Architecture must scale horizontally for peak hiring cycles.
See the Platform in Action
From intake to completion, explore how the solution transforms operations.
Adaptive Interview Engine
The agent conducts a structured interview drawn from a bank of 65+ questions across multiple competency dimensions. Claude Sonnet chooses the next question in real time based on what the candidate has already said, which areas still need evidence, and how much time remains. Every candidate experiences a coherent structured interview, but the path through the question bank is adaptive: no two candidates are asked exactly the same sequence, and yet every candidate is evaluated against the same framework.
Real-Time Conversation Engine
Candidates connect via web or telephone. ElevenLabs handles real-time speech recognition and synthesis with production-grade quality in English, Russian, and Ukrainian. LiveKit manages concurrent interview rooms so the platform scales without queue waiting. Claude Sonnet orchestrates the conversation flow and decides when to probe, when to move on, and when to end the interview. Round-trip latency stays under two seconds, which is the threshold at which conversations feel natural rather than stilted.
Multi-Agent Orchestration Inside the Engine
The conversation engine from Block 02 is itself a graph of specialised LangGraph nodes. Each node handles one narrow conversational decision: intent recognition (what did the candidate mean), completeness check (is the answer complete enough), follow-up generation (what to ask next), topic transition (when to move on), time pacing (are we on schedule), and language detection (has the candidate switched languages). The nodes coordinate through a shared graph state, each reading what others have written and contributing their own decision. This is why the voice agent stays accurate across hundreds of distinct conversational scenarios. A single mega-prompt would crack at 20 to 30 scenarios; the graph approach scales cleanly per node, and new conversational behaviours add as new nodes rather than as bigger prompts.
How It All Works Together
Voice Processing Layer
ElevenLabs Flash models handle real-time speech recognition and synthesis across English, Russian, and Ukrainian. LiveKit manages the WebRTC media transport and orchestrates concurrent interview rooms, enabling the platform to run dozens of parallel interviews without contention. Voice activity detection and turn-taking logic ensure natural conversational pacing.
Conversation Orchestration Layer
Claude Sonnet conducts the interview using a structured conversation graph. The model decides which question to ask next based on the candidate responses, when to probe deeper into a topic, and when an assessment dimension has enough signal to score. LangGraph manages the conversation state and ensures consistency across candidates and across languages.
Conversation Decision Layer
A graph of specialised nodes built on LangGraph handles each conversational decision in its own narrow prompt: intent recognition, response completeness, follow-up generation, topic transition, and language detection. This pattern keeps accuracy high under conversational load. A single mega-prompt would crack at around 20 to 30 distinct scenarios; the graph approach scales cleanly as the question set and use cases grow.
Reporting and Integration Layer
Scored reports are written to PostgreSQL with vector embeddings stored in PGVector for similarity search across candidates. BigQuery powers the recruiter-facing analytics dashboards. The platform exposes APIs that integrate into customer ATS and HRIS systems so reports flow directly into the recruiter existing workflow.
Value and Impact Delivered
Measurable improvements across every dimension of operations.
300+ Live Interviews per Week
Voice agent in production handling 300+ live interviews per week, removing senior interviewer capacity as the limiting factor on platform growth.
65+ Adaptive Questions per Interview
Each candidate receives a tailored interview with 65+ questions adapted to the role and their profile, plus follow-up questions generated dynamically during the conversation.
Sub-2-Second Response Latency
Real-time integrated voice models keep round-trip latency under 2 seconds, so the conversation feels natural rather than scripted.
3 Languages, Production Quality
English, Russian, and Ukrainian supported at production fidelity. ElevenLabs chosen specifically for multilingual quality where standard voice models drop accuracy.
Dozens of Concurrent Interviews
LiveKit-based architecture scales horizontally, running many interviews in parallel without queue waiting during peak hiring cycles.
High-Quality Interview Data
The voice agent adaptive interviewing produces structured interview data that downstream scoring achieves 90%+ accuracy versus senior expert analysts. See our forthcoming AI assessment case study for the analysis side.
Frequently Asked Questions
Ready to Transform Your HR Tech Operations?
See how AI can help your organisation reduce errors, speed up processing, and improve outcomes. Let's discuss your specific challenges.
Book Discovery CallExplore Other Projects
Discover more AI solutions delivering measurable results across industries