Posted 2025-07-08Updated 2025-12-26Azure / AI / Voice Live API13 minutes read (About 1917 words)

Building Voice Agents with Azure Communication Services Voice Live API and Azure AI Agent Service

🎯 TL;DR: Real-time Voice Agent Implementation
This post walks through building a voice agent that connects traditional phone calls to Azure’s AI services. The system intercepts incoming calls via Azure Communication Services, streams audio in real-time to the Voice Live API, and processes conversations through pre-configured AI agents in Azure AI Studio. The implementation uses FastAPI for webhook handling, WebSocket connections for bidirectional audio streaming, and Azure Managed Identity for authentication (no API keys to manage). The architecture handles multiple concurrent calls on a single Python thread using asyncio.
Implementation details: Audio resampling between 16kHz (ACS requirement) and 24kHz (Voice Live requirement), connection resilience for preview services, and production deployment considerations. Full source code and documentation available here

Recently, I found myself co-leading an innovation project that pushed me into uncharted territory. The challenge? Developing a voice-based agentic solution with an ambitious goal - routing at least 25% of current contact center calls to AI voice agents. This was bleeding-edge stuff, with both the Azure Voice Live API and Azure AI Agent Service voice agents still in preview at the time of writing.

When you’re working with preview services, documentation is often sparse, and you quickly learn that reverse engineering network calls and maintaining close relationships with product teams becomes part of your daily routine. This blog post shares the practical lessons learned and the working solution we built to integrate these cutting-edge services.

The Innovation Challenge

Building a voice agent system that could handle real customer interactions meant tackling several complex requirements:

Real-time voice processing with minimal latency
Natural conversation flow without awkward pauses
Integration with existing contact center infrastructure
Scalability to handle multiple concurrent calls
Reliability for production use cases

With both Azure Voice Live API and Azure AI Voice Agent Service in preview, we were essentially building on shifting sands. But that’s what innovation is about - pushing boundaries and finding solutions where documentation doesn’t yet exist.

Understanding the Architecture

Our solution bridges Azure Communication Services (ACS) with Azure AI services to create an intelligent voice agent. Here’s how the pieces fit together:

graph TB
    subgraph "Phone Network"
        PSTN[📞 PSTN Number
+1-555-123-4567]
    end
    
    subgraph "Azure Communication Services"
        ACS[🔗 ACS Call Automation
Event Grid Webhooks]
        MEDIA[🎵 Media Streaming
WebSocket Audio]
    end
    
    subgraph "Python FastAPI App"
        API[🐍 FastAPI Server
localhost:49412]
        WS[🔌 WebSocket Handler
Audio Processing]
        HANDLER[⚡ Media Handler
Audio Resampling]
    end
    
    subgraph "Azure OpenAI"
        VOICE[🤖 Voice Live API
Agent Mode
gpt-4o Realtime]
        AGENT[👤 Pre-configured Agent
Azure AI Studio]
    end
    
    subgraph "Dev Infrastructure"
        TUNNEL[🚇 Dev Tunnel
Public HTTPS Endpoint]
    end
    
    PSTN -->|Incoming Call| ACS
    ACS -->|Webhook Events| TUNNEL
    TUNNEL -->|HTTPS| API
    ACS -->|WebSocket Audio| WS
    WS -->|PCM 16kHz| HANDLER
    HANDLER -->|PCM 24kHz| VOICE
    VOICE -->|Agent Processing| AGENT
    AGENT -->|AI Response| VOICE
    VOICE -->|AI Response| HANDLER
    HANDLER -->|PCM 16kHz| WS
    WS -->|Audio Stream| ACS
    ACS -->|Audio| PSTN
    
    style PSTN fill:#ff9999
    style ACS fill:#87CEEB
    style API fill:#90EE90
    style VOICE fill:#DDA0DD
    style TUNNEL fill:#F0E68C

Core Components

Azure Communication Services: Handles the telephony infrastructure, providing phone numbers and call routing
Voice Live API: Enables real-time speech recognition and synthesis with WebRTC streaming
Azure AI Agent Service: Provides the intelligence layer for understanding and responding to customer queries
WebSocket Bridge: Our custom Python application that connects these services