Skills Azure AI VoiceLive SDK for Java
📦

Azure AI VoiceLive SDK for Java

Safe

Build Real-Time Voice Apps with Azure AI

Developers need a way to integrate real-time bidirectional voice conversations into Java applications. This skill provides complete code examples and best practices for implementing voice AI using Azure AI VoiceLive SDK with WebSocket streaming.

Supports: Claude Codex Code(CC)
📊 69 Adequate
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "Azure AI VoiceLive SDK for Java". Create a VoiceLive client with DefaultAzureCredential

Expected outcome:

VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint(System.getenv("AZURE_VOICELIVE_ENDPOINT"))
.credential(new DefaultAzureCredentialBuilder().build())
.buildAsyncClient();

Using "Azure AI VoiceLive SDK for Java". Configure turn detection for natural conversation

Expected outcome:

ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()
.setThreshold(0.5)
.setPrefixPaddingMs(300)
.setSilenceDurationMs(500)
.setInterruptResponse(true)
.setAutoTruncate(true);

Security Audit

Safe
v1 • 2/24/2026

Prompt-only documentation skill containing code examples for Azure AI VoiceLive SDK. Static analysis found 0 files scanned with 0 potential security issues. Risk score: 0/100. No suspicious patterns detected. Skill provides legitimate documentation for Microsoft Azure service with no executable code.

0
Files scanned
0
Lines analyzed
0
findings
1
Total audits
No security issues found
Audited by: claude

Quality Score

38
Architecture
100
Maintainability
87
Content
31
Community
100
Security
74
Spec Compliance

What You Can Build

Customer Service Voice Bot

Build an interactive voice assistant for customer support that handles inquiries in real-time using natural speech

Accessibility Tool Development

Create voice-enabled applications for users who prefer voice interaction over text-based interfaces

IoT Voice Control Interface

Implement voice control for IoT devices with low-latency bidirectional communication

Try These Prompts

Basic Voice Client Setup
Show me how to set up a basic VoiceLiveAsyncClient in Java with API key authentication using the Azure AI VoiceLive SDK.
Configure Voice Session
How do I configure VoiceLiveSessionOptions with turn detection, voice selection, and audio format settings for a natural conversation flow?
Handle Voice Events
Write Java code to handle voice events including speech start/stop detection, audio delta streaming, and error handling in the VoiceLive session.
Implement Function Calling
Show me how to integrate function calling with VoiceLive to enable the AI assistant to execute real actions like weather lookups during conversation.

Best Practices

  • Use DefaultAzureCredential instead of API keys for production deployments to leverage Azure managed identities
  • Configure ServerVadTurnDetection with appropriate threshold and silence duration to match your use case for natural conversation flow
  • Always implement proper error handling and reconnection logic for production voice applications

Avoid

  • Do not hardcode API keys in source code - use environment variables or Azure Key Vault instead
  • Avoid blocking calls in reactive streams - use non-blocking patterns throughout
  • Do not skip audio format validation - ensure input matches 24kHz 16-bit PCM requirements

Frequently Asked Questions

What audio format does Azure AI VoiceLive require?
Azure AI VoiceLive requires 24kHz sample rate, 16-bit PCM, mono channel, signed PCM little-endian format.
How do I authenticate with Azure AI VoiceLive?
You can use AzureKeyCredential with API key, or DefaultAzureCredential for managed identity support in production.
What voices are available for Azure AI VoiceLive?
The SDK supports OpenAI voices (ALLOY, ASH, BALLAD, CORAL, ECHO, SAGE, SHIMMER, VERSE) and Azure voices including Standard, Custom, and Personal voices.
Can I use function calling with VoiceLive?
Yes, you can define functions using VoiceLiveFunctionDefinition and pass them via setTools() in VoiceLiveSessionOptions.
How does turn detection work in VoiceLive?
ServerVadTurnDetection uses voice activity detection to automatically detect when the user starts and stops speaking, with configurable threshold and silence duration.
What is the difference between TEXT and AUDIO modalities?
TEXT modality sends/receives text, AUDIO modality sends/receives audio. You can combine both using Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO).

Developer Details

File structure

📄 SKILL.md