Skills Azure Speech to Text REST API for Short Audio

📦

Azure Speech to Text REST API for Short Audio

Name: Azure Speech to Text REST API for Short Audio
Author: sickn33

Safe

Transcribe Audio with Azure Speech to Text REST API

Transcribe short audio files (up to 60 seconds) using Azure Speech to Text REST API without installing the Speech SDK. This skill provides ready-to-use Python code for authentication, audio formatting, and error handling.

Supports: Claude Codex Code(CC)

📊 70 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "Azure Speech to Text REST API for Short Audio". Transcribe audio.wav using Azure Speech to Text with detailed format

Expected outcome:

Transcription successful!

Display Text: What's the weather like today?
Confidence: 90.5%

The audio has been transcribed using Azure Speech to Text REST API with detailed response format.

Using "Azure Speech to Text REST API for Short Audio". How do I authenticate with Azure Speech to Text using bearer tokens?

Expected outcome:

You can authenticate using bearer tokens for improved security. The token is valid for 10 minutes. Here's how to implement it:

1. Get access token from the token endpoint
2. Use the token in the Authorization header
3. Cache the token and renew before expiration

This approach is recommended for production applications.

Security Audit

Safe

v1 • 2/25/2026

This is a prompt-only documentation skill that provides guidance for integrating with Azure Speech to Text REST API. Static analysis found no suspicious patterns, no executable code, and no risk factors. The skill consists of documentation and code examples for legitimate API integration. No security concerns identified.

Files scanned

Lines analyzed

findings

Total audits

No security issues found

Audited by: claude

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Transcribe Voice Memos

Convert short voice recordings to text for note-taking and documentation

Automated Call Transcription

Transcribe short phone call recordings for analysis and records

Multilingual Content Transcription

Transcribe audio content in multiple languages using Azure language support

Try These Prompts

Basic Audio Transcription

Use the Azure Speech to Text REST API skill to transcribe the audio file at path [AUDIO_FILE_PATH] to text. Use language [LANGUAGE_CODE] (e.g., en-US).

Detailed Transcription with Confidence

Use the Azure Speech to Text REST API skill to transcribe [AUDIO_FILE_PATH] using detailed format to get confidence scores. Language: [LANGUAGE]. Handle errors gracefully.

Async Transcription for Performance

Use the Azure Speech to Text REST API skill to transcribe [AUDIO_FILE_PATH] asynchronously. Show how to implement the async version with aiohttp for better performance.

Custom Error Handling

Use the Azure Speech to Text REST API skill to write a transcription function that handles all RecognitionStatus values (Success, NoMatch, InitialSilenceTimeout, BabbleTimeout, Error) with appropriate responses.

Best Practices

Use WAV PCM format at 16kHz mono for best recognition accuracy
Cache bearer tokens for 9 minutes to avoid repeated authentication
Enable chunked transfer encoding for lower latency on larger files

Avoid

Do not send audio files longer than 60 seconds - use Batch Transcription API instead
Do not use this for real-time streaming - use Speech SDK streaming
Do not hardcode API keys in source code - use environment variables

Frequently Asked Questions

What audio formats does Azure Speech to Text REST API support?

Azure Speech to Text REST API supports WAV with PCM codec at 16kHz mono (recommended) and OGG OPUS codec. The audio must be no longer than 60 seconds.

Do I need to install the Azure Speech SDK to use this skill?

No, this skill uses the REST API directly with the requests library. No SDK installation is required. Simply install the requests package.

How do I get Azure Speech to Text credentials?

Create an Azure subscription, then create a Speech resource in the Azure Portal. Go to the resource Keys and Endpoint page to get your API key and region.

What is the difference between simple and detailed response format?

Simple format returns just the DisplayText. Detailed format returns confidence scores, lexical form, ITN (inverse text normalization), and masked ITN for each result.

Can I transcribe audio in languages other than English?

Yes, Azure Speech to Text supports many languages. Specify the language using the language query parameter (e.g., de-DE for German, fr-FR for French).

How do I handle authentication errors?

Check that your API key is correct and has not expired. Ensure the region in your URL matches your resource region. Use bearer tokens for production to avoid key exposure.

Developer Details

Author

sickn33

License

MIT

Repository

https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/azure-speech-to-text-rest-py

Ref

main

File structure

📄 SKILL.md