技能 speech-to-text

🎙️

speech-to-text

Name: speech-to-text
Author: inference-sh-9

安全 ⚙️ 外部命令🌐 網路存取

使用 Whisper AI 進行音訊轉文字

使用最先進的 Whisper 模型將錄音轉換為準確的文字逐字稿。非常適合轉錄會議、播客、語音備忘錄，以及自動生成影片字幕。

支援: Claude Codex Code(CC)

📊 69 充足

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「speech-to-text」。 Transcribe the meeting recording at https://files.example.com/team-meeting.mp3

預期結果:

會議完整逐字稿文字，包含說話者識別與偵測的語言

正在使用「speech-to-text」。 Transcribe https://audio.example.com/interview.mp3 with timestamps

預期結果:

JSON 逐字稿，包含完整��字、帶時間戳記的片段與偵測的語言代碼

正在使用「speech-to-text」。 Translate the French audio at https://files.example.com/french-speech.mp3 to English

預期結果:

法語音訊內容的英文翻譯

安全審計

安全

v1 • 3/1/2026

All 37 static analysis findings are false positives from markdown code examples in documentation. The skill contains only documentation (SKILL.md) with bash command examples demonstrating inference.sh CLI usage. No executable code, no prompt injection attempts, and no malicious intent detected. The allowed-tools directive properly restricts Bash tool to infsh commands only.

已掃描檔案

130

分析行數

發現項

審計總數

審計者: claude

品質評分

架構

100

可維護性

內容

社群

100

安全

規範符合性

你能建構什麼

會議轉錄

將錄製的會議音訊轉換為可搜尋的文字，以便記錄與分享

播客製作

為播客集數產生節目筆記與逐字稿，以提升無障礙性

影片字幕製作

透過轉錄帶時間戳記的音軌來建立準確的影片字幕

試試這些提示

基本轉錄

Transcribe the audio file at https://example.com/meeting.mp3 to text

包含時間戳記

Transcribe https://example.com/podcast.mp3 and include timestamps for each segment

翻譯成英文

Translate the Spanish audio at https://example.com/spanish.mp3 to English text

影片字幕工作流程

Extract audio from https://example.com/video.mp4, transcribe it with timestamps, and prepare it for adding captions

最佳實務

使用高品質錄音以獲得最佳轉錄準確度
製作字幕或需要參考特定時間點時包含時間戳記
追求速度時選擇 Fast Whisper 模型，追求最高準確度時選擇 Whisper V3 Large
提供 MP3、WAV 或 M4A 等常見格式的音訊檔案以獲得最佳相容性

避免

請勿嘗試轉錄即時音訊串流——此工具需要檔案 URL
避免在未考慮後續處理的情況下使用極低品質或雜音過大的錄音
請勿在嘗試轉錄前忘記安裝 inference.sh CLI
避免在未經適當授權的情況下要求轉錄有版權的內容

常見問題

支援哪些音訊格式？

此工具接受可透過公開 URL 存取的音訊檔案，支援 MP3、WAV、M4A 以及 Whisper 模型支援的其他格式。

轉錄的準確度如何？

Whisper V3 Large 提供最先進的準確度。Fast Whisper Large V3 提供相似的準確度且處理速度更快。兩者皆支援 99 種以上的語言。

我需要安裝任何東西嗎？

是的，您需要使用以下��令安裝 inference.sh CLI 工具：curl -fsSL https://cli.inference.sh | sh && infsh login

我可以轉錄即時音訊嗎？

不行，此工具僅適用於預錄的音訊檔案。即時轉錄需要使用專為串流音訊設計的不同解決方案。

這兩種模型有何差異？

Fast Whisper Large V3 優先考慮速度同時維持準確度。Whisper V3 Large 提供最高的可能準確度，但處理時間可能較長。

我可以將外語音訊翻譯成英文嗎？

可以，使用 translate 任務參數即可在一個步驟中直接將外語音訊轉錄並翻譯成英文文字。

開發者詳情

作者

inference-sh-9

授權

MIT

儲存庫

https://github.com/inference-sh-9/skills/tree/main/skills/speech-to-text/

引用

main

檔案結構

📄 SKILL.md

speech-to-text

測試它

安全審計

風險因素

品質評分

你能建構什麼

會議轉錄

播客製作

影片字幕製作

試試這些提示

最佳實務

避免

常見問題

開發者詳情