Skills ai-avatar-video

🎬

ai-avatar-video

Name: ai-avatar-video
Author: inference-skills

Safe

Create AI Avatar and Talking Head Videos

Also available from: doany-ai,qu-skills,inference-sh-skills,infsh-skills,agentspace-so,inference-sh,skills-shell,runcomfy-com

Creating professional AI avatar videos traditionally requires complex video editing or expensive SaaS platforms. This skill provides a unified interface to generate talking head videos from images, audio, or text scripts using the inference.sh CLI.

Supports: Claude Codex Code(CC)

📊 69 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "ai-avatar-video". Portrait image of a professional + script: 'Welcome to our quarterly review...'

Expected outcome:

A video file showing the portrait image with realistic lip movements synchronized to the generated speech audio, delivered as a downloadable video file.

Using "ai-avatar-video". Portrait image + existing audio file of a speech

Expected outcome:

A talking head video where the person in the image appears to deliver the speech with natural facial movements and accurate lip synchronization.

Using "ai-avatar-video". Original training video + translated Spanish audio

Expected outcome:

A version of the training video with the original visual presenter now speaking the translated Spanish audio with proper lip sync.

Security Audit

Safe

v1 • 5/4/2026

Documentation skill for AI video generation via inference.sh CLI. All static findings are false positives. The external_commands (29) are example CLI commands in code blocks demonstrating belt tool usage. The network URLs (20) reference the inference.sh service API endpoints and documentation. The weak_crypto flag (1) is a false positive triggered by YAML frontmatter text mentioning 'algorithm'. No malicious code, command injection, or data exfiltration patterns present.

Files scanned

216

Lines analyzed

findings

Total audits

High Risk Issues (1)

SKILL.md:3

Weak Cryptographic Algorithm Flag (False Positive)

Static analyzer flagged 'weak_crypto: Weak cryptographic algorithm' at SKILL.md:3. This is a false positive. Line 3 contains YAML frontmatter describing the skill capabilities. The analyzer likely detected the word 'algorithm' in text like 'talking head generation, virtual presenters'. No cryptographic code exists in this documentation file.

Medium Risk Issues (1)

SKILL.md:15 SKILL.md:17-26 SKILL.md:26-34 SKILL.md:34-37 SKILL.md:53-61 SKILL.md:61-65 SKILL.md:65-74 SKILL.md:74-78 SKILL.md:78-83 SKILL.md:83-89 SKILL.md:89-102 SKILL.md:102-106 SKILL.md:106-111 SKILL.md:111-117 SKILL.md:117-122 SKILL.md:122-126 SKILL.md:126-131 SKILL.md:131-137 SKILL.md:137-148 SKILL.md:148-152 SKILL.md:152-166 SKILL.md:166-183 SKILL.md:183-189 SKILL.md:189-207

External Commands Documentation (False Positive)

Static analyzer flagged 29 instances of 'external_commands' (Ruby/shell backtick execution) at various SKILL.md lines. These are all shell command examples displayed in fenced code blocks (```bash blocks). The backtick detection likely triggered on code block syntax. These are documented CLI commands (`belt app run ...`) demonstrating proper belt CLI usage for the inference.sh service. No command injection vulnerabilities exist - the commands are static examples showing API usage patterns.

Low Risk Issues (1)

SKILL.md:9 SKILL.md:11 SKILL.md:15 SKILL.md:22 SKILL.md:55 SKILL.md:67 SKILL.md:80 SKILL.md:81 SKILL.md:108 SKILL.md:109 SKILL.md:119 SKILL.md:120 SKILL.md:128 SKILL.md:129 SKILL.md:145 SKILL.md:154 SKILL.md:163 SKILL.md:213 SKILL.md:214 SKILL.md:215

Hardcoded URLs to External Service (False Positive)

Static analyzer flagged 20 instances of 'hardcoded URLs' (network pattern). These URLs point to: inference.sh service endpoints, documentation links, and image assets. All URLs are legitimate references to the inference.sh service that this skill documents. No suspicious external connections or data exfiltration detected.

Audited by: claude

Quality Score

Architecture

100

Maintainability

Content

Community

Security

Spec Compliance

What You Can Build

Product Demo Videos

Create engaging product demonstrations with an AI presenter. Upload a professional portrait and script your talking points - the avatar delivers your message with natural lip synchronization.

Training Content Localization

Translate training videos into multiple languages. Transcribe the original, translate the script, generate new audio, and sync to your presenter avatar for consistent global training materials.

Social Media Content Creation

Produce consistent avatar content for social channels. Generate talking head videos from portrait images with AI-generated voices, reducing video production costs and turnaround time.

Try These Prompts

Basic Avatar from Script

Generate an avatar video using a portrait image with a text script and AI voice

Avatar from Audio File

Create a talking head video that syncs an existing portrait to a provided audio file

Multi-Language Dubbing

Transcribe, translate, and create a lip-synced avatar version of a video in a target language

Full Portrait + Avatar Pipeline

Generate a portrait image first, then create an avatar video from that portrait with TTS

Best Practices

Use high-quality, front-facing portrait photos with clear visibility of the face and good lighting for best results
Generate audio with clear speech and minimal background noise before creating avatar videos
Use P-Video-Avatar for the best balance of speed, cost, and quality - it includes built-in TTS and 1080p output

Avoid

Do not use low-quality or heavily filtered portrait images - avatar lip sync quality depends on input image clarity
Do not use audio with significant background noise - this degrades lip sync accuracy
Do not skip the audio generation step when using models without built-in TTS (OmniHuman, PixVerse)

Frequently Asked Questions

What is the recommended model for avatar video generation?

P-Video-Avatar is recommended. It is 18x faster and 6x cheaper than alternatives, supports built-in TTS with 30 voices in 10 languages, and outputs at 1080p resolution.

How do I create an avatar if I do not have a portrait image?

Generate a portrait image first using the p-image model (pruna/p-image) with prompts like 'professional headshot portrait of a young woman, neutral background', then use that image as input for avatar creation.

Can I use my own voice instead of AI-generated speech?

Yes. Upload your own audio file using the 'audio' parameter instead of 'voice_script'. This is supported by all models including P-Video-Avatar.

How do I localize videos into other languages?

Use fast-whisper to transcribe the original video, translate the text, generate new speech with kokoro-tts in the target language, then sync using latentsync-1-6.

What image formats are supported?

Provide image URLs (http:// or https://). Supported formats include JPG, PNG, and WebP. For best results, use high-quality portraits with front-facing composition.

How long does video generation take?

Generation time varies by model. P-Video-Avatar processes at ~1.83 seconds per second of video. OmniHuman 1.5 takes ~28s per second. Higher resolutions take longer to process.

Developer Details

Author

inference-skills

License

MIT

Repository

https://github.com/inference-skills/skills/tree/main/tools/video/ai-avatar-video/

Ref

main

File structure

📄 SKILL.md