airunway-aks-setup
Set up AI Runway on AKS from bare cluster to running model
Deploying LLMs on Azure Kubernetes Service requires orchestrating multiple components including GPU nodes, inference providers, and model artifacts. This skill guides you through the complete setup process from cluster verification to your first working model deployment.
Download the skill ZIP
Upload in Claude
Go to Settings ā Capabilities ā Skills ā Upload skill
Toggle on and start using
Test it
Using "airunway-aks-setup". Set up AI Runway on my AKS cluster
Expected outcome:
Step 1 - Cluster Verification:
ā kubectl found
ā make found
ā curl found
ā Cluster context: my-cluster
ā Nodes: 3 total, 1 GPU node detected
- GPU Node: Standard_A100_80GB
- VRAM: 80 GB
- bfloat16: Supported
Ready to proceed to Step 2?
Using "airunway-aks-setup". Deploy a model with 8B parameters on my cluster
Expected outcome:
Recommendation: meta-llama/Llama-3.1-8B-Instruct
Provider: KAITO (vLLM)
Reason: Your A100-80GB has capacity for 8B models with tensor parallel options.
This is a gated model requiring a HuggingFace access token. Shall I proceed with token collection?
Security Audit
Low RiskThis is a legitimate Microsoft-published documentation skill for AI Runway AKS setup. Static scanner flagged documentation files containing bash/PowerShell code examples as potential security issues. After evaluation, all findings are false positives: the skill provides markdown documentation with command examples for human execution, not executable code. No actual command injection, path traversal vulnerabilities, or malicious patterns exist. The skill is safe for publication with low risk level.
Low Risk Issues (1)
Quality Score
What You Can Build
First-time AI Runway deployment
New to AI Runway on AKS. Get a complete walkthrough from cluster verification through your first working model deployment with GPU acceleration.
GPU capability assessment
Discover available GPU hardware, check dtype support (bfloat16, float16), and receive model recommendations based on your cluster VRAM capacity.
Troubleshoot failed deployments
Resume from a specific step after partial setup, or follow rollback procedures to undo a failed deployment and start fresh.
Try These Prompts
Set up AI Runway on my AKS cluster. I have an existing cluster with GPU nodes.
Skip to step 4 and set up the KAITO inference provider on my AKS cluster.
Check what GPUs are available in my AKS cluster and tell me which models I can run.
Deploy the Llama-3.1-8B model to my AKS cluster using AI Runway. I have an A100-80GB node.
Best Practices
- Always confirm GPU node availability and VRAM capacity before selecting a model size
- Start with non-gated models like Phi-3 or Gemma to validate the setup before using gated models
- Use the skip-to-step parameter to resume from specific steps after interruptions
Avoid
- Do not run this skill without first confirming you understand GPU compute costs on Azure
- Do not skip cluster verification ā understanding your GPU hardware is required for model selection
- Do not attempt gated models (Llama, etc.) until you have validated the setup with a non-gated model
Frequently Asked Questions
What is AI Runway?
Do I need an existing AKS cluster?
What GPUs are supported?
Why did my deployment fail with bfloat16 errors?
How long does model deployment take?
How do I rollback a failed deployment?
Developer Details
File structure
š references/
š steps/
š step-1-verify.md
š step-2-controller.md
š step-3-gpu.md
š step-4-provider.md
š step-5-deploy.md
š step-6-summary.md
š gpu-profiles.md
š model-sizing.md
š powershell-notes.md
š troubleshooting.md
š SKILL.md