Skills airunway-aks-setup

📦

airunway-aks-setup

Name: airunway-aks-setup
Author: microsoft

Low Risk

Set up AI Runway on AKS from bare cluster to running model

Deploying LLMs on Azure Kubernetes Service requires orchestrating multiple components including GPU nodes, inference providers, and model artifacts. This skill guides you through the complete setup process from cluster verification to your first working model deployment.

Supports: Claude Codex Code(CC)

🥉 72 Bronze

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "airunway-aks-setup". Set up AI Runway on my AKS cluster

Expected outcome:

Step 1 - Cluster Verification:
✓ kubectl found
✓ make found
✓ curl found
✓ Cluster context: my-cluster
✓ Nodes: 3 total, 1 GPU node detected
- GPU Node: Standard_A100_80GB
- VRAM: 80 GB
- bfloat16: Supported

Ready to proceed to Step 2?

Using "airunway-aks-setup". Deploy a model with 8B parameters on my cluster

Expected outcome:

Recommendation: meta-llama/Llama-3.1-8B-Instruct
Provider: KAITO (vLLM)
Reason: Your A100-80GB has capacity for 8B models with tensor parallel options.

This is a gated model requiring a HuggingFace access token. Shall I proceed with token collection?

Security Audit

Low Risk

v1 • 4/24/2026

This is a legitimate Microsoft-published documentation skill for AI Runway AKS setup. Static scanner flagged documentation files containing bash/PowerShell code examples as potential security issues. After evaluation, all findings are false positives: the skill provides markdown documentation with command examples for human execution, not executable code. No actual command injection, path traversal vulnerabilities, or malicious patterns exist. The skill is safe for publication with low risk level.

Files scanned

619

Lines analyzed

findings

Total audits

Low Risk Issues (1)

SKILL.md:13 SKILL.md:19 references/steps/step-1-verify.md:9-52 references/steps/step-5-deploy.md:3-96

Documentation Code Examples Misidentified as Shell Execution

Static scanner flagged markdown files with bash/PowerShell code blocks as 'Ruby/shell backtick execution'. These are documentation files providing command examples for users to execute manually. No actual shell execution occurs. Pattern matchers cannot distinguish between executable code and human-readable documentation.

Audited by: claude

Quality Score

Architecture

100

Maintainability

Content

Community

Security

Spec Compliance

What You Can Build

First-time AI Runway deployment

New to AI Runway on AKS. Get a complete walkthrough from cluster verification through your first working model deployment with GPU acceleration.

GPU capability assessment

Discover available GPU hardware, check dtype support (bfloat16, float16), and receive model recommendations based on your cluster VRAM capacity.

Troubleshoot failed deployments

Resume from a specific step after partial setup, or follow rollback procedures to undo a failed deployment and start fresh.

Try These Prompts

Basic AI Runway Setup

Set up AI Runway on my AKS cluster. I have an existing cluster with GPU nodes.

Resume from Specific Step

Skip to step 4 and set up the KAITO inference provider on my AKS cluster.

GPU Assessment Only

Check what GPUs are available in my AKS cluster and tell me which models I can run.

Deploy Specific Model

Deploy the Llama-3.1-8B model to my AKS cluster using AI Runway. I have an A100-80GB node.

Best Practices

Always confirm GPU node availability and VRAM capacity before selecting a model size
Start with non-gated models like Phi-3 or Gemma to validate the setup before using gated models
Use the skip-to-step parameter to resume from specific steps after interruptions

Avoid

Do not run this skill without first confirming you understand GPU compute costs on Azure
Do not skip cluster verification — understanding your GPU hardware is required for model selection
Do not attempt gated models (Llama, etc.) until you have validated the setup with a non-gated model

Frequently Asked Questions

What is AI Runway?

AI Runway is a Kubernetes-native framework for deploying and managing LLM inference on Azure Kubernetes Service. It provides Custom Resource Definitions for model deployments and integrates with inference providers like KAITO, Dynamo, and KubeRay.

Do I need an existing AKS cluster?

Yes. This skill assumes you have an existing AKS cluster. If you need to create one, use the azure-kubernetes skill first to provision a cluster with GPU nodes, then return to this skill.

What GPUs are supported?

AI Runway supports NVIDIA GPUs including T4, V100, A10, A10G, L4, L40S, A100, H100, and H200. Each has different dtype support — older GPUs like T4 and V100 do not support bfloat16.

Why did my deployment fail with bfloat16 errors?

T4 and V100 GPUs do not support bfloat16 precision. Add --dtype float16 to serving arguments, or switch to xformers attention backend. Check the gpu-profiles.md reference for your specific GPU constraints.

How long does model deployment take?

Small models (1B-8B) typically deploy in 5-10 minutes. Large models (70B+) may take 20-40+ minutes because model weights must be downloaded from HuggingFace first. Check pod logs for download progress.

How do I rollback a failed deployment?

Use kubectl delete for model deployments and secrets. For providers and controllers, use make commands from the AI Runway repo root. See troubleshooting.md for the complete rollback sequence.