modal
Run Python code in the cloud
Also available from: davila7
Modal is a serverless platform for running Python code in the cloud. It provides instant access to GPUs, automatic scaling, and pay-per-use billing. Deploy ML models, run batch processing jobs, and serve APIs without managing infrastructure.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "modal". Deploy a Python function that summarizes text using a HuggingFace model on GPU
Expected outcome:
- ✓ Created Modal app with L40S GPU access
- ✓ Built container image with transformers and torch
- ✓ Deployed web endpoint for text summarization
- ✓ Endpoint available at https://your-app.modal.run
Using "modal". Run a batch job to process 1000 images in parallel
Expected outcome:
- ✓ Created worker function with 4 CPU cores and 8GB memory
- ✓ Configured parallel processing across 50 containers
- ✓ Processed 1000 images in ~8 minutes
- ✓ Results saved to Modal Volume at /data/output/
Using "modal". Schedule daily model retraining at midnight
Expected outcome:
- ✓ Created scheduled function with cron expression '0 0 * * *'
- ✓ Configured GPU (A100) for training computations
- ✓ Set up secret management for API credentials
- ✓ Training logs available in Modal dashboard
Security Audit
SafeThis is a documentation-only skill for Modal, a legitimate serverless cloud computing platform. All 572 static findings are FALSE POSITIVES. The scanner misinterprets Markdown documentation code examples as executable code. Patterns flagged include CLI commands in documentation (modal run, modal deploy), environment variable documentation, and legitimate Modal API patterns. No malicious code, credential exfiltration, or actual security vulnerabilities exist. This skill contains only documentation files teaching users how to properly use the Modal platform.
Risk Factors
⚙️ External commands (6)
🌐 Network access (3)
📁 Filesystem access (3)
🔑 Env variables (3)
Quality Score
What You Can Build
Deploy ML models for inference
Deploy trained models (LLMs, image classifiers) to production with GPU acceleration and auto-scaling for variable traffic.
Run batch processing jobs
Process large datasets in parallel across multiple containers. Process thousands of files or data rows simultaneously.
Execute GPU compute tasks
Run computationally intensive research tasks on H100 or A100 GPUs. Schedule training jobs and long-running computations.
Try These Prompts
Create a Modal app that runs a Python function on an L40S GPU. The function should load a HuggingFace model and return predictions. Use an appropriate container image with torch and transformers installed.
Set up a Modal function that processes CSV files in parallel. The function should read files from an S3 bucket, apply transformations, and save results. Use CPU parallelism with multiple cores.
Create a Modal scheduled function that runs daily at 2 AM. The function should refresh cached data from an API and update model weights stored in a Modal Volume.
Build a Modal web endpoint that accepts POST requests with input data. The endpoint should run inference using a deployed model and return predictions. Include proper error handling and authentication.
Best Practices
- Pin all Python package versions in image definitions to ensure reproducible builds and deployments
- Use separate Modal Secrets for different environments (dev, staging, production) to prevent credential leakage
- Configure appropriate min_containers to reduce cold start latency for latency-sensitive endpoints
Avoid
- Hardcoding API keys or credentials directly in function code instead of using Modal Secrets
- Importing heavy dependencies at module scope instead of inside function bodies, slowing down container startup
- Using sequential loops for batch processing instead of .map() for parallel execution across containers