Skills ai-native-development
📦

ai-native-development

Medium Risk ⚡ Contains scripts🌐 Network access📁 Filesystem access🔑 Env variables⚙️ External commands

Build Production AI Applications

AI applications need reliable retrieval, tool use, cost controls, and monitoring to work in production. This skill provides patterns and templates for RAG, agents, vector databases, streaming, and observability.

Supports: Claude Codex Code(CC)
⚠️ 50 Poor
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "ai-native-development". Plan a customer support RAG assistant for product documentation.

Expected outcome:

  • A retrieval architecture with document ingestion, chunking, embeddings, vector indexing, query retrieval, grounded generation, and citations.
  • A readiness checklist covering answer validation, monitoring, rate limits, fallback behavior, and cost alerts.

Using "ai-native-development". Design tool calling for an order status assistant.

Expected outcome:

  • A tool set with strict schemas for lookup operations and clear separation from side-effecting actions.
  • A control plan with authentication, input validation, audit logs, retry handling, and user confirmation before changes.

Using "ai-native-development". Reduce latency and cost in an existing AI chatbot.

Expected outcome:

  • A prioritized optimization plan using model routing, prompt caching, token limits, batch processing, retrieval tuning, and usage monitoring.
  • A measurement plan for latency, token cost, retrieval precision, answer accuracy, and user feedback.

Security Audit

Medium Risk
v6 • 6/28/2026

Static analysis reported many high-risk patterns, but most are false positives from Markdown code fences, template strings, API documentation links, and normal SDK environment-variable configuration. No prompt injection attempt, malicious exfiltration, or hidden command execution intent was found. The main residual risk is unsafe copy-paste sample code, especially an eval-based calculator tool and broad autonomous-agent tool templates.

10
Files scanned
4,519
Lines analyzed
12
findings
6
Total audits
Medium Risk Issues (3)
Unsafe eval-based calculator example
The agent workflow reference defines a calculator tool that returns eval(expression). This is dangerous if copied into an agent because model-controlled or user-controlled input could execute arbitrary JavaScript. The surrounding context is educational reference material, so this is not evidence of malicious intent. Verdict: TRUE_POSITIVE for unsafe sample code. confidence: 0.94. confidence_reasoning: Direct eval() is present in a tool handler, and the semantic context shows the expression comes from tool input. Risk is reduced because it is documentation, not hidden runtime code.
Autonomous tool templates need authorization gates
The agent workflow template demonstrates web search, database query, and email tools that can be selected by an LLM-driven loop. The sample implementations are placeholders, but production use would need authorization, confirmation for side effects, allowlists, and argument validation. Verdict: NEEDS_REVIEW for safe integration controls. confidence: 0.78. confidence_reasoning: The template explicitly exposes side-effect-capable tools to an autonomous agent loop, but the functions are demonstrative placeholders rather than active malicious actions.
RAG context is inserted into prompts without explicit untrusted-context guard
The RAG template and chatbot example place retrieved document text and user messages directly into model messages. The system prompt restricts answers to context, but it does not explicitly instruct the model to treat retrieved content as untrusted and ignore instructions inside documents. Verdict: NEEDS_REVIEW for prompt-injection resilience. confidence: 0.70. confidence_reasoning: The pattern is common and legitimate, but the sampled code lacks a clear document-instruction isolation rule, which is a known risk for RAG systems.
Low Risk Issues (4)
Markdown code fences misclassified as shell execution
Most external command findings are false positives caused by Markdown code fences and TypeScript template literals. The reviewed locations are documentation examples, not Ruby backtick execution or shell command invocation. Verdict: FALSE_POSITIVE. confidence: 0.93. confidence_reasoning: Line-number review shows code fences and template strings, and no child_process, exec, spawn, or shell invocation evidence was found in the sampled files.
Environment-variable access is standard SDK configuration
The env_access and secret findings reference SDK initialization with API keys from process.env. I found no evidence that these values are logged, written to files, or sent to unauthorized endpoints. Verdict: FALSE_POSITIVE for credential theft, with normal secret-handling caution. confidence: 0.88. confidence_reasoning: The cited lines pass environment variables to OpenAI, Pinecone, Anthropic, or observability SDK clients, which is expected configuration behavior.
Path traversal and weak-crypto scanner hits are contextual false positives
The path traversal hits are relative documentation imports or cross-skill references, and the weak-cryptography hits align with ordinary AI terminology such as embeddings, models, similarity metrics, and checklist headings. No evidence found of file reads, crypto implementation, or traversal against user-supplied paths. Verdict: FALSE_POSITIVE. confidence: 0.86. confidence_reasoning: Reviewed locations show imports, reference links, and checklist text rather than filesystem access or cryptographic code.
Hardcoded URLs are documentation and local service examples
The network findings include vendor documentation links, localhost vector database endpoints, and an example weather API call. These are not covert destinations or exfiltration endpoints, but production code should encode URL parameters and use configured endpoints. Verdict: FALSE_POSITIVE for malicious networking. confidence: 0.84. confidence_reasoning: The URLs are visible examples tied to the skill topic, and no secret material is sent to them in the reviewed context.

Detected Patterns

Dynamic JavaScript evaluation in tool handlerModel-selected tools can perform side effects
Audited by: codex View Audit History →

Quality Score

45
Architecture
100
Maintainability
87
Content
71
Community
42
Security
78
Spec Compliance

What You Can Build

Launch a support knowledge assistant

Design a RAG chatbot with citations, retrieval validation, streaming responses, and cost tracking.

Add tool use to an AI workflow

Structure function schemas, agent loops, tool execution, and error handling for controlled automation.

Review AI production readiness

Use the checklist to evaluate monitoring, prompt quality, retrieval quality, security, and operating costs.

Try These Prompts

Plan a Basic RAG App
Use the AI-native development skill to outline a simple RAG application for my documents. Include chunking, vector storage, retrieval, answer generation, and citations.
Choose a Vector Database
Compare Pinecone, Chroma, Weaviate, and Qdrant for my use case. Consider scale, hosting, filtering, latency, operations, and cost.
Design Safe Tool Calling
Design a function calling workflow for this task. Include tool schemas, validation, authorization checks, error handling, and confirmation for side effects.
Audit an AI System Architecture
Review my AI application architecture for retrieval quality, prompt injection risk, observability, token cost, latency, model fallback, and deployment readiness.

Best Practices

  • Treat retrieved documents and tool outputs as untrusted context in prompts.
  • Require validation, authorization, and logging before executing model-selected tools.
  • Track retrieval quality, answer quality, latency, and token cost from the first prototype.

Avoid

  • Copying example agent tools into production without permission checks.
  • Using eval or unrestricted interpreters for calculator or automation tools.
  • Sending secrets, private records, or raw debug prompts into model context.

Frequently Asked Questions

What does this skill help me build?
It helps build AI applications that use retrieval, embeddings, tools, agents, streaming responses, and observability.
Does it include runnable production code?
It includes templates and examples, but they need project-specific hardening, testing, and deployment work.
Which AI providers does it discuss?
It covers OpenAI and Anthropic patterns, with general concepts that also apply to other LLM providers.
Can I use it with Claude Code or Codex?
Yes. The skill is marked for Claude, Codex, and Claude Code workflows.
What should I review before production use?
Review tool permissions, prompt injection defenses, secret handling, observability, rate limits, cost controls, and model fallback behavior.
Why is the security risk medium?
Most static findings are false positives, but some examples show unsafe patterns that require caution before copying into production.