Skills cocoindex

🔄

cocoindex

Name: cocoindex
Author: Joseph OBrien

Safe

Build AI data pipelines with CocoIndex

Also available from: Joseph OBrien,davila7

Creating data transformation pipelines for AI applications requires understanding complex ETL patterns, embedding models, and vector databases. CocoIndex provides a unified framework for building real-time indexing flows that extract from multiple sources, transform with chunking and embeddings, and export to vector databases and knowledge graphs.

Supports: Claude Codex Code(CC)

📊 70 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "cocoindex". Build a CocoIndex flow that embeds my documents

Expected outcome:

Set up project with cocoindex package
Create flow definition with LocalFile source
Apply SplitRecursively for chunking
Use SentenceTransformerEmbed or EmbedText for vectors
Export to vector database target
Run setup then update to build index

Security Audit

Safe

v3 • 1/10/2026

Pure documentation skill containing only markdown reference files for the CocoIndex library. No executable code, scripts, or runtime components. This skill only displays documentation and does not perform any file access, network operations, or code execution.

Files scanned

1,640

Lines analyzed

findings

Total audits

No security issues found

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Build vector search indexes

Create pipelines that embed documents and store in vector databases for semantic search.

Process data for AI applications

Transform raw data through chunking, embedding, and extraction for AI model consumption.

Construct knowledge graphs

Extract structured entities using LLMs and build graph databases for relationship-based queries.

Try These Prompts

Create vector index

Help me create a CocoIndex flow that reads markdown files from a local directory, splits them into chunks of 2000 characters with 500 overlap, generates embeddings using OpenAI text-embedding-3-small, and exports to Postgres with pgvector for semantic search.

Build knowledge graph

Show me how to use CocoIndex to read JSON product files, extract structured information using GPT-4, and export the results as nodes and relationships in a Neo4j knowledge graph.

Implement live updates

I want to create a CocoIndex flow with live updates. Help me configure a local file source with a refresh interval and set up automatic processing when files change.

Write custom function

I need to create a custom CocoIndex function that calls an external API to enrich my data. Show me how to use the spec+executor pattern with caching and API authentication.

Best Practices

Use evaluate command to test flows before running update
Always assign transformed data to row fields, not local variables
Increment behavior_version when modifying cached functions
Add refresh_interval to sources for live update mode

Avoid

Using local variables instead of row fields for transformation results
Creating unnecessary dataclasses to mirror flow field schemas
Omitting type annotations on custom function return values
Running update without first running setup on new flows

Frequently Asked Questions

Which AI tools is CocoIndex compatible with?

CocoIndex works with OpenAI, Anthropic, Gemini, Voyage, and Ollama for embeddings and LLM extraction. Claude, Codex, and Claude Code can all use CocoIndex flows.

What are the size limits for data processing?

CocoIndex supports configurable concurrency limits. Set max_inflight_rows and max_inflight_bytes to control memory usage during processing.

How do I integrate with my existing codebase?

Install cocoindex package, define flows as Python functions with @cocoindex.flow_def decorator, then use CLI or Python API to operate flows.

Is my data safe when using CocoIndex?

CocoIndex runs locally with your data. API keys are read from environment variables. Source data stays on your machine except when explicitly exported to configured targets.

Why does my flow fail with database connection error?

Ensure COCOINDEX_DATABASE_URL is set in your .env file. The default is postgres://cocoindex:cocoindex@localhost/cocoindex for local development.

How does CocoIndex compare to LangChain or LlamaIndex?

CocoIndex focuses on real-time ETL pipelines with incremental processing. It complements orchestration libraries by handling the indexing and data transformation layer.

Developer Details

Author

Joseph OBrien

License

MIT

Repository

https://github.com/89jobrien/steve/tree/main/steve/skills/cocoindex

Ref

main

File structure

📁 references/

📄 api-operations.md

📄 cli-operations.md

📄 custom-functions.md

📄 flow-patterns.md

📄 SKILL.md