المهارات vector-database-engineer
🔍

vector-database-engineer

آمن

Build scalable vector search systems

Implement production-ready vector databases and semantic search. This skill provides expert guidance on embedding strategies, index optimization, and RAG architecture for modern AI applications.

يدعم: Claude Codex Code(CC)
🥉 74 برونزي
1

تنزيل ZIP المهارة

2

رفع في Claude

اذهب إلى Settings → Capabilities → Skills → Upload skill

3

فعّل وابدأ الاستخدام

اختبرها

استخدام "vector-database-engineer". How should I chunk 500-page PDFs for semantic search?

النتيجة المتوقعة:

Use recursive character text splitting with 1000-1500 character chunks and 200 character overlap. This preserves context while maintaining semantic coherence. For technical documents, consider structure-aware chunking that respects section boundaries.

استخدام "vector-database-engineer". Compare Pinecone vs Weaviate for production

النتيجة المتوقعة:

Pinecone offers managed scalability with zero operational overhead but has vendor lock-in. Weaviate provides self-hosted flexibility with hybrid search built-in but requires infrastructure management. Choose Pinecone for rapid development, Weaviate for cost control at scale.

التدقيق الأمني

آمن
v1 • 2/25/2026

All static analysis findings are false positives. The skill contains only documentation text with no executable code, network requests, or security risks. The 'external_commands' flag was triggered by the word 'open' in a documentation sentence, not actual command execution. This is a legitimate educational skill about vector database engineering.

1
الملفات التي تم فحصها
63
الأسطر التي تم تحليلها
0
النتائج
1
إجمالي عمليات التدقيق
لا توجد مشكلات أمنية
تم تدقيقه بواسطة: claude

درجة الجودة

38
الهندسة المعمارية
100
قابلية الصيانة
85
المحتوى
50
المجتمع
100
الأمان
91
الامتثال للمواصفات

ماذا يمكنك بناءه

Build a RAG knowledge base

Design semantic search over documentation for AI-powered question answering

Implement recommendation engine

Create similarity-based product recommendations using vector embeddings

Optimize vector search performance

Tune indexing and chunking strategies for millions of vectors

جرّب هذه الموجهات

Select a vector database
Help me choose between Pinecone, Weaviate, and Qdrant for a document search system with 1 million vectors
Design embedding strategy
Design an embedding pipeline for technical documentation. Recommend chunking size, overlap, and model selection
Configure HNSW index
Configure HNSW index parameters for 90% recall at under 50ms latency on 5 million vectors
Implement hybrid search
Implement hybrid search combining vector similarity with keyword filters for product search

أفضل الممارسات

  • Always test embedding models on your specific domain before production deployment
  • Start with simple chunking strategies before optimizing for complex document structures
  • Monitor vector drift and plan periodic re-embedding cycles
  • Use metadata filtering to reduce search space before vector queries

تجنب

  • Using larger embedding dimensions without testing if smaller models work for your use case
  • Chunking documents without overlap, losing context between segments
  • Skipping recall testing and only measuring latency
  • Storing embeddings without their source text or metadata references

الأسئلة المتكررة

What is the difference between HNSW and IVF indexing?
HNSW (Hierarchical Navigable Small World) provides faster queries with higher memory usage. IVF (Inverted File) uses less memory but has slower query speed. Use HNSW for real-time applications, IVF for cost-sensitive large-scale deployments.
How do I choose embedding dimensions?
Higher dimensions (1536) capture more semantic nuance but increase storage and latency. Start with 384-768 dimensions for most use cases. Only use 1536 if you have complex semantic relationships and sufficient infrastructure budget.
Should I use pre-filtering or post-filtering for metadata?
Pre-filtering reduces search space and improves performance but may miss relevant results. Post-filtering ensures recall but wastes computation on filtered results. Use pre-filtering for strict constraints, post-filtering for soft preferences.
What vector database should I use?
Pinecone for managed simplicity, Weaviate for hybrid search features, Qdrant for performance and filtering, pgvector if you already use PostgreSQL. Choose based on your team's expertise and infrastructure preferences.
How do I handle embedding drift?
Embedding drift occurs when your data distribution changes over time. Monitor search quality metrics monthly and schedule quarterly re-embedding for critical applications. Use A/B testing to compare old and new embeddings before full migration.
Can I use this skill to directly query my vector database?
No, this skill provides guidance and code generation for vector database architecture and optimization. It does not execute queries or connect to your database directly. You must implement the suggested code in your application.

تفاصيل المطور

بنية الملفات

📄 SKILL.md