senior-data-engineer
建構可擴展的資料管線和 ETL 系統
متاح أيضًا من: alirezarezvani
運用資深專業知識設計和實作生產級資料管線。使用 Python、SQL、Spark 和現代資料堆疊工具,將原始資料轉換為可靠、可擴展的分析基礎設施。
تنزيل ZIP المهارة
رفع في Claude
اذهب إلى Settings → Capabilities → Skills → Upload skill
فعّل وابدأ الاستخدام
اختبرها
استخدام "senior-data-engineer". Design a data pipeline for processing customer clickstream data in real-time
النتيجة المتوقعة:
- Source: Kafka topics for raw clickstream events
- Ingestion: Apache Kafka with exactly-once semantics
- Stream Processing: Apache Flink for windowed aggregations
- Storage: Bronze (raw), Silver (cleaned), Gold (aggregated) layers in Delta Lake
- Orchestration: Airflow for batch reconciliation jobs
- Monitoring: Data quality checks at each layer, latency alerts under 100ms P99
استخدام "senior-data-engineer". How do I improve data quality in my ETL pipelines
النتيجة المتوقعة:
- Implement schema validation at ingestion with Great Expectations or similar
- Add data profiling to detect anomalies and drift early
- Use checksum validation to ensure data integrity through transformations
- Build automated alerts for data quality failures
- Create data contracts between producers and consumers
استخدام "senior-data-engineer". What are best practices for monitoring data pipelines
النتيجة المتوقعة:
- Track pipeline latency, throughput, and error rates at each stage
- Implement alerting for SLA breaches and data freshness issues
- Use lineage tracking to understand data flow and impact
- Monitor data quality metrics: completeness, accuracy, freshness
- Set up automated incident response for common failures
التدقيق الأمني
مخاطر منخفضةStatic analysis flagged 57 patterns as HIGH risk, but ALL are false positives. The scanner misinterpreted documentation text ('algorithms', 'encryption') and template code (argparse calls, markdown backticks) as malicious patterns. Actual code contains only standard Python libraries, no network calls, no credential access, no external command execution. Scripts are production-ready templates with safe implementations.
عوامل الخطر
⚡ يحتوي على سكربتات (3)
📁 الوصول إلى نظام الملفات (3)
درجة الجودة
ماذا يمكنك بناءه
設計管線架構
建立具有適當錯誤處理和監控策略的穩健資料管線設計。
改善資料品質
實作驗證框架以確保管線中的資料準確性和一致性。
擴展資料基礎設施
為機器學習工作負載和即時推論建構生產就緒的資料基礎設施。
جرّب هذه الموجهات
Design a production-grade data pipeline architecture for [use case]. Include source systems, transformation logic, and target data warehouse schema. Recommend appropriate tools from the modern data stack.
Analyze and optimize my ETL pipeline for [workload type]. Identify bottlenecks and suggest improvements for throughput and latency using [specific tool].
Create a comprehensive data quality validation framework for [data type]. Include checks for completeness, accuracy, consistency, and timeliness.
Define DataOps best practices for our data team including CI/CD for data pipelines, monitoring strategies, and incident response procedures.
أفضل الممارسات
- 設計時考慮故障情況,實施適當的錯誤處理和重試機制
- 在擷取、轉換和輸出階段實作資料品質檢查
- 使用增量處理和策略性快取來優化效能
تجنب
- 建構沒有適當錯誤處理的單體管線
- 跳過資料驗證且不監控管線健康狀態
- 在批次處理即可滿足需求時仍以即時方式處理資料