data-engineering-data-pipeline
建構可擴展的資料管道
設計可投入生產環境的資料管道既複雜又容易出錯。此技能提供 ETL、串流和湖屋系統的經證實架構模式與實作指引。
스킬 ZIP 다운로드
Claude에서 업로드
설정 → 기능 → 스킬 → 스킬 업로드로 이동
토글을 켜고 사용 시작
테스트해 보기
"data-engineering-data-pipeline" 사용 중입니다. Design a batch pipeline for daily customer data sync from MySQL to Snowflake
예상 결과:
Architecture: ELT pattern with incremental loading. Components: (1) Extract using watermark column 'updated_at', (2) Load raw data to S3 staging, (3) Transform in Snowflake with dbt, (4) Validate with dbt tests, (5) Alert on failures via Slack. Key considerations: Handle late-arriving data, implement retry logic, monitor row count variance.
"data-engineering-data-pipeline" 사용 중입니다. How do I handle schema evolution in a streaming pipeline?
예상 결과:
Strategy: Use schema registry with compatibility checks. For additive changes, use default values. For breaking changes, implement dual-write during migration. Tools: Confluent Schema Registry for Kafka, Delta Lake schema evolution with mergeSchema option. Always test backward compatibility before deployment.
보안 감사
낮은 위험All static analyzer findings are false positives. The skill is documentation-only, providing architectural guidance and educational code examples. No executable code, external commands, or security risks detected. Safe for publication.
낮은 위험 문제 (3)
품질 점수
만들 수 있는 것
全新管道架構設計
為從電子試算表遷移至現代資料堆疊的新創公司從頭設計完整的資料管道。
串流遷移策略
使用 Kafka 與串流處理框架,將現有批次管道轉換為即時串流架構。
資料品質框架實作
使用 Great Expectations 與具自動警報的 dbt 測試,實作全面的資料品質檢查。
이 프롬프트를 사용해 보세요
I need to build a data pipeline that extracts data from PostgreSQL daily, transforms it, and loads it to a data warehouse. What architecture should I use and what are the key components?
We have high-volume event data from our application and need near-real-time analytics. Compare Lambda vs Kappa architecture for our use case with 1M events per minute.
Show me how to implement data quality checks for our orders table using Great Expectations. We need to validate uniqueness of order IDs, non-null customer IDs, and positive order amounts.
Our monthly data pipeline costs have doubled. Review our architecture and provide specific recommendations to reduce costs while maintaining SLA. Current stack: Airflow, Spark, S3, Redshift.
모범 사례
- 在選擇架構模式前,先評估資料來源、量、延遲需求與目標系統
- 使用浮水印欄位實作增量處理,以避免重新處理完整資料集
- 在每個管道階段加入資料品質閘道,並在驗證失敗時自動發出警報
피하기
- 未針對特定資料量與速度需求進行調整,直接複製生產環境模式
- 基於趨勢而非業務需求與團隊能力選擇架構
- 優先考慮功能而非監控、可觀察性與操作手冊