技能 senior-data-engineer

📊

senior-data-engineer

Name: senior-data-engineer
Author: davila7

低風險 ⚡ 包含腳本📁 檔案系統存取

建構可擴展的資料管線和 ETL 系統

也可從以下取得: alirezarezvani

運用資深專業知識設計和實作生產級資料管線。使用 Python、SQL、Spark 和現代資料堆疊工具，將原始資料轉換為可靠、可擴展的分析基礎設施。

支援: Claude Codex Code(CC)

🥉 75 青銅

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「senior-data-engineer」。 Design a data pipeline for processing customer clickstream data in real-time

預期結果:

Source: Kafka topics for raw clickstream events
Ingestion: Apache Kafka with exactly-once semantics
Stream Processing: Apache Flink for windowed aggregations
Storage: Bronze (raw), Silver (cleaned), Gold (aggregated) layers in Delta Lake
Orchestration: Airflow for batch reconciliation jobs
Monitoring: Data quality checks at each layer, latency alerts under 100ms P99

正在使用「senior-data-engineer」。 How do I improve data quality in my ETL pipelines

預期結果:

Implement schema validation at ingestion with Great Expectations or similar
Add data profiling to detect anomalies and drift early
Use checksum validation to ensure data integrity through transformations
Build automated alerts for data quality failures
Create data contracts between producers and consumers

正在使用「senior-data-engineer」。 What are best practices for monitoring data pipelines

預期結果:

Track pipeline latency, throughput, and error rates at each stage
Implement alerting for SLA breaches and data freshness issues
Use lineage tracking to understand data flow and impact
Monitor data quality metrics: completeness, accuracy, freshness
Set up automated incident response for common failures

安全審計

低風險

v5 • 1/17/2026

Static analysis flagged 57 patterns as HIGH risk, but ALL are false positives. The scanner misinterpreted documentation text ('algorithms', 'encryption') and template code (argparse calls, markdown backticks) as malicious patterns. Actual code contains only standard Python libraries, no network calls, no credential access, no external command execution. Scripts are production-ready templates with safe implementations.

已掃描檔案

1,038

分析行數

發現項

審計總數

風險因素

⚡ 包含腳本 (3)

scripts/pipeline_orchestrator.py:1-101 scripts/etl_performance_optimizer.py:1-101 scripts/data_quality_validator.py:1-101

📁 檔案系統存取 (3)

scripts/pipeline_orchestrator.py:7-12 scripts/etl_performance_optimizer.py:7-12 scripts/data_quality_validator.py:7-12

審計者: claude 查看審計歷史 →

品質評分

架構

100

可維護性

內容

社群

安全

規範符合性

你能建構什麼

設計管線架構

建立具有適當錯誤處理和監控策略的穩健資料管線設計。

改善資料品質

實作驗證框架以確保管線中的資料準確性和一致性。

擴展資料基礎設施

為機器學習工作負載和即時推論建構生產就緒的資料基礎設施。

試試這些提示

管線設計

Design a production-grade data pipeline architecture for [use case]. Include source systems, transformation logic, and target data warehouse schema. Recommend appropriate tools from the modern data stack.

ETL 優化

Analyze and optimize my ETL pipeline for [workload type]. Identify bottlenecks and suggest improvements for throughput and latency using [specific tool].

資料品質框架

Create a comprehensive data quality validation framework for [data type]. Include checks for completeness, accuracy, consistency, and timeliness.

DataOps 實作

Define DataOps best practices for our data team including CI/CD for data pipelines, monitoring strategies, and incident response procedures.

最佳實務

設計時考慮故障情況，實施適當的錯誤處理和重試機制
在擷取、轉換和輸出階段實作資料品質檢查
使用增量處理和策略性快取來優化效能

避免

建構沒有適當錯誤處理的單體管線
跳過資料驗證且不監控管線健康狀態
在批次處理即可滿足需求時仍以即時方式處理資料

常見問題

此技能支援哪些工具？

Python、SQL、Spark、Airflow、dbt、Kafka、Databricks、Snowflake、BigQuery 以及主要雲端平台。

此技能可以幫助設計什麼規模？

從啟動規模到企業工作負載的設計支援，採用適當的水平擴展模式。

這如何與現有資料基礎設施整合？

提供與現有 ETL 工具、資料倉儲和雲端平台相容的架構指導。

使用此技能時我的資料安全嗎？

這是基於提示的技能。不會儲存或傳輸任何資料。所有處理都在您的環境中進行。

為什麼 Python 指令碼是存根實作？

指令碼作為模板。實際實作取決於您的特定環境和需求。

這與聘請資料工程師有何不同？

此技能提供專家指導和模式。實際的實作、部署和維護仍需要人類工程師。

開發者詳情

作者

davila7

授權

MIT

儲存庫

https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/development/senior-data-engineer

引用

main

檔案結構

📁 references/

📄 data_modeling_patterns.md

📄 data_pipeline_architecture.md

📄 dataops_best_practices.md

📁 scripts/

📄 data_quality_validator.py

📄 etl_performance_optimizer.py

📄 pipeline_orchestrator.py

📄 SKILL.md