Process large datasets in memory with Polars, the high-performance DataFrame library. Features lazy evaluation, parallel execution, and Apache Arrow backend for 10x faster operations than pandas.
下載技能 ZIP
在 Claude 中上傳
前往 設定 → 功能 → 技能 → 上傳技能
開啟並開始使用
測試它
正在使用「polars」。 Load a CSV file and filter rows where age is greater than 25
預期結果:
- Created DataFrame with columns: name, age, city
- Filtered to 2 rows where age > 25
- Columns selected: name, age
正在使用「polars」。 Group sales data by product category and calculate total and average sales
預期結果:
- Grouped by product_category
- Calculated sum and mean of sales_amount
- Result includes: category, total_sales, avg_sales
正在使用「polars」。 Read a Parquet file using lazy evaluation and collect only needed columns
預期結果:
- Used scan_parquet for lazy loading
- Selected only required columns early
- Collected with predicate pushdown optimization
安全審計
安全This skill contains ONLY markdown documentation files with Python code examples. All 690 static findings are FALSE POSITIVES. The analyzer misidentified markdown code blocks, Python syntax, and Polars library methods as security threats. No executable code, shell commands, credential access, or network operations exist.
風險因素
⚙️ 外部命令 (647)
🔑 環境變數 (9)
⚡ 包含腳本 (1)
🌐 網路存取 (3)
品質評分
你能建構什麼
Build ETL pipelines
Create efficient data pipelines with lazy evaluation for memory optimization and parallel execution.
Transform and aggregate data
Filter, group, and aggregate large datasets with expression-based syntax and window functions.
Replace pandas with faster alternative
Migrate existing pandas code to Polars for significant performance improvements on medium datasets.
試試這些提示
Load a CSV file with Polars and show the first rows, column types, and basic statistics.
Filter rows where a column meets a condition and select specific columns using Polars expressions.
Group data by one or more columns and compute aggregations like mean, sum, and count.
Convert this DataFrame operation to use lazy evaluation and explain the performance benefits.
最佳實務
- Use scan_csv or scan_parquet with lazy evaluation for large datasets to enable query optimization
- Filter and select columns early in your pipeline to reduce memory usage and improve performance
- Prefer native Polars expressions over Python functions to enable parallel execution
避免
- Avoid using read_csv on large files when lazy evaluation would suffice
- Do not apply Python functions inside hot paths when Polars expressions can accomplish the same task
- Avoid loading entire datasets into memory when streaming with collect(streaming=True) would work