Free data engineering zoomcamp starts on January 16.
GitHub
GitHub - DataTalksClub/data-engineering-zoomcamp: Data Engineering Zoomcamp is a free 9-week course on building production-ready…
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼 - DataTalksClub/data-engineering-zoomcamp
👍3
A new book by Andy Grove, creator of DataFusion, about query engines. DataFusion is an extensible query planning, optimization, and execution framework, written in Rust, that uses Apache Arrow as its in-memory format.
👍1
AWS Lambdas - Python vs Rust. Performance and Cost Savings. - Confessions of a Data Guy
https://www.confessionsofadataguy.com/aws-lambdas-python-vs-rust-performance-and-cost-savings/
https://www.confessionsofadataguy.com/aws-lambdas-python-vs-rust-performance-and-cost-savings/
Confessions of a Data Guy
AWS Lambdas - Python vs Rust. Performance and Cost Savings. - Confessions of a Data Guy
Save money, save money!! Hear Hear! Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like…
👍2
Guide to Partitions Calculation for Processing Data Files in Apache Spark - DZone
https://dzone.com/articles/guide-to-partitions-calculation-for-processing-dat
https://dzone.com/articles/guide-to-partitions-calculation-for-processing-dat
DZone
Guide to Partitions Calculation for Processing Data Files in Apache Spark
Get to Know how Spark chooses the number of partitions implicitly while reading a set of data files into an RDD or a Dataset.
👍1
Build a poor man’s data lake from scratch with DuckDB | Dagster Blog
https://dagster.io/blog/duckdb-data-lake
https://dagster.io/blog/duckdb-data-lake
dagster.io
Build a Data Lake with DuckDB + Dagster
Use DuckDB, Python, and Dagster to build a lightweight data lake with SQL transforms and Parquet file support.
Pandas 2.0 and its Ecosystem (Arrow, Polars, DuckDB) | Airbyte
https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-duckdb
https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-duckdb
Airbyte
Pandas 2.0 and its Ecosystem (Arrow, Polars, DuckDB) | Airbyte
Dive deeper into the power of Pandas and how leveraging it can benefit your organization. Explore a new way to work with data and unlock powerful insights!
Home - Apache Doris
https://doris.apache.org/
https://doris.apache.org/
doris.apache.org
Apache Doris: Open source data warehouse for real time data analytics - Apache Doris
Apache Doris is an open-source database based on MPP architecture,with easier use and higher performance. As a modern data warehouse, apache doris empowers your Olap query and database analytics.
GitHub - MaterializeInc/datagen: Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.
https://github.com/MaterializeInc/datagen
https://github.com/MaterializeInc/datagen
GitHub
GitHub - MaterializeInc/datagen: Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka…
Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format. - MaterializeInc/datagen