GitHub - MaterializeInc/datagen: Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.
https://github.com/MaterializeInc/datagen
https://github.com/MaterializeInc/datagen
GitHub
GitHub - MaterializeInc/datagen: Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka…
Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format. - MaterializeInc/datagen
GitHub - eto-ai/lance: Modern columnar data format for ML implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://github.com/eto-ai/lance
https://github.com/eto-ai/lance
GitHub
GitHub - lancedb/lance: Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code…
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du...
👍1
Lightning fast aggregations by distributing DuckDB across AWS Lambda functions | by BoilingData.com | Medium
https://boilingdata.medium.com/lightning-fast-aggregations-by-distributing-duckdb-across-aws-lambda-functions-e4775931ab04
https://boilingdata.medium.com/lightning-fast-aggregations-by-distributing-duckdb-across-aws-lambda-functions-e4775931ab04
Medium
Lightning fast aggregations by distributing DuckDB across AWS Lambda functions
DuckDB is rapidly changing the way data scientists and engineers work. It’s efficient and internally parallelised architecture means that a…
Simplify Online Analytical Processing (OLAP) queries in Amazon Redshift using new SQL constructs such as ROLLUP, CUBE, and GROUPING SETS
Amazon
Simplify Online Analytical Processing (OLAP) queries in Amazon Redshift using new SQL constructs such as ROLLUP, CUBE, and GROUPING…
Amazon Redshift is a fully managed, petabyte-scale, massively parallel data warehouse that makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. We are continuously investing…
BigQuery under the hood: Behind the serverless storage and query optimizations that supercharge performance
Google Cloud Blog
Inside BigQuery’s storage and query optimizations | Google Cloud Blog
BigQuery’s serverless architecture features storage and query optimizations that deliver transformational data analytics performance.
Build a real-time GDPR-aligned Apache Iceberg data lake.
Amazon
Build a real-time GDPR-aligned Apache Iceberg data lake | Amazon Web Services
Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. But regulations such as the General Data Protection Regulation…
👍3
Implement slowly changing dimensions in a data lake using AWS Glue and Delta | AWS Big Data Blog
https://aws.amazon.com/blogs/big-data/implement-slowly-changing-dimensions-in-a-data-lake-using-aws-glue-and-delta/
https://aws.amazon.com/blogs/big-data/implement-slowly-changing-dimensions-in-a-data-lake-using-aws-glue-and-delta/
Amazon
Implement slowly changing dimensions in a data lake using AWS Glue and Delta | Amazon Web Services
In a data warehouse, a dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. To illustrate an example, in a typical sales domain, customer, time or product are dimensions and sales transactions…
Welcome to Marvin - Marvin
https://www.askmarvin.ai/
https://www.askmarvin.ai/
Marvin
Marvin - Marvin
A powerful framework for building AI applications
The Truth about Prefect, Mage, and Airflow.
https://dataengineeringcentral.substack.com/p/the-truth-about-prefect-mage-and
https://dataengineeringcentral.substack.com/p/the-truth-about-prefect-mage-and
Substack
The Truth about Prefect, Mage, and Airflow.
The Battle for the Orchestration Future.
MLOps Is Overfitting: Here’s Why
https://lakefs.io/blog/mlops-is-overfitting/
https://lakefs.io/blog/mlops-is-overfitting/
Git for Data - lakeFS
How To Improve ML Pipeline Development With Reproducibility
In this article we'll explore how to improve your ML pipeline development with MLOps tools for reproducible experiments. Read on to learn more.
The future of the data engineer — Part I | by Analytics at Meta | Apr, 2023 | Medium
https://medium.com/@AnalyticsAtMeta/the-future-of-the-data-engineer-part-i-32bd125465be
https://medium.com/@AnalyticsAtMeta/the-future-of-the-data-engineer-part-i-32bd125465be
Medium
The future of the data engineer — Part I
Introduction
❤1