Data1984 – Telegram
Data1984
787 subscribers
44 photos
1 video
17 files
762 links
This channel is mostly about data related stuff, some of the main topics are #DataEngineering #SQL #Python #cloud .

Contact: @gorros
Download Telegram
Arroyo is an open-source stream processing engine, enabling users to transform, filter, aggregate, and join their data streams in real-time with SQL queries. It's designed to be easy enough for any SQL user to build correct, reliable, and scalable streaming pipelines.
https://www.arroyo.dev/blog/why-arrow-and-datafusion
1
Interesting projects from Microsoft for building LLM applications:
- AICI: Prompts as (Wasm) Programs
- AutoGen: A programming framework for agentic AI
- Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps
👍1
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

https://arxiv.org/html/2404.07544v1
1
Not sure if this is a dbt alternative or just an abstraction layer 🤔
https://www.malloydata.dev/
DataFrames at Scale Comparison: TPC-H

https://docs.coiled.io/blog/tpch.html
It looks like Databricks is up to something. First it was Iceberg and now Hudi. It is not clear if they try to converge on one open table format or just want to make sure Delta Lake becomes the one. In my opinion, Iceberg has more potential to become the one.
1
Build Data Products and a Data Mesh with dbt Cloud: A tutorial from Snowflake. This is very similar to a project I am currently working on.
👍1