NEW BOT Телеграм, страница

Arroyo is an open-source stream processing engine, enabling users to transform, filter, aggregate, and join their data streams in real-time with SQL queries. It's designed to be easy enough for any SQL user to build correct, reliable, and scalable streaming pipelines.
https://www.arroyo.dev/blog/why-arrow-and-datafusion

www.arroyo.dev

We built a new SQL Engine on Arrow and DataFusion

Arroyo 0.10 has an entirely new SQL engine built with Apache Arrow and DataFusion. It's much faster, smaller, and easier to run. Read on for why and how we're making this change.

❤1

733 viewsedited 18:10

Data1984

Devin, the first AI software engineer 🤯

X (formerly Twitter)

Cognition (@cognition) on X

Today we're excited to introduce Devin, the first AI software engineer.

Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs…

765 viewsedited 18:28

Data1984

Interesting projects from Microsoft for building LLM applications:
- AICI: Prompts as (Wasm) Programs
- AutoGen: A programming framework for agentic AI
- Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps

GitHub

GitHub - microsoft/aici: AICI: Prompts as (Wasm) Programs

AICI: Prompts as (Wasm) Programs. Contribute to microsoft/aici development by creating an account on GitHub.

👍1

890 viewsedited 19:01

Data1984

TaskWeaver: A code-first agent framework for seamlessly planning and executing data analytics tasks.

GitHub

GitHub - microsoft/TaskWeaver: A code-first agent framework for seamlessly planning and executing data analytics tasks.

A code-first agent framework for seamlessly planning and executing data analytics tasks. - GitHub - microsoft/TaskWeaver: A code-first agent framework for seamlessly planning and executing data an...

853 views08:22

Data1984

https://delta-io.github.io/delta-rs/integrations/delta-lake-datafusion/

817 views10:36

Data1984

The Past, Present and Future of Stream Processing - Kai Waehner
https://www.kai-waehner.de/blog/2024/03/20/the-past-present-and-future-of-stream-processing/

Kai Waehner

The Past, Present and Future of Stream Processing

Stream Processing Journey with IBM, Apama, TIBCO StreamBase, Kafka Streams, Apache Flink, Streaming Databases, GenAI and Apache Iceberg.

1.02K views11:07

Data1984

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

https://arxiv.org/html/2404.07544v1

❤1

944 views06:06

Data1984

https://youtu.be/5VqM5nmcmPI?si=3OMZM-Cc0Oqsf8tT

YouTube

#200 50 Years of SQL | Don Chamberlin Computer Scientist and Co-Inventor of SQL

Over the past 199 episodes of DataFramed, we’ve heard from people at the forefront of data and AI, and over the past year we’ve constantly looked ahead to the future AI might bring. But all of the technologies and ways of working we’ve witnessed have been…

1.01K views17:27

Data1984

Not sure if this is a dbt alternative or just an abstraction layer 🤔
https://www.malloydata.dev/

www.malloydata.dev

A modern open source language for analyzing, transforming, and modeling data.

1.06K viewsedited 17:04

Data1984

https://x.com/felixzumstein/status/1791054825179718003

989 views09:25

Data1984

DataFrames at Scale Comparison: TPC-H

https://docs.coiled.io/blog/tpch.html

920 viewsedited 21:18

Data1984

https://github.com/opendatadiscovery/awesome-data-catalogs

GitHub

GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.

📙 Awesome Data Catalogs and Observability Platforms. - GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.

❤1

843 views18:14

Data1984

It looks like Databricks is up to something. First it was Iceberg and now Hudi. It is not clear if they try to converge on one open table format or just want to make sure Delta Lake becomes the one. In my opinion, Iceberg has more potential to become the one.

TechCrunch

Databricks acquires Tabular to build a common data lakehouse standard

Databricks has acquired Tabular, a data management startup, in its quest to build a common standard for data lakehouses.

❤1

1.06K views17:40

Data1984

Bufstream: Kafka at 10x lower cost - Buf
https://buf.build/blog/bufstream-kafka-lower-cost

buf.build

Bufstream: Kafka at 8x lower cost

We're excited to announce the public beta of Bufstream, a drop-in replacement for Apache Kafka deployed entirely in your own VPC that's 8x less expensive to operate.

1.03K views22:08

Data1984