[llm][usecase][text-to-sql]
https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff
https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff
Medium
How we built Text-to-SQL at Pinterest
Adam Obeng | Data Scientist, Data Platform Science; J.C. Zhong | Tech Lead, Analytics Platform; Charlie Gu | Sr. Manager, Engineering
👍1
[news][ai][hackaton]
Great projects out of the Mistral AI hackaton which took place in Paris.
https://x.com/alexreibman/status/1796349663710511114?s=46&t=eNN3Y-GKeBSlFyyj1ozvgg
Great projects out of the Mistral AI hackaton which took place in Paris.
https://x.com/alexreibman/status/1796349663710511114?s=46&t=eNN3Y-GKeBSlFyyj1ozvgg
👍3
[distributed systems][kafka]
Kora: A Cloud-Native Event Streaming Platform For Kafka
https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf
Kora: A Cloud-Native Event Streaming Platform For Kafka
https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf
[memory]
What Every Programmer Should Know About Memory
This paper explains the structure of memory subsys- tems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
What Every Programmer Should Know About Memory
This paper explains the structure of memory subsys- tems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
🔥2
[learning][distributed systems]
Colleague shared an amazing thing you can try to study distributed systems by building.
https://fly.io/dist-sys/1/
Colleague shared an amazing thing you can try to study distributed systems by building.
https://fly.io/dist-sys/1/
Fly
Challenge #1: Echo
Documentation and guides from the team at Fly.io.
🔥3
[video]
How computers work
https://www.youtube.com/watch?v=HaBMAD-Dr8M&list=PLnAxReCloSeTJc8ZGogzjtCtXl_eE6yzA&index=2
How computers work
https://www.youtube.com/watch?v=HaBMAD-Dr8M&list=PLnAxReCloSeTJc8ZGogzjtCtXl_eE6yzA&index=2
YouTube
Logic gates - From transistors to logic gates NAND, AND, NOR, OR, NOT, XOR how computers work PART 1
Logic Gates - This video describes how the main logic gates are built starting from transistors in C-MOS technology, mostly used in CPU and RAM Memory. We see the NAND, AND, OR, NOR, NOT, XOR gates. At the end we see how built a three inputs AND gate and…
👍1🔥1
[distributed systems][paper]
Event-Based Programming without Inversion of Control
https://lampwww.epfl.ch/~odersky/papers/jmlc06.pdf
Event-Based Programming without Inversion of Control
https://lampwww.epfl.ch/~odersky/papers/jmlc06.pdf
[asyncio][python]
https://www.roguelynn.com/words/asyncio-we-did-it-wrong/
https://www.roguelynn.com/words/asyncio-we-did-it-wrong/
roguelynn
asyncio: We Did It Wrong
"The concurrent Python programmer’s dream", the answer to everyone's asynchronous prayers. The `asyncio` module has various layers of abstraction allowing developers as much control as they need and are comfortable with. But it's easy to get lulled into a…
[paper][GC][state machine]
https://arxiv.org/html/2405.11182v1
In this paper, the authors quantify the overhead of running a state machine replication system for cloud systems written in a language with garbage collection (GC). To this end, they (1) design a canonical cloud system—a distributed, consensus-based, linearizable key-value store—from scratch, (2) implement it in C++, Java, Rust, and Go, and (3) evaluate the implementations under update-heavy and read-heavy workloads on AWS with different resource constraints, aiming to maximize throughput while maintaining low tail latency. The results show that GC incurs a non-trivial cost, even with ample memory. With limited memory, languages with manual memory management can achieve an order of magnitude higher throughput than those with GC on the same hardware. A key observation is that if a cloud system is expected to scale significantly, building it in a language with manual memory management, despite the higher development cost, may lead to substantial cloud cost savings in the long run.
https://arxiv.org/html/2405.11182v1
In this paper, the authors quantify the overhead of running a state machine replication system for cloud systems written in a language with garbage collection (GC). To this end, they (1) design a canonical cloud system—a distributed, consensus-based, linearizable key-value store—from scratch, (2) implement it in C++, Java, Rust, and Go, and (3) evaluate the implementations under update-heavy and read-heavy workloads on AWS with different resource constraints, aiming to maximize throughput while maintaining low tail latency. The results show that GC incurs a non-trivial cost, even with ample memory. With limited memory, languages with manual memory management can achieve an order of magnitude higher throughput than those with GC on the same hardware. A key observation is that if a cloud system is expected to scale significantly, building it in a language with manual memory management, despite the higher development cost, may lead to substantial cloud cost savings in the long run.
🔥2
[ethz][computer architecture][lectures]
https://safari.ethz.ch/architecture/fall2022/doku.php?id=schedule
https://safari.ethz.ch/architecture/fall2022/doku.php?id=schedule
🔥1
[paper][ClickHouse]
This paper presents an overview of ClickHouse, a popular open- source OLAP database designed for high-performance analytics over petabyte-scale data sets with high ingestion rates. Its storage layer combines a data format based on traditional log-structured merge (LSM) trees with novel techniques for continuous trans- formation (e.g. aggregation, archiving) of historical data in the background. Queries are written in a convenient SQL dialect and processed by a state-of-the-art vectorized query execution engine with optional code compilation. ClickHouse makes aggressive use of pruning techniques to avoid evaluating irrelevant data in queries. Other data management systems can be integrated at the table function, table engine, or database engine level. Real-world bench- marks demonstrate that ClickHouse is amongst the fastest analyti- cal databases on the market.
https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf
This paper presents an overview of ClickHouse, a popular open- source OLAP database designed for high-performance analytics over petabyte-scale data sets with high ingestion rates. Its storage layer combines a data format based on traditional log-structured merge (LSM) trees with novel techniques for continuous trans- formation (e.g. aggregation, archiving) of historical data in the background. Queries are written in a convenient SQL dialect and processed by a state-of-the-art vectorized query execution engine with optional code compilation. ClickHouse makes aggressive use of pruning techniques to avoid evaluating irrelevant data in queries. Other data management systems can be integrated at the table function, table engine, or database engine level. Real-world bench- marks demonstrate that ClickHouse is amongst the fastest analyti- cal databases on the market.
https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf
👍2