Engineer Readings – Telegram
[paper][GC][state machine]
https://arxiv.org/html/2405.11182v1

In this paper, the authors quantify the overhead of running a state machine replication system for cloud systems written in a language with garbage collection (GC). To this end, they (1) design a canonical cloud system—a distributed, consensus-based, linearizable key-value store—from scratch, (2) implement it in C++, Java, Rust, and Go, and (3) evaluate the implementations under update-heavy and read-heavy workloads on AWS with different resource constraints, aiming to maximize throughput while maintaining low tail latency. The results show that GC incurs a non-trivial cost, even with ample memory. With limited memory, languages with manual memory management can achieve an order of magnitude higher throughput than those with GC on the same hardware. A key observation is that if a cloud system is expected to scale significantly, building it in a language with manual memory management, despite the higher development cost, may lead to substantial cloud cost savings in the long run.
🔥2
🔥1
[paper][ClickHouse]

This paper presents an overview of ClickHouse, a popular open- source OLAP database designed for high-performance analytics over petabyte-scale data sets with high ingestion rates. Its storage layer combines a data format based on traditional log-structured merge (LSM) trees with novel techniques for continuous trans- formation (e.g. aggregation, archiving) of historical data in the background. Queries are written in a convenient SQL dialect and processed by a state-of-the-art vectorized query execution engine with optional code compilation. ClickHouse makes aggressive use of pruning techniques to avoid evaluating irrelevant data in queries. Other data management systems can be integrated at the table function, table engine, or database engine level. Real-world bench- marks demonstrate that ClickHouse is amongst the fastest analyti- cal databases on the market.

https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf
👍2
[cicd][uber]

https://www.uber.com/en-NL/blog/continuous-deployment/

“Uber’s business runs on a myriad of microservices. Ensuring that changes to all of these services are deployed safely and in a timely manner is critical. By utilizing continuous deployment to automate this process, we ensure that new features, library updates, and security patches are all delivered to production without unnecessary delays, improving the overall quality of code serving our business.
In this article, we share how we reimagined continuous deployment of microservices at Uber to improve our deployment automation and the user experience of managing microservices, while tackling some of the peculiar challenges of working with large monorepos with increasing commit volumes.
👍31
[ai][moshi]

Moshi is made of three main components: Helium, a 7B language model trained on 2.1T tokens, Mimi, a neural audio codec that models semantic and acoustic information, and a new multi-stream architecture that jointly models audio from the user and Moshi on separate channels.

https://kyutai.org/Moshi.pdf

https://github.com/kyutai-labs/moshi

https://huggingface.co/kmhf
[book][MIT]
Mathematics for Computer Science

https://courses.csail.mit.edu/6.042/spring18/mcs.pdf
👍4
👍1
[competitive programming][book]
As we are on the open market and looking for a job we aim to be good enough to pass the gates. But what if we are thinking about perfection?

https://cses.fi/book/book.pdf
👍2
[data structures][paper]

Cache-Oblivious Algorithms
and Data Structures

https://erikdemaine.org/papers/BRICS2002/paper.pdf
[genAI][clone your c-lvl]
The promise of human behavioral simulation—general-purpose computational agents that replicate human behavior across domains—could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals—applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental
replications.
https://arxiv.org/pdf/2411.10109