Engineer Readings – Telegram
[article][distributed]

We present a formal, machine checked TLA+ safety proof of MongoRaftReconfig, a distributed dynamic reconfiguration protocol. MongoRaftReconfig was designed for and imple- mented in MongoDB, a distributed database whose replica- tion protocol is derived from the Raft consensus algorithm. We present an inductive invariant for MongoRaftReconfig that is formalized in TLA+ and formally proved using the TLA+ proof system (TLAPS). We also present a formal TLAPS proof of two key safety properties of MongoRaftReconfig, Leader- Completeness and StateMachineSafety. To our knowledge, these are the first machine checked inductive invariant and safety proof of a dynamic reconfiguration protocol for a Raft based replication system.

https://will62794.github.io/assets/papers/cpp22-formal-verification-reconfig.pdf
[article]

Many organizations are shifting to a data management paradigm called the “Lakehouse,” which implements the functionality of struc- tured data warehouses on top of unstructured data lakes. This presents new challenges for query execution engines. The execu- tion engine needs to provide good performance on the raw un- curated datasets that are ubiquitous in data lakes, and excellent performance on structured data stored in popular columnar file formats like Apache Parquet. Toward these goals, we present Pho- ton, a vectorized query engine for Lakehouse environments that we developed at Databricks. Photon can outperform existing cloud data warehouses in SQL workloads, but implements a more general exe- cution framework that enables efficient processing of raw data and also enables Photon to support the Apache Spark API. We discuss the design choices we made in Photon (e.g., vectorization vs. code generation) and describe its integration with our existing SQL and Apache Spark runtimes, its task model, and its memory manager. Photon has accelerated some customer workloads by over 10× and has recently allowed Databricks to set a new audited performance record for the official 100TB TPC-DS benchmark.

https://www-cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf