DevOps&SRE Library – Telegram
DevOps&SRE Library
18.4K subscribers
466 photos
4 videos
2 files
5K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Dear friend, you have built a Kubernetes

I am afraid to inform you that you have built a Kubernetes. I know you wanted to "choose boring tech" to just run some containers. You said that "Kubernetes is overkill" and "it's just way too complex for a simple task" and yet, six months later, you have pile of shell noscripts that do not work—breaking every time there's a slight shift in the winds of production.


https://www.macchaffee.com/blog/2024/you-have-built-a-kubernetes
Choosing the right Postgres indexes

Indexes can make a world of difference to performance in Postgres, but it’s not always obvious when you’ve written a query that could do with an index. Here we’ll cover:

- What indexes are
- Some use cases for when they’re helpful
- Rules of thumb for figuring out which sort of index to add
- How to identify when you’re missing an index


https://incident.io/blog/choosing-the-right-postgres-indexes
BemiDB

BemiDB is a Postgres read replica optimized for analytics. It consists of a single binary that seamlessly connects to a Postgres database, replicates the data in a compressed columnar format, and allows you to run complex queries using its Postgres-compatible analytical query engine.


https://github.com/BemiHQ/BemiDB
65,000 nodes and counting: Google Kubernetes Engine is ready for trillion-parameter AI models

As generative AI evolves, we're beginning to see the transformative potential it is having across industries and our lives. And as large language models (LLMs) increase in size — current models are reaching hundreds of billions of parameters, and the most advanced ones are approaching 2 trillion — the need for computational power will only intensify. In fact, training these large models on modern accelerators already requires clusters that exceed 10,000 nodes.

With support for 15,000-node clusters — the world’s largest — Google Kubernetes Engine (GKE) has the capacity to handle these demanding training workloads. Today, in anticipation of even larger models, we are introducing support for 65,000-node clusters.

With support for up to 65,000 nodes, we believe GKE offers more than 10X larger scale than the other two largest public cloud providers.


https://cloud.google.com/blog/products/containers-kubernetes/gke-65k-nodes-and-counting
netavark

Netavark is a rust based network stack for containers.


https://github.com/containers/netavark
mise

mise is a polyglot tool version manager. It replaces tools like asdf, nvm, pyenv, rbenv, etc.

mise allows you to switch sets of env vars in different project directories. It can replace direnv.

mise is a task runner that can replace make, or npm noscripts.


https://github.com/jdx/mise
Migrating billions of records: moving our active DNS database while it’s in use

According to a survey done by W3Techs, as of October 2024, Cloudflare is used as an authoritative DNS provider by 14.5% of all websites. As an authoritative DNS provider, we are responsible for managing and serving all the DNS records for our clients’ domains. This means we have an enormous responsibility to provide the best service possible, starting at the data plane. As such, we are constantly investing in our infrastructure to ensure the reliability and performance of our systems.


https://blog.cloudflare.com/migrating-billions-of-records-moving-our-active-dns-database-while-in-use
Against Incident Severities and in Favor of Incident Types

About a year ago, Honeycomb kicked off an internal experiment to structure how we do incident response. We looked at the usual severity-based approach (usually using a SEV scale), but decided to adopt an approach based on types, aiming to better play the role of quick definitions for multiple departments put together. This post is a short report on our experience doing it.


https://www.honeycomb.io/blog/against-incident-severities-favor-incident-types
How to Build Smaller Container Images: Docker Multi-Stage Builds

https://labs.iximiuz.com/tutorials/docker-multi-stage-builds
slackdump

Save or export your private and public Slack messages, threads, files, and users locally without admin privileges.


https://github.com/rusq/slackdump
automatisch

The open source Zapier alternative. Build workflow automation without spending time and money.


https://github.com/automatisch/automatisch
pglite-fusion

Embed an SQLite database in your PostgreSQL table. AKA multitenancy has been solved.


https://github.com/frectonz/pglite-fusion
There’s No Such Thing as a Free Lunch!

How Slack trains engineers in incident response by ordering lunch together.


https://slack.engineering/theres-no-such-thing-as-a-free-lunch
lla

lla is a high-performance, extensible alternative to the traditional ls command, written in Rust. It offers enhanced functionality, customizable output, and a plugin system for extended capabilities.


https://github.com/triyanox/lla
wesql

WeSQL is an innovative MySQL distribution that adopts a compute-storage separation architecture, with storage backed by S3 (and S3-compatible systems). It can run on any cloud, ensuring no vendor lock-in.

WeSQL has completely replaced MySQL’s traditional disk storage with S3. All MySQL data—binlogs, schemas, storage engine metadata, WAL, and data files—are entirely (not partially!) stored as objects in S3. The 11 nines of durability provided by S3 significantly enhances data reliability. Additionally, WeSQL can start from a clean, empty instance, connect to S3, load the data, and begin serving immediately with no additional setup required.

It is ideal for users who need an easy-to-manage, cost-effective, and developer-friendly MySQL database solution, especially for those needing support for both Serverless and BYOC (Bring Your Own Cloud).


https://github.com/wesql/wesql
10 Essential AWS Security Steps for Your AWS Account

After spending years helping teams set up their AWS infrastructure, I've noticed something interesting: many of us face the same security challenges when starting out. You know what I mean if you've ever wondered "Wait, is my S3 bucket actually secure?" or "Should I really be using the root account for this?" (Spoiler: probably not!)

The good news? I've put together this guide to help you build a rock-solid AWS security foundation from day one. We'll cover 10 essential security measures that I've seen make a real difference in protecting AWS environments. While absolute security is a journey rather than a destination, implementing these steps will put you way ahead of the game in defending against common attack vectors.

And I've also created a Terraform project that you can use as baseline for your securing your AWS account!

The best part? It's all under the AWS free tier! 😉

Essentially, I got tired or reading the same posts regarding people (or organizations) getting their account hacked, here's my solution for that!


https://cloudnature.net/blog/10-essential-aws-security-steps-for-your-aws-account
terrateam

Terrateam is an open-source GitOps CI/CD platform for automating infrastructure workflows. It integrates with GitHub to orchestrate Terraform, OpenTofu, CDKTF, and Terragrunt operations via pull requests. Use our hosted service or run on-premise.


https://github.com/terrateamio/terrateam
Using Sealed Secrets with Your Kubernetes Applications

This blog post walks you through working with Sealed Secrets by Bitnami.


https://devoriales.com/post/351/using-sealed-secrets-with-your-kubernetes-applications