DevOps&SRE Library – Telegram
DevOps&SRE Library
18.4K subscribers
466 photos
4 videos
2 files
5K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Percentile

What is it? Why is it used? And why is it important in the context of optimization and reliability engineering? Bonus: a browser app that lets you play with data.


https://blog.alexewerlof.com/p/percentile
Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding

Applying Quality of Service techniques at the application level


https://netflixtechblog.com/enhancing-netflix-reliability-with-service-level-prioritized-load-shedding-e735e6ce8f7d
A write-ahead log is not a universal part of durability

A database does not need a write-ahead log (WAL) to achieve durability. A database can write its long-term data structure durably to disk before returning to a client. Granted, this is a bad idea! And granted, a WAL is critical for durability by design in most databases. But I think it's helpful to understand WALs by understanding what you could do without them.


https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html
ConfigMap Conundrum: Subtleties of Dynamic Updates in Kubernetes Configurations

Know the differences between ConfigMaps mounted as Volumes and ConfigMaps defined as environment variables.


https://blog.adityasamant.dev/configmap-conundrum-subtleties-of-dynamic-updates-in-kubernetes-configurations
Argo Events — Event Bus and Webhook

Argo Event is a Kubernetes based event automation engine. It is part of the Argo project. Argo Events can be used with or independent of other projects in Argo.

I will be writing a series of articles on Argo Events; in these articles I will be looking at how we can use Argo Event to automate process within and without a Kubernetes cluster.

For this first article in this series, we will examine Argo Events core concepts, installation and provisioning different event buses which Argo Event uses to forward events to their sink. Finally we will look at setting up a webhook event flow to verify our setup.


https://medium.chuklee.com/argo-events-event-bus-and-webhook-ac34e5714209
Users, Groups, Roles and API Access in Kubernetes

The nuances of how users and groups are configured in Kubernetes and how the role-based access control (RBAC) mechanism applies for them.


https://blog.adityasamant.dev/users-groups-roles-and-api-access-in-kubernetes
Kubernetes Events — News feed of your cluster

Understand Kubernetes Events and learn to use kubectl events to monitor and troubleshoot your cluster’s issues effectively.


https://decisivedevops.com/kubernetes-events-news-feed-of-your-kubernetes-cluster-826e08892d7a
GPU Virtualization in K8s: Challenges and State of the Art

Kubernetes schedules GPU workloads by assigning a whole device to a single job exclusively. This one-to-one relationship leads to massive GPU underutilization, especially for interactive jobs, characterized by significant idle periods and infrequent bursts of heavy GPU usage. Current solutions enable GPU sharing by statically assigning a fixed slice of GPU memory to each co-located job. These solutions are not suitable for interactive scenarios since the number of co-located jobs is limited by the size of physical GPU memory. Consequently, users must know the GPU memory demand of their jobs before submitting them for execution, which is impractical.


https://www.arrikto.com/blog/gpu-virtualization-in-k8s-challenges-and-state-of-the-art
gitops-bridge

The GitOps Bridge is a community project that aims to showcase best practices and patterns for bridging the process of creating a Kubernetes cluster to subsequently managing everything through GitOps. It focuses on using ArgoCD or FluxCD, both of which are CNCF-graduated projects.


https://github.com/gitops-bridge-dev/gitops-bridge
Ephemeral Values in Terraform

Since long before I worked at HashiCorp I've been interested in the problems with how Terraform interacts with security-sensitive information like passwords and private keys.


https://log.martinatkins.me/2024/05/22/terraform-ephemeral-values
Solving large logs with ClickHouse

Embrace engineers share a few key learnings from supporting larger log sizes, including working around a current limitation in ClickHouse and testing several skip indices to optimize query performance and storage cost.


https://embrace.io/blog/solving-large-logs-with-clickhouse
Inside EKS Networking: Decoding the Service IP Journey

https://dev.to/chen/inside-eks-networking-decoding-the-service-ip-journey-4k1
omni

Omni manages Kubernetes on bare metal, virtual machines, or in a cloud. Built on Talos Linux by the folks at Sidero.

Boot from an Omni image. Click to allocate to a cluster. That’s it!

- Vanilla Kubernetes, on your machines, under your control.
- Elegant UI for management and operations
- Security taken care of—ties into your Enterprise ID provider
- Highly Available Kubernetes API endpoint built in
- Firewall friendly—manage edge nodes securely
- From single-node clusters to the largest scale
- Support for GPUs and most CSIs


https://github.com/siderolabs/omni
nginx-gateway-fabric

NGINX Gateway Fabric is an open-source project that provides an implementation of the Gateway API using NGINX as the data plane.


https://github.com/nginxinc/nginx-gateway-fabric