DevOps&SRE Library – Telegram
DevOps&SRE Library
18.7K subscribers
451 photos
3 videos
2 files
5.07K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
Why we migrated from Terraspace to Terramate: A technical journey

This is not a paid partnership but only a summary of our experience after a year of usage with Terramate.


https://medium.com/alan/why-we-migrated-from-terraspace-to-terramate-a-technical-journey-91a6d667f6ec
From Keycloak to Cognito: Building a Self-Hosted Terraform Registry on AWS

https://cdn.infrahouse.com/blog/2025-10-26-building-terraform-aws-registry
spacelift-intent

Spacelift Intent is an MCP Server that lets you define cloud resources in natural language and have them provisioned by directly calling provider APIs - no OpenTofu or Terraform code required.


https://github.com/spacelift-io/spacelift-intent
lefthook

Fast and powerful Git hooks manager for any type of projects.


https://github.com/evilmartians/lefthook
ShipShipShip

A modern, self-hostable changelog and roadmap platform that helps you share product updates with your community and gather feedback through feature voting.


https://github.com/GauthierNelkinsky/ShipShipShip
hl

High-performance log viewer and processor that transforms logs in JSON and logfmt formats into a human-readable output. Built with efficiency in mind, it enables quick parsing and analysis of large log files with minimal overhead.


https://github.com/pamburus/hl
headlamp

Headlamp is an easy-to-use and extensible Kubernetes web UI.

Headlamp was created to blend the traditional feature set of other web UIs/dashboards (i.e., to list and view resources) with added functionality.


https://github.com/kubernetes-sigs/headlamp
seaweedfs-operator

This Kubernetes Operator is made to easily deploy SeaweedFS onto your Kubernetes cluster.

The operator manages the complete SeaweedFS infrastructure on Kubernetes, including Master servers, Volume servers, Filer services, and IAM (Identity and Access Management) services. This provides a scalable, resilient distributed file system with S3-compatible API and built-in authentication.


https://github.com/seaweedfs/seaweedfs-operator
Kubernetes CPU Limits: Scylla and Charybdis

Kubernetes limits — especially CPU limits — are often a source of confusion. Some argue you should always use them, while others insist you should never use them. In this post, I’ll explain why the reality is simply a tradeoff between resource utilization and performance predictability.


https://medium.com/@vladimir.prus/kubernetes-cpu-limits-scylla-and-charybdis-6a9aa3a8c6ca
Kubernetes v1.34: Finer-Grained Control Over Container Restarts

With the release of Kubernetes 1.34, a new alpha feature is introduced that gives you more granular control over container restarts within a Pod. This feature, named Container Restart Policy and Rules, allows you to specify a restart policy for each container individually, overriding the Pod's global restart policy. In addition, it also allows you to conditionally restart individual containers based on their exit codes. This feature is available behind the alpha feature gate ContainerRestartRules.

This has been a long-requested feature. Let's dive into how it works and how you can use it.


https://kubernetes.io/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/
Understanding the True Cost of a Kubernetes Workload

Trace individual microservice costs by combining Kubernetes metrics, APM, and CUR for granular spending insights


https://medium.com/life-at-telkomsel/understanding-the-true-cost-of-a-kubernetes-workload-3a81e2b9529b
Battle for Resources or the SSA Path to Kubernetes Diplomacy

https://hackernoon.com/battle-for-resources-or-the-ssa-path-to-kubernetes-diplomacy
Monitoring Kubernetes Cluster with Prometheus and Grafana using ArgoCD

https://jackjapar.com/monitoring-kubernetes-cluster-with-prometheus-and-grafana-using-argocd
Cluster API + Talos + Proxmox = ❤️

https://a-cup-of.coffee/blog/talos-capi-proxmox
webdav

A simple and standalone WebDAV server.


https://github.com/hacdias/webdav
Failure is inevitable: Learning from a large outage, and building for reliability in depth at Datadog

https://www.datadoghq.com/blog/engineering/rethinking-reliability
Why we're leaving serverless

Every millisecond matters when you're in the critical path of API authentication. After two years of fighting serverless limitations, we rebuilt our entire API stack and slashed the end-to-end latency.


https://www.unkey.com/blog/serverless-exit
Advancing Our Chef Infrastructure: Safety Without Disruption

Building a safer, more reliable path forward for Chef at Slack


https://slack.engineering/advancing-our-chef-infrastructure-safety-without-disruption