DevOps & SRE notes – Telegram
DevOps & SRE notes
12K subscribers
38 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
The write-up by Overcast explores whether Kubernetes pods are truly evicted because of CPU pressure or if other scheduler nuances are at play. By dissecting eviction events and kubelet metrics, the author equips operators with actionable tips to diagnose and prevent unexpected pod terminations.
https://overcast.blog/do-pods-really-get-evicted-due-to-cpu-pressure-2b27274a670c
👍4
This piece argues that GitHub’s network effects, developer experience, and strategic acquisitions ultimately cemented its dominance in the code-hosting world. It contrasts competing platforms and offers lessons for toolmakers seeking to build thriving ecosystems.
https://blog.gitbutler.com/why-github-actually-won/
1🔥1
vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.

https://github.com/loft-sh/vcluster?tab=readme-ov-file
👌3
In his overview, Martin Heinz unpacks a recent technical challenge and walks readers through the solution, complete with code snippets and performance benchmarks. The candid narrative emphasizes practical learning and encourages experimentation in everyday development workflows.
https://martinheinz.dev/blog/111
This story dives into structuring Argo CD repositories with ApplicationSets so that teams can manage dozens of Kubernetes environments from a single source of truth. Clear diagrams and YAML examples make the pattern easy to adopt for both greenfield and legacy clusters.
https://medium.com/containers-101/how-to-structure-your-argo-cd-repositories-using-application-sets-1150e75d05b3
3👍2
The publication on Random Tinkering details how to schedule lightweight CronJobs that scrape node-level metrics with Node Exporter and ship them to Prometheus. It balances operational guidance with security considerations, ensuring observability without overloading the control plane.
https://randomtinkering.hashnode.dev/how-to-collect-kubernetes-node-metrics-with-node-exporter-using-cronjobs
👍2
This entry introduces Talos, a minimal Linux distribution purpose-built for Kubernetes, and explains how its immutable design reduces drift and patching headaches. It walks through the installation flow, cluster bootstrap, and day-two operations from a practitioner’s viewpoint.
https://a-cup-of.coffee/blog/talos/
👍5
The insight compiled by Ashish B. serves as a living cheat sheet for common Google Cloud tasks, covering everything from IAM gotchas to cost-saving tricks with gcloud commands. It’s a handy reference for engineers who bounce between cloud providers and need quick recall of GCP specifics.
https://ashishb.net/programming/google-cloud/
👍2
In this article, “Making Your System Observable” outlines practical techniques for evolving from scattered logs to coherent observability across services. Readers will discover why a holistic signals-first mindset matters more than bolting on dashboards late in the game.
https://www.architecture-weekly.com/p/making-your-system-observability
3
This blogpost by Yandex SRE Dmitry Ziablov recounts a late-night incident that turned a harmless retry loop into a production outage. He dissects the cascade of failures and offers a framework for spotting bad retry patterns before they bite.
https://medium.com/yandex/good-retry-bad-retry-an-incident-story-648072d3cee6
💩2
The piece argues that traces beat metrics when you need to pinpoint latency spikes and hidden dependencies. It walks through three concrete debugging scenarios that show why span data can surface root causes in seconds.
https://jaywhy13.hashnode.dev/3-reasons-traces-better-than-metrics-for-debugging-your-application
1👍1
In Slack’s detailed write-up, engineers share how the Unified Grid architecture split a monolithic workspace into isolated “cells” to serve enterprises with hundreds of thousands of users. The narrative dives into sharding strategy, migration challenges, and the performance wins that followed.
https://slack.engineering/unified-grid-how-we-re-architected-slack-for-our-largest-customers/
1