DevOps & SRE notes – Telegram
DevOps & SRE notes
12K subscribers
38 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
This analysis explores how eBPF (extended Berkeley Packet Filter) can be used to gain insights into real-time SSL/TLS encrypted traffic. The author, TJ. Podobnik, discusses how this technology allows for monitoring without compromising security.
https://medium.com/all-things-ebpf/what-insights-can-ebpf-provide-into-real-time-ssl-tls-encrypted-traffic-and-how-435c8ad33efc
👍5
This post by Brian Chambers reflects on the lessons learned from launching an edge compute platform at Chick-fil-A. It discusses the challenges and successes of developing and scaling the platform from within the Enterprise Architecture team.
https://medium.com/chick-fil-atech/what-we-learned-from-launching-edge-compute-from-enterprise-architecture-1dc34e49482f
👍1
This article discusses the importance of the "what went well" section in incident write-ups, arguing that it's more than just a morale booster. Lorin Hochstein suggests that detailing successful improvisations and diagnostic work can be a powerful learning tool for future incident responders.
https://surfingcomplexity.blog/2025/06/14/what-went-well-is-more-than-just-a-pat-on-the-back/
👍3
Forwarded from DevOps & SRE notes (tutunak)
Looking for a hosting platform to practice with Linux, Kubernetes, etc.? Register using my referral link on DigitalOcean and get $200 in credit for 60 days. By registering through my referral link, you also support this Telegram channel.

👉 Register
🔥43👍3👏1
This piece, "The MTTI Manifesto," argues for the importance of a new metric in incident response: Mean Time to Isolate. The author contends that the majority of outage time is spent identifying the problem's source, not fixing it, and that focusing on MTTI can drive significant improvements in system architecture and observability.
https://www.oldschoolburke.com/the-mtti-manifesto/
👍5
This write-up explores the emerging discipline of AI Reliability Engineering (AIRe) as the "Third Age of SRE." It argues that the unique challenges of AI workloads, such as their probabilistic nature and new failure modes like model decay, require an evolution of traditional Site Reliability Engineering principles.
https://thenewstack.io/ai-reliability-engineering-welcome-to-the-third-age-of-sre/
This dispatch offers a detailed walkthrough for backend engineers on creating a Kubernetes Operator using Go and Kubebuilder. The author, Amr Elhewy, simplifies complex DevOps concepts by building a practical "PodTracker" operator that sends Slack notifications for new pod creations.
https://hewi.blog/a-backend-engineer-lost-in-the-devops-world-making-a-kubernetes-operator-with-go
🔥3
Forwarded from AWS Notes (Roman Siewko)
🔥 FREE premium exam prep on AWS Skill Builder until Jan 5, 2026!

https://skillbuilder.aws/

🎓 𝗖𝗼𝘃𝗲𝗿𝘀:
🔸AWS Certified Cloud Practitioner (CLF-C02)
🔸AWS AI Practitioner

💡 𝗪𝗵𝗮𝘁 𝘆𝗼𝘂 𝗴𝗲𝘁 (𝗻𝗼𝗿𝗺𝗮𝗹𝗹𝘆 𝗽𝗮𝗶𝗱):
Official practice exams
Hands-on labs (SimuLearn)
AWS Escape Room (learning by playing)
Flashcards & learning plans

Plus, there are always-free resources:
• Official practice questions
• Free AWS training events
• AWS Educate (labs + potential free exam vouchers)

#AWS_certification
🔥3
This post compares Amazon EKS Auto Mode and Azure AKS Automatic, evaluating which platform offers a superior managed Kubernetes solution. While acknowledging AWS's progress, the author ultimately argues that AKS Automatic's more comprehensive, end-to-end automation makes it the clear winner for a truly hands-off experience.
https://pixelrobots.co.uk/2024/12/amazon-eks-auto-mode-vs-azure-aks-automatic-the-better-managed-kubernetes-solution/
This paper delves into disaster recovery architectures that go beyond simple high availability to ensure systems remain operational even when HA fails. Yakaiah Bommishetti outlines various DR strategies, from cold backups to active-active multi-site setups, emphasizing the critical difference between preventing failures and restoring services after a catastrophe.
https://hackernoon.com/beyond-high-availability-disaster-recovery-architectures-that-keep-running-when-ha-fails
❤‍🔥32