DevOps & SRE notes – Telegram
DevOps & SRE notes
12K subscribers
38 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
Debugging complex distributed systems often requires innovative solutions, especially when issues arise only in production environments. This article explores the concept of "production neighbors," a debugging technique that leverages similar workloads and configurations in neighboring systems to troubleshoot and resolve production issues effectively. Through real-world examples, readers can gain insights into how Uber's engineering team improves debugging efficiency and reduces downtime.

https://www.uber.com/en-IN/blog/debugging-with-production-neighbors/
👍6
Building a Kubernetes Admission Controller can be a powerful way to manage and enforce policies within a cluster, especially when dealing with specific infrastructure requirements. This article details the process of developing an Admission Controller using Kotlin to address an Azure Kubernetes Service (AKS) add-on issue related to User Defined Routes (UDR). It provides practical guidance on how to implement custom checks and configurations, helping developers better control Kubernetes behavior in AKS environments.

https://eggboy.medium.com/developing-kubernetes-admission-controller-with-kotlin-fixing-aks-add-on-issue-in-udr-23418ab21d56
Understanding the inner workings of container runtimes is essential for developers and DevOps professionals working with containerized applications. This article provides an in-depth look at how containers are spawned using runc, a low-level container runtime. By diving into the plumbing behind runc, readers can learn about the underlying processes and system calls involved in creating and managing containers, offering valuable insights into containerization fundamentals.

https://medium.com/@rishabhsvats/plumbing-of-spawning-container-with-runc-ed409ac02ae3
👍1
Maintaining true zero-downtime in Kubernetes rolling deployments is key to delivering a seamless experience for users and preserving active client connections. This article delves into effective strategies and techniques to manage Kubernetes deployments without interrupting ongoing sessions. By exploring solutions for connection stability and load distribution, it provides practical insights for achieving flawless, uninterrupted updates in live production setups.

https://kunmidevopstories.hashnode.dev/how-to-achieve-real-zero-downtime-in-kubernetes-rolling-deployments-avoiding-broken-client-connections
👍8
Optimizing Kubernetes resource usage without incurring additional costs can significantly improve efficiency within GitOps pipelines. This article explains how to achieve zero-cost resource tuning, leveraging GitOps practices to fine-tune Kubernetes workloads. By focusing on resource allocation and automation, it offers a practical approach to refining performance and scaling resources effectively within a GitOps-driven environment.

https://itnext.io/zero-cost-kubernetes-resource-tuning-in-your-gitops-pipelines-fba02f1dd9da
👍71
Amazon S3 is a powerful storage solution, but managing it effectively can reveal unexpected complexities. This article highlights lesser-known aspects of S3 that can impact performance, security, and cost management. From data consistency issues to access controls, it provides insights into nuances that users often encounter, equipping them to handle S3 with a more informed approach.

https://www.plerion.com/blog/things-you-wish-you-didnt-need-to-know-about-s3
👍7🔥1💯1
🚀 Join Our DevOps & SRE Community! 🚀

Connect with fellow professionals, discuss posts, share insights, and stay updated on the latest trends. Let’s learn and grow together! 💡🔧

🗣 t.me/devops_sre_notes_dis
👍6
DevOps & SRE notes pinned «🚀 Join Our DevOps & SRE Community! 🚀 Connect with fellow professionals, discuss posts, share insights, and stay updated on the latest trends. Let’s learn and grow together! 💡🔧 🗣 t.me/devops_sre_notes_dis»
Becoming an expert in any field requires more than just time; it involves strategic learning, dedication, and continuous improvement. This article outlines actionable steps for mastering any skill, from setting clear goals to adopting effective study techniques. Readers will gain insights into building expertise methodically, helping them develop a structured approach to achieve proficiency in their chosen area.

https://newsletter.techworld-with-milan.com/p/how-to-become-an-expert-in-anything
🔥3👍2
Handling incidents effectively is essential for a data team’s success, especially when managing complex data systems. This article explores how the data team at Incident.io approaches incident response, detailing their strategies for quick detection, communication, and resolution. By sharing insights into their structured processes, it offers readers a glimpse into efficient data incident management and the tools that make it possible.

https://incident.io/blog/how-our-data-team-handles-incidents
👍3
While Continuous Integration (CI) and Continuous Delivery (CD) are often paired together, separating them can lead to more effective and focused development workflows. This article discusses the benefits of decoupling CI and CD, arguing that distinct processes allow teams to optimize each for its unique goals and challenges. By examining the evolving needs of software development, it provides insights into how treating CI and CD independently can enhance efficiency and agility.

https://thenewstack.io/why-ci-and-cd-need-to-go-their-separate-ways/
👍21👌1
Looking for a hosting platform to practice with Linux, Kubernetes, etc.? Register using my referral link on DigitalOcean and get $200 in credit for 60 days. By registering through my referral link, you also support this Telegram channel.

👉 Register
👍7🔥7❤‍🔥3👏2
A Service Level Agreement (SLA) is a foundational element in service-based industries, defining the performance standards and reliability expectations between a service provider and its clients. This article breaks down the essentials of SLAs, explaining their purpose, key components, and how they help manage customer expectations. By understanding SLAs, both providers and clients can foster transparency and accountability in service delivery.

https://uptimerobot.com/blog/what-is-an-sla/
👍5
Turning incidents into learning opportunities is a powerful way to build resilient systems and teams. This article explores three strategies for leveraging incidents as a catalyst for improvement, focusing on reflection, analysis, and proactive change. By fostering a culture of learning, teams can transform setbacks into valuable insights, enhancing both technical capabilities and team collaboration.

https://thenewstack.io/3-strategies-to-turn-incidents-into-learning-opportunities/
👍4