DevOps & SRE notes – Telegram
DevOps & SRE notes
12K subscribers
38 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
Migrating to a new Kubernetes platform can be a complex journey filled with unforeseen challenges and adjustments. This article shares Klaviyo’s experience in navigating this transition, highlighting the technical hurdles, strategic decisions, and lessons learned along the way. By detailing their approach to a seamless migration, it offers valuable insights for teams planning similar Kubernetes moves and helps them anticipate potential obstacles.

https://klaviyo.tech/piloting-through-the-fog-a-tale-of-migrating-to-a-new-kubernetes-platform-7fe5677310fa
👍2
Autonomous cost optimization in Kubernetes is essential for managing cloud resources efficiently without compromising performance. This article from StormForge introduces autonomous cost optimization, explaining how machine learning and automation can be applied to Kubernetes clusters to reduce costs. Learn how to optimize resource usage, balance workloads, and save on cloud expenses while maintaining system performance.


https://stormforge.io/blog/intro-autonomous-cost-optimization-kubernetes
👍3
Kyverno and Gatekeeper are two popular tools for policy management in Kubernetes, but each has its unique strengths. In this article, Glen Yu explains why he prefers Kyverno over Gatekeeper for Kubernetes-native policy management. The article covers the ease of use, integration capabilities, and flexibility of Kyverno, and why it’s often a better fit for Kubernetes environments looking for simpler and more powerful policy enforcement.

https://medium.com/@glen.yu/why-i-prefer-kyverno-over-gatekeeper-for-native-kubernetes-policy-management-35a05bb94964
👍5
The video covers data agility, focusing on the challenges of building large-scale data systems. Martin Kleppmann discusses the complexity of integrating multiple components like databases, caches, search engines, and graph systems. He introduces event streams and systems like Kafka and Samza as solutions to improve scalability and reduce complexity by processing data in a unified, ordered log. Kleppmann emphasizes loose coupling of components, event-driven architectures, and stream processing to achieve a more scalable and maintainable system.

https://www.youtube.com/watch?v=b_H4FFE3wP0
👍32
Policy enforcement is critical in Kubernetes environments to ensure security and compliance. This article by Javier Canizalez explains how to use Gatekeeper to restrict the kubectl exec command, enhancing security by preventing unauthorized access to running containers. Learn about the steps to configure Gatekeeper for policy enforcement and how to restrict potentially dangerous operations within your Kubernetes clusters.

https://medium.com/@javier-canizalez/policy-enforcement-in-kubernetes-restricting-kubectl-exec-with-gatekeeper-7e99823465c9
👍4
Upgrading AWS EKS clusters can be complex, but using a blue-green deployment strategy can make the process more seamless and reduce downtime. This article from OneFootball Locker Room explains how to optimize EKS cluster upgrades using the blue-green tactic. Learn how this approach ensures smooth transitions between cluster versions, minimizes risk, and maintains high availability during the upgrade process.

https://medium.com/onefootball-locker-room/from-blue-to-green-optimizing-aws-eks-clusters-upgrade-with-blue-green-tactic-2ee7c4920755
👍3
Security training is a fundamental part of maintaining a secure and resilient organization. This article from PagerDuty outlines their approach to security training, detailing how they empower employees to recognize and mitigate security threats. Learn about the key components of their security training program, including best practices, ongoing education, and the importance of fostering a security-conscious culture across the company.
https://www.pagerduty.com/blog/security-training-at-pagerduty/
👍4
🎵 Do you listen to tech podcasts?
Final Results
56%
Yes
44%
No
👍2
DevOps & SRE notes pinned «🎵 Do you listen to tech podcasts?»
Running GPU-accelerated workloads, especially large language models (LLMs), on Amazon EKS can significantly enhance performance for AI and machine learning applications. This article from Prodigy Engineering explains how to configure and manage GPU-accelerated workloads on EKS. Learn about the necessary steps, best practices, and challenges involved in optimizing Kubernetes clusters to run GPU-intensive tasks efficiently.

https://medium.com/prodigy-engineering/running-gpu-accelerated-llm-workloads-on-eks-9928c07d30ea
👍2
Kubernetes can offer tremendous benefits, but it's not without its challenges. This article from Encore shares real-world "horror stories" from Kubernetes environments, highlighting common mistakes and pitfalls teams have faced. Through these cautionary tales, learn how to avoid misconfigurations, optimize cluster performance, and prevent operational disasters in your own Kubernetes deployments.

https://encore.dev/blog/horror-stories-k8s
👍5💩1