Feature Flags vs. Feature Management: A Technical Deep Dive for SREs
https://www.cloudbees.com/blog/feature-flag-vs-feature-management
https://www.cloudbees.com/blog/feature-flag-vs-feature-management
kubeseal-convert
https://github.com/EladLeev/kubeseal-convert
A tool for importing secrets from a pre-existing secrets management systems (e.g. Vault, Secrets Manager) into a SealedSecret.
https://github.com/EladLeev/kubeseal-convert
krr
https://github.com/robusta-dev/krr
Robusta KRR (Kubernetes Resource Recommender) is a CLI tool for optimizing resource allocation in Kubernetes clusters. It gathers pod usage data from Prometheus and recommends requests and limits for CPU and memory. This reduces costs and improves performance.
https://github.com/robusta-dev/krr
End-to-end Testing of Kubernetes Resources with the e2e-framework
https://medium.com/programming-kubernetes/end-to-end-testing-of-kubernetes-resources-with-the-e2e-framework-ac52e7e58db8
https://medium.com/programming-kubernetes/end-to-end-testing-of-kubernetes-resources-with-the-e2e-framework-ac52e7e58db8
Understand how graceful shutdown can achieve zero downtime during k8s rolling update
https://dev.to/yutaroyamanaka/understand-how-graceful-shutdown-can-achieve-zero-downtime-during-k8s-rolling-update-15eh
https://dev.to/yutaroyamanaka/understand-how-graceful-shutdown-can-achieve-zero-downtime-during-k8s-rolling-update-15eh
In modern cloud-native environments, Kafka consumers are increasingly deployed within Kubernetes. This setup offers benefits in scalability and deployment ease but also introduces the need for sophisticated scaling strategies that can adapt to the volatile nature of Kafka’s data streams.
https://kedify.io/resources/blog/keda-kafka-improve-performance-by-62-15-at-peak-loads/
Kedify
KEDA + Kafka: Improve performance by 62.15% at peak loads | Kedify
Cut cloud costs by 20%+, auto‑scale any workloads including HTTP, gRPC & ML workloads, and gain centralized multi‑cluster control and insights.
How Wise reduced AWS RDS maintenance downtimes from 10 minutes to 100 milliseconds is an interesting story for those who do DB operations.
From time to time, it's necessary to apply changes that require downtime. However, it's unacceptable to have long "maintenance windows" nowadays. So, one has to be creative.
#dba #mariadb
From time to time, it's necessary to apply changes that require downtime. However, it's unacceptable to have long "maintenance windows" nowadays. So, one has to be creative.
#dba #mariadb
Medium
How Wise reduced AWS RDS maintenance downtimes from 10 minutes to 100 milliseconds
A story of a fruitful collaboration between Site Reliability and Database Engineering teams
Kafka 101
https://highscalability.com/unnoscriptd-2
Originally developed in LinkedIn during 2011, Apache Kafka is one of the most popular open-source Apache projects out there. So far it has had a total of 24 notable releases and most intriguingly, its code base has grown at an average rate of 24% throughout each of those releases.
https://highscalability.com/unnoscriptd-2
Becoming a Senior Site Reliability Engineer: A Guide to Upskilling
https://reliabilityengineering.substack.com/p/becoming-a-senior-site-reliability
Learn how to upskill yourself to become senior site reliability engineer
https://reliabilityengineering.substack.com/p/becoming-a-senior-site-reliability
Tetragon is a flexible Kubernetes-aware security observability and runtime enforcement tool that applies policy and filtering directly with eBPF, allowing for reduced observation overhead, tracking of any process, and real-time enforcement of policies
https://tetragon.io/
Tetragon - eBPF-based Security Observability and Runtime Enforcement
Tetragon is a sub-project under Cillium and a proud CNCF project eBPF-based Security Observability and Runtime Enforcement Tetragon is a flexible Kubernetes-aware security observability and runtime enforcement tool that applies policy and filtering directly…
In this article, the Exness SOC (Security Operations Center) team shares approaches to monitoring and detecting threats in the K8s environment
https://scribe.rip/exness-blog/threat-detection-in-the-k8s-environment-d5fdcd88a094
GPU Virtualization in K8s: Challenges and State of the Art
https://www.arrikto.com/blog/gpu-virtualization-in-k8s-challenges-and-state-of-the-art
Kubernetes schedules GPU workloads by assigning a whole device to a single job exclusively. This one-to-one relationship leads to massive GPU underutilization, especially for interactive jobs, characterized by significant idle periods and infrequent bursts of heavy GPU usage. Current solutions enable GPU sharing by statically assigning a fixed slice of GPU memory to each co-located job. These solutions are not suitable for interactive scenarios since the number of co-located jobs is limited by the size of physical GPU memory. Consequently, users must know the GPU memory demand of their jobs before submitting them for execution, which is impractical.
https://www.arrikto.com/blog/gpu-virtualization-in-k8s-challenges-and-state-of-the-art
Kubernetes Events — News feed of your cluster
https://decisivedevops.com/kubernetes-events-news-feed-of-your-kubernetes-cluster-826e08892d7a
Understand Kubernetes Events and learn to use kubectl events to monitor and troubleshoot your cluster’s issues effectively.
https://decisivedevops.com/kubernetes-events-news-feed-of-your-kubernetes-cluster-826e08892d7a
Users, Groups, Roles and API Access in Kubernetes
https://blog.adityasamant.dev/users-groups-roles-and-api-access-in-kubernetes
The nuances of how users and groups are configured in Kubernetes and how the role-based access control (RBAC) mechanism applies for them.
https://blog.adityasamant.dev/users-groups-roles-and-api-access-in-kubernetes
Argo Events — Event Bus and Webhook
https://medium.chuklee.com/argo-events-event-bus-and-webhook-ac34e5714209
Argo Event is a Kubernetes based event automation engine. It is part of the Argo project. Argo Events can be used with or independent of other projects in Argo.
I will be writing a series of articles on Argo Events; in these articles I will be looking at how we can use Argo Event to automate process within and without a Kubernetes cluster.
For this first article in this series, we will examine Argo Events core concepts, installation and provisioning different event buses which Argo Event uses to forward events to their sink. Finally we will look at setting up a webhook event flow to verify our setup.
https://medium.chuklee.com/argo-events-event-bus-and-webhook-ac34e5714209
ConfigMap Conundrum: Subtleties of Dynamic Updates in Kubernetes Configurations
https://blog.adityasamant.dev/configmap-conundrum-subtleties-of-dynamic-updates-in-kubernetes-configurations
Know the differences between ConfigMaps mounted as Volumes and ConfigMaps defined as environment variables.
https://blog.adityasamant.dev/configmap-conundrum-subtleties-of-dynamic-updates-in-kubernetes-configurations
Useful git commands for SRE and DevOps engineers
https://reliabilityengineering.substack.com/p/useful-git-commands-for-sre-and-devops
https://reliabilityengineering.substack.com/p/useful-git-commands-for-sre-and-devops
A write-ahead log is not a universal part of durability
https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html
A database does not need a write-ahead log (WAL) to achieve durability. A database can write its long-term data structure durably to disk before returning to a client. Granted, this is a bad idea! And granted, a WAL is critical for durability by design in most databases. But I think it's helpful to understand WALs by understanding what you could do without them.
https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html