NEW BOT Телеграм, страница

Comparing Open Source k8s Load Balancers

In this article we discuss three open source load-balancer controllers that can be used with any distribution of Kubernetes.

https://medium.com/thermokline/comparing-k8s-load-balancers-2f5c76ea8f31

4.54K views15:00

DevOps&SRE Library

From four to five 9s of uptime by migrating to Kubernetes

https://workos.com/blog/from-four-to-five-9s-of-uptime-by-migrating-to-kubernetes

4.05K views06:59

DevOps&SRE Library

Let’s Consign CAP to the Cabinet of Curiosities

CAP? Again? Still?

https://brooker.co.za/blog/2024/07/25/cap-again.html

3.89K views14:59

DevOps&SRE Library

tntk-infra

Put your DevOps skills to the test with our hands-on capstone project. Designed for anyone interested in gaining practical experience, this project challenges you to integrate AWS, Terraform, Kubernetes, GitHub Actions, ArgoCD, Datadog, and PagerDuty to build and manage a production-like environment. Showcase your ability to create a complete, real-world solution by building cloud infrastructure, implementing observability, developing CI/CD pipelines, and managing incidents.

https://github.com/tntk-io/tntk-infra

4.46K views06:01

DevOps&SRE Library

kardinal

Kardinal is a framework for creating extremely lightweight ephemeral development environments within a shared Kubernetes cluster. In Kardinal, an environment is called a "flow" because it represents a path that a request takes through the cluster. Versions of services that are under development are deployed on-demand, and then shared across all development work that depends on that version. Read more about Kardinal in our docs.

https://github.com/kurtosis-tech/kardinal

4.94K views07:00

DevOps&SRE Library

grpcmd

grpcmd is a simple, easy-to-use, and developer-friendly CLI tool for gRPC.

https://github.com/grpcmd/grpcmd

4.59K views15:00

DevOps&SRE Library

The Art of System Debugging — Decoding CPU Utilization

This blog post describes the case study of how we diagnosed, root caused and then mitigated a performance issue in one of our applications in Flipkart. As part of this journey, we describe the different tools (eBPF and traditional) that can debug performance issues.

https://blog.flipkart.tech/the-art-of-system-debugging-decoding-cpu-utilization-da75f09ef1ff

4.46K views07:02

DevOps&SRE Library

Observability at the Edge

How Chick-fil-A provides observability for 2,800+ K8s clusters

https://medium.com/chick-fil-atech/observability-at-the-edge-b2385065ab6e

5.19K views15:02

DevOps&SRE Library

The raise of Hosted Control Plane in Kubernetes

In the early days of Kubernetes adoption, single-cluster deployments were the norm, offering a straightforward approach to managing applications and services. As the adoption of Kubernetes expanded, the limitations of single-cluster models surfaced. The increasing demand for Kubernetes clusters requires a shift to multicluster deployments and an innovative Hosted Control Plane architecture.

https://clastix.io/post/the-raise-of-hosted-control-plane-in-kubernetes

5.61K views07:01

DevOps&SRE Library

Advantages of storing configuration in container registries rather than git

https://medium.com/@bgrant0607/advantages-of-storing-configuration-in-container-registries-rather-than-git-b4266dc0c79f

5.58K views15:01

DevOps&SRE Library

Comparing Multi-tenancy Options in Kubernetes

https://www.loft.sh/blog/comparing-multi-tenancy-options-in-kubernetes

5.37K views07:01

DevOps&SRE Library

Implementing Scalable GitOps With Argo CD and ApplicationSets: A Case Study

https://aviadhaham.me/posts/implementing-gitops-with-argo-cd-and-applicationsets

5.36K views15:02

DevOps&SRE Library

Managing many Helm Charts with Kluctl

Learn how easy it is to manage multiple Helm Charts from one deployment project using Kluctl.

https://kluctl.io/blog/2023/02/28/managing-many-helm-charts-with-kluctl

4.87K views07:01

DevOps&SRE Library

Kubernetes on Proxmox

In this article we’ll create a two-node cluster with one control-plane node and one worker node as a proof-of-concept. As an extra challenge we’ll also take a look at how to do PCIe passthrough for the worker node.

https://blog.stonegarden.dev/articles/2024/03/proxmox-k8s-with-cilium

4.76K views15:01

DevOps&SRE Library

book6

A collaborative IPv6 book.

The intention is a practical introduction to IPv6 for technical people, kept up to date by active practitioners.

https://github.com/becarpenter/book6

4.39K views07:00

DevOps&SRE Library

Piloting through the Fog: A Tale of Migrating to a New Kubernetes Platform

It’s a tale as old as UNIX_MIN_TIMESTAMP. Your team owns a service that you treat like a black box as long as it’s working. Sure, there’s a small maintenance task here and there that the most tenured member of the team almost exclusively picks up. How they fix it might as well be a wizard’s incantation with a sprinkle of fairy dust. But this time around they’re busy on another task, or worse, gone from the company altogether. Here’s my story of such a maintenance task. In this post I go through my journey of migrating one such service from Klaviyo’s legacy kubernetes platform, to our new spiffy, well-managed platform.

https://klaviyo.tech/piloting-through-the-fog-a-tale-of-migrating-to-a-new-kubernetes-platform-7fe5677310fa

4.06K views15:01

DevOps&SRE Library

How our data team handles incidents

Historically, data teams have not been closely involved in the incident management process (at least, not in the traditional “get woken up at 2AM by a SEV0” sense). But with a growing involvement of data (and therefore data teams) in core business processes, decision making, and user-facing products, data-related incidents are increasingly common, and more important than ever.

At incident.io, the Data team works across multiple areas of the business, enabling go-to-market and product teams alike to make data-driven decisions. Given our broad involvement, we’re no stranger to data incidents and are heavy users of our own product to monitor, triage, and respond to them. Here’s a quick run-through of how we’ve set this up.

https://incident.io/blog/how-our-data-team-handles-incidents

4K views07:02

DevOps&SRE Library

What is an SLA?

A Service Level Agreement (SLA) is a formal document that outlines the expectations, responsibilities, and performance metrics between a service provider and a customer.

https://uptimerobot.com/blog/what-is-an-sla

3.97K views15:01

DevOps&SRE Library

Optimizing global message transit latency: a journey through TCP configuration

https://ably.com/blog/optimizing-global-message-transit-latency-a-journey-through-tcp-configuration

4.17K views07:01

DevOps&SRE Library

kubetrim