Advanced Network Observability – Supercharging Container Network Observability in Azure Kubernetes Service (AKS)
https://pixelrobots.co.uk/2024/06/advanced-network-observability-supercharging-container-network-observability-in-azure-kubernetes-service-aks
https://pixelrobots.co.uk/2024/06/advanced-network-observability-supercharging-container-network-observability-in-azure-kubernetes-service-aks
Scaling Kubernetes Pods Based on HTTP Traffic using KEDA HTTP Add-on
https://blog.raulnq.com/scaling-kubernetes-pods-based-on-http-traffic-using-keda-http-add-on
https://blog.raulnq.com/scaling-kubernetes-pods-based-on-http-traffic-using-keda-http-add-on
system-upgrade-controller
https://github.com/rancher/system-upgrade-controller
This project aims to provide a general-purpose, Kubernetes-native upgrade controller (for nodes). It introduces a new CRD, the Plan, for defining any and all of your upgrade policies/requirements. A Plan is an outstanding intent to mutate nodes in your cluster. For up-to-date details on defining a plan please review v1/types.go.
https://github.com/rancher/system-upgrade-controller
kraan
https://github.com/fidelity/kraan
kraan helps you deploy and manage 'layers' on top of kubernetes. By applying layers on top of K8s clusters, you can build focused platforms on top of K8s e.g ML platforms, Data platform etc. Each layer is a collection of addons and can have dependencies established between the layers. i.e a "mgmt-layer" can depend on a "common-layer". Kraan will always ensure that the addons in the "common-layer" are deployed successfully before deploying the "mgmt-layer" addons. A layer is represented as a kubernetes custom resource and kraan is an operator that is deployed into the cluster and works constantly to reconcile the state of the layer custom resource.
kraan is powered by flux2 and builds on top of projects like source-controller and helm-controller.
https://github.com/fidelity/kraan
intel-device-plugins-for-kubernetes
https://github.com/intel/intel-device-plugins-for-kubernetes
Collection of Intel device plugins for Kubernetes
https://github.com/intel/intel-device-plugins-for-kubernetes
sops-secrets-operator
https://github.com/isindir/sops-secrets-operator
Operator which manages Kubernetes Secret Resources created from user defined SopsSecrets CRs, inspired by Bitnami SealedSecrets and sops.
https://github.com/isindir/sops-secrets-operator
cubefs
https://github.com/cubefs/cubefs
As an open-source distributed storage, CubeFS can serve as your datacenter filesystem, data lake storage infra, and private or hybrid cloud storage. In particular, CubeFS enables the separation of storage/compute architecture for databases and AI/ML applications.
https://github.com/cubefs/cubefs
mani-diffy
https://github.com/chime/mani-diffy
This program walks a hierarchy of Argo CD Application templates, renders Kubernetes manifests from the input templates, and posts the rendered files back for the user to review and validate.
It is designed to be called from a CI job within a pull request, enabling the author to update templates and see the resulting manifests directly within the pull request before the changes are applied to the Kubernetes cluster.
The rendered manifests are kept within the repository, making diffs between revisions easy to parse, dramatically improving safety when updating complex application templates.
https://github.com/chime/mani-diffy
bashly
https://github.com/DannyBen/bashly
Bashly is a command line application (written in Ruby) that lets you generate feature-rich bash command line tools.
Bashly lets you focus on your specific code, without worrying about command line argument parsing, usage texts, error messages and other functions that are usually handled by a framework in any other programming language.
https://github.com/DannyBen/bashly
Dear friend, you have built a Kubernetes
https://www.macchaffee.com/blog/2024/you-have-built-a-kubernetes
I am afraid to inform you that you have built a Kubernetes. I know you wanted to "choose boring tech" to just run some containers. You said that "Kubernetes is overkill" and "it's just way too complex for a simple task" and yet, six months later, you have pile of shell noscripts that do not work—breaking every time there's a slight shift in the winds of production.
https://www.macchaffee.com/blog/2024/you-have-built-a-kubernetes
Choosing the right Postgres indexes
https://incident.io/blog/choosing-the-right-postgres-indexes
Indexes can make a world of difference to performance in Postgres, but it’s not always obvious when you’ve written a query that could do with an index. Here we’ll cover:
- What indexes are
- Some use cases for when they’re helpful
- Rules of thumb for figuring out which sort of index to add
- How to identify when you’re missing an index
https://incident.io/blog/choosing-the-right-postgres-indexes
BemiDB
https://github.com/BemiHQ/BemiDB
BemiDB is a Postgres read replica optimized for analytics. It consists of a single binary that seamlessly connects to a Postgres database, replicates the data in a compressed columnar format, and allows you to run complex queries using its Postgres-compatible analytical query engine.
https://github.com/BemiHQ/BemiDB
65,000 nodes and counting: Google Kubernetes Engine is ready for trillion-parameter AI models
https://cloud.google.com/blog/products/containers-kubernetes/gke-65k-nodes-and-counting
As generative AI evolves, we're beginning to see the transformative potential it is having across industries and our lives. And as large language models (LLMs) increase in size — current models are reaching hundreds of billions of parameters, and the most advanced ones are approaching 2 trillion — the need for computational power will only intensify. In fact, training these large models on modern accelerators already requires clusters that exceed 10,000 nodes.
With support for 15,000-node clusters — the world’s largest — Google Kubernetes Engine (GKE) has the capacity to handle these demanding training workloads. Today, in anticipation of even larger models, we are introducing support for 65,000-node clusters.
With support for up to 65,000 nodes, we believe GKE offers more than 10X larger scale than the other two largest public cloud providers.
https://cloud.google.com/blog/products/containers-kubernetes/gke-65k-nodes-and-counting
netavark
https://github.com/containers/netavark
Netavark is a rust based network stack for containers.
https://github.com/containers/netavark
mise
https://github.com/jdx/mise
mise is a polyglot tool version manager. It replaces tools like asdf, nvm, pyenv, rbenv, etc.
mise allows you to switch sets of env vars in different project directories. It can replace direnv.
mise is a task runner that can replace make, or npm noscripts.
https://github.com/jdx/mise
Migrating billions of records: moving our active DNS database while it’s in use
https://blog.cloudflare.com/migrating-billions-of-records-moving-our-active-dns-database-while-in-use
According to a survey done by W3Techs, as of October 2024, Cloudflare is used as an authoritative DNS provider by 14.5% of all websites. As an authoritative DNS provider, we are responsible for managing and serving all the DNS records for our clients’ domains. This means we have an enormous responsibility to provide the best service possible, starting at the data plane. As such, we are constantly investing in our infrastructure to ensure the reliability and performance of our systems.
https://blog.cloudflare.com/migrating-billions-of-records-moving-our-active-dns-database-while-in-use
Against Incident Severities and in Favor of Incident Types
https://www.honeycomb.io/blog/against-incident-severities-favor-incident-types
About a year ago, Honeycomb kicked off an internal experiment to structure how we do incident response. We looked at the usual severity-based approach (usually using a SEV scale), but decided to adopt an approach based on types, aiming to better play the role of quick definitions for multiple departments put together. This post is a short report on our experience doing it.
https://www.honeycomb.io/blog/against-incident-severities-favor-incident-types
Way too many ways to wait on a child process with a timeout
https://gaultier.github.io/blog/way_too_many_ways_to_wait_for_a_child_process_with_a_timeout.html
https://gaultier.github.io/blog/way_too_many_ways_to_wait_for_a_child_process_with_a_timeout.html
How to Build Smaller Container Images: Docker Multi-Stage Builds
https://labs.iximiuz.com/tutorials/docker-multi-stage-builds
https://labs.iximiuz.com/tutorials/docker-multi-stage-builds
slackdump
https://github.com/rusq/slackdump
Save or export your private and public Slack messages, threads, files, and users locally without admin privileges.
https://github.com/rusq/slackdump
automatisch
https://github.com/automatisch/automatisch
The open source Zapier alternative. Build workflow automation without spending time and money.
https://github.com/automatisch/automatisch