This driver enables Dynamic Resource Allocation (DRA) of GPUs and ComputeDomains (for multi-node NVLink) in Kubernetes, letting workloads share and reconfigure GPUs dynamically rather than statically.
More: https://ku.bz/vVQHtF-jK
More: https://ku.bz/vVQHtF-jK
This tutorial teaches how to use the updatecli tool to automate updates for third-party Helm charts in a GitOps repository by automatically creating pull requests.
More: https://ku.bz/-CSZk8hb5
More: https://ku.bz/-CSZk8hb5
This is a Golang tool that inspects GPU resource allocation and usage within Kubernetes clusters, reducing API server overhead by up to ~75 % while giving clear usage stats per node and pod.
More: https://ku.bz/kTS27gwPz
More: https://ku.bz/kTS27gwPz
This tutorial teaches how to set up partitioned log storage on S3 using Fluent Bit in Kubernetes with date-based organization and covers configuration for log management at scale.
More: https://ku.bz/1GqSRFDdY
More: https://ku.bz/1GqSRFDdY
This Virtual Kubelet implementation makes Kubernetes appear to have a GPU-backed node via RunPod, letting you schedule GPU workloads seamlessly without owning GPU infrastructure.
More: https://ku.bz/DFjPtqVqS
More: https://ku.bz/DFjPtqVqS
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Alex Arnell, Principal Member Of Technical Staff at Heroku, explains how OpenTelemetry's "T-shape" methodology distinguishes monitoring from observability in practice.
He breaks down how 80% of observability should cover basic golden signals (standard RED metrics that trigger alerts), while the remaining 20% focuses on business-specific, custom telemetry that enables deeper investigation.
Alex walks through a real incident response workflow: starting with high-level monitoring alerts, then using distributed tracing to pinpoint exact failure points across infrastructure components.
Watch the full interview: https://ku.bz/Lsr8gltrH
This interview is a reaction to Miguel Luna's episode https://ku.bz/WwS04jYvv
He breaks down how 80% of observability should cover basic golden signals (standard RED metrics that trigger alerts), while the remaining 20% focuses on business-specific, custom telemetry that enables deeper investigation.
Alex walks through a real incident response workflow: starting with high-level monitoring alerts, then using distributed tracing to pinpoint exact failure points across infrastructure components.
Watch the full interview: https://ku.bz/Lsr8gltrH
This interview is a reaction to Miguel Luna's episode https://ku.bz/WwS04jYvv
Forwarded from Kube Careers
This week's 6 Kubernetes jobs that offer VISA sponsorships are:
Platform Engineer with ClickHouse
💰 $14.1M to $23M a year
👨💻 Remote from the United States of America
→ https://ku.bz/2X8VqBpCy
Software Engineer with Inflection AI
💰 $175K to $350K a year
👨💻 Remote from the United States of America
→ https://ku.bz/ZJBvwh94F
Software Engineer with Teleport
💰 $173.6K to $293K a year
👨💻 Remote from the United States of America
→ https://ku.bz/99fD7rRtt
Software Engineer with Muon Space
💰 $207K to $234K a year
🏠🏃🏻♂️🌎 Silicon Valley, CA, USA
→ https://ku.bz/Qj3YWmb_8
Infrastructure Architect with Anchorage Digital
💰 $145K to $205K a year
👨💻 Remote from the United States of America
→ https://ku.bz/WDnzs-grR
👉 Browse 1072 jobs on Kube Careers https://kube.careers
Platform Engineer with ClickHouse
💰 $14.1M to $23M a year
👨💻 Remote from the United States of America
→ https://ku.bz/2X8VqBpCy
Software Engineer with Inflection AI
💰 $175K to $350K a year
👨💻 Remote from the United States of America
→ https://ku.bz/ZJBvwh94F
Software Engineer with Teleport
💰 $173.6K to $293K a year
👨💻 Remote from the United States of America
→ https://ku.bz/99fD7rRtt
Software Engineer with Muon Space
💰 $207K to $234K a year
🏠🏃🏻♂️🌎 Silicon Valley, CA, USA
→ https://ku.bz/Qj3YWmb_8
Infrastructure Architect with Anchorage Digital
💰 $145K to $205K a year
👨💻 Remote from the United States of America
→ https://ku.bz/WDnzs-grR
👉 Browse 1072 jobs on Kube Careers https://kube.careers
Forwarded from LearnKube news
This week on Learn Kubernetes Weekly 164:
📊 Queue-Based Autoscaling Without Flapping: Rethinking App Scaling with Kubernetes, KEDA, and RabbitMQ
🔄 Announcing Changed Block Tracking API support
🐳 Why I Ditched Docker for Podman (And You Should Too)
🔐 That Time I Found a Service Account Token in my Log Files
☁️ Deploying a .NET Weather Forecast App to AKS Using GitHub Actions and Argo CD
Read it now: https://kube.today/issues/164
⭐️ This issue is brought to you by LearnKube — master Kubernetes with hands-on training designed for engineers who want to learn the smart way https://ku.bz/hypSbyc-V
📊 Queue-Based Autoscaling Without Flapping: Rethinking App Scaling with Kubernetes, KEDA, and RabbitMQ
🔄 Announcing Changed Block Tracking API support
🐳 Why I Ditched Docker for Podman (And You Should Too)
🔐 That Time I Found a Service Account Token in my Log Files
☁️ Deploying a .NET Weather Forecast App to AKS Using GitHub Actions and Argo CD
Read it now: https://kube.today/issues/164
⭐️ This issue is brought to you by LearnKube — master Kubernetes with hands-on training designed for engineers who want to learn the smart way https://ku.bz/hypSbyc-V
This tutorial teaches how to configure stretched Layer 2 networks between KubeVirt clusters using OVN-Kubernetes for VM connectivity across multiple Kubernetes environments.
More: https://ku.bz/_RJh812W-
More: https://ku.bz/_RJh812W-
Kaniko builds container images from Dockerfiles without needing a Docker daemon and supports secure CI/CD workflows via signed images and SBOMs.
More: https://ku.bz/bQy9bGCv9
More: https://ku.bz/bQy9bGCv9
Forwarded from KubeFM
Media is too big
VIEW IN TELEGRAM
Zain Malik, Software @ Exostellar, describes the massive multi-tenant Kubernetes infrastructure he previously managed at City Storage Systems.
Their environment ran approximately 30,000 pods across different clusters, with sizes ranging from 400 to 950 nodes (the upper limit imposed by AKS constraints). Zain explains how pod density varied dramatically based on workload requirements—some nodes hosted just 1-2 pods while others were densely packed, allowing them to run up to 10,000 pods on 500 nodes during optimal conditions.
Watch the full episode: https://ku.bz/5PLksqVlk
Their environment ran approximately 30,000 pods across different clusters, with sizes ranging from 400 to 950 nodes (the upper limit imposed by AKS constraints). Zain explains how pod density varied dramatically based on workload requirements—some nodes hosted just 1-2 pods while others were densely packed, allowing them to run up to 10,000 pods on 500 nodes during optimal conditions.
Watch the full episode: https://ku.bz/5PLksqVlk
This tutorial teaches how to build a SaaS Kubernetes platform using Kamaji for control plane management, Argo CD for GitOps, and Sveltos for multi-cluster automation.
More: https://ku.bz/2xcV89JQd
More: https://ku.bz/2xcV89JQd
kubectl-ip-check plugin checks whether pod IPs are reachable and correctly routed from cluster nodes, helping you quickly detect basic Kubernetes networking and CNI issues before deeper debugging.
More: https://ku.bz/b_JyHlVQz
More: https://ku.bz/b_JyHlVQz
This article explains how Freshworks built their own production-grade, high-performance Redis cluster on Kubernetes + Envoy, handling over 2.5 million IOPS and 1.5 TB data with 99.99% uptime.
More: https://ku.bz/8PGp083YQ
More: https://ku.bz/8PGp083YQ
Forwarded from Kube Careers
This week's 6 Kubernetes jobs that offer VISA sponsorships are:
Platform Engineer with ClickHouse
💰 $14.1M to $23M a year
👨💻 Remote from the United States of America
→ https://ku.bz/2X8VqBpCy
Software Engineer with Inflection AI
💰 $175K to $350K a year
👨💻 Remote from the United States of America
→ https://ku.bz/ZJBvwh94F
Software Engineer with Teleport
💰 $173.6K to $293K a year
👨💻 Remote from the United States of America
→ https://ku.bz/99fD7rRtt
Software Engineer with Muon Space
💰 $207K to $234K a year
🏠🏃🏻♂️🌎 Silicon Valley, CA, USA
→ https://ku.bz/Qj3YWmb_8
Infrastructure Architect with Anchorage Digital
💰 $145K to $205K a year
👨💻 Remote from the United States of America
→ https://ku.bz/WDnzs-grR
👉 Browse 1179 jobs on Kube Careers https://kube.careers
Platform Engineer with ClickHouse
💰 $14.1M to $23M a year
👨💻 Remote from the United States of America
→ https://ku.bz/2X8VqBpCy
Software Engineer with Inflection AI
💰 $175K to $350K a year
👨💻 Remote from the United States of America
→ https://ku.bz/ZJBvwh94F
Software Engineer with Teleport
💰 $173.6K to $293K a year
👨💻 Remote from the United States of America
→ https://ku.bz/99fD7rRtt
Software Engineer with Muon Space
💰 $207K to $234K a year
🏠🏃🏻♂️🌎 Silicon Valley, CA, USA
→ https://ku.bz/Qj3YWmb_8
Infrastructure Architect with Anchorage Digital
💰 $145K to $205K a year
👨💻 Remote from the United States of America
→ https://ku.bz/WDnzs-grR
👉 Browse 1179 jobs on Kube Careers https://kube.careers
Forwarded from LearnKube news
This week on Learn Kubernetes Weekly 165:
🔥 GPU-Based Containers as a Service
🚀 Bifrost's Journey from Nginx to Envoy Gateway for Intelligent Rate Limiting
🤖 Building Production-Ready Multi-Agent Systems on Kubernetes: Deploying 11 Specialized AI Agents
🔒 Kubernetes Security Fundamentals: Networking
🐛 Debugging the One-in-a-Million Failure: Migrating Pinterest's Search to Kubernetes
Read it now: https://kube.today/issues/165
⭐️ This issue is brought to you by LearnKube — master Kubernetes with hands-on training designed for engineers who want to learn the smart way https://ku.bz/hypSbyc-V
🔥 GPU-Based Containers as a Service
🚀 Bifrost's Journey from Nginx to Envoy Gateway for Intelligent Rate Limiting
🤖 Building Production-Ready Multi-Agent Systems on Kubernetes: Deploying 11 Specialized AI Agents
🔒 Kubernetes Security Fundamentals: Networking
🐛 Debugging the One-in-a-Million Failure: Migrating Pinterest's Search to Kubernetes
Read it now: https://kube.today/issues/165
⭐️ This issue is brought to you by LearnKube — master Kubernetes with hands-on training designed for engineers who want to learn the smart way https://ku.bz/hypSbyc-V
Kubernetes NMState provides declarative host networking configuration for Kubernetes nodes using NMState to manage interfaces, bonds, VLANs, and routes through custom resources.
More: https://ku.bz/d5hxkBdlh
More: https://ku.bz/d5hxkBdlh
Forwarded from LearnKube news
Kide is an observability platform that ingests and indexes logs and metrics in real time so you can search, analyze, and alert on cluster and application data without waiting.
More: https://ku.bz/zpsL842PB
More: https://ku.bz/zpsL842PB
Kubernetes Autoscaling Mixin plugin provides a set of reusable Grafana dashboards and Prometheus rules to help you observe and improve Kubernetes autoscaling performance in your clusters.
More: https://ku.bz/rxvH9xRhn
More: https://ku.bz/rxvH9xRhn
This tutorial teaches how to set up distributed tracing with Grafana Tempo on AKS using Azure Blob Storage and Private Link for secure, cost-effective observability with workload identity and automated Private Link Service provisioning.
More: https://ku.bz/80GykkQ3W
More: https://ku.bz/80GykkQ3W