GPU Virtualization in K8s: Challenges and State of the Art
https://www.arrikto.com/blog/gpu-virtualization-in-k8s-challenges-and-state-of-the-art
Kubernetes schedules GPU workloads by assigning a whole device to a single job exclusively. This one-to-one relationship leads to massive GPU underutilization, especially for interactive jobs, characterized by significant idle periods and infrequent bursts of heavy GPU usage. Current solutions enable GPU sharing by statically assigning a fixed slice of GPU memory to each co-located job. These solutions are not suitable for interactive scenarios since the number of co-located jobs is limited by the size of physical GPU memory. Consequently, users must know the GPU memory demand of their jobs before submitting them for execution, which is impractical.
https://www.arrikto.com/blog/gpu-virtualization-in-k8s-challenges-and-state-of-the-art
Kubernetes Events — News feed of your cluster
https://decisivedevops.com/kubernetes-events-news-feed-of-your-kubernetes-cluster-826e08892d7a
Understand Kubernetes Events and learn to use kubectl events to monitor and troubleshoot your cluster’s issues effectively.
https://decisivedevops.com/kubernetes-events-news-feed-of-your-kubernetes-cluster-826e08892d7a
Users, Groups, Roles and API Access in Kubernetes
https://blog.adityasamant.dev/users-groups-roles-and-api-access-in-kubernetes
The nuances of how users and groups are configured in Kubernetes and how the role-based access control (RBAC) mechanism applies for them.
https://blog.adityasamant.dev/users-groups-roles-and-api-access-in-kubernetes
Argo Events — Event Bus and Webhook
https://medium.chuklee.com/argo-events-event-bus-and-webhook-ac34e5714209
Argo Event is a Kubernetes based event automation engine. It is part of the Argo project. Argo Events can be used with or independent of other projects in Argo.
I will be writing a series of articles on Argo Events; in these articles I will be looking at how we can use Argo Event to automate process within and without a Kubernetes cluster.
For this first article in this series, we will examine Argo Events core concepts, installation and provisioning different event buses which Argo Event uses to forward events to their sink. Finally we will look at setting up a webhook event flow to verify our setup.
https://medium.chuklee.com/argo-events-event-bus-and-webhook-ac34e5714209
ConfigMap Conundrum: Subtleties of Dynamic Updates in Kubernetes Configurations
https://blog.adityasamant.dev/configmap-conundrum-subtleties-of-dynamic-updates-in-kubernetes-configurations
Know the differences between ConfigMaps mounted as Volumes and ConfigMaps defined as environment variables.
https://blog.adityasamant.dev/configmap-conundrum-subtleties-of-dynamic-updates-in-kubernetes-configurations
Useful git commands for SRE and DevOps engineers
https://reliabilityengineering.substack.com/p/useful-git-commands-for-sre-and-devops
https://reliabilityengineering.substack.com/p/useful-git-commands-for-sre-and-devops
A write-ahead log is not a universal part of durability
https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html
A database does not need a write-ahead log (WAL) to achieve durability. A database can write its long-term data structure durably to disk before returning to a client. Granted, this is a bad idea! And granted, a WAL is critical for durability by design in most databases. But I think it's helpful to understand WALs by understanding what you could do without them.
https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html
Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding
https://netflixtechblog.com/enhancing-netflix-reliability-with-service-level-prioritized-load-shedding-e735e6ce8f7d
Applying Quality of Service techniques at the application level
https://netflixtechblog.com/enhancing-netflix-reliability-with-service-level-prioritized-load-shedding-e735e6ce8f7d
Is Kubernetes rolling update truly zero downtime?
https://medium.com/@chawlajanit/is-kubernetes-rolling-update-truly-zero-downtime-a83103af65a5
https://medium.com/@chawlajanit/is-kubernetes-rolling-update-truly-zero-downtime-a83103af65a5
Percentile
https://blog.alexewerlof.com/p/percentile
What is it? Why is it used? And why is it important in the context of optimization and reliability engineering? Bonus: a browser app that lets you play with data.
https://blog.alexewerlof.com/p/percentile
Terraform at LumApps
Part 1: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-1-f37660b4ed95
Part 2: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-2-27494897def4
Part 3: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-3-daa3c869f0f4
We have :
- 15 terragrunt.hcl files for every service.
- Around 900 terragrunt.hcl files (15*60) in total.
Part 1: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-1-f37660b4ed95
Part 2: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-2-27494897def4
Part 3: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-3-daa3c869f0f4
Updatecli is a command-line tool used to define and apply update strategies. It reads a manifest then works into three stages:
1 Source, which describes where a piece of information, to be used, is retrieved from.
2 Target, which describes what to update using information defined by the source.
3 Condition, which defines a condition, required to be satisfied, to update the target.
Deciding how, when, where to update information is hard.
There are many tools that can apply continuous delivery or continuous deployment. We configure our infrastructure with ansible playbooks, puppet manifest, helm chart, and others. We rely on configuration files to specify the version we need to install. Unfortunately, too often those files are manually updated.
Because it’s hard to automatically detect what information to update, and when.
The logic that manipulates information from a configuration file is defined outside that configuration file. Information comes from different sources like maven, docker, files, git repository, and elsewhere. Before modifying information, we should validate our assumptions.
Updatecli allows combining blocks, aka plugins, to specify what information needs to be updated, when, and where. We can easily implement the workflow that suits our needs.
https://www.updatecli.io
Updatecli
Updatecli is a tool used to apply file update strategies. Designed to be used from everywhere, each application "run" detects if a value needs to be updated using a custom strategy then apply changes according to the strategy.
MongoDB-Powered Autoscaling: Harnessing KEDA to Scale Applications Dynamically Based on Database Events Triggered by MongoDB Query Results
https://medium.com/@mohammadsaquib.ee/mongodb-powered-autoscaling-harnessing-keda-to-scale-applications-dynamically-based-on-database-f38a68e71db6
#observability #keda #k8s #kubernetes
https://medium.com/@mohammadsaquib.ee/mongodb-powered-autoscaling-harnessing-keda-to-scale-applications-dynamically-based-on-database-f38a68e71db6
#observability #keda #k8s #kubernetes
Open source distributed Platform as a Service (PaaS). A self-hosted Vercel / Netlify / Cloudflare alternative
https://github.com/taubyte/tau
#paas #vercel #netlify #cloudflare
https://github.com/taubyte/tau
#paas #vercel #netlify #cloudflare
What's the Problem with OpenTelemetry?
https://www.hyperdx.io/blog/whats-the-problem-with-opentelemetry
#opentelemetry #monitoring #observability
https://www.hyperdx.io/blog/whats-the-problem-with-opentelemetry
#opentelemetry #monitoring #observability
Atuin replaces your existing shell history with a SQLite database, and records additional context for your commands. With this context, Atuin gives you faster and better search of your shell history.
Additionally, Atuin (optionally) syncs your shell history between all of your machines. Fully end-to-end encrypted, of course.
https://atuin.sh/
Since the 1.7 release, the OpenTofu community and core team have been hard at work on much-requested features, making .tf code easier to write, reducing unnecessary boilerplate, improving performance, and morehttps://opentofu.org/blog/opentofu-1-8-0/
opentofu.org
OpenTofu 1.8.0 is out with Early Evaluation, Provider Mocking, and a Coder-Friendly Future | OpenTofu
OpenTofu 1.8.0 is now available with early variable/locals evaluation, provider mocking for tests, and a future that makes every-day Tofu code a lot simpler.