NEW BOT Телеграм, страница

GPU Virtualization in K8s: Challenges and State of the Art

Kubernetes schedules GPU workloads by assigning a whole device to a single job exclusively. This one-to-one relationship leads to massive GPU underutilization, especially for interactive jobs, characterized by significant idle periods and infrequent bursts of heavy GPU usage. Current solutions enable GPU sharing by statically assigning a fixed slice of GPU memory to each co-located job. These solutions are not suitable for interactive scenarios since the number of co-located jobs is limited by the size of physical GPU memory. Consequently, users must know the GPU memory demand of their jobs before submitting them for execution, which is impractical.

https://www.arrikto.com/blog/gpu-virtualization-in-k8s-challenges-and-state-of-the-art

69 views18:58

DevOps drawer

Kubernetes Events — News feed of your cluster

Understand Kubernetes Events and learn to use kubectl events to monitor and troubleshoot your cluster’s issues effectively.

https://decisivedevops.com/kubernetes-events-news-feed-of-your-kubernetes-cluster-826e08892d7a

68 views19:02

DevOps drawer

Users, Groups, Roles and API Access in Kubernetes

The nuances of how users and groups are configured in Kubernetes and how the role-based access control (RBAC) mechanism applies for them.

https://blog.adityasamant.dev/users-groups-roles-and-api-access-in-kubernetes

65 views19:02

DevOps drawer

Argo Events — Event Bus and Webhook

Argo Event is a Kubernetes based event automation engine. It is part of the Argo project. Argo Events can be used with or independent of other projects in Argo.

I will be writing a series of articles on Argo Events; in these articles I will be looking at how we can use Argo Event to automate process within and without a Kubernetes cluster.

For this first article in this series, we will examine Argo Events core concepts, installation and provisioning different event buses which Argo Event uses to forward events to their sink. Finally we will look at setting up a webhook event flow to verify our setup.

https://medium.chuklee.com/argo-events-event-bus-and-webhook-ac34e5714209

60 views19:02

DevOps drawer

ConfigMap Conundrum: Subtleties of Dynamic Updates in Kubernetes Configurations

Know the differences between ConfigMaps mounted as Volumes and ConfigMaps defined as environment variables.

https://blog.adityasamant.dev/configmap-conundrum-subtleties-of-dynamic-updates-in-kubernetes-configurations

65 views19:02

DevOps drawer

Practical Guide to Kubernetes API

https://blog.kubesimplify.com/practical-guide-to-kubernetes-api

78 views19:03

DevOps drawer

Useful git commands for SRE and DevOps engineers

https://reliabilityengineering.substack.com/p/useful-git-commands-for-sre-and-devops

85 views19:03

DevOps drawer

A write-ahead log is not a universal part of durability

A database does not need a write-ahead log (WAL) to achieve durability. A database can write its long-term data structure durably to disk before returning to a client. Granted, this is a bad idea! And granted, a WAL is critical for durability by design in most databases. But I think it's helpful to understand WALs by understanding what you could do without them.

https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html

95 views19:03

DevOps drawer

Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding

Applying Quality of Service techniques at the application level

https://netflixtechblog.com/enhancing-netflix-reliability-with-service-level-prioritized-load-shedding-e735e6ce8f7d

85 views19:03

DevOps drawer

Is Kubernetes rolling update truly zero downtime?

https://medium.com/@chawlajanit/is-kubernetes-rolling-update-truly-zero-downtime-a83103af65a5

83 views19:03

DevOps drawer

Percentile

What is it? Why is it used? And why is it important in the context of optimization and reliability engineering? Bonus: a browser app that lets you play with data.

https://blog.alexewerlof.com/p/percentile

84 views19:04

DevOps drawer

Terraform at LumApps

We have :

- 15 terragrunt.hcl files for every service.
- Around 900 terragrunt.hcl files (15*60) in total.

Part 1: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-1-f37660b4ed95

Part 2: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-2-27494897def4

Part 3: https://medium.com/lumapps-engineering/terraform-at-lumapps-part-3-daa3c869f0f4

92 views19:04

DevOps drawer

Kubernetes instance calculator

https://learnk8s.io/kubernetes-instance-calculator

107 views19:04

DevOps drawer

Updatecli is a command-line tool used to define and apply update strategies. It reads a manifest then works into three stages:
1 Source, which describes where a piece of information, to be used, is retrieved from.
2 Target, which describes what to update using information defined by the source.
3 Condition, which defines a condition, required to be satisfied, to update the target.

Deciding how, when, where to update information is hard.

There are many tools that can apply continuous delivery or continuous deployment. We configure our infrastructure with ansible playbooks, puppet manifest, helm chart, and others. We rely on configuration files to specify the version we need to install. Unfortunately, too often those files are manually updated.

Because it’s hard to automatically detect what information to update, and when.

The logic that manipulates information from a configuration file is defined outside that configuration file. Information comes from different sources like maven, docker, files, git repository, and elsewhere. Before modifying information, we should validate our assumptions.

Updatecli allows combining blocks, aka plugins, to specify what information needs to be updated, when, and where. We can easily implement the workflow that suits our needs.

https://www.updatecli.io

Updatecli

Updatecli is a tool used to apply file update strategies. Designed to be used from everywhere, each application "run" detects if a value needs to be updated using a custom strategy then apply changes according to the strategy.

127 viewsedited 08:40

DevOps drawer

MongoDB-Powered Autoscaling: Harnessing KEDA to Scale Applications Dynamically Based on Database Events Triggered by MongoDB Query Results

https://medium.com/@mohammadsaquib.ee/mongodb-powered-autoscaling-harnessing-keda-to-scale-applications-dynamically-based-on-database-f38a68e71db6

#observability #keda #k8s #kubernetes

112 views21:01

DevOps drawer

Open source distributed Platform as a Service (PaaS). A self-hosted Vercel / Netlify / Cloudflare alternative

https://github.com/taubyte/tau

#paas #vercel #netlify #cloudflare

102 views21:04

DevOps drawer

What's the Problem with OpenTelemetry?

https://www.hyperdx.io/blog/whats-the-problem-with-opentelemetry

#opentelemetry #monitoring #observability

102 views21:04

DevOps drawer

Reliability Engineering Mindset

https://blog.alexewerlof.com/p/rem?r=10ywg9

#sre #devops #sysadmin

104 views21:08

DevOps drawer

Atuin replaces your existing shell history with a SQLite database, and records additional context for your commands. With this context, Atuin gives you faster and better search of your shell history.

Additionally, Atuin (optionally) syncs your shell history between all of your machines. Fully end-to-end encrypted, of course.

https://atuin.sh/

111 views11:31

DevOps drawer

Since the 1.7 release , the OpenTofu community and core team have been hard at work on much-requested features, making .tf code easier to write, reducing unnecessary boilerplate, improving performance, and more

https://opentofu.org/blog/opentofu-1-8-0/

opentofu.org

OpenTofu 1.8.0 is out with Early Evaluation, Provider Mocking, and a Coder-Friendly Future | OpenTofu

OpenTofu 1.8.0 is now available with early variable/locals evaluation, provider mocking for tests, and a future that makes every-day Tofu code a lot simpler.

170 viewsedited 18:38

DevOps drawer

RFC1178 Choosing a Name for Your Computer

https://www.rfc-editor.org/rfc/rfc1178.html

138 views18:40

About

Blog

Apps

Platform