Released a tool I built and personally use a lot - Is it THAT risky??
Hi, I just released a tool I built in Go, which is an AI agent that can run system commands using the latest GPT-5.2. It helps me with automations and fast actions.
Honestly, it works great, and I use it a lot. Got initial feedback that it's unwise and that it shouldn't be used IN ANY CASE.
Is it that bad?
It's super convenient, I want to start using that in remote environments
https://github.com/matank001/OsDevil
https://redd.it/1pljy5c
@r_devops
Hi, I just released a tool I built in Go, which is an AI agent that can run system commands using the latest GPT-5.2. It helps me with automations and fast actions.
Honestly, it works great, and I use it a lot. Got initial feedback that it's unwise and that it shouldn't be used IN ANY CASE.
Is it that bad?
It's super convenient, I want to start using that in remote environments
https://github.com/matank001/OsDevil
https://redd.it/1pljy5c
@r_devops
GitHub
GitHub - matank001/OsDevil: OsDevil is a lightweight AI agent that converts natural language into operating system commands, enabling…
OsDevil is a lightweight AI agent that converts natural language into operating system commands, enabling automation across development, operations, and DevSecOps workflows - matank001/OsDevil
DevOps Engineer trying to stay afloat after a layoff and a few bad decisions.
Hi everyone,
I’m posting here because I need to say this somewhere, and I don’t feel comfortable dumping it all on the people in my life.
I’m a DevOps / infrastructure engineer in Canada with several years of experience. I’ve worked across cloud, CI/CD, containers, automation, and I hold multiple certifications (AWS, Docker, Terraform, Kubernetes-related). On paper, I should be “fine.” That’s part of what makes this harder.
Earlier this year I was laid off, and it really broke something in me. Since then, my confidence hasn’t fully come back. I second-guess myself constantly, panic in interviews, and replay mistakes in my head over and over. I’ve fumbled questions I know I know. My brain just locks up under pressure.
Recently, in a state of anxiety, I left a job too quickly — a decision I regret. I’m about to start at a new org that, based on people already working there, is extremely micromanaging and heavy on interference. Even before day one, it’s triggering a lot of dread. I already feel like I’m bracing myself just to survive instead of grow.
I’m still have savings and insurance, so I’m not financially desperate, but mentally I feel exhausted all the time. There’s a constant low-grade tension in my body, like my nervous system is always switched on. I overthink every decision, beat myself up for past ones, and feel like I’m slowly shrinking as a person.
Sometimes my thoughts drift into very bleak, philosophical territory about life, purpose, and suffering but not because I want to harm myself (I don’t), but because I feel worn down by the constant effort of “keeping it together.” I want to be clear: I am safe. This is burnout, anxiety, and mental fatigue, not a crisis.
I’m trying to cope by:
Focusing on small wins (certs, small goals, structure)
Taking things one day at a time
Continuing to apply for other roles quietly
Reminding myself that jobs can be temporary, even if they’re bad
I guess I’m looking to hear from people who’ve been through something similar:
Has anyone else had anxiety completely hijack their decision-making? How did you rebuild confidence after layoffs or professional burnout? How do you survive a micromanaging environment without it destroying your mental health?
If you made it this far, thank you for reading. Writing this already helps me feel a little less alone.
https://redd.it/1plmm5f
@r_devops
Hi everyone,
I’m posting here because I need to say this somewhere, and I don’t feel comfortable dumping it all on the people in my life.
I’m a DevOps / infrastructure engineer in Canada with several years of experience. I’ve worked across cloud, CI/CD, containers, automation, and I hold multiple certifications (AWS, Docker, Terraform, Kubernetes-related). On paper, I should be “fine.” That’s part of what makes this harder.
Earlier this year I was laid off, and it really broke something in me. Since then, my confidence hasn’t fully come back. I second-guess myself constantly, panic in interviews, and replay mistakes in my head over and over. I’ve fumbled questions I know I know. My brain just locks up under pressure.
Recently, in a state of anxiety, I left a job too quickly — a decision I regret. I’m about to start at a new org that, based on people already working there, is extremely micromanaging and heavy on interference. Even before day one, it’s triggering a lot of dread. I already feel like I’m bracing myself just to survive instead of grow.
I’m still have savings and insurance, so I’m not financially desperate, but mentally I feel exhausted all the time. There’s a constant low-grade tension in my body, like my nervous system is always switched on. I overthink every decision, beat myself up for past ones, and feel like I’m slowly shrinking as a person.
Sometimes my thoughts drift into very bleak, philosophical territory about life, purpose, and suffering but not because I want to harm myself (I don’t), but because I feel worn down by the constant effort of “keeping it together.” I want to be clear: I am safe. This is burnout, anxiety, and mental fatigue, not a crisis.
I’m trying to cope by:
Focusing on small wins (certs, small goals, structure)
Taking things one day at a time
Continuing to apply for other roles quietly
Reminding myself that jobs can be temporary, even if they’re bad
I guess I’m looking to hear from people who’ve been through something similar:
Has anyone else had anxiety completely hijack their decision-making? How did you rebuild confidence after layoffs or professional burnout? How do you survive a micromanaging environment without it destroying your mental health?
If you made it this far, thank you for reading. Writing this already helps me feel a little less alone.
https://redd.it/1plmm5f
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A short whinge about the current state of the sub and lack of moderation
Hi,
As many readers are aware, this subreddit is a dump.
It is filled with posts that the majority of users do not want as evidenced by the downvotes the majority of posts receive.
Reporting the absolute garbage posted unfortunately doesn't result in a removal either.
A quick scan of posts finds:
AI blogspam
Vendor blogspam
"I created X to solve Y (imaginary problem)"
Product market research
Covert marketing
Problems that would be solved with less effort by using Google rather than making a Reddit post
Can the mods open up applications to people who actually want to moderate the sub and consult with the community on evolving the current ruleset?
https://redd.it/1plo7f0
@r_devops
Hi,
As many readers are aware, this subreddit is a dump.
It is filled with posts that the majority of users do not want as evidenced by the downvotes the majority of posts receive.
Reporting the absolute garbage posted unfortunately doesn't result in a removal either.
A quick scan of posts finds:
AI blogspam
Vendor blogspam
"I created X to solve Y (imaginary problem)"
Product market research
Covert marketing
Problems that would be solved with less effort by using Google rather than making a Reddit post
Can the mods open up applications to people who actually want to moderate the sub and consult with the community on evolving the current ruleset?
https://redd.it/1plo7f0
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Building a QEMU/KVM based virtual home lab with automated Linux VM provisioning and resource management with local domain control
I have been building and using an automation toolkit for running a complete virtual home lab on KVM/QEMU. I understand there are a lot of opensource alternatives available, but this was built for fun and for managing a custom lab setup.
The automated setup deploys a central lab infrastructure server VM that runs all essential services for the lab: DNS (BIND), DHCP (KEA), iPXE, NFS, and NGINX web server for OS provisioning. You manage everything from your host machine using custom built CLI tools, and the lab infra server handles all the backend services for your local domain (like .lab.local).
You can deploy VMs two ways: network boot using iPXE/PXE for traditional provisioning, or clone golden images for instant deployment. Build a base image once, then spin up multiple copies in seconds. The CLI tools let you manage the complete lifecycle—deploy, reimage, resize resources, hot-add or remove disks and network interfaces, access serial consoles, and monitor health. Your local DNS infrastructure is handled dynamically as you create or destroy VMs, and you can manage DNS records with a centralized tool.
Supports AlmaLinux, Rocky Linux, Oracle Linux, CentOS Stream, RHEL, Ubuntu LTS, and openSUSE Leap using Kickstart, Cloud-init, and AutoYaST for automated provisioning.
The whole point is to make it a playground to build, break, and rebuild without fear. Perfect for spinning up Kubernetes clusters, testing multi-node setups, or experimenting with any Linux-based infrastructure. Everything is written in bash with no complex dependencies. Ansible is utilized for lab infrastructure server provisioning.
GitHub: https://github.com/Muthukumar-Subramaniam/server-hub
Been using this in my homelab and made it public so anyone with similar interests or requirements can use it. Please have a look and share your ideas and advice if any.
https://redd.it/1plns7z
@r_devops
I have been building and using an automation toolkit for running a complete virtual home lab on KVM/QEMU. I understand there are a lot of opensource alternatives available, but this was built for fun and for managing a custom lab setup.
The automated setup deploys a central lab infrastructure server VM that runs all essential services for the lab: DNS (BIND), DHCP (KEA), iPXE, NFS, and NGINX web server for OS provisioning. You manage everything from your host machine using custom built CLI tools, and the lab infra server handles all the backend services for your local domain (like .lab.local).
You can deploy VMs two ways: network boot using iPXE/PXE for traditional provisioning, or clone golden images for instant deployment. Build a base image once, then spin up multiple copies in seconds. The CLI tools let you manage the complete lifecycle—deploy, reimage, resize resources, hot-add or remove disks and network interfaces, access serial consoles, and monitor health. Your local DNS infrastructure is handled dynamically as you create or destroy VMs, and you can manage DNS records with a centralized tool.
Supports AlmaLinux, Rocky Linux, Oracle Linux, CentOS Stream, RHEL, Ubuntu LTS, and openSUSE Leap using Kickstart, Cloud-init, and AutoYaST for automated provisioning.
The whole point is to make it a playground to build, break, and rebuild without fear. Perfect for spinning up Kubernetes clusters, testing multi-node setups, or experimenting with any Linux-based infrastructure. Everything is written in bash with no complex dependencies. Ansible is utilized for lab infrastructure server provisioning.
GitHub: https://github.com/Muthukumar-Subramaniam/server-hub
Been using this in my homelab and made it public so anyone with similar interests or requirements can use it. Please have a look and share your ideas and advice if any.
https://redd.it/1plns7z
@r_devops
GitHub
GitHub - Muthukumar-Subramaniam/server-hub: A QEMU/KVM-based automation lab for end-to-end VM provisioning that manages the complete…
A QEMU/KVM-based automation lab for end-to-end VM provisioning that manages the complete lifecycle of virtual machines, with multi-distribution support, providing a powerful environment for learnin...
Automate KVM image creation for testing purposes
I'm trying to clean up the testing workflow for a project I'm working on, a database built on top of
Right now I'm using KVM and its NVMe device emulator to power the dev environment, but the developer experience is poor: I have a noscript to recreate the KVM image but it requires some manual steps, and I don't want to commit the KVM image itself for obvious reasons
My questions are:
- Is there an alternative to dockerfiles for KVM images?
- If not, what are my best options for my use case?
- What other options do I have to emulate NVMe devices?
Things I tried:
- Running an
- Mocking an NVMe device with some code and a memory backed file, but it's not real testing
https://redd.it/1pln6hj
@r_devops
I'm trying to clean up the testing workflow for a project I'm working on, a database built on top of
io_uring and NVMe.Right now I'm using KVM and its NVMe device emulator to power the dev environment, but the developer experience is poor: I have a noscript to recreate the KVM image but it requires some manual steps, and I don't want to commit the KVM image itself for obvious reasons
My questions are:
- Is there an alternative to dockerfiles for KVM images?
- If not, what are my best options for my use case?
- What other options do I have to emulate NVMe devices?
Things I tried:
- Running an
nvmevirt device emulator, but it's not suitable for my test environment because it requires to load a kernel module- Mocking an NVMe device with some code and a memory backed file, but it's not real testing
https://redd.it/1pln6hj
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Exposing Services on a KIND Cluster on Contabo VPS, MetalLB vs cloud-provider-kind?
I'm setting up a test Kubernetes environment on a Contabo VPS and KIND to spin up the cluster.
I’m figuring out the least hacky way to expose services externally.
So far, I see two main options:
1. MetalLB
2. cloud-provider-kind
My goal isn’t production traffic, but I do want something that:
Behaves close to real Kubernetes networking
Doesn’t rely on NodePort hacks
Is reasonable for CI/testing
For those who’ve run KIND on VPS providers like Contabo/Hetzner:
Which approach did you settle on?
Any gotchas with MetalLB on a single-node KIND cluster?
https://redd.it/1plv49u
@r_devops
I'm setting up a test Kubernetes environment on a Contabo VPS and KIND to spin up the cluster.
I’m figuring out the least hacky way to expose services externally.
So far, I see two main options:
1. MetalLB
2. cloud-provider-kind
My goal isn’t production traffic, but I do want something that:
Behaves close to real Kubernetes networking
Doesn’t rely on NodePort hacks
Is reasonable for CI/testing
For those who’ve run KIND on VPS providers like Contabo/Hetzner:
Which approach did you settle on?
Any gotchas with MetalLB on a single-node KIND cluster?
https://redd.it/1plv49u
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
GitHub - eznix86/kseal: CLI tool to view, export, and encrypt Kubernetes SealedSecrets.
I’ve been using *kubeseal* (the Bitnami sealed-secrets CLI) on my clusters for a while now, and all my secrets stay sealed with Bitnami SealedSecrets so I can safely commit them to Git.
At first I had a bunch of *bash* one-liners and little helpers to export secrets, view them, or re-encrypt them in place. That worked… until it didn’t. Every time I wanted to peek inside a secret or grab all the sealed secrets out into plaintext for debugging, I’d end up reinventing the wheel. So naturally I thought:
>“Why not wrap this up in a proper noscript?”
Fast forward a few hours later and I ended up with **kseal** — a tiny Python CLI that sits on top of kubeseal and gives me a few things that made my life easier:
* `kseal cat`: print a decrypted secret right in the terminal
* `kseal export`: dump secrets to files (local or from cluster)
* `kseal encrypt`: seal plaintext secrets using `kubeseal`
* `kseal init`: generate a config so you don’t have to rerun the same flags forever
You can install it with pip/pipx and run it wherever you already have access to your cluster. It’s basically just automating the stuff I was doing manually and providing a consistent interface instead of a pile of ad-hoc noscripts. ([GitHub](https://github.com/eznix86/kseal/))
It is just something that *helped me* and maybe helps someone else who’s tired of:
* remembering kubeseal flags
* juggling secrets in different dirs
* reinventing small helper noscripts every few weeks
Check it out if you’re in the same boat: [https://github.com/eznix86/kseal/](https://github.com/eznix86/kseal/)
https://redd.it/1plw3n7
@r_devops
I’ve been using *kubeseal* (the Bitnami sealed-secrets CLI) on my clusters for a while now, and all my secrets stay sealed with Bitnami SealedSecrets so I can safely commit them to Git.
At first I had a bunch of *bash* one-liners and little helpers to export secrets, view them, or re-encrypt them in place. That worked… until it didn’t. Every time I wanted to peek inside a secret or grab all the sealed secrets out into plaintext for debugging, I’d end up reinventing the wheel. So naturally I thought:
>“Why not wrap this up in a proper noscript?”
Fast forward a few hours later and I ended up with **kseal** — a tiny Python CLI that sits on top of kubeseal and gives me a few things that made my life easier:
* `kseal cat`: print a decrypted secret right in the terminal
* `kseal export`: dump secrets to files (local or from cluster)
* `kseal encrypt`: seal plaintext secrets using `kubeseal`
* `kseal init`: generate a config so you don’t have to rerun the same flags forever
You can install it with pip/pipx and run it wherever you already have access to your cluster. It’s basically just automating the stuff I was doing manually and providing a consistent interface instead of a pile of ad-hoc noscripts. ([GitHub](https://github.com/eznix86/kseal/))
It is just something that *helped me* and maybe helps someone else who’s tired of:
* remembering kubeseal flags
* juggling secrets in different dirs
* reinventing small helper noscripts every few weeks
Check it out if you’re in the same boat: [https://github.com/eznix86/kseal/](https://github.com/eznix86/kseal/)
https://redd.it/1plw3n7
@r_devops
GitHub
GitHub - eznix86/kseal: CLI tool to view, export, and encrypt Kubernetes Secrets.
CLI tool to view, export, and encrypt Kubernetes Secrets. - eznix86/kseal
Looking for Slack App Feedback - Slack --> Github/Linear Issues
As a systems engineer(clearly used to writing too many user stories) I tend to have many ideas that get lost in chat or I need to copy pasta over to Github. Was playing around in Discord and got a pretty handy tool(for me at least) going where I react to urls or messages and port those over into Github. I refer to the proces as Capture Clean Create.
**What it does:**
\- React with an emoji to any message with a URL → creates a GitHub issue or Linear ticket
\- Use `/idea capture` to summarize the last N messages into a structured issue
\- AI extracts noscript, summary, category, and key points automatically
Just looking for some feedback on if this is a useful tool for you, mostly for developers/PMs. Outside of Slack/Github it currently supports Linear, Discord. Jira and Teams are next up.
https://slack.com/oauth/v2/authorize?client\_id=9193114002786.10095883648134&scope=channels:history,channels:read,chat:write,reactions:read,users:read,team:read,commands&user\_scope=
https://redd.it/1pltrez
@r_devops
As a systems engineer(clearly used to writing too many user stories) I tend to have many ideas that get lost in chat or I need to copy pasta over to Github. Was playing around in Discord and got a pretty handy tool(for me at least) going where I react to urls or messages and port those over into Github. I refer to the proces as Capture Clean Create.
**What it does:**
\- React with an emoji to any message with a URL → creates a GitHub issue or Linear ticket
\- Use `/idea capture` to summarize the last N messages into a structured issue
\- AI extracts noscript, summary, category, and key points automatically
Just looking for some feedback on if this is a useful tool for you, mostly for developers/PMs. Outside of Slack/Github it currently supports Linear, Discord. Jira and Teams are next up.
https://slack.com/oauth/v2/authorize?client\_id=9193114002786.10095883648134&scope=channels:history,channels:read,chat:write,reactions:read,users:read,team:read,commands&user\_scope=
https://redd.it/1pltrez
@r_devops
Multi region AI deployment and every country has different data residency laws, compliance is impossible.
We are expanding AI product to europe and asia and thought we had compliance figured out but germany requires data processed in germany, france has different rules, singapore different, japan even more strict. We tried regional deployments but then we have data sync problems and model consistency issues, tried to centralize but that violates residency laws.
The legal team sent us a spreadsheet with 47 rows of different rules per country and some contradict each other. How are companies with global AI products handling this? feels like we need different deployment per country which is impossible to maintain.
https://redd.it/1plyiz1
@r_devops
We are expanding AI product to europe and asia and thought we had compliance figured out but germany requires data processed in germany, france has different rules, singapore different, japan even more strict. We tried regional deployments but then we have data sync problems and model consistency issues, tried to centralize but that violates residency laws.
The legal team sent us a spreadsheet with 47 rows of different rules per country and some contradict each other. How are companies with global AI products handling this? feels like we need different deployment per country which is impossible to maintain.
https://redd.it/1plyiz1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Terraform still? - I live under a rock
Apparently, I live under a rock and missed that terraform/IBM caused quite a bit of drama this year.
I'm a DE who is working to build his own server where ill be using it for fun and some learning for a little job security. My employer does not have an IaC solution right now or I would just choose whatever they were going with, but I am kind of at a loss on what tool I should be using. Ill be using Proxmox and will be usong a mix of LXC's and VM's to deploy Ubuntu server and SQL Server instances as well as some Azure resources.
Originally I planned on using terraform, but with everything I've been reading it sounds like terraform is losing its marketshare to OpenTofu and Pulumi. With my focus being on learning and job security as a date engineer, is there an obvious choice in IaC solution for me?
Go easy, I fully admit I'm a rookie here.
https://redd.it/1pm49co
@r_devops
Apparently, I live under a rock and missed that terraform/IBM caused quite a bit of drama this year.
I'm a DE who is working to build his own server where ill be using it for fun and some learning for a little job security. My employer does not have an IaC solution right now or I would just choose whatever they were going with, but I am kind of at a loss on what tool I should be using. Ill be using Proxmox and will be usong a mix of LXC's and VM's to deploy Ubuntu server and SQL Server instances as well as some Azure resources.
Originally I planned on using terraform, but with everything I've been reading it sounds like terraform is losing its marketshare to OpenTofu and Pulumi. With my focus being on learning and job security as a date engineer, is there an obvious choice in IaC solution for me?
Go easy, I fully admit I'm a rookie here.
https://redd.it/1pm49co
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I need help figuring out what this is called and where to start.
My manager just let me know that I will be taking over the terraform repo for Azure AI/ML because one of my teammate left and the one who trained under him did not pick up anything.
The AI/ML project will be resuming next month with the dev side starting to train their own models. My manager told me to self study to prep myself for it.
Right now the terraform repo is used to deploy models and build the endpoints but that is it. At least from what I see it. I was able to deploy a test instance and learn how to deploy them in different regions, etc. However, my manager said as of right now, I will also be responsible for building out the infra for devs to train their own ML models and make sure we have high availablility. I may be doing more but we are not sure yet. The dev that I talked to also said the same thing.
Is this considered platform ops? MLops? AI engineer? Would the Azure AI Engineer cert be the thing for me?
Does anyone do something similar and can give me some recommendations on learning resources? Or can give me an idea of what other things you do related to this? (build out, iac, pipeline, etc. ) I can try to ask my company for pluralsight access if there is anything good there. I already have kodekloud but haven't been through the material since I've been busy but is there anything there that you would recommend?
I'm super excited but also overwhelmed since this is new to me and the company.
https://redd.it/1pm5r94
@r_devops
My manager just let me know that I will be taking over the terraform repo for Azure AI/ML because one of my teammate left and the one who trained under him did not pick up anything.
The AI/ML project will be resuming next month with the dev side starting to train their own models. My manager told me to self study to prep myself for it.
Right now the terraform repo is used to deploy models and build the endpoints but that is it. At least from what I see it. I was able to deploy a test instance and learn how to deploy them in different regions, etc. However, my manager said as of right now, I will also be responsible for building out the infra for devs to train their own ML models and make sure we have high availablility. I may be doing more but we are not sure yet. The dev that I talked to also said the same thing.
Is this considered platform ops? MLops? AI engineer? Would the Azure AI Engineer cert be the thing for me?
Does anyone do something similar and can give me some recommendations on learning resources? Or can give me an idea of what other things you do related to this? (build out, iac, pipeline, etc. ) I can try to ask my company for pluralsight access if there is anything good there. I already have kodekloud but haven't been through the material since I've been busy but is there anything there that you would recommend?
I'm super excited but also overwhelmed since this is new to me and the company.
https://redd.it/1pm5r94
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Sensitive Data in Error Messages: When Your Stack Traces Give Away the Database Schema 📋
https://instatunnel.my/blog/sensitive-data-in-error-messages-when-your-stack-traces-give-away-the-database-schema
https://redd.it/1pm4bs0
@r_devops
https://instatunnel.my/blog/sensitive-data-in-error-messages-when-your-stack-traces-give-away-the-database-schema
https://redd.it/1pm4bs0
@r_devops
InstaTunnel
Sensitive Data in Error Messages: How Stack Traces Expose
Discover how verbose error messages and stack traces leak database schemas, file paths, and secrets. Learn why production apps must hide detailed errors in 2025
Supply chain compromises why runtime matters
​
Even if your dependencies are “safe” at build time, runtime can reveal malicious activity. It’s kind of scary how one tiny package can create huge issues once workloads are live.
This blog explains how these runtime threats show up:link
Do you monitor runtime behaviors for dependencies, or mostly rely on pre-deployment scans?
https://redd.it/1pmcc2c
@r_devops
​
Even if your dependencies are “safe” at build time, runtime can reveal malicious activity. It’s kind of scary how one tiny package can create huge issues once workloads are live.
This blog explains how these runtime threats show up:link
Do you monitor runtime behaviors for dependencies, or mostly rely on pre-deployment scans?
https://redd.it/1pmcc2c
@r_devops
ARMO
The Real Cloud Attack Vectors to Watch in 2026- ARMO
Learn the 3 most prevalent runtime threat vectors behind modern cloud breaches: application-layer attacks, supply chain compromises, and stolen cloud identities.
Built an LLM-powered GitHub Actions failure analyzer (no PR spam, advisory-only)
Hi all,
As a DevOps engineer, I often realize that I still spend too much time reading failed GitHub Actions logs.
After a quick search, I couldn’t find anything that focuses specifically on **post-mortem analysis of failed CI jobs**, so I built one myself.
What it does:
\- Runs only when a GitHub Actions job fails
\- Collects and normalizes job logs
\- Uses an LLM to explain the root cause and suggest possible fixes
\- Publishes the result directly into the Job Summary (no PR spam, no comments)
Key points:
\- Language-agnostic (works with almost any stack that produces logs)
\- LLM-agnostic (OpenAI / Claude / OpenRouter / self-hosted)
\- Designed for DevOps workflows, not code review
\- Optimizes logs before sending them to the LLM to reduce token cost
This is advisory-only (no autofix), by design.
You can find and try it here:
https://github.com/ratibor78/actions-ai-advisor
I’d really appreciate feedback from people who live in CI/CD every day:
What would make this genuinely useful for you?
https://redd.it/1pmdb1i
@r_devops
Hi all,
As a DevOps engineer, I often realize that I still spend too much time reading failed GitHub Actions logs.
After a quick search, I couldn’t find anything that focuses specifically on **post-mortem analysis of failed CI jobs**, so I built one myself.
What it does:
\- Runs only when a GitHub Actions job fails
\- Collects and normalizes job logs
\- Uses an LLM to explain the root cause and suggest possible fixes
\- Publishes the result directly into the Job Summary (no PR spam, no comments)
Key points:
\- Language-agnostic (works with almost any stack that produces logs)
\- LLM-agnostic (OpenAI / Claude / OpenRouter / self-hosted)
\- Designed for DevOps workflows, not code review
\- Optimizes logs before sending them to the LLM to reduce token cost
This is advisory-only (no autofix), by design.
You can find and try it here:
https://github.com/ratibor78/actions-ai-advisor
I’d really appreciate feedback from people who live in CI/CD every day:
What would make this genuinely useful for you?
https://redd.it/1pmdb1i
@r_devops
GitHub
GitHub - ratibor78/actions-ai-advisor: GitHub Actions errors explanation by AI
GitHub Actions errors explanation by AI. Contribute to ratibor78/actions-ai-advisor development by creating an account on GitHub.
BCP/DR/GRC at your company real readiness — or mostly paperwork?
Entering position as SRE group lead.
I’m trying to better understand how **BCP, DR, and GRC actually work in practice**, not how they’re supposed to work on paper.
In many companies I’ve seen, there are:
* Policies, runbooks, and risk registers
* SOC2 / ISO / internal audits that get “passed”
* Diagrams and recovery plans that look good in reviews
But I’m curious about the **day-to-day reality**:
* When something breaks, **do people actually use the DR/BCP docs?**
* How often are DR or recovery plans *really* tested end-to-end?
* Do incident learnings meaningfully feed back into controls and risk tracking - or does that break down?
* Where do things still rely on spreadsheets, docs, or tribal knowledge?
I’m not looking to judge — just trying to learn from people who live this.
What surprised you the most during a real incident or audit?
(LMK what's the company size - cause I guess it's different in each size)
https://redd.it/1pmg8a9
@r_devops
Entering position as SRE group lead.
I’m trying to better understand how **BCP, DR, and GRC actually work in practice**, not how they’re supposed to work on paper.
In many companies I’ve seen, there are:
* Policies, runbooks, and risk registers
* SOC2 / ISO / internal audits that get “passed”
* Diagrams and recovery plans that look good in reviews
But I’m curious about the **day-to-day reality**:
* When something breaks, **do people actually use the DR/BCP docs?**
* How often are DR or recovery plans *really* tested end-to-end?
* Do incident learnings meaningfully feed back into controls and risk tracking - or does that break down?
* Where do things still rely on spreadsheets, docs, or tribal knowledge?
I’m not looking to judge — just trying to learn from people who live this.
What surprised you the most during a real incident or audit?
(LMK what's the company size - cause I guess it's different in each size)
https://redd.it/1pmg8a9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Best way to isolate Development Environments without Docker/Hyper-V?
I really dislike polluting my main OS with development tools, runtimes, and dependencies.
For a while, I’ve been using Docker to solve this problem, and in many ways it works well.
However, It’s a very advanced tool, it has lots of things and I never actually use and for the simple goal of isolating each project’s development environment, therefore it often feels like overkill.
On top of that, Docker tends to run in the background if you forget to shut it down, consumes a noticeable amount of system resources, and (on Windows) requires Hyper-V/WSL2, which adds even more overhead.
I’m wondering if there are simpler or lighter alternatives for keeping development environments isolated without “polluting” the host OS. I just want to keep it simple.
https://redd.it/1pmfb2e
@r_devops
I really dislike polluting my main OS with development tools, runtimes, and dependencies.
For a while, I’ve been using Docker to solve this problem, and in many ways it works well.
However, It’s a very advanced tool, it has lots of things and I never actually use and for the simple goal of isolating each project’s development environment, therefore it often feels like overkill.
On top of that, Docker tends to run in the background if you forget to shut it down, consumes a noticeable amount of system resources, and (on Windows) requires Hyper-V/WSL2, which adds even more overhead.
I’m wondering if there are simpler or lighter alternatives for keeping development environments isolated without “polluting” the host OS. I just want to keep it simple.
https://redd.it/1pmfb2e
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
ingress-nginx retiring March 2026 - what's your migration plan?
So the official **Kubernetes ingress-nginx** is being retired (announcement from SIG Network in November). Best-effort maintenance **until March 2026**, then no more updates or security patches.
Currently evaluating options for our GKE clusters (\~160 ingress):
* **Envoy Gateway** (Gateway API native) - seems like the "future-proof" choice
* **F5 NGINX Ingress Controller** \- different project, still maintained, easier migration path
* **Traefik** \- heard good things, anyone running it at scale?
* **Istio Gateway** \- feels overkill if we don't need full service mesh
For those already migrating or who've made the switch:
* What did you choose and why?
* How painful was moving away from annotation hell?
* Is Gateway API mature enough for prod?
Leaning toward Envoy Gateway but curious about real-world experiences.
https://redd.it/1pmkjqq
@r_devops
So the official **Kubernetes ingress-nginx** is being retired (announcement from SIG Network in November). Best-effort maintenance **until March 2026**, then no more updates or security patches.
Currently evaluating options for our GKE clusters (\~160 ingress):
* **Envoy Gateway** (Gateway API native) - seems like the "future-proof" choice
* **F5 NGINX Ingress Controller** \- different project, still maintained, easier migration path
* **Traefik** \- heard good things, anyone running it at scale?
* **Istio Gateway** \- feels overkill if we don't need full service mesh
For those already migrating or who've made the switch:
* What did you choose and why?
* How painful was moving away from annotation hell?
* Is Gateway API mature enough for prod?
Leaning toward Envoy Gateway but curious about real-world experiences.
https://redd.it/1pmkjqq
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
One Ubuntu setting that quietly breaks services: ulimit -n
I’ve seen enough strange production issues turn out to be one OS limit most of us never check.
Wrote this from personal debugging pain, not theory.
Curious how many others have been bitten by this.
Link : https://medium.com/stackademic/the-one-setting-in-ubuntu-that-quietly-breaks-your-apps-ulimit-n-f458ab437b7d?sk=4e540d4a7b6d16eb826f469de8b8f9ad
https://redd.it/1pmjooe
@r_devops
I’ve seen enough strange production issues turn out to be one OS limit most of us never check.
ulimit -n caused random 500s, frozen JVMs, dropped SSH sessions, and broken containers.Wrote this from personal debugging pain, not theory.
Curious how many others have been bitten by this.
Link : https://medium.com/stackademic/the-one-setting-in-ubuntu-that-quietly-breaks-your-apps-ulimit-n-f458ab437b7d?sk=4e540d4a7b6d16eb826f469de8b8f9ad
https://redd.it/1pmjooe
@r_devops
Medium
The One Setting in Ubuntu That Quietly Breaks Your Apps: ulimit -n
Why one hidden OS limit makes services stall, logs stop, and developers blame everything except the real cause.
Stay in a stable job or work for an AI company.
Hi,
I am working for a company in Berlin as an senior infrastructure engineer. The company is stable but does not pay well. I am working on impactful projects and working hard. I asked for a raise, but it seems I will not get a significant increase, maybe 5-8%.
Meanwhile, I am having an interview for an AI company, not EU-based. It got 130M investment last year and wants to expand in EMAE.
They pay ~30% more than what I make at the moment.
Given the market, does it make sense to take the risk or stay in a stable job for a while until the market gets better?
https://redd.it/1pmn9hh
@r_devops
Hi,
I am working for a company in Berlin as an senior infrastructure engineer. The company is stable but does not pay well. I am working on impactful projects and working hard. I asked for a raise, but it seems I will not get a significant increase, maybe 5-8%.
Meanwhile, I am having an interview for an AI company, not EU-based. It got 130M investment last year and wants to expand in EMAE.
They pay ~30% more than what I make at the moment.
Given the market, does it make sense to take the risk or stay in a stable job for a while until the market gets better?
https://redd.it/1pmn9hh
@r_devops
Anyone automating their i18n/localization workflow in CI/CD?
My team is building towards launching in new markets, and the manual translation process is becoming a real bottleneck. We've been exploring ways to integrate localization automation into our DevOps pipeline.
Our current setup involves manually extracting JSON strings, sending them out for translation, and then manually re-integrating them—it’s slow and error-prone. I've been looking at ways to make this a seamless part of our "develop → commit → deploy" flow.
One tool I came across and have started testing for this is the Lingo.dev CLI. It's an open-source, AI-powered toolkit designed to handle translation automation locally and fits into a CI/CD pipeline . Its core feature seems to be that you point it at your translation files, and it can automatically translate them using a specified LLM, outputting files in the correct structure .
The concept of integrating this into a pipeline looks powerful. For instance, you can configure a GitHub Action to run the lingo.dev i18n command on every push or pull request. It uses an i18n.lock file with content checksums to translate only changed text, which keeps costs down and speeds things up .
I'm curious about the practical side from other DevOps/SRE folks:
When does automation make sense? Do you run translations on every PR, on merges to main, or as a scheduled job?
Handling the output: Do you commit the newly generated translation files directly back to the feature branch or PR? What does that review process look like?
Provider choice: The CLI seems to support both "bring your own key" (e.g., OpenAI, Anthropic) and a managed cloud option . Any strong opinions on managing API keys/credential rotation in CI vs. using a managed service?
Rollback & state: The checksum-based lock file seems crucial for idempotency . How do you handle scenarios where you need to roll back a batch of translations or audit what was changed?
Basically, I'm trying to figure out if this "set it and forget it" approach is viable or if it introduces more complexity than it solves. I'd love to hear about your real-world implementations, pitfalls, or any alternative tools in this space.
https://redd.it/1pmnax4
@r_devops
My team is building towards launching in new markets, and the manual translation process is becoming a real bottleneck. We've been exploring ways to integrate localization automation into our DevOps pipeline.
Our current setup involves manually extracting JSON strings, sending them out for translation, and then manually re-integrating them—it’s slow and error-prone. I've been looking at ways to make this a seamless part of our "develop → commit → deploy" flow.
One tool I came across and have started testing for this is the Lingo.dev CLI. It's an open-source, AI-powered toolkit designed to handle translation automation locally and fits into a CI/CD pipeline . Its core feature seems to be that you point it at your translation files, and it can automatically translate them using a specified LLM, outputting files in the correct structure .
The concept of integrating this into a pipeline looks powerful. For instance, you can configure a GitHub Action to run the lingo.dev i18n command on every push or pull request. It uses an i18n.lock file with content checksums to translate only changed text, which keeps costs down and speeds things up .
I'm curious about the practical side from other DevOps/SRE folks:
When does automation make sense? Do you run translations on every PR, on merges to main, or as a scheduled job?
Handling the output: Do you commit the newly generated translation files directly back to the feature branch or PR? What does that review process look like?
Provider choice: The CLI seems to support both "bring your own key" (e.g., OpenAI, Anthropic) and a managed cloud option . Any strong opinions on managing API keys/credential rotation in CI vs. using a managed service?
Rollback & state: The checksum-based lock file seems crucial for idempotency . How do you handle scenarios where you need to roll back a batch of translations or audit what was changed?
Basically, I'm trying to figure out if this "set it and forget it" approach is viable or if it introduces more complexity than it solves. I'd love to hear about your real-world implementations, pitfalls, or any alternative tools in this space.
https://redd.it/1pmnax4
@r_devops
lingo.dev
Lingo.dev - Automated AI localization for web & mobile apps
State-of-the-art AI localization for apps, right in CI/CD. Ship faster, release more often, and have more paying customers.
How to master
Amid mass layoffs and restructuring I ended up in devops teams from backend engineering team.
It’s been a couple of months. I am mostly doing pipeline support work meaning application teams use our templates and infra and we support them in all areas from onboarding to stability.
There are a ton of teams and their stacks are very different (therefore templates). How to get a grasp of all the pieces?
I know without giving a ton of info seeking help is hard but I’d like to know if there a framework which I can follow to understand all the moving parts?
We are on Gitlab and AWS. Appreciate your help.
https://redd.it/1pmsh7u
@r_devops
Amid mass layoffs and restructuring I ended up in devops teams from backend engineering team.
It’s been a couple of months. I am mostly doing pipeline support work meaning application teams use our templates and infra and we support them in all areas from onboarding to stability.
There are a ton of teams and their stacks are very different (therefore templates). How to get a grasp of all the pieces?
I know without giving a ton of info seeking help is hard but I’d like to know if there a framework which I can follow to understand all the moving parts?
We are on Gitlab and AWS. Appreciate your help.
https://redd.it/1pmsh7u
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community