Reddit DevOps – Telegram
Sick of having to manually read documentation diffs in CI

Does anyone else hate reviewing documentation changes in PRs?
​We do docs-as-code, but if someone rewrites a paragraph for clarity, the standard diff just nukes the whole thing. I have to manually check every single word to ensure they didn't sneakily change a price or a deadline. It's just a massive waste of time.
​I finally hacked together a small tool that compares semantic meaning instead of syntax. For example, changing "The system is fast" to "System performance is quick" gets ignored, but changing "Price: $10" to "Price: $20" actually flags it as a real change.
​It uses LLM classification, but I built a safety net around it: if the model is unsure or the output breaks, it defaults back to a standard diff. It fails safe.

​Demo here: https://context-diff.vercel.app/

​Thinking of packaging this as a simple GitHub Action. Would you actually run this step in your pipeline for docs review?

https://redd.it/1pnqubh
@r_devops
resh v0.9.0 – an AI-native automation shell with URI-based resource handles

Hi all — I wanted to share a recent release of an open source project I’ve been working on, resh v0.9.0.

resh is an automation-focused shell designed to reduce brittleness in infrastructure and systems automation. Instead of stringly-typed CLI output, it models system resources as **URI-based handles** with structured JSON output, making it friendlier for automation, tooling, and AI agents.

Core idea:

    file://, svc://, net://, http://, proc://, secret://, snapshot://, mq://, log://


Each handle exposes explicit verbs (e.g., `status`, `verify`, `tail`, `ping`, `get`, `put`) and returns deterministic, machine-readable results. The goal is to make automation safer, composable, and introspectable — especially as more teams experiment with AI-assisted ops.




What’s new in v0.9.0 (high level):

* Expanded handle set (file, net, http, secret, svc, snapshot, mq, log, etc.)

* Stronger JSON envelopes and error determinism across verbs

* Improved service control (systemd/OpenRC)

* Better HTTP handling for automation use cases

* Continued focus on test coverage and production-safe defaults



This is early-stage OSS, not meant to replace Bash interactively, but to serve as a reliable automation substrate that other tools (or agents) can call.



Repo & docs are here if you’re curious:

👉 [https://github.com/millertechnologygroup/resh\](https://github.com/millertechnologygroup/resh)



Feedback — especially from folks who’ve fought fragile shell automation in CI/CD or ops tooling — is very welcome. If this isn’t useful for your workflow, that’s totally fair; I’m mainly looking for informed critique and real-world perspectives.



Thanks for reading.

https://redd.it/1pns1cl
@r_devops
are we teaching juniors how to build, or just how to use ai?

​

i’ve noticed a lot of newer devs are really good at getting something working quickly with ai help, but things slow down fast when the output isn’t quite right. once the happy path breaks, it’s harder to reason about what’s going on.

tools like chatgpt or cosine are genuinely useful, but they work best as support, not a replacement for understanding. if you don’t know why something works, debugging turns into trial and error pretty quickly. it feels like there’s a fine line between using ai well and leaning on it too much.

curious how others approach this. how do you encourage good ai usage without letting core skills slip?

https://redd.it/1pnuh2g
@r_devops
Has anyone actually found cloud cost visibility tools that don't feel like they were designed for accountants?

Ok so I'm the only devops person at a 12 person startup and I've somehow become the "cloud cost guy" which honestly was not in my job denoscription lol, and oour aws bill went from like $2,800 to $4,300 over the last few months and my cto keeps asking me where all the money is going and I genuinely have no idea half the time which is kind of embarrassing to admit.

Cost explorer is fine I guess but it's always delayed by like a day or two and by the time I actually see a spike the damage is already done, so I've been poking around at different options but everything either looks like it was designed for finance teams who want 47 different pivot tables or it's so expensive that it kind of defeats the whole purpose of trying to save money in the first place you know?

We're not big enough to justify hiring a dedicated finops person but we're definitely past the point where I can just ignore costs and hope for the best, and we're running mostly eks with some lambda and rds so nothing crazy but complex enough that tagging everything properly feels like a part time job on its own.

What are you all running for this kind of thing, and bonus points if it's something that doesn't require a week of setup or a sales call just to see a demo because I really don't have time for that right now.

https://redd.it/1pnufax
@r_devops
Stuck with installing arogcd using terraform

So I am trying to creates VPC and EKS using modules in my terraform code. But I am unable to find a way to EASILY install Argocd on my cluster and apply application.yaml (manifest for argocd config) on the cluster post creating it in same Iaac.

I tried googling/LLMing to find way.

I tried using eks's module output to set host in helm and install using helm_release but its not working giving me some kind REST endpoint kinda error.

What is the easiest way to do? Should I use Ansible? and is it really this tedious to setup argocd using terraform?


Please share code example if possible you can look at my code at - https://github.com/c0dysharma/microservices-demo-Iaac

https://redd.it/1pnv123
@r_devops
KODEKLOUD QUESTION

Hello, recently I got fired from Cloud Support position and now I am ready to sub there. Wanna grind as much as I can for the next few months. My question is is the Pro sub already enough or the next tier which is the AI one would be more beneficial? Idk how the AI Tutor and assisted labs would help me considering the price so I have a dilemma is it worth it. Thank you in advance!

https://redd.it/1pnxzu4
@r_devops
Sources to stay ahead of trends

Hi r/devops

I am approaching Senior level in our field and have noticed the requirements are to have architectual knowledge and an opinion on trends. Am aware of DevOps handbook, ByteByteGo and generally where to go if I were to interview for a different company.

For example, at my current company we're adopting a modular design of self service products and bringing the tooling we create closer to the developers. This includes investing in a GitOps strategy, naturually with ArgoCD, and Terraform module projects designed with Terraform Enterprise in mind. Of course IDPs are all the rage too recently.

I am more than happy with the tools and how to implement, but I am finding I am learning about these best practises from colleagues above rather than reading material in my own time.

I appreciate every company has a different problem to solve, so the shoe doesn't always fit. But I interested to hear from you all on how you keep up to date with new(er) methodologies and learn how to critically implement them from a philosophical standpoint (if that makes sense!).

Happy to clarify or expand on this quick ramble post.

Thanks.

https://redd.it/1pnz0xx
@r_devops
How are you handling integrations between SaaS, internal systems, and data pipelines without creating ops debt?

We’re seeing more workflows break not because infra fails, but because integrations quietly rot.

Some of us are:

* Maintaining custom noscripts and cron jobs
* Using iPaaS tools that feel heavy or limited
* Pushing everything into queues and hoping for the best

What’s your current setup? What’s been solid, and what’s been a constant source of alerts at 2 a.m.?

https://redd.it/1pnztvd
@r_devops
How to create FedRAMP compliant cloud environments with IaC for repeatable deployment

Is it possible to build a full cloud environment using Infrastructure as Code and make it FedRAMP compliant from the start? The goal would be to offer pre-authorized environments to companies seeking FedRAMP approval. Since everything is IaC, the setup could be repeated across accounts and tenants. The main challenge is understanding the actual effort for audits, ongoing compliance, and maintenance in production.

https://redd.it/1po0swi
@r_devops
Upgraded Github runner version - now facing build errors

We were forced to upgrade from v2.28 to v2.330 as the older one was deprecated.

However now we have dependency issues :

>C:\\Users\\ContainerAdministrator\\.nuget\\packages\\microsoft.build.sql\\0.1.14-preview\\Sdk\\Sdk.targets(22,89): error MSB4226: The imported project "C:\\Program Files (x86)\\Microsoft Visual Studio\\18\\BuildTools\\MSBuild\\Microsoft\\VisualStudio\\v16.0\\SSDT\\Microsoft.Data.Tools.Schema.SqlTasks.targets" was not found. Also, tried to find "Microsoft\\VisualStudio\\v16.0\\SSDT\\Microsoft.Data.Tools.Schema.SqlTasks.targets" in the fallback search path(s) for $(MSBuildExtensionsPath) - "C:\\Program Files (x86)\\MSBuild" . These search paths are defined in "C:\\Program Files (x86)\\Microsoft Visual Studio\\18\\BuildTools\\MSBuild\\Current\\Bin\\msbuild.exe.Config". Confirm that the path in the <Import> declaration is correct, and that the file exists on disk in one of the search paths. [C:\\home\\runner\\_work\\Main\\Main\\src\\Core.Database\\Core.Database.sqlproj\]

Any idea how I could fix this?

https://redd.it/1po2lvy
@r_devops
Do you actually trust K8s rightsizing recommendations?

Working at a bank, I've noticed teams straight up ignore cost optimization tools because the recommendations feel risky — cutting resources too aggressively can cause outages, and nobody wants to get paged at 3 am to save $50/month.

So the tools just... get ignored.

Got me thinking: would it help if a tool was explicitly asymmetric? Meaning it prioritizes "don't break anything" over "save maximum money" — recommending conservative cuts that won't cause OOMKills, even if it leaves some savings on the table.

For those managing K8s clusters:

* Do you actually follow rightsizing suggestions today?
* Would you trust a tool more if it guaranteed no under-provisioning risk?
* Or is the problem something else entirely?

Genuinely curious how others handle this tradeoff.

https://redd.it/1po3lto
@r_devops
What's your note-taking system for tech learning?

I've been jumping between note apps trying to find the "perfect" system - Notion, Obsidian, Logseq, Inkdrop, Affine... you name it, I've probably tried it.

But here's my problem: I take all these notes and then never actually remember the stuff later. I'll write detailed notes about Docker or some AWS service, then 2 weeks later I'm googling the same thing again like I never learned it.

So I'm curious:
- What note-taking app/system do you actually use?
- More importantly, how do you take notes so you actually remember things later?
- Or do you just not bother with notes and learn by doing?

Feels like I'm spending more time organizing notes than learning. Maybe I'm overthinking this whole thing?

What works for you?

https://redd.it/1po3nok
@r_devops
need grafana alternatives

Hey, good chance that i dont know how to use grafana but is there a better "logs visualizer" then it?
for context i come from uptrace, amazing frontend, but grafana has been a pita to get logs, filter etc , my other backend is victorialogs which has vlogscli, but i was hoping some something simpler like vmui for metrics, please lmk if yall know of anything.

Have a good one

https://redd.it/1po25gk
@r_devops
Github Actions introducing a per-minute fee for self-hosted runners

Github have just sent out an email announcing a $0.002/minute fee for self-hosted runners.

Just ran the numbers, and for us, that's close to $3.5k a month extra on our GitHub bill.

https://resources.github.com/actions/2026-pricing-changes-for-github-actions/

https://redd.it/1po8hj5
@r_devops
Pricing changes for GitHub Actions

On January 1, 2026, you will receive up to a 39% reduction in the net price of GitHub-hosted runners.
On March 1, 2026, we are introducing a new $0.002 per-minute GitHub Actions cloud platform charge that will apply to self-hosted runner usage. Any usage subject to this charge will count toward the minutes included in your plan.

"Please note the price for runner usage in public repositories will remain free, and there will be no changes in price structure for GitHub Enterprise Server customers"

source: https://resources.github.com/actions/2026-pricing-changes-for-github-actions/


p.s their email states 96% of users will see a cost reduction, but the actual extended link says 15%...make your own conclusions...

https://redd.it/1po92bm
@r_devops
Working for a company where people maybe don’t have that much tech knowledge

I’m not sure because I haven’t started yet but it seems they may not be so knowledgeable about current technology but maybe I’m getting the wrong impression. I know for sure I’m the only one who knows the cloud we will be using.

What are the pros and cons of working in this kinda environment?

I’m excited for how much I can be involved in but a little nervous about how much might be on my plate right away and a potential lack of onboarding/time to understand the new environment I’m in. Any tips? Thank you!

https://redd.it/1poa3nv
@r_devops
All Pods memory for a service being utilised to max regardless of less traffic

Hi all,
We use kubernetes along with Jenkins for CI.
We have a service that currently has 4 pods running and for that service it has always had its memory utilised to max capacity (the k8s resource website literally shows the memory utilisation as red marks for the pod).
I have to analyse what the main cause for this is and resolve it.

Can you please help me out here explaining how I can at least get to know the root cause of this issue?

https://redd.it/1poaov2
@r_devops
People who do on-call: assuming no MDM, do you prefer 2 separate phones, on 2 eSIMs installed into your personal phone? Why?

Assuming no MDM is required, when you’re on-call, do you prefer to have 2 physically separate phones, or a 2nd SIM/eSIM installed into your personal phone?

EDIT: meant to say “or 2 eSIMs” instead of “on”.

https://redd.it/1po97bh
@r_devops
Amazon confirms a Russian GRU unit hacked Western energy and infrastructure networks for years

Amazon confirms a Russian GRU unit hacked Western energy and infrastructure networks for years.

The threat wasn’t malware, it was silent credential theft from live traffic.

From 2021-2025, APT44 relied less on zero-days and more on exposed routers and VPN gateways

source: https://thehackernews.com/2025/12/amazon-exposes-years-long-gru-cyber.html

https://redd.it/1po8v1p
@r_devops
What’s the hardest thing to actually “see”/observe in your system, and what incident misled you the most?

TL;DR: Curious about two things: what feels basically invisible in your system even though you have monitoring, and what is the most misleading incident you have dealt with.

1. What is the hardest thing to actually see in your system today?

I do not mean “we forgot to add a metric.” I mean the things that stay fuzzy even when you are staring at all the graphs. Maybe it is concurrency weirdness that only shows up under load. Maybe it is figuring out what really changed when you have multiple deploy paths and config surfaces. Maybe it is hidden dependencies that only show up when they are on fire. For you, what is that blind spot that always makes incidents messier than they should be?

2. What is the most misleading incident you have worked?

I love the stories where all the symptoms pointed at the wrong thing. CPU looked bad but the real issue was a retry storm. Latency screamed “network” but it was actually cache. Everyone blamed the database and it turned out to be some tiny config or feature flag. You know, the “we debugged the wrong thing for three hours and only then saw it” moments.

For me it is that “what actually changed” question. I have been in situations where everyone swore nothing changed, and then three tools later we find some “small” config tweak or background job rollout that no one thought counted as a real change. On paper everything was monitored. In reality we were just poking around until someone tripped over the real diff.

That experience is what made me curious about how people actually reason during incidents, not just which tool they use.



https://redd.it/1pocyvd
@r_devops
What’s the best way to practice DevOps tools? I built something for beginners + need your thoughts

A lot of people entering DevOps keep asking the same question:
“Where can I practice CI/CD, Kubernetes, Terraform, etc. without paying for a bootcamp?”

Instead of repeating answers, I ended up building a small learning hub that has:

Free DevOps tutorials blogs
Hands-on practice challenges
Simple explanations of complex tools
Mini projects for beginners

If any of you are willing to take a look and tell me what’s good/bad/missing, I’d appreciate it:
**https://thedevopsworld.com**

Not selling anything — just trying to make a genuinely useful practice resource for newcomers to our field.
it will always remain free and with no intentions of making money.

Would love your suggestions on features, topics, or improvements, if you already tried!
future updates
We will be adding community mentoring feature
We have signed a collaboration with agentic ai for cloud deployment company to provide playground for our super.

#please don't sell anything or anyone's paid service, we respect you but the community runs on different funding model and non of it comes from users.

https://redd.it/1pog8iw
@r_devops