Reddit DevOps – Telegram
are we teaching juniors how to build, or just how to use ai?

​

i’ve noticed a lot of newer devs are really good at getting something working quickly with ai help, but things slow down fast when the output isn’t quite right. once the happy path breaks, it’s harder to reason about what’s going on.

tools like chatgpt or cosine are genuinely useful, but they work best as support, not a replacement for understanding. if you don’t know why something works, debugging turns into trial and error pretty quickly. it feels like there’s a fine line between using ai well and leaning on it too much.

curious how others approach this. how do you encourage good ai usage without letting core skills slip?

https://redd.it/1pnuh2g
@r_devops
Has anyone actually found cloud cost visibility tools that don't feel like they were designed for accountants?

Ok so I'm the only devops person at a 12 person startup and I've somehow become the "cloud cost guy" which honestly was not in my job denoscription lol, and oour aws bill went from like $2,800 to $4,300 over the last few months and my cto keeps asking me where all the money is going and I genuinely have no idea half the time which is kind of embarrassing to admit.

Cost explorer is fine I guess but it's always delayed by like a day or two and by the time I actually see a spike the damage is already done, so I've been poking around at different options but everything either looks like it was designed for finance teams who want 47 different pivot tables or it's so expensive that it kind of defeats the whole purpose of trying to save money in the first place you know?

We're not big enough to justify hiring a dedicated finops person but we're definitely past the point where I can just ignore costs and hope for the best, and we're running mostly eks with some lambda and rds so nothing crazy but complex enough that tagging everything properly feels like a part time job on its own.

What are you all running for this kind of thing, and bonus points if it's something that doesn't require a week of setup or a sales call just to see a demo because I really don't have time for that right now.

https://redd.it/1pnufax
@r_devops
Stuck with installing arogcd using terraform

So I am trying to creates VPC and EKS using modules in my terraform code. But I am unable to find a way to EASILY install Argocd on my cluster and apply application.yaml (manifest for argocd config) on the cluster post creating it in same Iaac.

I tried googling/LLMing to find way.

I tried using eks's module output to set host in helm and install using helm_release but its not working giving me some kind REST endpoint kinda error.

What is the easiest way to do? Should I use Ansible? and is it really this tedious to setup argocd using terraform?


Please share code example if possible you can look at my code at - https://github.com/c0dysharma/microservices-demo-Iaac

https://redd.it/1pnv123
@r_devops
KODEKLOUD QUESTION

Hello, recently I got fired from Cloud Support position and now I am ready to sub there. Wanna grind as much as I can for the next few months. My question is is the Pro sub already enough or the next tier which is the AI one would be more beneficial? Idk how the AI Tutor and assisted labs would help me considering the price so I have a dilemma is it worth it. Thank you in advance!

https://redd.it/1pnxzu4
@r_devops
Sources to stay ahead of trends

Hi r/devops

I am approaching Senior level in our field and have noticed the requirements are to have architectual knowledge and an opinion on trends. Am aware of DevOps handbook, ByteByteGo and generally where to go if I were to interview for a different company.

For example, at my current company we're adopting a modular design of self service products and bringing the tooling we create closer to the developers. This includes investing in a GitOps strategy, naturually with ArgoCD, and Terraform module projects designed with Terraform Enterprise in mind. Of course IDPs are all the rage too recently.

I am more than happy with the tools and how to implement, but I am finding I am learning about these best practises from colleagues above rather than reading material in my own time.

I appreciate every company has a different problem to solve, so the shoe doesn't always fit. But I interested to hear from you all on how you keep up to date with new(er) methodologies and learn how to critically implement them from a philosophical standpoint (if that makes sense!).

Happy to clarify or expand on this quick ramble post.

Thanks.

https://redd.it/1pnz0xx
@r_devops
How are you handling integrations between SaaS, internal systems, and data pipelines without creating ops debt?

We’re seeing more workflows break not because infra fails, but because integrations quietly rot.

Some of us are:

* Maintaining custom noscripts and cron jobs
* Using iPaaS tools that feel heavy or limited
* Pushing everything into queues and hoping for the best

What’s your current setup? What’s been solid, and what’s been a constant source of alerts at 2 a.m.?

https://redd.it/1pnztvd
@r_devops
How to create FedRAMP compliant cloud environments with IaC for repeatable deployment

Is it possible to build a full cloud environment using Infrastructure as Code and make it FedRAMP compliant from the start? The goal would be to offer pre-authorized environments to companies seeking FedRAMP approval. Since everything is IaC, the setup could be repeated across accounts and tenants. The main challenge is understanding the actual effort for audits, ongoing compliance, and maintenance in production.

https://redd.it/1po0swi
@r_devops
Upgraded Github runner version - now facing build errors

We were forced to upgrade from v2.28 to v2.330 as the older one was deprecated.

However now we have dependency issues :

>C:\\Users\\ContainerAdministrator\\.nuget\\packages\\microsoft.build.sql\\0.1.14-preview\\Sdk\\Sdk.targets(22,89): error MSB4226: The imported project "C:\\Program Files (x86)\\Microsoft Visual Studio\\18\\BuildTools\\MSBuild\\Microsoft\\VisualStudio\\v16.0\\SSDT\\Microsoft.Data.Tools.Schema.SqlTasks.targets" was not found. Also, tried to find "Microsoft\\VisualStudio\\v16.0\\SSDT\\Microsoft.Data.Tools.Schema.SqlTasks.targets" in the fallback search path(s) for $(MSBuildExtensionsPath) - "C:\\Program Files (x86)\\MSBuild" . These search paths are defined in "C:\\Program Files (x86)\\Microsoft Visual Studio\\18\\BuildTools\\MSBuild\\Current\\Bin\\msbuild.exe.Config". Confirm that the path in the <Import> declaration is correct, and that the file exists on disk in one of the search paths. [C:\\home\\runner\\_work\\Main\\Main\\src\\Core.Database\\Core.Database.sqlproj\]

Any idea how I could fix this?

https://redd.it/1po2lvy
@r_devops
Do you actually trust K8s rightsizing recommendations?

Working at a bank, I've noticed teams straight up ignore cost optimization tools because the recommendations feel risky — cutting resources too aggressively can cause outages, and nobody wants to get paged at 3 am to save $50/month.

So the tools just... get ignored.

Got me thinking: would it help if a tool was explicitly asymmetric? Meaning it prioritizes "don't break anything" over "save maximum money" — recommending conservative cuts that won't cause OOMKills, even if it leaves some savings on the table.

For those managing K8s clusters:

* Do you actually follow rightsizing suggestions today?
* Would you trust a tool more if it guaranteed no under-provisioning risk?
* Or is the problem something else entirely?

Genuinely curious how others handle this tradeoff.

https://redd.it/1po3lto
@r_devops
What's your note-taking system for tech learning?

I've been jumping between note apps trying to find the "perfect" system - Notion, Obsidian, Logseq, Inkdrop, Affine... you name it, I've probably tried it.

But here's my problem: I take all these notes and then never actually remember the stuff later. I'll write detailed notes about Docker or some AWS service, then 2 weeks later I'm googling the same thing again like I never learned it.

So I'm curious:
- What note-taking app/system do you actually use?
- More importantly, how do you take notes so you actually remember things later?
- Or do you just not bother with notes and learn by doing?

Feels like I'm spending more time organizing notes than learning. Maybe I'm overthinking this whole thing?

What works for you?

https://redd.it/1po3nok
@r_devops
need grafana alternatives

Hey, good chance that i dont know how to use grafana but is there a better "logs visualizer" then it?
for context i come from uptrace, amazing frontend, but grafana has been a pita to get logs, filter etc , my other backend is victorialogs which has vlogscli, but i was hoping some something simpler like vmui for metrics, please lmk if yall know of anything.

Have a good one

https://redd.it/1po25gk
@r_devops
Github Actions introducing a per-minute fee for self-hosted runners

Github have just sent out an email announcing a $0.002/minute fee for self-hosted runners.

Just ran the numbers, and for us, that's close to $3.5k a month extra on our GitHub bill.

https://resources.github.com/actions/2026-pricing-changes-for-github-actions/

https://redd.it/1po8hj5
@r_devops
Pricing changes for GitHub Actions

On January 1, 2026, you will receive up to a 39% reduction in the net price of GitHub-hosted runners.
On March 1, 2026, we are introducing a new $0.002 per-minute GitHub Actions cloud platform charge that will apply to self-hosted runner usage. Any usage subject to this charge will count toward the minutes included in your plan.

"Please note the price for runner usage in public repositories will remain free, and there will be no changes in price structure for GitHub Enterprise Server customers"

source: https://resources.github.com/actions/2026-pricing-changes-for-github-actions/


p.s their email states 96% of users will see a cost reduction, but the actual extended link says 15%...make your own conclusions...

https://redd.it/1po92bm
@r_devops
Working for a company where people maybe don’t have that much tech knowledge

I’m not sure because I haven’t started yet but it seems they may not be so knowledgeable about current technology but maybe I’m getting the wrong impression. I know for sure I’m the only one who knows the cloud we will be using.

What are the pros and cons of working in this kinda environment?

I’m excited for how much I can be involved in but a little nervous about how much might be on my plate right away and a potential lack of onboarding/time to understand the new environment I’m in. Any tips? Thank you!

https://redd.it/1poa3nv
@r_devops
All Pods memory for a service being utilised to max regardless of less traffic

Hi all,
We use kubernetes along with Jenkins for CI.
We have a service that currently has 4 pods running and for that service it has always had its memory utilised to max capacity (the k8s resource website literally shows the memory utilisation as red marks for the pod).
I have to analyse what the main cause for this is and resolve it.

Can you please help me out here explaining how I can at least get to know the root cause of this issue?

https://redd.it/1poaov2
@r_devops
People who do on-call: assuming no MDM, do you prefer 2 separate phones, on 2 eSIMs installed into your personal phone? Why?

Assuming no MDM is required, when you’re on-call, do you prefer to have 2 physically separate phones, or a 2nd SIM/eSIM installed into your personal phone?

EDIT: meant to say “or 2 eSIMs” instead of “on”.

https://redd.it/1po97bh
@r_devops
Amazon confirms a Russian GRU unit hacked Western energy and infrastructure networks for years

Amazon confirms a Russian GRU unit hacked Western energy and infrastructure networks for years.

The threat wasn’t malware, it was silent credential theft from live traffic.

From 2021-2025, APT44 relied less on zero-days and more on exposed routers and VPN gateways

source: https://thehackernews.com/2025/12/amazon-exposes-years-long-gru-cyber.html

https://redd.it/1po8v1p
@r_devops
What’s the hardest thing to actually “see”/observe in your system, and what incident misled you the most?

TL;DR: Curious about two things: what feels basically invisible in your system even though you have monitoring, and what is the most misleading incident you have dealt with.

1. What is the hardest thing to actually see in your system today?

I do not mean “we forgot to add a metric.” I mean the things that stay fuzzy even when you are staring at all the graphs. Maybe it is concurrency weirdness that only shows up under load. Maybe it is figuring out what really changed when you have multiple deploy paths and config surfaces. Maybe it is hidden dependencies that only show up when they are on fire. For you, what is that blind spot that always makes incidents messier than they should be?

2. What is the most misleading incident you have worked?

I love the stories where all the symptoms pointed at the wrong thing. CPU looked bad but the real issue was a retry storm. Latency screamed “network” but it was actually cache. Everyone blamed the database and it turned out to be some tiny config or feature flag. You know, the “we debugged the wrong thing for three hours and only then saw it” moments.

For me it is that “what actually changed” question. I have been in situations where everyone swore nothing changed, and then three tools later we find some “small” config tweak or background job rollout that no one thought counted as a real change. On paper everything was monitored. In reality we were just poking around until someone tripped over the real diff.

That experience is what made me curious about how people actually reason during incidents, not just which tool they use.



https://redd.it/1pocyvd
@r_devops
What’s the best way to practice DevOps tools? I built something for beginners + need your thoughts

A lot of people entering DevOps keep asking the same question:
“Where can I practice CI/CD, Kubernetes, Terraform, etc. without paying for a bootcamp?”

Instead of repeating answers, I ended up building a small learning hub that has:

Free DevOps tutorials blogs
Hands-on practice challenges
Simple explanations of complex tools
Mini projects for beginners

If any of you are willing to take a look and tell me what’s good/bad/missing, I’d appreciate it:
**https://thedevopsworld.com**

Not selling anything — just trying to make a genuinely useful practice resource for newcomers to our field.
it will always remain free and with no intentions of making money.

Would love your suggestions on features, topics, or improvements, if you already tried!
future updates
We will be adding community mentoring feature
We have signed a collaboration with agentic ai for cloud deployment company to provide playground for our super.

#please don't sell anything or anyone's paid service, we respect you but the community runs on different funding model and non of it comes from users.

https://redd.it/1pog8iw
@r_devops
why is devops so hard😩

backend developer here trying to learn devops. is it just me who feels it is complex to understand devops as a beginner? isn't there an easy way to do this?

https://redd.it/1pooror
@r_devops
From C++ Terminal Tetris to Kubernetes and AI: My open source journey (60k+ stars total)

I have been writing code for many years. Recently, I looked back at my GitHub profile. The projects I led have accumulated over 60,000 stars.

I wanted to share my path and some thoughts.

The Journey

* In College: I started with C++. I wrote a Tetris game that runs entirely in the terminal. I had to handle cursor movement and color erasing manually. It was raw but fun. (Repo: `fanux/tetris`)
* Early Career: I switched to Go. I wrote lhttp, a websocket framework. (Repo: `fanux/lhttp`)
* Infrastructure Era: Later, I focused on Kubernetes. I built Sealos, a Kubernetes distribution. This was my first big project. (Repo: `labring/sealos`)
* Startup Founder: Then I started my own company. We built Laf (serverless) and FastGPT (AI knowledge base). (Repo: `labring/laf` and `labring/FastGPT`)
* Now: I am building Fulling, an AI coding tool. (Repo: `FullAgent/fulling`)

My Thoughts

Even though I am a CEO now, I still insist on doing open source. Here is what I learned:

1. The Drive: Open source is fun. Creating value for the developer community is my internal drive. It is the only reason I can keep doing this for so long.
2. The Challenge: Just pushing code to GitHub is meaningless. The hardest part is the start. You have to accumulate early users one by one. Promoting a project is a very long-term process.
3. No Shortcuts: After all these years, I still haven't found a shortcut. To make a project successful, I still have to do the "dumb" work: writing blogs, creating content, and explaining the value.

The Struggle

Honestly, it is sometimes painful. Every time I start a new project (like the current one), it feels like starting from zero. I often feel lonely because I have to do the promotion myself.

Writing code makes me happy and fulfilled. But writing code that no one uses makes me sad. So I have to force myself to do marketing, which I am not naturally good at. It is a conflict.

How do you balance the joy of coding with the pain of promotion?

https://redd.it/1poq7dn
@r_devops