Reddit DevOps – Telegram
Is the real production was scenarios and trainings? Has anyone brought this?

i came across this training from linkedin, they are teaching real production war scenarios, it says "Master production-grade tools, fire-drill scenarios, and cross-cloud architectures. Every skill here is forged through real outages, real deployments, and real engineering war rooms. " https://elite.infrathrone.xyz/

Has anyone have idea about it? how is it?

https://redd.it/1p08yh8
@r_devops
Can you really automate QA testing without headcount or is everyone just lying?

serious question because i'm tired of the linkedin hype. Every other post is someone claiming they "automated 90% of QA" and "eliminated manual testing" but then you talk to them and they still have a QA team.

Here's my situation, we have 3 QA engineers for a team of 25 devs, they're constantly underwater and we keep getting bugs in production anyway and Leadership wants to "automate QA" instead of hiring more people but i'm skeptical this is actually possible, feels like one of those things that works in theory but not in practice.

I've seen test automation frameworks, we use some already, but they still need someone to write and maintain the tests and they don't catch the weird edge cases that a human would. Plus our integration tests are flaky as hell and take forever to run.

So what's the reality here? Can you actually reduce headcount with automation or is it just shifting the work around? And if you did pull this off, what did you use? Not interested in solutions that require hiring a separate automation team, that defeats the whole point.

https://redd.it/1p0a727
@r_devops
Is the internet really decentralized, or just fragile?

Most people don’t realize this: the internet they think is distributed is actually held together by a handful of infrastructure chokepoints. Cloudflare sneezes, and half the web catches a fever. We’ve built our digital world on a fragile stack of AWS, Cloudflare, Google Cloud, and a few telcos.

When one fails, everything collapses like dominoes. The internet wasn’t supposed to be this vulnerable.

Edit: By “Internet” I meant what regular users experience daily the apps, websites, payments, and services they rely on.

https://redd.it/1p0bcub
@r_devops
Do you have backup plan in case your provider going down?

Currently I see issue with cloaudflare for almost 45 minutes, I didn't prepare any plan in this case and I cant move my dns. Because namecheap also down. How to prepare to such cases?

https://redd.it/1p0b4tf
@r_devops
a few weeks back dockerhub was done, along with abunch of others- now cloudflare

can someone, senior please, tell us, wtf is going on lately?

how's this happening. this sounds like a devops problem, but it could be IT physical problem as well- data center fails.


any info about these outages?


as an up and coming devops, i would like to be ready for anything, and this is interesting to me...since there are always surprises in this field it seems.



https://redd.it/1p0aa5g
@r_devops
Datadog? Eval

Hello! I’m interviewing for a role at DataDog and want to get some candid feedback on their product. If you use it in any capacity it’d be great to hear the good, bad, and ugly. How are you using it? How has it impacted your day to day or overall strategy? What are the downfalls? I know there are already threads in here but I want to be sure I get any feedback on new feature launches or recent changes. Thanks in advance!

https://redd.it/1p0ffnz
@r_devops
Curious About Internal Workflows During Massive Outages

With the current Cloudflare outage going on, I’ve been wondering what the internal workflow looks like inside large tech companies during incidents of this scale.

How do different teams coordinate when something huge breaks?

Do SRE/DevOps/Network teams all jump in at once or does it follow a strict escalation path?
And how is communication handled across so many teams and time zones?

https://redd.it/1p0bsur
@r_devops
Trying to transition to Devops

Hi all, pretty new here and was hoping on some advice.

Context: By trade I’m currently a civil design engineer was my uni background also being in civil engineering. I’ve been doing it for about 2 years now.

Recently I’ve been really interested in devops and I’m determined to transition my career. I started by learning python and I’m pretty confident as an intermediate level. I’ve also done my first azure certification (AZ-900) to get my fundamentals knowledge right. I have also done some fundamentals in network and I’m pretty confident with my understanding of the osi layers. I’m currently working on getting my admin associate certification (AZ-104). My plan is to the learn terraform afterwards as well as azure devops or GitHub actions (leaning towards GitHub actions). I’m learning powershell slowly on the side right now too.

Outside of my core learning I’ve done some high level research on containerzation and orchestration too knowing I’ll have to focus of those when the time comes.

Just wanted to get thoughts from people that already do it and steer on what would help, thanks.

https://redd.it/1p0mx4b
@r_devops
Is there anything new to learn in 2025?

Aside Kubernetes and Terraform, is there anything to learn as a software developer or DevOps engineer? What would you suggest and why?

https://redd.it/1p0sfmj
@r_devops
hello devops fam, I just passed my AWS SAA and wanna go straight to learning devops

hello fam I just pass my AWS course any recommended yt channel, courses, udemy, etc that you'd recommend to learn about devops? Any recommendations are greatly appreciated

https://redd.it/1p0t19c
@r_devops
What a day...

I spent the last 3 weeks working on a project management pipeline that was heavy in GitHub actions and was set to demo it today in a huge meeting in-front of all of the project managers and developers and started the demo at 3:30 EST this afternoon.

I started off at the user creation command line and created a new user, switched to them and ran a custom SSH and GitHub config wizard I wrote which abstracted away the burdens of dealing with configuring those for PMs.

It worked flawlessly. It ran the check, verified everything was good, pulled repos. It was golden.

I went further into the systems and went to have it send some project management files into a branch to be picked up by CI....

Suddenly git was broken, I was flabberghasted.

It was 3:40, GitHub was down. I sat there like an iditot fudging it for 10 minutes until the meeting moved to another presentation....

It was devastating....

What a day fellas (fellettes), what a day...

https://redd.it/1p0vx27
@r_devops
When does Policy-as-Code become "The Slow Lane" for developers?

Hey r/devops,

I'm working on scaling up our internal developer platform (IDP) and one of the biggest points of friction is how we enforce DevSecOps and compliance policies without killing our velocity. We're trying to shift left, but it feels like we've just shifted all the pipeline friction right onto the developer's lap.

We moved from a few post-merge human approval tollgates to an aggressive Policy-as-Code strategy using tools like Open Policy Agent (OPA) with Rego on every pull request (PR).

The result? Our security posture is fantastic. Our IaC drift is near zero. But our average PR time is up 25%, and the team is starting to view the pipeline as an adversary, not an enabler.

The checks are running: SAST, SCA, Terrascan, custom checks for naming conventions, and resource tagging compliance. All before merge. The problem is that a failed low-severity SAST finding can hold up a critical patch that has a clean functional change.

My burning question to the community:

How are you balancing the enforcement of non-critical-but-mandatory policies (like resource tagging or specific naming conventions) in the pipeline?

1. Do you have an explicit 'fail fast/fail hard' policy only for critical security issues, and let minor compliance issues run through the main pipeline, alerting to a dashboard for follow-up? (i.e., making them blocking in pre-prod but non-blocking in the main CI?)
2. Are you using a separate, performance-optimized "compliance-only" pipeline that runs less frequently, thereby unblocking the core CI/CD flow?

I’m looking for actual tooling or architectural patterns that allow for selective blocking that doesn't rely on us writing custom logic in every single Jenkinsfile/GitHub Action workflow.




https://redd.it/1p0z7dp
@r_devops
Feedback Antigravity IDE for DevOps: Any feedback on integrations & automation?

Anyone tried using Antigravity by Google for DevOps workflows? I noticed the AI can suggest fixes/refactors and the IDE supports agent-like automation (e.g., review agent, code agent). Integration with Gemini 3 and VS Code style interface helped me resurrect a legacy web app.



\- Anyone tested Chrome extension/API or CI/CD integrations?

\- How's the support for Docker, containerized dev flows, pipelines?

\- Is the multi-agent system practical for DevOps use cases?

https://redd.it/1p10asi
@r_devops
I do not know what is going wrong and I am desperate for help. I cannot build an EKS Cluster for whatever reason and I cannot figure it out.

Hello,

I'm attempting to get into DevOps, and I'm trying to build a personal project as a way to learn and understand DevOps stuff.

My goal is to build an EKS cluster via Terraform, set up a prod and dev environment, and then slap in a dumb little website and load balance it.

I have followed EVERY TUTORIAL I COULD FIND and every single time, they give me code. I either download their code or set it up EXACTLY as they do (including the tutorial from Terraform themselves!) and for whatever reason, my ec2 instances NEVER JOIN AS NODES. It always always ALWAYS gives me the issue type of NodeCreationFailure.

I discovered that if I add the vpc-cni addon to the cluster, suddenly it works and everything is happy. So I thought maybe all I have to do in Terraform is specify that it should add the vpc-cni add-on before compute is built in the cluster and it solves everything.

BUT THEN I RAN INTO A NEW PROBLEM. The vpc-cni add-on ALWAYS finds conflicts, even on a new cluster, and will not install. I have tried every single thing I can try in Terraform to make it so that it will run with OVERRIDE on the conflicts, but it is not working. No matter which way I do it, I cannot set it to override, and therefore the vpc-cni addon can never be added to the cluster via Terraform.

I do not know what else I can do. I have tried everything and looked at every possible resource. This is driving me absolutely insane because I cannot find anything anywhere that solves my problem.

Please, if you know how to fix this, or at the very least, if you know how to help me troubleshoot this, please help me. I just want to get this project working so I can get experience. This is the first step and I'm already failing.

https://redd.it/1p11mnv
@r_devops
Is there a way to create jobs that I can trigger with certain parameters in Github Actions?

I've used Jenkins for a while, and sometimes other teams we worked with needed to e.g. onboard a client, and we created a Jenkins job that takes parameters (relating to their details) and runs a certain number of tasks for them to automate the onboarding process.

Is such a thing possible in Github Actions?

I'm thinking of things such as, lets say I want to hook up two VPCs, I just go to the job, I input the ID and CIDR range of VPC 1 and ID and CIDR range of VPC 2, and it automatically makes the API calls to create a Peering Connection between the two and updates their respective tables.


Or I want to whitelist a clients IP in our AWS WAF, so you input the parameter, and it runs the job. As far as I can see, there is no way to feed a parameter into a job in Github Actions?

Any advice would be much appreciated.

https://redd.it/1p131v1
@r_devops
Wrote a blog about things to focus on when starting a new DevEx role

Hey everyone! I've been working in the platform engineering/devex space for about 3 years now. Based on what I've heard from the community and my own experiences I put together a guide of things to focus on in the first 30 days of starting a new role. Hope this helps!

Read here: https://metalbear.com/blog/devex-engineer/

https://redd.it/1p131e9
@r_devops