Is the real production was scenarios and trainings? Has anyone brought this?
i came across this training from linkedin, they are teaching real production war scenarios, it says "Master production-grade tools, fire-drill scenarios, and cross-cloud architectures. Every skill here is forged through real outages, real deployments, and real engineering war rooms. " https://elite.infrathrone.xyz/
Has anyone have idea about it? how is it?
https://redd.it/1p08yh8
@r_devops
i came across this training from linkedin, they are teaching real production war scenarios, it says "Master production-grade tools, fire-drill scenarios, and cross-cloud architectures. Every skill here is forged through real outages, real deployments, and real engineering war rooms. " https://elite.infrathrone.xyz/
Has anyone have idea about it? how is it?
https://redd.it/1p08yh8
@r_devops
Infrathrone
Infrathrone: The DevOps War Room
Where Prod Goes Down, and You Rise Up. Elite DevOps & SRE training with real-world production simulations.
Can you really automate QA testing without headcount or is everyone just lying?
serious question because i'm tired of the linkedin hype. Every other post is someone claiming they "automated 90% of QA" and "eliminated manual testing" but then you talk to them and they still have a QA team.
Here's my situation, we have 3 QA engineers for a team of 25 devs, they're constantly underwater and we keep getting bugs in production anyway and Leadership wants to "automate QA" instead of hiring more people but i'm skeptical this is actually possible, feels like one of those things that works in theory but not in practice.
I've seen test automation frameworks, we use some already, but they still need someone to write and maintain the tests and they don't catch the weird edge cases that a human would. Plus our integration tests are flaky as hell and take forever to run.
So what's the reality here? Can you actually reduce headcount with automation or is it just shifting the work around? And if you did pull this off, what did you use? Not interested in solutions that require hiring a separate automation team, that defeats the whole point.
https://redd.it/1p0a727
@r_devops
serious question because i'm tired of the linkedin hype. Every other post is someone claiming they "automated 90% of QA" and "eliminated manual testing" but then you talk to them and they still have a QA team.
Here's my situation, we have 3 QA engineers for a team of 25 devs, they're constantly underwater and we keep getting bugs in production anyway and Leadership wants to "automate QA" instead of hiring more people but i'm skeptical this is actually possible, feels like one of those things that works in theory but not in practice.
I've seen test automation frameworks, we use some already, but they still need someone to write and maintain the tests and they don't catch the weird edge cases that a human would. Plus our integration tests are flaky as hell and take forever to run.
So what's the reality here? Can you actually reduce headcount with automation or is it just shifting the work around? And if you did pull this off, what did you use? Not interested in solutions that require hiring a separate automation team, that defeats the whole point.
https://redd.it/1p0a727
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is the internet really decentralized, or just fragile?
Most people don’t realize this: the internet they think is distributed is actually held together by a handful of infrastructure chokepoints. Cloudflare sneezes, and half the web catches a fever. We’ve built our digital world on a fragile stack of AWS, Cloudflare, Google Cloud, and a few telcos.
When one fails, everything collapses like dominoes. The internet wasn’t supposed to be this vulnerable.
Edit: By “Internet” I meant what regular users experience daily the apps, websites, payments, and services they rely on.
https://redd.it/1p0bcub
@r_devops
Most people don’t realize this: the internet they think is distributed is actually held together by a handful of infrastructure chokepoints. Cloudflare sneezes, and half the web catches a fever. We’ve built our digital world on a fragile stack of AWS, Cloudflare, Google Cloud, and a few telcos.
When one fails, everything collapses like dominoes. The internet wasn’t supposed to be this vulnerable.
Edit: By “Internet” I meant what regular users experience daily the apps, websites, payments, and services they rely on.
https://redd.it/1p0bcub
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Cloudflare outage
Well you all probably know about this, but for those that doesn’t
https://www.techradar.com/pro/live/a-cloudflare-outage-is-taking-down-parts-of-the-internet
https://redd.it/1p0bv5p
@r_devops
Well you all probably know about this, but for those that doesn’t
https://www.techradar.com/pro/live/a-cloudflare-outage-is-taking-down-parts-of-the-internet
https://redd.it/1p0bv5p
@r_devops
TechRadar
A major Cloudflare outage took down large parts of the internet - X, ChatGPT and more were affected, but all recovered now
Cloudflare issues fixed following major outage
Do you have backup plan in case your provider going down?
Currently I see issue with cloaudflare for almost 45 minutes, I didn't prepare any plan in this case and I cant move my dns. Because namecheap also down. How to prepare to such cases?
https://redd.it/1p0b4tf
@r_devops
Currently I see issue with cloaudflare for almost 45 minutes, I didn't prepare any plan in this case and I cant move my dns. Because namecheap also down. How to prepare to such cases?
https://redd.it/1p0b4tf
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
a few weeks back dockerhub was done, along with abunch of others- now cloudflare
can someone, senior please, tell us, wtf is going on lately?
how's this happening. this sounds like a devops problem, but it could be IT physical problem as well- data center fails.
any info about these outages?
as an up and coming devops, i would like to be ready for anything, and this is interesting to me...since there are always surprises in this field it seems.
https://redd.it/1p0aa5g
@r_devops
can someone, senior please, tell us, wtf is going on lately?
how's this happening. this sounds like a devops problem, but it could be IT physical problem as well- data center fails.
any info about these outages?
as an up and coming devops, i would like to be ready for anything, and this is interesting to me...since there are always surprises in this field it seems.
https://redd.it/1p0aa5g
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Datadog? Eval
Hello! I’m interviewing for a role at DataDog and want to get some candid feedback on their product. If you use it in any capacity it’d be great to hear the good, bad, and ugly. How are you using it? How has it impacted your day to day or overall strategy? What are the downfalls? I know there are already threads in here but I want to be sure I get any feedback on new feature launches or recent changes. Thanks in advance!
https://redd.it/1p0ffnz
@r_devops
Hello! I’m interviewing for a role at DataDog and want to get some candid feedback on their product. If you use it in any capacity it’d be great to hear the good, bad, and ugly. How are you using it? How has it impacted your day to day or overall strategy? What are the downfalls? I know there are already threads in here but I want to be sure I get any feedback on new feature launches or recent changes. Thanks in advance!
https://redd.it/1p0ffnz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Curious About Internal Workflows During Massive Outages
With the current Cloudflare outage going on, I’ve been wondering what the internal workflow looks like inside large tech companies during incidents of this scale.
How do different teams coordinate when something huge breaks?
Do SRE/DevOps/Network teams all jump in at once or does it follow a strict escalation path?
And how is communication handled across so many teams and time zones?
https://redd.it/1p0bsur
@r_devops
With the current Cloudflare outage going on, I’ve been wondering what the internal workflow looks like inside large tech companies during incidents of this scale.
How do different teams coordinate when something huge breaks?
Do SRE/DevOps/Network teams all jump in at once or does it follow a strict escalation path?
And how is communication handled across so many teams and time zones?
https://redd.it/1p0bsur
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Github is down!
Anyone have anymore information? https://www.githubstatus.com/
https://redd.it/1p0o9zb
@r_devops
Anyone have anymore information? https://www.githubstatus.com/
https://redd.it/1p0o9zb
@r_devops
Githubstatus
GitHub Status
Welcome to GitHub's home for real-time and historical data on system performance.
Trying to transition to Devops
Hi all, pretty new here and was hoping on some advice.
Context: By trade I’m currently a civil design engineer was my uni background also being in civil engineering. I’ve been doing it for about 2 years now.
Recently I’ve been really interested in devops and I’m determined to transition my career. I started by learning python and I’m pretty confident as an intermediate level. I’ve also done my first azure certification (AZ-900) to get my fundamentals knowledge right. I have also done some fundamentals in network and I’m pretty confident with my understanding of the osi layers. I’m currently working on getting my admin associate certification (AZ-104). My plan is to the learn terraform afterwards as well as azure devops or GitHub actions (leaning towards GitHub actions). I’m learning powershell slowly on the side right now too.
Outside of my core learning I’ve done some high level research on containerzation and orchestration too knowing I’ll have to focus of those when the time comes.
Just wanted to get thoughts from people that already do it and steer on what would help, thanks.
https://redd.it/1p0mx4b
@r_devops
Hi all, pretty new here and was hoping on some advice.
Context: By trade I’m currently a civil design engineer was my uni background also being in civil engineering. I’ve been doing it for about 2 years now.
Recently I’ve been really interested in devops and I’m determined to transition my career. I started by learning python and I’m pretty confident as an intermediate level. I’ve also done my first azure certification (AZ-900) to get my fundamentals knowledge right. I have also done some fundamentals in network and I’m pretty confident with my understanding of the osi layers. I’m currently working on getting my admin associate certification (AZ-104). My plan is to the learn terraform afterwards as well as azure devops or GitHub actions (leaning towards GitHub actions). I’m learning powershell slowly on the side right now too.
Outside of my core learning I’ve done some high level research on containerzation and orchestration too knowing I’ll have to focus of those when the time comes.
Just wanted to get thoughts from people that already do it and steer on what would help, thanks.
https://redd.it/1p0mx4b
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Ai and Cloud service perception survey for University (Anonymous)
Hello! If any of you lovely people have a couple minutes spare could you please do my survey, its for a marketing campaign I'm making at University. Cheers! https://forms.gle/Gmr4hqbnvRq6LxQz9
https://redd.it/1p0qgka
@r_devops
Hello! If any of you lovely people have a couple minutes spare could you please do my survey, its for a marketing campaign I'm making at University. Cheers! https://forms.gle/Gmr4hqbnvRq6LxQz9
https://redd.it/1p0qgka
@r_devops
Google Docs
Cloud service and AI perception survey
This survey explores how professionals perceive and use cloud and AI tools, with a focus on Google Cloud and Google Gemini. Your responses are anonymous and will help identify real user needs, barriers, and expectations.
Is there anything new to learn in 2025?
Aside Kubernetes and Terraform, is there anything to learn as a software developer or DevOps engineer? What would you suggest and why?
https://redd.it/1p0sfmj
@r_devops
Aside Kubernetes and Terraform, is there anything to learn as a software developer or DevOps engineer? What would you suggest and why?
https://redd.it/1p0sfmj
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
hello devops fam, I just passed my AWS SAA and wanna go straight to learning devops
hello fam I just pass my AWS course any recommended yt channel, courses, udemy, etc that you'd recommend to learn about devops? Any recommendations are greatly appreciated
https://redd.it/1p0t19c
@r_devops
hello fam I just pass my AWS course any recommended yt channel, courses, udemy, etc that you'd recommend to learn about devops? Any recommendations are greatly appreciated
https://redd.it/1p0t19c
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What a day...
I spent the last 3 weeks working on a project management pipeline that was heavy in GitHub actions and was set to demo it today in a huge meeting in-front of all of the project managers and developers and started the demo at 3:30 EST this afternoon.
I started off at the user creation command line and created a new user, switched to them and ran a custom SSH and GitHub config wizard I wrote which abstracted away the burdens of dealing with configuring those for PMs.
It worked flawlessly. It ran the check, verified everything was good, pulled repos. It was golden.
I went further into the systems and went to have it send some project management files into a branch to be picked up by CI....
Suddenly git was broken, I was flabberghasted.
It was 3:40, GitHub was down. I sat there like an iditot fudging it for 10 minutes until the meeting moved to another presentation....
It was devastating....
What a day fellas (fellettes), what a day...
https://redd.it/1p0vx27
@r_devops
I spent the last 3 weeks working on a project management pipeline that was heavy in GitHub actions and was set to demo it today in a huge meeting in-front of all of the project managers and developers and started the demo at 3:30 EST this afternoon.
I started off at the user creation command line and created a new user, switched to them and ran a custom SSH and GitHub config wizard I wrote which abstracted away the burdens of dealing with configuring those for PMs.
It worked flawlessly. It ran the check, verified everything was good, pulled repos. It was golden.
I went further into the systems and went to have it send some project management files into a branch to be picked up by CI....
Suddenly git was broken, I was flabberghasted.
It was 3:40, GitHub was down. I sat there like an iditot fudging it for 10 minutes until the meeting moved to another presentation....
It was devastating....
What a day fellas (fellettes), what a day...
https://redd.it/1p0vx27
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
JWT Algorithm Confusion: Turning RS256 Tokens into HS256 Disasters 🔄
https://instatunnel.my/blog/jwt-algorithm-confusion-turning-rs256-tokens-into-hs256-disasters
https://redd.it/1p0wy52
@r_devops
https://instatunnel.my/blog/jwt-algorithm-confusion-turning-rs256-tokens-into-hs256-disasters
https://redd.it/1p0wy52
@r_devops
InstaTunnel
JWT Algorithm Confusion: Exploiting RS256 to HS256 Token
Discover how JWT algorithm confusion lets attackers forge tokens by switching RS256 to HS256, using public keys as HMAC secrets. Learn real-world exploits
When does Policy-as-Code become "The Slow Lane" for developers?
Hey r/devops,
I'm working on scaling up our internal developer platform (IDP) and one of the biggest points of friction is how we enforce DevSecOps and compliance policies without killing our velocity. We're trying to shift left, but it feels like we've just shifted all the pipeline friction right onto the developer's lap.
We moved from a few post-merge human approval tollgates to an aggressive Policy-as-Code strategy using tools like Open Policy Agent (OPA) with Rego on every pull request (PR).
The result? Our security posture is fantastic. Our IaC drift is near zero. But our average PR time is up 25%, and the team is starting to view the pipeline as an adversary, not an enabler.
The checks are running: SAST, SCA, Terrascan, custom checks for naming conventions, and resource tagging compliance. All before merge. The problem is that a failed low-severity SAST finding can hold up a critical patch that has a clean functional change.
My burning question to the community:
How are you balancing the enforcement of non-critical-but-mandatory policies (like resource tagging or specific naming conventions) in the pipeline?
1. Do you have an explicit 'fail fast/fail hard' policy only for critical security issues, and let minor compliance issues run through the main pipeline, alerting to a dashboard for follow-up? (i.e., making them blocking in pre-prod but non-blocking in the main CI?)
2. Are you using a separate, performance-optimized "compliance-only" pipeline that runs less frequently, thereby unblocking the core CI/CD flow?
I’m looking for actual tooling or architectural patterns that allow for selective blocking that doesn't rely on us writing custom logic in every single Jenkinsfile/GitHub Action workflow.
https://redd.it/1p0z7dp
@r_devops
Hey r/devops,
I'm working on scaling up our internal developer platform (IDP) and one of the biggest points of friction is how we enforce DevSecOps and compliance policies without killing our velocity. We're trying to shift left, but it feels like we've just shifted all the pipeline friction right onto the developer's lap.
We moved from a few post-merge human approval tollgates to an aggressive Policy-as-Code strategy using tools like Open Policy Agent (OPA) with Rego on every pull request (PR).
The result? Our security posture is fantastic. Our IaC drift is near zero. But our average PR time is up 25%, and the team is starting to view the pipeline as an adversary, not an enabler.
The checks are running: SAST, SCA, Terrascan, custom checks for naming conventions, and resource tagging compliance. All before merge. The problem is that a failed low-severity SAST finding can hold up a critical patch that has a clean functional change.
My burning question to the community:
How are you balancing the enforcement of non-critical-but-mandatory policies (like resource tagging or specific naming conventions) in the pipeline?
1. Do you have an explicit 'fail fast/fail hard' policy only for critical security issues, and let minor compliance issues run through the main pipeline, alerting to a dashboard for follow-up? (i.e., making them blocking in pre-prod but non-blocking in the main CI?)
2. Are you using a separate, performance-optimized "compliance-only" pipeline that runs less frequently, thereby unblocking the core CI/CD flow?
I’m looking for actual tooling or architectural patterns that allow for selective blocking that doesn't rely on us writing custom logic in every single Jenkinsfile/GitHub Action workflow.
https://redd.it/1p0z7dp
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Feedback Antigravity IDE for DevOps: Any feedback on integrations & automation?
Anyone tried using Antigravity by Google for DevOps workflows? I noticed the AI can suggest fixes/refactors and the IDE supports agent-like automation (e.g., review agent, code agent). Integration with Gemini 3 and VS Code style interface helped me resurrect a legacy web app.
\- Anyone tested Chrome extension/API or CI/CD integrations?
\- How's the support for Docker, containerized dev flows, pipelines?
\- Is the multi-agent system practical for DevOps use cases?
https://redd.it/1p10asi
@r_devops
Anyone tried using Antigravity by Google for DevOps workflows? I noticed the AI can suggest fixes/refactors and the IDE supports agent-like automation (e.g., review agent, code agent). Integration with Gemini 3 and VS Code style interface helped me resurrect a legacy web app.
\- Anyone tested Chrome extension/API or CI/CD integrations?
\- How's the support for Docker, containerized dev flows, pipelines?
\- Is the multi-agent system practical for DevOps use cases?
https://redd.it/1p10asi
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I do not know what is going wrong and I am desperate for help. I cannot build an EKS Cluster for whatever reason and I cannot figure it out.
Hello,
I'm attempting to get into DevOps, and I'm trying to build a personal project as a way to learn and understand DevOps stuff.
My goal is to build an EKS cluster via Terraform, set up a prod and dev environment, and then slap in a dumb little website and load balance it.
I have followed EVERY TUTORIAL I COULD FIND and every single time, they give me code. I either download their code or set it up EXACTLY as they do (including the tutorial from Terraform themselves!) and for whatever reason, my ec2 instances NEVER JOIN AS NODES. It always always ALWAYS gives me the issue type of NodeCreationFailure.
I discovered that if I add the vpc-cni addon to the cluster, suddenly it works and everything is happy. So I thought maybe all I have to do in Terraform is specify that it should add the vpc-cni add-on before compute is built in the cluster and it solves everything.
BUT THEN I RAN INTO A NEW PROBLEM. The vpc-cni add-on ALWAYS finds conflicts, even on a new cluster, and will not install. I have tried every single thing I can try in Terraform to make it so that it will run with OVERRIDE on the conflicts, but it is not working. No matter which way I do it, I cannot set it to override, and therefore the vpc-cni addon can never be added to the cluster via Terraform.
I do not know what else I can do. I have tried everything and looked at every possible resource. This is driving me absolutely insane because I cannot find anything anywhere that solves my problem.
Please, if you know how to fix this, or at the very least, if you know how to help me troubleshoot this, please help me. I just want to get this project working so I can get experience. This is the first step and I'm already failing.
https://redd.it/1p11mnv
@r_devops
Hello,
I'm attempting to get into DevOps, and I'm trying to build a personal project as a way to learn and understand DevOps stuff.
My goal is to build an EKS cluster via Terraform, set up a prod and dev environment, and then slap in a dumb little website and load balance it.
I have followed EVERY TUTORIAL I COULD FIND and every single time, they give me code. I either download their code or set it up EXACTLY as they do (including the tutorial from Terraform themselves!) and for whatever reason, my ec2 instances NEVER JOIN AS NODES. It always always ALWAYS gives me the issue type of NodeCreationFailure.
I discovered that if I add the vpc-cni addon to the cluster, suddenly it works and everything is happy. So I thought maybe all I have to do in Terraform is specify that it should add the vpc-cni add-on before compute is built in the cluster and it solves everything.
BUT THEN I RAN INTO A NEW PROBLEM. The vpc-cni add-on ALWAYS finds conflicts, even on a new cluster, and will not install. I have tried every single thing I can try in Terraform to make it so that it will run with OVERRIDE on the conflicts, but it is not working. No matter which way I do it, I cannot set it to override, and therefore the vpc-cni addon can never be added to the cluster via Terraform.
I do not know what else I can do. I have tried everything and looked at every possible resource. This is driving me absolutely insane because I cannot find anything anywhere that solves my problem.
Please, if you know how to fix this, or at the very least, if you know how to help me troubleshoot this, please help me. I just want to get this project working so I can get experience. This is the first step and I'm already failing.
https://redd.it/1p11mnv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is there a way to create jobs that I can trigger with certain parameters in Github Actions?
I've used Jenkins for a while, and sometimes other teams we worked with needed to e.g. onboard a client, and we created a Jenkins job that takes parameters (relating to their details) and runs a certain number of tasks for them to automate the onboarding process.
Is such a thing possible in Github Actions?
I'm thinking of things such as, lets say I want to hook up two VPCs, I just go to the job, I input the ID and CIDR range of VPC 1 and ID and CIDR range of VPC 2, and it automatically makes the API calls to create a Peering Connection between the two and updates their respective tables.
Or I want to whitelist a clients IP in our AWS WAF, so you input the parameter, and it runs the job. As far as I can see, there is no way to feed a parameter into a job in Github Actions?
Any advice would be much appreciated.
https://redd.it/1p131v1
@r_devops
I've used Jenkins for a while, and sometimes other teams we worked with needed to e.g. onboard a client, and we created a Jenkins job that takes parameters (relating to their details) and runs a certain number of tasks for them to automate the onboarding process.
Is such a thing possible in Github Actions?
I'm thinking of things such as, lets say I want to hook up two VPCs, I just go to the job, I input the ID and CIDR range of VPC 1 and ID and CIDR range of VPC 2, and it automatically makes the API calls to create a Peering Connection between the two and updates their respective tables.
Or I want to whitelist a clients IP in our AWS WAF, so you input the parameter, and it runs the job. As far as I can see, there is no way to feed a parameter into a job in Github Actions?
Any advice would be much appreciated.
https://redd.it/1p131v1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Wrote a blog about things to focus on when starting a new DevEx role
Hey everyone! I've been working in the platform engineering/devex space for about 3 years now. Based on what I've heard from the community and my own experiences I put together a guide of things to focus on in the first 30 days of starting a new role. Hope this helps!
Read here: https://metalbear.com/blog/devex-engineer/
https://redd.it/1p131e9
@r_devops
Hey everyone! I've been working in the platform engineering/devex space for about 3 years now. Based on what I've heard from the community and my own experiences I put together a guide of things to focus on in the first 30 days of starting a new role. Hope this helps!
Read here: https://metalbear.com/blog/devex-engineer/
https://redd.it/1p131e9
@r_devops
MetalBear 🐻
Your First 30 Days as a DevEx Engineer: What to Audit and Improve
A practical 30 day audit framework for new DevEx engineers to benchmark feedback loops, reduce context switching, and eliminate outdated rituals that slow teams down.
Quarkus with Buildpacks and OpenShift Builds
Howcto build images for Quarkus apps with Cloud Native Buildpacks locally and in OpenShift: https://piotrminkowski.com/2025/11/19/quarkus-with-buildpacks-and-openshift-builds/
https://redd.it/1p12iih
@r_devops
Howcto build images for Quarkus apps with Cloud Native Buildpacks locally and in OpenShift: https://piotrminkowski.com/2025/11/19/quarkus-with-buildpacks-and-openshift-builds/
https://redd.it/1p12iih
@r_devops
Piotr's TechBlog
Quarkus with Buildpacks and OpenShift Builds - Piotr's TechBlog
In this article, you will learn how to build Quarkus application images using Cloud Native Buildpacks and OpenShift Builds