How do u know a CloudFormation CHANGE won’t break something subtle?
You change one resource.
The stack deploys successfully.
Nothing errors.
But something downstream breaks.
How do you catch that before deploy?
Or do you just accept the risk?
Curious how people think about this in practice.
https://redd.it/1pzu7dl
@r_devops
You change one resource.
The stack deploys successfully.
Nothing errors.
But something downstream breaks.
How do you catch that before deploy?
Or do you just accept the risk?
Curious how people think about this in practice.
https://redd.it/1pzu7dl
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Does anyone here use rapidapi? Having issues making a payment
I'm trying to add my card to purchase a subnoscription yet my card keeps declining. So then I decide to use klarna as a loan payback option and it gets declined. Then I use affirm for loan payback and the loan was charged but the payment was blocked by rapidapi. The only possible conclusion why this happened is I was making api calls from my laptop while using hotspot so I don't know if rapidapi considered this a proxy and decided to block me from making payments?
https://redd.it/1q00zkz
@r_devops
I'm trying to add my card to purchase a subnoscription yet my card keeps declining. So then I decide to use klarna as a loan payback option and it gets declined. Then I use affirm for loan payback and the loan was charged but the payment was blocked by rapidapi. The only possible conclusion why this happened is I was making api calls from my laptop while using hotspot so I don't know if rapidapi considered this a proxy and decided to block me from making payments?
https://redd.it/1q00zkz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I built a browser extension for managing multiple AWS accounts
I wanted to share this browser extension I built a few days ago. I built it to solve my own problem while working with different clients’ AWS environments. My password manager was not very helpful, as it struggled to keep credentials organized in one place and quickly became messy.
So I decided to build a solution for myself, and I thought I would share it here in case others are dealing with a similar issue.
The extension is very simple and does the following:
Stores AWS accounts with nicknames and color coding
Displays a colored banner in the AWS console to identify the current account
Supports one click account switching
Provides keyboard shortcuts (Cmd or Ctrl + Shift + 1 to 5) for frequently used accounts
Allows importing accounts from CSV or `~/.aws/config`
Groups accounts by project or client
I have currently published it on the Firefox Store:
https://addons.mozilla.org/en-US/firefox/addon/aws-omniconsole/
The source code is also available on GitHub:
https://github.com/mraza007/aws-omni
https://redd.it/1q02rc4
@r_devops
I wanted to share this browser extension I built a few days ago. I built it to solve my own problem while working with different clients’ AWS environments. My password manager was not very helpful, as it struggled to keep credentials organized in one place and quickly became messy.
So I decided to build a solution for myself, and I thought I would share it here in case others are dealing with a similar issue.
The extension is very simple and does the following:
Stores AWS accounts with nicknames and color coding
Displays a colored banner in the AWS console to identify the current account
Supports one click account switching
Provides keyboard shortcuts (Cmd or Ctrl + Shift + 1 to 5) for frequently used accounts
Allows importing accounts from CSV or `~/.aws/config`
Groups accounts by project or client
I have currently published it on the Firefox Store:
https://addons.mozilla.org/en-US/firefox/addon/aws-omniconsole/
The source code is also available on GitHub:
https://github.com/mraza007/aws-omni
https://redd.it/1q02rc4
@r_devops
addons.mozilla.org
AWS OmniConsole – Get this Extension for 🦊 Firefox (en-US)
Download AWS OmniConsole for Firefox. Manage multiple AWS accounts in one place. Switch between consoles with a single click instead of juggling multiple browsers or password managers.
Is it just me or are some KodKloud course materials AI-generated?
Been using KodeKloud for a while now — love the hands-on labs and sandbox environments, they're genuinely useful for practical learning.
But I've started noticing some of the written course content has all the hallmarks of AI-generated text:
Forced analogies every other paragraph ("think of it like a VIP list...")
Formulaic transitions ("First things first," "Next up," "Time for a test run")
Repeated phrases/typos that suggest no human reviewed it ("violations and violations," "real-world world scenario")
Generic safety disclaimers at the end
Combined with other production issues I've noticed — choppy video edits, inconsistent audio quality, pixelated graphics, cropped screenshots cutting off text — it feels like they're prioritizing quantity over quality.
Anyone else noticing this? For what we pay, I'd expect better QA on the content. The practical stuff is solid but the courseware itself feels rushed.
EDIT: Typo in the noscript, oops, KodeKloud.
https://redd.it/1q04riy
@r_devops
Been using KodeKloud for a while now — love the hands-on labs and sandbox environments, they're genuinely useful for practical learning.
But I've started noticing some of the written course content has all the hallmarks of AI-generated text:
Forced analogies every other paragraph ("think of it like a VIP list...")
Formulaic transitions ("First things first," "Next up," "Time for a test run")
Repeated phrases/typos that suggest no human reviewed it ("violations and violations," "real-world world scenario")
Generic safety disclaimers at the end
Combined with other production issues I've noticed — choppy video edits, inconsistent audio quality, pixelated graphics, cropped screenshots cutting off text — it feels like they're prioritizing quantity over quality.
Anyone else noticing this? For what we pay, I'd expect better QA on the content. The practical stuff is solid but the courseware itself feels rushed.
EDIT: Typo in the noscript, oops, KodeKloud.
https://redd.it/1q04riy
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Artifactory nginx replacement
I am hosting Artifactory on EKS with nginx ingress controller for url rewrite. Since nginx ingress controller will be retired, what to use instead? First though is to use ALB because it now supports url rewrite. Any other options?
Please let me know your opinions and experience.
Thank you.
https://redd.it/1q071hx
@r_devops
I am hosting Artifactory on EKS with nginx ingress controller for url rewrite. Since nginx ingress controller will be retired, what to use instead? First though is to use ALB because it now supports url rewrite. Any other options?
Please let me know your opinions and experience.
Thank you.
https://redd.it/1q071hx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Stuck on the Java 8 / Spring Boot 2 upgrade. Do you need a "Map" or a "Driver"?
We are currently debating how to handle a massive legacy migration (Java/Spring) that has been postponing for years. The team is paralyzed because nobody knows the blast radius or the exact effort involved.
We are trying to validate what would actually unblock teams in this situation.
The Hypothetical Solution:
Imagine a "Risk Intelligence Service" where you grant read-access to the repo, and you get back a comprehensive Upgrade Strategy Report.
It identifies exactly what breaks, where the test gaps are, and provides a step-by-step migration plan (e.g., "Fix these 3 libs first, then upgrade module X").
My question to Engineering Managers / Tech Leads:
If you had budget ($3k-$10k range) to solve this headache, which option would you actually buy?
- Option A (The Map): "Just give us the deep-dive analysis and the plan. We have the devs, we just need to know exactly what to do so we don't waste weeks on research."
- Option B (The Driver): "I don't want a report. I want you to come in, do the grunt work (refactoring/upgrading), and hand me a clean PR."
- Option C (Status Quo): "We wouldn't pay for either. We just accept the pain and do it manually in-house."
Trying to figure out if the bottleneck is knowledge (risk assessment) or capacity (doing the work).
https://redd.it/1q07zt7
@r_devops
We are currently debating how to handle a massive legacy migration (Java/Spring) that has been postponing for years. The team is paralyzed because nobody knows the blast radius or the exact effort involved.
We are trying to validate what would actually unblock teams in this situation.
The Hypothetical Solution:
Imagine a "Risk Intelligence Service" where you grant read-access to the repo, and you get back a comprehensive Upgrade Strategy Report.
It identifies exactly what breaks, where the test gaps are, and provides a step-by-step migration plan (e.g., "Fix these 3 libs first, then upgrade module X").
My question to Engineering Managers / Tech Leads:
If you had budget ($3k-$10k range) to solve this headache, which option would you actually buy?
- Option A (The Map): "Just give us the deep-dive analysis and the plan. We have the devs, we just need to know exactly what to do so we don't waste weeks on research."
- Option B (The Driver): "I don't want a report. I want you to come in, do the grunt work (refactoring/upgrading), and hand me a clean PR."
- Option C (Status Quo): "We wouldn't pay for either. We just accept the pain and do it manually in-house."
Trying to figure out if the bottleneck is knowledge (risk assessment) or capacity (doing the work).
https://redd.it/1q07zt7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
The cognitive overhead of cloud infra choices feels under-discussed
Curious how people here think about this from an ops perspective.
We started on AWS (like most teams), and functionally it does everything we need. That said, once you move past basic usage, the combination of IAM complexity, cost attribution, and compliance-related questions adds a non-trivial amount of cognitive overhead. For context, our requirements are fairly standard: VMs, networking, backups, and some basic automation,,, nothing particularly exotic.
Because we’re EU-focused, I’ve been benchmarking a few non-hyperscaler setups in parallel, mostly as a sanity check to understand tradeoffs rather than as a migration plan. One of the environments I tested was a Swiss-based IaaS (Xelon), primarily to look at API completeness, snapshot semantics, and what day-2 operations actually feel like compared to AWS.
The experience was mixed in predictable ways: fewer abstractions and less surface area, but also a smaller ecosystem and less polish overall. It did, however, make it easier to reason about certain operational behaviors.
Idk what the “right” long-term answer is, but I’m interested in how others approach this in practice: Do you default to hyperscalers until scale demands otherwise, or do you intentionally optimize for simplicity earlier on?
https://redd.it/1q07j39
@r_devops
Curious how people here think about this from an ops perspective.
We started on AWS (like most teams), and functionally it does everything we need. That said, once you move past basic usage, the combination of IAM complexity, cost attribution, and compliance-related questions adds a non-trivial amount of cognitive overhead. For context, our requirements are fairly standard: VMs, networking, backups, and some basic automation,,, nothing particularly exotic.
Because we’re EU-focused, I’ve been benchmarking a few non-hyperscaler setups in parallel, mostly as a sanity check to understand tradeoffs rather than as a migration plan. One of the environments I tested was a Swiss-based IaaS (Xelon), primarily to look at API completeness, snapshot semantics, and what day-2 operations actually feel like compared to AWS.
The experience was mixed in predictable ways: fewer abstractions and less surface area, but also a smaller ecosystem and less polish overall. It did, however, make it easier to reason about certain operational behaviors.
Idk what the “right” long-term answer is, but I’m interested in how others approach this in practice: Do you default to hyperscalers until scale demands otherwise, or do you intentionally optimize for simplicity earlier on?
https://redd.it/1q07j39
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How did you get into DevOps and what actually mattered early on?
I’m learning DevOps right now and trying to be smart about where I spend my time.
For people already working in DevOps:
- What actually helped you get your first role?
- What did you stress about early on that didn’t really matter later?
- When did you personally feel “ready” for a job versus just learning tools?
One thing I keep thinking about is commands. I understand concepts pretty well, but I don’t always remember exact syntax. In real work, do you mostly rely on memory, or is it normal to lean on docs, old noscripts, and Google as long as you understand what you’re doing?
I’m more interested in real experiences than generic advice. Would love to hear how it was for you.
https://redd.it/1q09vxa
@r_devops
I’m learning DevOps right now and trying to be smart about where I spend my time.
For people already working in DevOps:
- What actually helped you get your first role?
- What did you stress about early on that didn’t really matter later?
- When did you personally feel “ready” for a job versus just learning tools?
One thing I keep thinking about is commands. I understand concepts pretty well, but I don’t always remember exact syntax. In real work, do you mostly rely on memory, or is it normal to lean on docs, old noscripts, and Google as long as you understand what you’re doing?
I’m more interested in real experiences than generic advice. Would love to hear how it was for you.
https://redd.it/1q09vxa
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Reflections on DevOps over the past year
This is more of a thinking-out-loud post than a hot take.
Looking back over the past year, I can’t shake the feeling that DevOps has gotten both more powerful and more fragile at the same time.
We have better tooling than ever:
- managed services everywhere
- more automation
- more abstraction
- AI creeping into workflows
- dashboards, alerts, pipelines for everything
And yet… a lot of the incidents I’ve seen still come down to the same old things.
Misconfigurations (still rampant at my company).
Shared failure domains that nobody realized were shared.
Deployments that technically “worked” but took the system down anyway (thinking of the AWS one specifically)
Observability that only told us what happened after users noticed.
It feels like we keep adding layers on top of systems without always revisiting the fundamentals underneath them.
I’ve been part of incidents where:
- redundancy existed on paper, but not in reality
- CI/CD pipelines became a bigger risk than the code changes themselves (felt this personally since our team took control of the cloud pipelines at my company)
- costs exploded quietly until someone finally asked “why is this so expensive?”
- security issues weren’t exotic attacks — just permissions that were too broad
None of this is new. But it feels more frequent, or at least more visible.
I’m genuinely curious how others see it:
- Do you feel like the DevOps role is shifting?
- Are we actually solving different problems now, or just re-solving the same ones with new tools?
- Has the push toward speed and abstraction made things easier… or just harder to reason about?
Not looking for definitive answers — just interested in how others experienced this past year.
https://redd.it/1q0cvl1
@r_devops
This is more of a thinking-out-loud post than a hot take.
Looking back over the past year, I can’t shake the feeling that DevOps has gotten both more powerful and more fragile at the same time.
We have better tooling than ever:
- managed services everywhere
- more automation
- more abstraction
- AI creeping into workflows
- dashboards, alerts, pipelines for everything
And yet… a lot of the incidents I’ve seen still come down to the same old things.
Misconfigurations (still rampant at my company).
Shared failure domains that nobody realized were shared.
Deployments that technically “worked” but took the system down anyway (thinking of the AWS one specifically)
Observability that only told us what happened after users noticed.
It feels like we keep adding layers on top of systems without always revisiting the fundamentals underneath them.
I’ve been part of incidents where:
- redundancy existed on paper, but not in reality
- CI/CD pipelines became a bigger risk than the code changes themselves (felt this personally since our team took control of the cloud pipelines at my company)
- costs exploded quietly until someone finally asked “why is this so expensive?”
- security issues weren’t exotic attacks — just permissions that were too broad
None of this is new. But it feels more frequent, or at least more visible.
I’m genuinely curious how others see it:
- Do you feel like the DevOps role is shifting?
- Are we actually solving different problems now, or just re-solving the same ones with new tools?
- Has the push toward speed and abstraction made things easier… or just harder to reason about?
Not looking for definitive answers — just interested in how others experienced this past year.
https://redd.it/1q0cvl1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you track your LLM/API costs per user?
Building a SaaS with multiple LLMs (OpenAI, Anthropic, Mistral) + various APIs (Supabase, etc).
My problem: I have zero visibility on costs.
* How much does each user cost me?
* Which feature burns the most tokens?
* When should I rate-limit a user?
Right now I'm basically flying blind until the invoice hits.
Tried looking at Helicone/LangFuse but not sure I want a proxy sitting between me and my LLM calls.
How do you guys handle this? Any simple solutions?
https://redd.it/1q0ecii
@r_devops
Building a SaaS with multiple LLMs (OpenAI, Anthropic, Mistral) + various APIs (Supabase, etc).
My problem: I have zero visibility on costs.
* How much does each user cost me?
* Which feature burns the most tokens?
* When should I rate-limit a user?
Right now I'm basically flying blind until the invoice hits.
Tried looking at Helicone/LangFuse but not sure I want a proxy sitting between me and my LLM calls.
How do you guys handle this? Any simple solutions?
https://redd.it/1q0ecii
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I have been working on a self-hosted GitHub Actions runner orchestrator
Hey folks,
I have been working on CIHub, an open-source project that lets you run self-hosted GitHub Actions runner on your own metal servers using firecracker. Each job runs in its own isolated VM for better security.
It integrates directly with standard GitHub Actions workflows allowing you to specify runner resources (e.g. adding label
The project is still early and under active development, and I'd really appreciate any feedback or ideas !
GitHub: https://github.com/getcihub/cihub
https://redd.it/1q0gh41
@r_devops
Hey folks,
I have been working on CIHub, an open-source project that lets you run self-hosted GitHub Actions runner on your own metal servers using firecracker. Each job runs in its own isolated VM for better security.
It integrates directly with standard GitHub Actions workflows allowing you to specify runner resources (e.g. adding label
runs-on: cihub-2cpu-4gb-amd64) and includes a server + agent setup for scaling across machines.The project is still early and under active development, and I'd really appreciate any feedback or ideas !
GitHub: https://github.com/getcihub/cihub
https://redd.it/1q0gh41
@r_devops
GitHub
GitHub - getcihub/cihub: Supercharged GitHub Actions runners
Supercharged GitHub Actions runners. Contribute to getcihub/cihub development by creating an account on GitHub.
How Meta evolved the DevOps toolchain for eBPF
Every server at Meta runs eBPF, 50% over 180 programs. They needed to rethink their CI/CD pipeline to handle challenges like attaching programs to multiple attach points and dealing with over 100 kernel variants to deploy programs
Talk: https://www.youtube.com/watch?v=wXuykaYSFCQ&t=818s
Slides: https://static.sched.com/hosted\_files/kccncna2025/68/BPF%20CICD%20KubeCon%20Talk.pdf?\_gl=1\*usbsj8\*\_gcl\_au\*MjExMTAzMDkxNi4xNzY3MDQ0NDcy\*FPAU\*MjExMTAzMDkxNi4xNzY3MDQ0NDcy
https://redd.it/1q0huye
@r_devops
Every server at Meta runs eBPF, 50% over 180 programs. They needed to rethink their CI/CD pipeline to handle challenges like attaching programs to multiple attach points and dealing with over 100 kernel variants to deploy programs
Talk: https://www.youtube.com/watch?v=wXuykaYSFCQ&t=818s
Slides: https://static.sched.com/hosted\_files/kccncna2025/68/BPF%20CICD%20KubeCon%20Talk.pdf?\_gl=1\*usbsj8\*\_gcl\_au\*MjExMTAzMDkxNi4xNzY3MDQ0NDcy\*FPAU\*MjExMTAzMDkxNi4xNzY3MDQ0NDcy
https://redd.it/1q0huye
@r_devops
YouTube
Fast and the Furious: CICD Pipeline for eBPF Programs at Meta S... Theophilus Benson & Prankur Gupta
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education…
Boss conflict with Scrum Relations during Christmas (Xmas-Nondenominational winter-solstice festivities) Holiday Season - PSU Course Focus
Hi all, hope you're enjoying Christmas (Xmas-Nondenominational winter-solstice festivities). Wanted to hear your thoughts on this situation. My boss and I were passive aggressively arguing during the latest sprint meeting about new operation methodologies leading into Q1 of 2026. Background, as a scrum master of my sector, we currently operate with a 70% interest towards improving ART (Agile Release Train) performance with a 25% interest in current burndown navigation rounds, a 3.8% (t.l.d.r this is calculated by total story points over a averaged period of time over three to four quarters divided by total confidence metric), and a 1.3% interest in handling "team issues" (story point assignment, workplace relationships, failed deadlines, simple stuff like that). My boss believes we should average out the interest relationship for at 5% (t.l.d.r this is calculated by total story points over a averaged period of time over three to four quarters divided by total confidence metric) rather than 3.8%. The internet is telling me this is due to a knowledge deficit caused by my non-acquisition of USUX scrum focus within the PSU scrum course (I will admit, I was watching the newest marvel movie (Fantastic four anyone???) and planning my Disney vacation while taking that part of the course, I tried getting my partner to screen record, but they was getting the new booster vaccine).
Has anyone ran into something similar in regard to priority assignments? Why specifically at the end of the year (for Gregorian calendar users) and not the end of the fiscal year (for American taxpayers). Also, what scrum cert would you recommend for a 15 year old child who has interests in turning his startup into a fully functioning scrum environment.
https://redd.it/1q0iwlm
@r_devops
Hi all, hope you're enjoying Christmas (Xmas-Nondenominational winter-solstice festivities). Wanted to hear your thoughts on this situation. My boss and I were passive aggressively arguing during the latest sprint meeting about new operation methodologies leading into Q1 of 2026. Background, as a scrum master of my sector, we currently operate with a 70% interest towards improving ART (Agile Release Train) performance with a 25% interest in current burndown navigation rounds, a 3.8% (t.l.d.r this is calculated by total story points over a averaged period of time over three to four quarters divided by total confidence metric), and a 1.3% interest in handling "team issues" (story point assignment, workplace relationships, failed deadlines, simple stuff like that). My boss believes we should average out the interest relationship for at 5% (t.l.d.r this is calculated by total story points over a averaged period of time over three to four quarters divided by total confidence metric) rather than 3.8%. The internet is telling me this is due to a knowledge deficit caused by my non-acquisition of USUX scrum focus within the PSU scrum course (I will admit, I was watching the newest marvel movie (Fantastic four anyone???) and planning my Disney vacation while taking that part of the course, I tried getting my partner to screen record, but they was getting the new booster vaccine).
Has anyone ran into something similar in regard to priority assignments? Why specifically at the end of the year (for Gregorian calendar users) and not the end of the fiscal year (for American taxpayers). Also, what scrum cert would you recommend for a 15 year old child who has interests in turning his startup into a fully functioning scrum environment.
https://redd.it/1q0iwlm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community