Coming from a Kubernetes-heavy SRE background and moving into AWS/ECS ops – could use some perspective
Hey all, looking for some perspective from people who’ve been around this longer than me.
I’ve been working as an SRE for just under three years now, and almost all of that time has been in Kubernetes-based environments. I spent most of my days dealing with production issues, on-call rotations, scaling problems, deployments that went sideways, and generally keeping clusters alive. Observability was a big part of my work too, Prometheus, Grafana, ELK, Datadog, some Jaeger tracing. Basically living inside k8s and the tooling around it.
I’m now interviewing for a role that’s a lot more AWS-ops heavy, and honestly it feels like a bit of a mental shift. They don’t run Kubernetes at all. Everything is ECS on AWS, and the role is much more focused on things like cost optimization, release and change management, versioning, and day-to-day production issues at the AWS service level. None of that sounds crazy to me in theory, but I can feel where my experience is thinner when it comes to AWS-native workflows, especially around ECS and FinOps.
I’m not trying to pretend I’m an AWS expert. I know how to think about capacity, failures, rollbacks, and noisy systems, but now I’m trying to translate that into how AWS actually does things. Stuff like how people really manage releases in ECS, where AWS costs usually get out of hand in real environments, and what ops teams actually look at first when something breaks in production outside of Kubernetes.
If you’ve moved from a Kubernetes-heavy setup into more traditional AWS or ECS-based ops work, I’d really like to hear how that transition went for you. What did you wish you understood earlier? What mattered way more than you expected? And what things did you overthink that turned out not to be that important?
Just trying to level myself up properly and not walk into this role blind. Appreciate any advice.
https://redd.it/1qzhbcr
@r_devops
Hey all, looking for some perspective from people who’ve been around this longer than me.
I’ve been working as an SRE for just under three years now, and almost all of that time has been in Kubernetes-based environments. I spent most of my days dealing with production issues, on-call rotations, scaling problems, deployments that went sideways, and generally keeping clusters alive. Observability was a big part of my work too, Prometheus, Grafana, ELK, Datadog, some Jaeger tracing. Basically living inside k8s and the tooling around it.
I’m now interviewing for a role that’s a lot more AWS-ops heavy, and honestly it feels like a bit of a mental shift. They don’t run Kubernetes at all. Everything is ECS on AWS, and the role is much more focused on things like cost optimization, release and change management, versioning, and day-to-day production issues at the AWS service level. None of that sounds crazy to me in theory, but I can feel where my experience is thinner when it comes to AWS-native workflows, especially around ECS and FinOps.
I’m not trying to pretend I’m an AWS expert. I know how to think about capacity, failures, rollbacks, and noisy systems, but now I’m trying to translate that into how AWS actually does things. Stuff like how people really manage releases in ECS, where AWS costs usually get out of hand in real environments, and what ops teams actually look at first when something breaks in production outside of Kubernetes.
If you’ve moved from a Kubernetes-heavy setup into more traditional AWS or ECS-based ops work, I’d really like to hear how that transition went for you. What did you wish you understood earlier? What mattered way more than you expected? And what things did you overthink that turned out not to be that important?
Just trying to level myself up properly and not walk into this role blind. Appreciate any advice.
https://redd.it/1qzhbcr
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Vouch: earn the right to submit a pull request (from Mitchell Hashimoto)
Mitchell Hashimoto got tired of watching open-source maintainers drown in AI-generated pull requests. So he built Vouch, a contributor trust management system. The concept is almost absurdly simple: before you can submit a PR to a project using Vouch, someone already trusted has to vouch for you.
The whole thing lives in a single text file inside the repo. One username per line. A minus sign means denounced. You can parse it with grep.
Sigstore verifies artifacts. SLSA verifies builds. Dependabot checks dependencies. None of them answer the question of whether a given person should be contributing to a project at all. That's the gap Vouch fills: contributor trust, not artifact trust.
Hashimoto designed it the same way he designed Terraform. Declarative. Human-readable. Version-controlled. Instead of .tf files for infrastructure, you get .td files for trust. Same brain, different domain.
The xz-utils backdoor is the elephant in the room. "Jia Tan" spent two years earning trust through legitimate contributions before planting a CVSS 10.0 backdoor. Vouch wouldn't have stopped that attack. But the vouch record would've been visible in the git history, who vouched for them, when, and the denouncement would propagate to every project subscribing to that vouch list. Less of a lock, more of a security camera.
Ghostty is already integrating it. The repo picked up 600 stars in three days. A GitHub staff member commented on the HN thread saying they'd ship changes "next week."
The concerns are real though. Gatekeeping is the obvious one. Open source is supposed to be open, and Vouch creates an explicit barrier where there wasn't one before. One HN commenter called it "social credit on GitHub." The persona gaming problem hasn't gone away either; someone could still spend months building trust before going rogue.
Hashimoto himself flags it as experimental. But it's the first serious attempt at making contributor trust visible and version-controlled.
I wrote up the full breakdown, including how Vouch compares to PGP's web of trust, Advogato, and Debian's maintainer process, here if you want the deep dive.
https://redd.it/1qzgoao
@r_devops
Mitchell Hashimoto got tired of watching open-source maintainers drown in AI-generated pull requests. So he built Vouch, a contributor trust management system. The concept is almost absurdly simple: before you can submit a PR to a project using Vouch, someone already trusted has to vouch for you.
The whole thing lives in a single text file inside the repo. One username per line. A minus sign means denounced. You can parse it with grep.
Sigstore verifies artifacts. SLSA verifies builds. Dependabot checks dependencies. None of them answer the question of whether a given person should be contributing to a project at all. That's the gap Vouch fills: contributor trust, not artifact trust.
Hashimoto designed it the same way he designed Terraform. Declarative. Human-readable. Version-controlled. Instead of .tf files for infrastructure, you get .td files for trust. Same brain, different domain.
The xz-utils backdoor is the elephant in the room. "Jia Tan" spent two years earning trust through legitimate contributions before planting a CVSS 10.0 backdoor. Vouch wouldn't have stopped that attack. But the vouch record would've been visible in the git history, who vouched for them, when, and the denouncement would propagate to every project subscribing to that vouch list. Less of a lock, more of a security camera.
Ghostty is already integrating it. The repo picked up 600 stars in three days. A GitHub staff member commented on the HN thread saying they'd ship changes "next week."
The concerns are real though. Gatekeeping is the obvious one. Open source is supposed to be open, and Vouch creates an explicit barrier where there wasn't one before. One HN commenter called it "social credit on GitHub." The persona gaming problem hasn't gone away either; someone could still spend months building trust before going rogue.
Hashimoto himself flags it as experimental. But it's the first serious attempt at making contributor trust visible and version-controlled.
I wrote up the full breakdown, including how Vouch compares to PGP's web of trust, Advogato, and Debian's maintainer process, here if you want the deep dive.
https://redd.it/1qzgoao
@r_devops
GitHub
GitHub - mitchellh/vouch: A community trust management system based on explicit vouches to participate.
A community trust management system based on explicit vouches to participate. - mitchellh/vouch
State of OpenTofu?
Has OpenTofu gained anything on Terraform? Has it proven itself as an alternative?
I unfortunately don't use IaC in my current deployment but I'm curious how the landscape has changed.
https://redd.it/1qz67sq
@r_devops
Has OpenTofu gained anything on Terraform? Has it proven itself as an alternative?
I unfortunately don't use IaC in my current deployment but I'm curious how the landscape has changed.
https://redd.it/1qz67sq
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need advice: trying to document an installation guide for production
Hey guys, I recently open-sourced a pretty huge self-hosted project. I've set up a docker-compose.yaml that worked fine for local deployments, but I suppose I made a lot of rookie mistakes for a production deployment guide.
I don't have much experience in DevOps except for small services and deploying websites with nginx+letsencrypt, and when people started coming to me for advice on why their setup failed, I was a bit overwhelmed.
For the last three evenings I've been trying to come to a default installation guide for a reverse proxy that would work fine for production.
So, the current setup is pretty standard:
-
- pretty much a default Go backend container
- frontend container that builds the frontend with baked in nginx that serves the static files on
My initial prod setup directed people to build the images manually and to edit the
Well, after debugging a couple environment-specific problems that people faced trying to deploy it this way, I realized that I need to adjust the guide ASAP.
At first, I thought that I needed to remove the baked-in nginx from the frontend container and move it up to `docker-compose.yaml`, but then I've read a suggestion on the internet that I can just put another reverse proxy in front of the frontend-internal nginx one.
So, my current thinking process is:
1. adjust nginx.conf.template to accept the DOMAIN and BACKENDPORT, so that they're provided by docker-compose, not changed by the user (or should the baked in nginx.conf be left untouched, without accepting those env vars, staying localhost-only?)
2. add a new container in docker-compose for prod setups - caddy with a reverse proxy in front (maybe as an override file)
Also, is it fine to mix caddy and nginx this way? or am I better off overhauling the setup entirely? If so, what's the best course of action for me?
In case someone wants to take a look: https://github.com/Vsein/Neohabit (the setup files are docker-compose.yaml, .env.example, frontend/nginx.conf.template; all of them are mentioned in the installation guide "building manually from source")
And here's what I've been trying to do: https://github.com/Vsein/Neohabit/pull/110
Anyway, sorry if this post is amateurish, I just genuinely feel like I'm wasting my time trying to do something that might be a wrong direction entirely.
https://redd.it/1qzmth6
@r_devops
Hey guys, I recently open-sourced a pretty huge self-hosted project. I've set up a docker-compose.yaml that worked fine for local deployments, but I suppose I made a lot of rookie mistakes for a production deployment guide.
I don't have much experience in DevOps except for small services and deploying websites with nginx+letsencrypt, and when people started coming to me for advice on why their setup failed, I was a bit overwhelmed.
For the last three evenings I've been trying to come to a default installation guide for a reverse proxy that would work fine for production.
So, the current setup is pretty standard:
-
docker-compose.yaml with setup on localhost by default- pretty much a default Go backend container
- frontend container that builds the frontend with baked in nginx that serves the static files on
/ and sets up a localhost reverse proxy on /apiMy initial prod setup directed people to build the images manually and to edit the
frontend/nginx.conf.template that the frontend container uses, so that people change their servername/adjust their IP address and so on.Well, after debugging a couple environment-specific problems that people faced trying to deploy it this way, I realized that I need to adjust the guide ASAP.
At first, I thought that I needed to remove the baked-in nginx from the frontend container and move it up to `docker-compose.yaml`, but then I've read a suggestion on the internet that I can just put another reverse proxy in front of the frontend-internal nginx one.
So, my current thinking process is:
1. adjust nginx.conf.template to accept the DOMAIN and BACKENDPORT, so that they're provided by docker-compose, not changed by the user (or should the baked in nginx.conf be left untouched, without accepting those env vars, staying localhost-only?)
2. add a new container in docker-compose for prod setups - caddy with a reverse proxy in front (maybe as an override file)
Also, is it fine to mix caddy and nginx this way? or am I better off overhauling the setup entirely? If so, what's the best course of action for me?
In case someone wants to take a look: https://github.com/Vsein/Neohabit (the setup files are docker-compose.yaml, .env.example, frontend/nginx.conf.template; all of them are mentioned in the installation guide "building manually from source")
And here's what I've been trying to do: https://github.com/Vsein/Neohabit/pull/110
Anyway, sorry if this post is amateurish, I just genuinely feel like I'm wasting my time trying to do something that might be a wrong direction entirely.
https://redd.it/1qzmth6
@r_devops
GitHub
GitHub - Vsein/Neohabit: A self-hosted habit-tracker with a new approach to heatmaps, and flexible habits that happen X times in…
A self-hosted habit-tracker with a new approach to heatmaps, and flexible habits that happen X times in Y days. - Vsein/Neohabit
How do devs secure their notebooks?
Hi guys,
How do devs typically secure/monitor the hygiene of their notebooks?
I scanned about 5000 random notebooks on GitHub and ended up finding almost 30 aws/oai/hf/google keys (frankly, they were inactive, but still).
https://redd.it/1qzn7f2
@r_devops
Hi guys,
How do devs typically secure/monitor the hygiene of their notebooks?
I scanned about 5000 random notebooks on GitHub and ended up finding almost 30 aws/oai/hf/google keys (frankly, they were inactive, but still).
https://redd.it/1qzn7f2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Priority Dilemma: Academic GPA vs. Personal Projects in DevOps
Hi everyone,
I’m a first-year Computer Science student, and I’m currently facing a dilemma that I’d love to get your take on (especially from the recruiters and hiring managers here).
On one hand, a high GPA is often seen as a critical resource and a primary screening tool for many companies.
On the other hand, I feel that the DevOps world is highly practical.
A project that demonstrates a complete End-to-End Pipeline (using tools like GitHub Actions, AWS, Docker, K8s, Terraform, Ansible, etc.)
shows hands-on toolchain knowledge and real-world application—qualities that are hard to measure through a GPA alone.
I’d like to ask about your priorities:
1. When screening for a Junior or Student position, what would make you stop and look at my CV—a 90 GPA with no projects, or an 80 GPA with a portfolio that demonstrates a deep understanding of CI/CD and IaC?
2. Do you have any tips on how to properly present such projects on a CV or in an interview to effectively reflect architectural understanding?
Thanks in advance for your insights! 🙏
https://redd.it/1qzoupy
@r_devops
Hi everyone,
I’m a first-year Computer Science student, and I’m currently facing a dilemma that I’d love to get your take on (especially from the recruiters and hiring managers here).
On one hand, a high GPA is often seen as a critical resource and a primary screening tool for many companies.
On the other hand, I feel that the DevOps world is highly practical.
A project that demonstrates a complete End-to-End Pipeline (using tools like GitHub Actions, AWS, Docker, K8s, Terraform, Ansible, etc.)
shows hands-on toolchain knowledge and real-world application—qualities that are hard to measure through a GPA alone.
I’d like to ask about your priorities:
1. When screening for a Junior or Student position, what would make you stop and look at my CV—a 90 GPA with no projects, or an 80 GPA with a portfolio that demonstrates a deep understanding of CI/CD and IaC?
2. Do you have any tips on how to properly present such projects on a CV or in an interview to effectively reflect architectural understanding?
Thanks in advance for your insights! 🙏
https://redd.it/1qzoupy
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What should I prepare / learn in detail before a DevOps / Cloud Engineer internship? (GitLab, Terraform, AWS)
Hi everyone,
I have a **DevOps / Cloud Engineer internship** coming up (about **4–5 months long**) , and the main tools used are **GitLab, Terraform, and AWS**.
For context, I already have:
* **AWS Solutions Architect Associate**
* **Terraform Associate**
* **CKA (In progress)**
So I’m familiar with the **concepts and theory**, but I don’t have much **real hands-on / production-style experience yet**, which I’d like to work on before the internship starts.
I’d really appreciate advice from people in DevOps / cloud roles on:
* What **hands-on skills** I should focus on with:
* **GitLab** (CI/CD pipelines, runners, YAML, etc.)
* **Terraform** (state management, modules, best practices?)
* **AWS** (which services matter most at intern level?)
* Any **common gaps interns usually have**, even with certs
* Things you wish you had practiced *before* your first DevOps / cloud role
I’m not trying to master everything, just want to be **useful quickly and not completely lost** on day one 😅
Any advice, learning priorities, or “focus on this, ignore that” tips would be really appreciated. Thanks!
https://redd.it/1qzrs6y
@r_devops
Hi everyone,
I have a **DevOps / Cloud Engineer internship** coming up (about **4–5 months long**) , and the main tools used are **GitLab, Terraform, and AWS**.
For context, I already have:
* **AWS Solutions Architect Associate**
* **Terraform Associate**
* **CKA (In progress)**
So I’m familiar with the **concepts and theory**, but I don’t have much **real hands-on / production-style experience yet**, which I’d like to work on before the internship starts.
I’d really appreciate advice from people in DevOps / cloud roles on:
* What **hands-on skills** I should focus on with:
* **GitLab** (CI/CD pipelines, runners, YAML, etc.)
* **Terraform** (state management, modules, best practices?)
* **AWS** (which services matter most at intern level?)
* Any **common gaps interns usually have**, even with certs
* Things you wish you had practiced *before* your first DevOps / cloud role
I’m not trying to master everything, just want to be **useful quickly and not completely lost** on day one 😅
Any advice, learning priorities, or “focus on this, ignore that” tips would be really appreciated. Thanks!
https://redd.it/1qzrs6y
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What decides where to ru the build on git runners or cloud build machines . Which is better in the long run if you may have multiple clouds
Currently using aws ci cd but new devops guy is using git runners .
No idea what is the right strategy
https://redd.it/1qzw544
@r_devops
Currently using aws ci cd but new devops guy is using git runners .
No idea what is the right strategy
https://redd.it/1qzw544
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
[Weekly/temp] Built a tool? New idea? Seeking feedback? Share in this thread.
This is a weekly thread for sharing new tools, side projects, github repositories and early stage ideas like micro-SaaS or MVPs.
What type of content may be suitable:
* new tools solving something you have been doing manually all this time
* something you have put together over the weekend and want to ask for feedback
* "I built X..."
etc.
If you have built something like this and want to show it, please post it here.
Individual posts of this type may be removed and redirected here.
Please remember to follow the rules and remain civil and professional.
*This is a trial weekly thread.*
https://redd.it/1qzyfzf
@r_devops
This is a weekly thread for sharing new tools, side projects, github repositories and early stage ideas like micro-SaaS or MVPs.
What type of content may be suitable:
* new tools solving something you have been doing manually all this time
* something you have put together over the weekend and want to ask for feedback
* "I built X..."
etc.
If you have built something like this and want to show it, please post it here.
Individual posts of this type may be removed and redirected here.
Please remember to follow the rules and remain civil and professional.
*This is a trial weekly thread.*
https://redd.it/1qzyfzf
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
❤1
Weekly/temp DevOps ENTRY LEVEL - internship / fresher & changing careers
This is a weekly thread to ask questions about getting into DevOps.
If you are a student, or want to start career in DevOps but do not know how? Ask here.
Changing careers but do not have basic prerequisites? Ask here.
Before asking
try to search if your question was asked and answered
try these resources
[https://roadmap.sh/devops](https://roadmap.sh/devops)
(please suggest more)
_____________
Individual posts of this type may be removed and redirected here.
Please remember to follow the rules and remain civil and professional.
This is a trial weekly thread.
https://redd.it/1qzzvku
@r_devops
This is a weekly thread to ask questions about getting into DevOps.
If you are a student, or want to start career in DevOps but do not know how? Ask here.
Changing careers but do not have basic prerequisites? Ask here.
Before asking
try to search if your question was asked and answered
try these resources
[https://roadmap.sh/devops](https://roadmap.sh/devops)
(please suggest more)
_____________
Individual posts of this type may be removed and redirected here.
Please remember to follow the rules and remain civil and professional.
This is a trial weekly thread.
https://redd.it/1qzzvku
@r_devops
roadmap.sh
DevOps Roadmap: Learn to become a DevOps Engineer or SRE
Step by step guide for DevOps, SRE or any other Operations Role in 2026
SSL/TLS explained (newbie-friendly): certificates, CA chain of trust, and making HTTPS work locally with OpenSSL
I kept hearing “just add SSL” and realized I didn’t actually understand what a certificate proves, how browsers trust it, or what’s happening during verification—so I wrote a short “newbie’s log” while learning.
In this post I cover:
What an “SSL certificate” (TLS, really) is: issuer info + public key + signature
Why the signature matters and how verification works
The chain of trust (Root CA → Intermediate CA → your cert) and why your OS/browser already trusts certain roots
A practical walkthrough: generate a local root CA + sign a localhost cert (SAN included), then serve a local site over HTTPS with a tiny Python server + import the root cert into Firefox
Blog Link: https://journal.farhaan.me/ssl-how-it-works-and-why-it-matters
https://redd.it/1r07ejx
@r_devops
I kept hearing “just add SSL” and realized I didn’t actually understand what a certificate proves, how browsers trust it, or what’s happening during verification—so I wrote a short “newbie’s log” while learning.
In this post I cover:
What an “SSL certificate” (TLS, really) is: issuer info + public key + signature
Why the signature matters and how verification works
The chain of trust (Root CA → Intermediate CA → your cert) and why your OS/browser already trusts certain roots
A practical walkthrough: generate a local root CA + sign a localhost cert (SAN included), then serve a local site over HTTPS with a tiny Python server + import the root cert into Firefox
Blog Link: https://journal.farhaan.me/ssl-how-it-works-and-why-it-matters
https://redd.it/1r07ejx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Monitoring performance and security together feels harder than it should be
One thing I have noticed is how disconnected performance monitoring and cloud security often are. You might notice latency or error spikes, but the security signals live somewhere else entirely. Or a security alert fires with no context about what the system was doing at that moment.
Trying to manage both sides separately feels inefficient, especially when incidents usually involve some mix of performance, configuration, and access issues. Having to cross check everything manually slows down response time and makes postmortems messy.
I am curious if others have found ways to bring performance data and security signals closer together so incidents are easier to understand and respond to.
https://redd.it/1r0dbxa
@r_devops
One thing I have noticed is how disconnected performance monitoring and cloud security often are. You might notice latency or error spikes, but the security signals live somewhere else entirely. Or a security alert fires with no context about what the system was doing at that moment.
Trying to manage both sides separately feels inefficient, especially when incidents usually involve some mix of performance, configuration, and access issues. Having to cross check everything manually slows down response time and makes postmortems messy.
I am curious if others have found ways to bring performance data and security signals closer together so incidents are easier to understand and respond to.
https://redd.it/1r0dbxa
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
When is it time to quit?
I wrapped up a tech panel for a Principal Azure Engineer role at an investment bank a couple of hours ago. This followed an interview with the hiring manager last Wednesday. We know each other from the past, i.e., I’ve interviewed for multiple roles at this firm over the last 5-6 years.
This role landed on my LinkedIn feed randomly. I commented on the post and emailed the hiring manager directly, we had a short back-and-forth, and his recruiter called me almost immediately. The process has been unusually smooth by modern standards.
Today’s panel felt strong. I’m confident I cleared the bar with both the Azure SME and the hiring manager. I saw visible agreement on several answers, got verbal acknowledgment more than once and handled questions from a junior panelist with ease. I was told that I’m “first in line” (not sure if that means FIFO or first on the shortlist), however, it seemed to be directionally positive.
Here’s the problem: I was laid off a little over six months ago and I am EXHAUSTED. It's like I've been on the hamster wheels of interviews since 8/4/2025. I’ve done the prep, the loops, the panels, the follow-ups. I know I’m good enough to be gainfully employed as a DevOps engineer.
If this role doesn’t turn into an offer, I’m seriously questioning whether I want to continue in tech at all. I don’t know if I have it in me to keep doing 5–7 round interview gauntlets, only to be rejected for vague reasons like “culture fit” or not smiling enough. I’ve given my adult life to STEM / engineering / corporate IT / tech and I am exhausted from having to engage with recruiters who want someone to take managerial roles for IC level pay.
I’m not bitter about rejection. I’m tired of dysfunction...hiring managers who don’t know the difference between EC2 and AWS Lambda, recruiters who can’t distinguish an AWS account from an Azure subnoscription and BS interview processes that ding candidates for being "too intense".
So I’m asking honestly: when is it time to walk away?
For those who’ve been at a similar crossroads...did you step back temporarily, change strategy or leave tech altogether?
TL;DR: Six months, countless interviews, strong signals in today's tech panel. If today's tech panel doesn’t result in an offer, I’m seriously considering being done with the tech interview industrial complex.
https://redd.it/1r0jghq
@r_devops
I wrapped up a tech panel for a Principal Azure Engineer role at an investment bank a couple of hours ago. This followed an interview with the hiring manager last Wednesday. We know each other from the past, i.e., I’ve interviewed for multiple roles at this firm over the last 5-6 years.
This role landed on my LinkedIn feed randomly. I commented on the post and emailed the hiring manager directly, we had a short back-and-forth, and his recruiter called me almost immediately. The process has been unusually smooth by modern standards.
Today’s panel felt strong. I’m confident I cleared the bar with both the Azure SME and the hiring manager. I saw visible agreement on several answers, got verbal acknowledgment more than once and handled questions from a junior panelist with ease. I was told that I’m “first in line” (not sure if that means FIFO or first on the shortlist), however, it seemed to be directionally positive.
Here’s the problem: I was laid off a little over six months ago and I am EXHAUSTED. It's like I've been on the hamster wheels of interviews since 8/4/2025. I’ve done the prep, the loops, the panels, the follow-ups. I know I’m good enough to be gainfully employed as a DevOps engineer.
If this role doesn’t turn into an offer, I’m seriously questioning whether I want to continue in tech at all. I don’t know if I have it in me to keep doing 5–7 round interview gauntlets, only to be rejected for vague reasons like “culture fit” or not smiling enough. I’ve given my adult life to STEM / engineering / corporate IT / tech and I am exhausted from having to engage with recruiters who want someone to take managerial roles for IC level pay.
I’m not bitter about rejection. I’m tired of dysfunction...hiring managers who don’t know the difference between EC2 and AWS Lambda, recruiters who can’t distinguish an AWS account from an Azure subnoscription and BS interview processes that ding candidates for being "too intense".
So I’m asking honestly: when is it time to walk away?
For those who’ve been at a similar crossroads...did you step back temporarily, change strategy or leave tech altogether?
TL;DR: Six months, countless interviews, strong signals in today's tech panel. If today's tech panel doesn’t result in an offer, I’m seriously considering being done with the tech interview industrial complex.
https://redd.it/1r0jghq
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Security findings come in Jira tickets with zero context
Security scanner runs nightly and I wake up to 15 Jira tickets. Each one says fix CVE-2025-XXXX in dependency Y with no explanation of what the dependency does, where it's used, or why it matters.
I'm supposed to drop whatever sprint work I'm on, research the CVE, find where we use that package, assess actual risk, test the upgrade, and hope nothing breaks.
Meanwhile the ticket was auto-generated and the security team has no idea what they're asking me to fix. Just scanner said critical so here's a ticket.
Why can't these tools give actual context? Like this package is used in auth flow, vulnerability allows account takeover, here's how to fix it. Instead of just screaming CVE numbers at me.
https://redd.it/1r4xpz9
@r_devops
Security scanner runs nightly and I wake up to 15 Jira tickets. Each one says fix CVE-2025-XXXX in dependency Y with no explanation of what the dependency does, where it's used, or why it matters.
I'm supposed to drop whatever sprint work I'm on, research the CVE, find where we use that package, assess actual risk, test the upgrade, and hope nothing breaks.
Meanwhile the ticket was auto-generated and the security team has no idea what they're asking me to fix. Just scanner said critical so here's a ticket.
Why can't these tools give actual context? Like this package is used in auth flow, vulnerability allows account takeover, here's how to fix it. Instead of just screaming CVE numbers at me.
https://redd.it/1r4xpz9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Duplicate writes in multi-step automation: where do you enforce idempotency?
Genuine question.
We run multi-step automation that touches tickets, db writes, api calls and emails.
A step partially failed or timed out. we restarted the run. a downstream write had already happened. result: duplicate tickets, duplicate notifications.
This does not feel like a simple retry problem. it is about where step boundaries live and how side effects stay idempotent across an entire run.
Things we are trying:
Treating write-capable steps differently from read-only steps
Requiring idempotency keys or operation ids for side effects
Making re-runs step-scoped instead of whole-run
Keeping a durable per-step ledger with inputs, outputs and timestamps
Adding manual pause or cancel before certain write steps
It still feels easy to get wrong.
Where do you enforce idempotency in practice?
Application layer
Workflow engine
Middleware or sidecar
Sagas or outbox pattern
Approval gates
If you have shipped long-running automation with real side effects, what worked and what caused incidents?
https://redd.it/1r4u7zr
@r_devops
Genuine question.
We run multi-step automation that touches tickets, db writes, api calls and emails.
A step partially failed or timed out. we restarted the run. a downstream write had already happened. result: duplicate tickets, duplicate notifications.
This does not feel like a simple retry problem. it is about where step boundaries live and how side effects stay idempotent across an entire run.
Things we are trying:
Treating write-capable steps differently from read-only steps
Requiring idempotency keys or operation ids for side effects
Making re-runs step-scoped instead of whole-run
Keeping a durable per-step ledger with inputs, outputs and timestamps
Adding manual pause or cancel before certain write steps
It still feels easy to get wrong.
Where do you enforce idempotency in practice?
Application layer
Workflow engine
Middleware or sidecar
Sagas or outbox pattern
Approval gates
If you have shipped long-running automation with real side effects, what worked and what caused incidents?
https://redd.it/1r4u7zr
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How are you handling AI agent inventory and compliance in your infrastructure?
With the EU AI Act enforcement date coming up (August 2026), we've been dealing with a problem that I think a lot of DevOps teams are going to hit: figuring out what AI agents are actually running in your infrastructure.
Our situation: we had n8n workflows calling OpenAI, LangChain agents deployed by different teams, random Zapier integrations making API calls to Claude — and nobody had a central view of all of it. Classic shadow AI problem.
The compliance angle made it urgent. The EU AI Act requires organizations to classify AI systems by risk level, maintain documentation, and demonstrate oversight. Can't do any of that if you don't even have an inventory.
What we ended up building was a scanner that walks through your infra and maps AI components — models, agents, API calls, data flows. We open-sourced it as AI-BOM (github.com/Trusera/ai-bom) since we figured other teams are hitting the same wall.
But I'm curious how others are approaching this:
- Do you have visibility into what AI/LLM integrations are running across your org?
- Is anyone tracking AI agents as part of their CMDB or asset inventory?
- How are you thinking about EU AI Act compliance from an infrastructure perspective?
- Anyone using SBOM-style approaches for AI components?
Would love to hear what other teams are doing — or if this just isn't on your radar yet.
https://redd.it/1r4y6b7
@r_devops
With the EU AI Act enforcement date coming up (August 2026), we've been dealing with a problem that I think a lot of DevOps teams are going to hit: figuring out what AI agents are actually running in your infrastructure.
Our situation: we had n8n workflows calling OpenAI, LangChain agents deployed by different teams, random Zapier integrations making API calls to Claude — and nobody had a central view of all of it. Classic shadow AI problem.
The compliance angle made it urgent. The EU AI Act requires organizations to classify AI systems by risk level, maintain documentation, and demonstrate oversight. Can't do any of that if you don't even have an inventory.
What we ended up building was a scanner that walks through your infra and maps AI components — models, agents, API calls, data flows. We open-sourced it as AI-BOM (github.com/Trusera/ai-bom) since we figured other teams are hitting the same wall.
But I'm curious how others are approaching this:
- Do you have visibility into what AI/LLM integrations are running across your org?
- Is anyone tracking AI agents as part of their CMDB or asset inventory?
- How are you thinking about EU AI Act compliance from an infrastructure perspective?
- Anyone using SBOM-style approaches for AI components?
Would love to hear what other teams are doing — or if this just isn't on your radar yet.
https://redd.it/1r4y6b7
@r_devops
GitHub
GitHub - Trusera/ai-bom: AI Bill of Materials — discover every AI agent, model, and API in your infrastructure
AI Bill of Materials — discover every AI agent, model, and API in your infrastructure - Trusera/ai-bom
Any resources to help a senior backend engineer moving into a lead data platform engineering role? My DevOps knowledge is elementary at best and I don't know everything AWS but I'm the most qualified to do this.
For context, I'm a strong backend engineer and I've used Terraform to create my own services and whatnot but I've never done anything this in-depth like the SREs and lead platform engineers at my previous companies.
Establishing engineering best practices for the team, platform monitoring, observability, security/governance, failover, design patterns, architecture, and the whole 9 yards are going to be my main responsibility (this absolutely terrifies me). I'm going to be the main engineer that data/analytics engineers, ml engineers, and management can come to for advice.
My vision here is to build a boring but reliable and well-oiled machine. Ideally costs are optimized, we're not being idiots by leaving resources unattended to. Everything's being built from scratch so I have the final say but I'm worried about screwing it up and doing something stupid that'll cost the companies thousands for no reason.
Tooling wise, it's mainly AWS, Snowflake, and I'm thinking of introducing Gitlab instead of Github.
https://redd.it/1r50dcd
@r_devops
For context, I'm a strong backend engineer and I've used Terraform to create my own services and whatnot but I've never done anything this in-depth like the SREs and lead platform engineers at my previous companies.
Establishing engineering best practices for the team, platform monitoring, observability, security/governance, failover, design patterns, architecture, and the whole 9 yards are going to be my main responsibility (this absolutely terrifies me). I'm going to be the main engineer that data/analytics engineers, ml engineers, and management can come to for advice.
My vision here is to build a boring but reliable and well-oiled machine. Ideally costs are optimized, we're not being idiots by leaving resources unattended to. Everything's being built from scratch so I have the final say but I'm worried about screwing it up and doing something stupid that'll cost the companies thousands for no reason.
Tooling wise, it's mainly AWS, Snowflake, and I'm thinking of introducing Gitlab instead of Github.
https://redd.it/1r50dcd
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need help preparing for internship
Hi, I was lucky enough to get a cloud/devops engineer intern, but I rlly only know the basics of the cloud, I don’t really know much about it.
Are there any resources/books you recommend to learn more abt cloud technologies and be able to do good during the internship?
Thank you so much!
https://redd.it/1r52nkk
@r_devops
Hi, I was lucky enough to get a cloud/devops engineer intern, but I rlly only know the basics of the cloud, I don’t really know much about it.
Are there any resources/books you recommend to learn more abt cloud technologies and be able to do good during the internship?
Thank you so much!
https://redd.it/1r52nkk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Book recommendation
What is the best book to learn network? I have general idea about dns, firewalls, NAT, switch, hub etc. But I still don’t feel confident regarding network and want to dig deeper.
https://redd.it/1r4wpu8
@r_devops
What is the best book to learn network? I have general idea about dns, firewalls, NAT, switch, hub etc. But I still don’t feel confident regarding network and want to dig deeper.
https://redd.it/1r4wpu8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Do you feel the Heat of AI in DevOps Roles?
as the noscript suggests, do you feel AI is after your DevOps job?.
have you seen it helping effectively in your role or eliminating your role.
helping --> generating IAC, python code for automation. decesion making when your confused at using anything in DevOps. etc.,
Eliminating --> AI can replace you in every possible way.
I can go first:
Helping --> I have seen juniors using it effectively and writing better code with faster turnaround time.my junior is nothing without AI and so arrogant person that he tells him self and others that he knows everything. true to this my manager supports him as he fixes and provisions infra in no time.but he engages us in calls for hours to make him self understand the requirement.
Eliminating --> i strongly feel our roles will be vanished in years to come.may be max 5 yrs. the reason I see is the bug. the startup bug. everyone wants to do something and they feel as if they are doing favour to the society. but no, they are satisfieng their ego.they are looking very closely at all roles to see what can be automated and targetting them. DevOps is no exception here. thts how Amazon also had to let go many DevOps/cloud engineerings.
https://redd.it/1r56d15
@r_devops
as the noscript suggests, do you feel AI is after your DevOps job?.
have you seen it helping effectively in your role or eliminating your role.
helping --> generating IAC, python code for automation. decesion making when your confused at using anything in DevOps. etc.,
Eliminating --> AI can replace you in every possible way.
I can go first:
Helping --> I have seen juniors using it effectively and writing better code with faster turnaround time.my junior is nothing without AI and so arrogant person that he tells him self and others that he knows everything. true to this my manager supports him as he fixes and provisions infra in no time.but he engages us in calls for hours to make him self understand the requirement.
Eliminating --> i strongly feel our roles will be vanished in years to come.may be max 5 yrs. the reason I see is the bug. the startup bug. everyone wants to do something and they feel as if they are doing favour to the society. but no, they are satisfieng their ego.they are looking very closely at all roles to see what can be automated and targetting them. DevOps is no exception here. thts how Amazon also had to let go many DevOps/cloud engineerings.
https://redd.it/1r56d15
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
ACA autoscaling killing long running jobs — best practice?
Using Azure Container Apps with HTTP autoscaling(with 10 as concurrent users) for report generation. During scale up/down, replicas get terminated and reports fail mid-execution.
Questions:
• Is this the right pattern for long-running jobs on ACA?
• Any Service Bus lock timeout gotchas?
https://redd.it/1r4hkzu
@r_devops
Using Azure Container Apps with HTTP autoscaling(with 10 as concurrent users) for report generation. During scale up/down, replicas get terminated and reports fail mid-execution.
Questions:
• Is this the right pattern for long-running jobs on ACA?
• Any Service Bus lock timeout gotchas?
https://redd.it/1r4hkzu
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community