How do I streamline the access update process in my org?
Dealing with a bunch of role changes at my company (project swaps, team changes, etc.) and access updates have been super messy. I've seen some people using HR-triggered workflows to try to automate this, but wondering if there are other things I should be looking into. I've been looking into Console to try to handle small permission tweaks that keep coming up. Would love to hear about how other ppl are handling this!
https://redd.it/1pp8kph
@r_devops
Dealing with a bunch of role changes at my company (project swaps, team changes, etc.) and access updates have been super messy. I've seen some people using HR-triggered workflows to try to automate this, but wondering if there are other things I should be looking into. I've been looking into Console to try to handle small permission tweaks that keep coming up. Would love to hear about how other ppl are handling this!
https://redd.it/1pp8kph
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Colleague built a pretty neat tool for managing RabbitMQ DLQs
Hey all,
Just wanted to give a quick shoutout to a dev from my company who built a tool we’ve been using internally for a while now, it’s called Rabbit GUI (https://rabbitgui.com/), and it helps us manage RabbitMQ dead letter queues. We use it to read messages from the queue, search and filter, and republish only specific messages if needed.
We’ve had it in use for a couple months, and honestly, it’s been super handy. I definitely would not want to give it up.
Disclaimer, it’s a paid tool (lifetime license though, not a subnoscription), but I think the pricing’s fair for what it does.
Figured I’d help him get a bit more visibility since it’s actually been useful for us.
If anyone checks it out, I’d love to hear your thoughts, happy to pass along any feedback or questions to him!
Cheers
https://redd.it/1pp7fwq
@r_devops
Hey all,
Just wanted to give a quick shoutout to a dev from my company who built a tool we’ve been using internally for a while now, it’s called Rabbit GUI (https://rabbitgui.com/), and it helps us manage RabbitMQ dead letter queues. We use it to read messages from the queue, search and filter, and republish only specific messages if needed.
We’ve had it in use for a couple months, and honestly, it’s been super handy. I definitely would not want to give it up.
Disclaimer, it’s a paid tool (lifetime license though, not a subnoscription), but I think the pricing’s fair for what it does.
Figured I’d help him get a bit more visibility since it’s actually been useful for us.
If anyone checks it out, I’d love to hear your thoughts, happy to pass along any feedback or questions to him!
Cheers
https://redd.it/1pp7fwq
@r_devops
Rabbitgui
RabbitGUI | Manage RabbitMQ dead-letters with ease
Debug, monitor, and manage RabbitMQ with a modern developer interface. RabbitGUI supports multiple connexions, quick search, debug mode, and much more.
Is SSL decryption still worth it for AI and SaaS visibility? Am a SecOps lead btw
Anyone still banking on SSL decryption for GenAI and SaaS app visibility? What's breaking in your environment: cert pinning, HSTS, user complaints?
Particularly curious about the network layer vs app layer debate. Seeing more teams pivot to browser-native controls but want to hear operational experiences. What's your take?
https://redd.it/1ppbi0c
@r_devops
Anyone still banking on SSL decryption for GenAI and SaaS app visibility? What's breaking in your environment: cert pinning, HSTS, user complaints?
Particularly curious about the network layer vs app layer debate. Seeing more teams pivot to browser-native controls but want to hear operational experiences. What's your take?
https://redd.it/1ppbi0c
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Composable DXP in practice... flexibility win or long-term maintenance tax?
I’ve been seeing more teams move away from monolithic CMS platforms toward a composable DXP model with headless CMS, search, personalization, commerce, analytics, all loosely coupled and stitched together with APIs.
On paper it’s best-of-breed everything, faster iteration, and no vendor lock-in.
In practice though, it seems like the real tradeoff shows up later in:
\- Integration ownership and version drift
\- Observability across multiple vendors
\- Reliability when one service upstream sneezes
\- The ongoing cost of “keeping the stack composed”
For those running composable DXPs in production today:
\- Has it meaningfully improved delivery speed or experience quality?
\- Where did the complexity actually concentrate over time (build, ops, integration, governance)?
\- And if you’ve lived on both sides, would you still choose composable over a modern all-in-one today?
Less interested in vendor marketing... more in the lived operational reality.
https://redd.it/1ppa6d2
@r_devops
I’ve been seeing more teams move away from monolithic CMS platforms toward a composable DXP model with headless CMS, search, personalization, commerce, analytics, all loosely coupled and stitched together with APIs.
On paper it’s best-of-breed everything, faster iteration, and no vendor lock-in.
In practice though, it seems like the real tradeoff shows up later in:
\- Integration ownership and version drift
\- Observability across multiple vendors
\- Reliability when one service upstream sneezes
\- The ongoing cost of “keeping the stack composed”
For those running composable DXPs in production today:
\- Has it meaningfully improved delivery speed or experience quality?
\- Where did the complexity actually concentrate over time (build, ops, integration, governance)?
\- And if you’ve lived on both sides, would you still choose composable over a modern all-in-one today?
Less interested in vendor marketing... more in the lived operational reality.
https://redd.it/1ppa6d2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
GitHub is "postponing" self-hosted GHA pricing change
https://x.com/github/status/2001372894882918548
The outcry won! (for now)
> We’re postponing the announced billing change for self-hosted GitHub Actions to take time to re-evaluate our approach.
https://redd.it/1ppe91p
@r_devops
https://x.com/github/status/2001372894882918548
The outcry won! (for now)
> We’re postponing the announced billing change for self-hosted GitHub Actions to take time to re-evaluate our approach.
https://redd.it/1ppe91p
@r_devops
X (formerly Twitter)
GitHub (@github) on X
We’ve read your posts and heard your feedback.
1. We’re postponing the announced billing change for self-hosted GitHub Actions to take time to re-evaluate our approach.
2. We are continuing to reduce hosted-runners prices by up to 39% on January 1, 2026.…
1. We’re postponing the announced billing change for self-hosted GitHub Actions to take time to re-evaluate our approach.
2. We are continuing to reduce hosted-runners prices by up to 39% on January 1, 2026.…
Am I Junior Level at least?
So i'll preface by saying I work as an SDET mainly. But here lately we've been moving over from Azure to AWS. I was kinda the first person to start messing with things. And I guess I wanted to see if this is at least "junior level" based off what ive done. Also we are using gitlab pipelines for CI/CD for the first time.
So far I have:
* Setup CI/CD Pipelines in Gitlab (ci-yaml file)
* Get a working pipeline for Deploying to AWS (Beanstalk for now)
* Similarly set up a working pipeline to handle Terraform Apply/Plan
* E2E Automated Testing on Pipelines (this is less devops and more SDET though)
* Get a decent understand of Terraform modules. Set up IAM and S3 Terraform state Terraform modules
* Dockerize our reporting tool (Allure) and work from ECR
* Document and work with DevOps on Environments/Shared Resources/etc.. for moving to Gitlab fully as well as AWS.
It doesn't feel like a lot, and I have a ways to go but I find it interesting. Yeah I obviously used A.I. for some of the syntax/CLI commands but I feel like I have a decent idea of Architecture.
https://redd.it/1ppejw8
@r_devops
So i'll preface by saying I work as an SDET mainly. But here lately we've been moving over from Azure to AWS. I was kinda the first person to start messing with things. And I guess I wanted to see if this is at least "junior level" based off what ive done. Also we are using gitlab pipelines for CI/CD for the first time.
So far I have:
* Setup CI/CD Pipelines in Gitlab (ci-yaml file)
* Get a working pipeline for Deploying to AWS (Beanstalk for now)
* Similarly set up a working pipeline to handle Terraform Apply/Plan
* E2E Automated Testing on Pipelines (this is less devops and more SDET though)
* Get a decent understand of Terraform modules. Set up IAM and S3 Terraform state Terraform modules
* Dockerize our reporting tool (Allure) and work from ECR
* Document and work with DevOps on Environments/Shared Resources/etc.. for moving to Gitlab fully as well as AWS.
It doesn't feel like a lot, and I have a ways to go but I find it interesting. Yeah I obviously used A.I. for some of the syntax/CLI commands but I feel like I have a decent idea of Architecture.
https://redd.it/1ppejw8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you compare CI/CD providers?
I've been exploring which CI/CD provider to focus on for my organization over the past few months. We've got some things in GitHub actions, and some in Azure DevOps, mostly because different groups of people set up different solutions.
But to be honest, I can't find a compelling reason to go with one or the other. Coin toss?
And then of course, there are other options out there.
What are the key differentiators that you have come across in exploring these tools?
https://redd.it/1pph1m7
@r_devops
I've been exploring which CI/CD provider to focus on for my organization over the past few months. We've got some things in GitHub actions, and some in Azure DevOps, mostly because different groups of people set up different solutions.
But to be honest, I can't find a compelling reason to go with one or the other. Coin toss?
And then of course, there are other options out there.
What are the key differentiators that you have come across in exploring these tools?
https://redd.it/1pph1m7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
This is the kind of work AI should be doing
​
I already knew what I needed to do. The problem wasn’t lack of knowledge, it was recall. I could’ve spent time poking around, trial and erroring, or Googling until something clicked. All of that would’ve pulled me out of the flow I was in.
Instead, I asked Cosine, got what I needed almost instantly, and kept going. No rabbit holes, no context switch, no wasted mental energy.
For me, that’s the right use of AI. Handle the small, forgettable details so I can stay focused on the parts that actually require thinking.
https://redd.it/1ppjq86
@r_devops
​
I already knew what I needed to do. The problem wasn’t lack of knowledge, it was recall. I could’ve spent time poking around, trial and erroring, or Googling until something clicked. All of that would’ve pulled me out of the flow I was in.
Instead, I asked Cosine, got what I needed almost instantly, and kept going. No rabbit holes, no context switch, no wasted mental energy.
For me, that’s the right use of AI. Handle the small, forgettable details so I can stay focused on the parts that actually require thinking.
https://redd.it/1ppjq86
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is £95–100k total comp solid for a senior-ish DevOps role in London?
Hey all,
Looking for a quick sanity check from people in the London market.
I've got two offers for Platform Engineer/SRE roles at large non-FAANG companies in London. Base is in the £80–90k range, total comp comes out around £95–100k with bonus.
I'm 24, a bit unsure if this is good for the market or if I should be pushing harder, looking elsewhere. Not that trying to min-max, just want to know if this is a solid place to be or if I'm undervaluing myself.
Would appreciate any perspective from people hiring or working in similar roles. Thanks!
https://redd.it/1ppj4y3
@r_devops
Hey all,
Looking for a quick sanity check from people in the London market.
I've got two offers for Platform Engineer/SRE roles at large non-FAANG companies in London. Base is in the £80–90k range, total comp comes out around £95–100k with bonus.
I'm 24, a bit unsure if this is good for the market or if I should be pushing harder, looking elsewhere. Not that trying to min-max, just want to know if this is a solid place to be or if I'm undervaluing myself.
Would appreciate any perspective from people hiring or working in similar roles. Thanks!
https://redd.it/1ppj4y3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What’s the most common reason CI/CD pipelines break down in growing teams?
As teams grow, CI/CD pipelines that once worked fine can slowly turn messy. More people, more changes, quick fixes, and suddenly the pipeline feels fragile and breaks more often than it should. Tests become flaky, environments don’t match, and everyone starts blaming the tools instead of the process.
What do you think is the main reason CI/CD pipelines break down as teams scale?
https://redd.it/1pplnrt
@r_devops
As teams grow, CI/CD pipelines that once worked fine can slowly turn messy. More people, more changes, quick fixes, and suddenly the pipeline feels fragile and breaks more often than it should. Tests become flaky, environments don’t match, and everyone starts blaming the tools instead of the process.
What do you think is the main reason CI/CD pipelines break down as teams scale?
https://redd.it/1pplnrt
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
New Features We Find Exciting in the Kubernetes 1.35 Release
Hey everyone! Wrote a blog post highlighting some of the features I think are worth taking a look at in the latest Kubernetes release, including examples to try them out.
Read here: https://metalbear.com/blog/kubernetes-1-35/(https://www.reddit.com/submit/?postid=t31n34tlz)
https://redd.it/1ppj9ur
@r_devops
Hey everyone! Wrote a blog post highlighting some of the features I think are worth taking a look at in the latest Kubernetes release, including examples to try them out.
Read here: https://metalbear.com/blog/kubernetes-1-35/(https://www.reddit.com/submit/?postid=t31n34tlz)
https://redd.it/1ppj9ur
@r_devops
MetalBear 🐻
New Features We Find Exciting in the Kubernetes 1.35 Release
Explore the key features in Kubernetes 1.35 'World Tree' release, a release focused on strengthening the platform’s foundations with stable OCI image volumes, modernized streaming via WebSockets, clearer traffic distribution, and deeper security and node…
On-demand runner on AWS CodeBuild with Bitbucket Pipelines
I made a package that enables AWS CodeBuild as an on-demand self-hosted runner for Bitbucket Pipelines.
The problem: AWS CodeBuild natively supports managed runners for GitHub Actions, GitLab, etc. - but not Bitbucket.
The solution: This package bridges that gap. Your Bitbucket Pipeline triggers CodeBuild via OIDC, which spins up an ephemeral self-hosted runner on-demand. When the build completes, the runner terminates automatically.
https://github.com/westito/aws-bitbucket-runner
https://redd.it/1ppn1xy
@r_devops
I made a package that enables AWS CodeBuild as an on-demand self-hosted runner for Bitbucket Pipelines.
The problem: AWS CodeBuild natively supports managed runners for GitHub Actions, GitLab, etc. - but not Bitbucket.
The solution: This package bridges that gap. Your Bitbucket Pipeline triggers CodeBuild via OIDC, which spins up an ephemeral self-hosted runner on-demand. When the build completes, the runner terminates automatically.
https://github.com/westito/aws-bitbucket-runner
https://redd.it/1ppn1xy
@r_devops
GitHub
GitHub - westito/aws-bitbucket-runner
Contribute to westito/aws-bitbucket-runner development by creating an account on GitHub.
I wrote a garbage collector for my AWS account because 'Status: Available' doesn't mean 'In Use'.
Hey everyone,
I've been diving deep into the AWS SDKs specifically to understand how billing correlates with actual usage, and I realized something annoying: Status != Usage.
The AWS Console shows a NAT Gateway as "Available" , but it doesn't warn you that it has processed 0 bytes in 30 days while still costing \~$32/month. It shows an EBS volume as "Available", but not that it was detached 6 months ago from a terminated instance.
I wanted to build something that digs deeper than just metadata.
So I wrote CloudSlash.
It’s an open-source CLI tool (AGPL) written in Go.
The Engineering: I wanted to build a proper specialized tool, not just a noscript.
Heuristic Engine: It correlates CloudWatch Metrics (actual traffic/IOPS) with Infrastructure State to prove a resource is unused.
The Findings:
Zombie EBS: Volumes attached to stopped instances for >30 days (or unattached).
Vampire NATs: Gateways charging hourly rates with <1GB monthly traffic.
Ghost S3: Incomplete multipart uploads (invisible storage costs).
Stack: Go + Cobra + BubbleTea (for a nice TUI). It builds a strictly local dependency graph of your resources.
Why Use It? It runs with ReadOnlyAccess. It doesn't send data to any SaaS (it's local). It allows you to find waste that the basic free-tier tools might miss.
I also added a "Pro" feature that generates Terraform
I'd really appreciate any feedback on the Golang structure or suggestions for other "waste patterns" I should implement next.
Repo: https://github.com/DrSkyle/CloudSlash
Cheers!
https://redd.it/1ppnn2n
@r_devops
Hey everyone,
I've been diving deep into the AWS SDKs specifically to understand how billing correlates with actual usage, and I realized something annoying: Status != Usage.
The AWS Console shows a NAT Gateway as "Available" , but it doesn't warn you that it has processed 0 bytes in 30 days while still costing \~$32/month. It shows an EBS volume as "Available", but not that it was detached 6 months ago from a terminated instance.
I wanted to build something that digs deeper than just metadata.
So I wrote CloudSlash.
It’s an open-source CLI tool (AGPL) written in Go.
The Engineering: I wanted to build a proper specialized tool, not just a noscript.
Heuristic Engine: It correlates CloudWatch Metrics (actual traffic/IOPS) with Infrastructure State to prove a resource is unused.
The Findings:
Zombie EBS: Volumes attached to stopped instances for >30 days (or unattached).
Vampire NATs: Gateways charging hourly rates with <1GB monthly traffic.
Ghost S3: Incomplete multipart uploads (invisible storage costs).
Stack: Go + Cobra + BubbleTea (for a nice TUI). It builds a strictly local dependency graph of your resources.
Why Use It? It runs with ReadOnlyAccess. It doesn't send data to any SaaS (it's local). It allows you to find waste that the basic free-tier tools might miss.
I also added a "Pro" feature that generates Terraform
import blocks and destroy plans to fix the waste automatically, but the core scanning and discovery are 100% free/open source.I'd really appreciate any feedback on the Golang structure or suggestions for other "waste patterns" I should implement next.
Repo: https://github.com/DrSkyle/CloudSlash
Cheers!
https://redd.it/1ppnn2n
@r_devops
GitHub
GitHub - DrSkyle/CloudSlash: The Forensic Cloud Accountant for AWS. Detects zombie resources, vampire NATs, and infrastructure…
The Forensic Cloud Accountant for AWS. Detects zombie resources, vampire NATs, and infrastructure drift using zero-trust heuristics and precision metrics. - DrSkyle/CloudSlash
Unpopular opinion: DORA metrics are becoming "Vanity Metrics" for Engineering Health.
I’ve been looking at our dashboard lately, and on paper, we are an "Elite" team. Deployment frequency is up, and lead time is down.
But if I look at the actual team health? It’s a mess. The Senior Architects are burning out doing code reviews, we are accruing massive tech debt to hit that velocity, and I’m pretty sure we are shipping features that don't actually move the needle just to keep the "deploy count" high.
It feels like DORA measures the efficiency of the pipeline, but not the health of the organization.
I’m trying to move away from just measuring "Output" to measuring "Capacity & Risk" (e.g., Skill Coverage, Bus Factor, Cognitive Load).
Has anyone successfully implemented metrics that measure sustainability rather than just speed? How do you explain to a board that "High Velocity" != "Good Engineering"?
https://redd.it/1ppphjb
@r_devops
I’ve been looking at our dashboard lately, and on paper, we are an "Elite" team. Deployment frequency is up, and lead time is down.
But if I look at the actual team health? It’s a mess. The Senior Architects are burning out doing code reviews, we are accruing massive tech debt to hit that velocity, and I’m pretty sure we are shipping features that don't actually move the needle just to keep the "deploy count" high.
It feels like DORA measures the efficiency of the pipeline, but not the health of the organization.
I’m trying to move away from just measuring "Output" to measuring "Capacity & Risk" (e.g., Skill Coverage, Bus Factor, Cognitive Load).
Has anyone successfully implemented metrics that measure sustainability rather than just speed? How do you explain to a board that "High Velocity" != "Good Engineering"?
https://redd.it/1ppphjb
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do I optimise wasted runs on github actions
This is from one repo that has not been that active in the last 7 days :
\- 39 total CI minutes
\- 14 minutes were non-productive
\- Biggest driver: failed/re-run workflows and Duplicate runs for the same PR
We always assumed “this is normal, but with billing changes, it adds up fast.
I am looking into some tools that could help with this, but I am curious how others are handling this...
\- Do you actively cancel outdated PR runs?
\- Or just accept the cost as the price of speed?
https://redd.it/1pppfsd
@r_devops
This is from one repo that has not been that active in the last 7 days :
\- 39 total CI minutes
\- 14 minutes were non-productive
\- Biggest driver: failed/re-run workflows and Duplicate runs for the same PR
We always assumed “this is normal, but with billing changes, it adds up fast.
I am looking into some tools that could help with this, but I am curious how others are handling this...
\- Do you actively cancel outdated PR runs?
\- Or just accept the cost as the price of speed?
https://redd.it/1pppfsd
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
My "just don't f***ing dance" moment: I just automated 90% of our L2 maintenance team workload and I'm keeping it to myself
I have been an early adopter of the hype around AI, like LLMs and code assistants like copilot and I was hyped too, I honestly I believed the narrative that says AI won't replace replace humans, but it's more of a productivity multiplier, something to augments devs and ops people and the boring tasks like the writing markdowns or docstrings.
Then came the agentic stuff which at first I didn't but much into because I was skeptical about vendor lock in. until MCP came along and suddenly my entire workflow at work changed (at the time i was saying it evolved but I'm not in that mood), Terraform MCP, Context 7, k8s MCP, I was very impressed but still very optimistic about the things we can do with it and how it will improve our daily lives, it connected to our actual infrastructure, it wasn't a gimmick anymore.
Then come Opus 4.5 and Gemini 3 Pro, people called them beasts so I did what I always do, I pushed the enveloppe to see how far it can go.
**And it went far ..**
I built an agentic app that monitors our nightly CI jobs, watches for failures and errors and maybe rerun if necessary or push small fixes.
it also monitors certain apps on our k8s cluster and runs the necessary fixes. These fixes aren't magic, it's everything we documented as guidelines for our L2 maintenance team. the ai agent just .. does it. sometimes better than humans, because it creates bug tickets with a level of details I have never seen from a person.
I was beyond hyped, already planning my presentations and my demo and show them how great all of this is.
**Then it hit me.**
I kept thinking about the scene from the big short when Brad Pitt as Ben told the young bankers when they started celebrating betting against the housing market where he said : *"If we're right, people lose homes. People lose jobs. People lose retirement savings, people lose pensions. You know what I hate about fucking banking? It reduces people to numbers. Here's a number - every 1% unemployment goes up, 40,000 people die, did you know that?"*
What I've made doesn't bet against the housing market, but the L2 maintenance team, actual people in our workforce, people i enjoy having coffee with.
So, I kept it to myself .. but here's the thing, I don't share it, someone else will, or some other company will propose it as a service with the promis of cost reduction and whatnot, my silence won't help anyone, it just means to me that i'm not the one to pull the trigger.
I don't have a clear idea on how to feel about this and i'm not here to moralize anyone, I'm just ... really really confused.
This technology is something genuinely amazing, but somehow, I don't fee like dancing.
**TL;DR:** Built an agentic SRE tool and it effectively was able to replace the whole L2 maintenance team. Realized the human cost and decided to keep it to myself.
UPDATE:
I will pull together my resources together and push them to s github repository, I did my tests in a GCP project, where I deployed a cloud run with a service account that has IAM rights on one of our less important gke clusters it interfaces with gemini and uses a already existing token for gitlab.
I use git mcp, kube mcp and Google Cloud observability mcp plus the gitlab python sdk.
For the guidelines, I exported them as markdown files and added them in the app to use for the context.
The bug tickets were tests tickets send via a http request to jira using my PAT.
https://redd.it/1ppr3qu
@r_devops
I have been an early adopter of the hype around AI, like LLMs and code assistants like copilot and I was hyped too, I honestly I believed the narrative that says AI won't replace replace humans, but it's more of a productivity multiplier, something to augments devs and ops people and the boring tasks like the writing markdowns or docstrings.
Then came the agentic stuff which at first I didn't but much into because I was skeptical about vendor lock in. until MCP came along and suddenly my entire workflow at work changed (at the time i was saying it evolved but I'm not in that mood), Terraform MCP, Context 7, k8s MCP, I was very impressed but still very optimistic about the things we can do with it and how it will improve our daily lives, it connected to our actual infrastructure, it wasn't a gimmick anymore.
Then come Opus 4.5 and Gemini 3 Pro, people called them beasts so I did what I always do, I pushed the enveloppe to see how far it can go.
**And it went far ..**
I built an agentic app that monitors our nightly CI jobs, watches for failures and errors and maybe rerun if necessary or push small fixes.
it also monitors certain apps on our k8s cluster and runs the necessary fixes. These fixes aren't magic, it's everything we documented as guidelines for our L2 maintenance team. the ai agent just .. does it. sometimes better than humans, because it creates bug tickets with a level of details I have never seen from a person.
I was beyond hyped, already planning my presentations and my demo and show them how great all of this is.
**Then it hit me.**
I kept thinking about the scene from the big short when Brad Pitt as Ben told the young bankers when they started celebrating betting against the housing market where he said : *"If we're right, people lose homes. People lose jobs. People lose retirement savings, people lose pensions. You know what I hate about fucking banking? It reduces people to numbers. Here's a number - every 1% unemployment goes up, 40,000 people die, did you know that?"*
What I've made doesn't bet against the housing market, but the L2 maintenance team, actual people in our workforce, people i enjoy having coffee with.
So, I kept it to myself .. but here's the thing, I don't share it, someone else will, or some other company will propose it as a service with the promis of cost reduction and whatnot, my silence won't help anyone, it just means to me that i'm not the one to pull the trigger.
I don't have a clear idea on how to feel about this and i'm not here to moralize anyone, I'm just ... really really confused.
This technology is something genuinely amazing, but somehow, I don't fee like dancing.
**TL;DR:** Built an agentic SRE tool and it effectively was able to replace the whole L2 maintenance team. Realized the human cost and decided to keep it to myself.
UPDATE:
I will pull together my resources together and push them to s github repository, I did my tests in a GCP project, where I deployed a cloud run with a service account that has IAM rights on one of our less important gke clusters it interfaces with gemini and uses a already existing token for gitlab.
I use git mcp, kube mcp and Google Cloud observability mcp plus the gitlab python sdk.
For the guidelines, I exported them as markdown files and added them in the app to use for the context.
The bug tickets were tests tickets send via a http request to jira using my PAT.
https://redd.it/1ppr3qu
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Terraform, Terragrunt ... and Terratest?
I'm tasked with figuring out how to integrate terratest (TT) into a moderately large terraform (TF) repo for AWS resources. The deployment and orchestration is all done with terragrunt (TG) (it passes in the variables, etc.). The organization itself has fully adopted using TG with TF.
My question to you all is about using terratest for integration testing of terraform modules that are themselves orchestrated via terragrunt. My searches for best practices, lessons learned, etc. have returned little useful results. Perhaps most telling, no reddit posts have surfaced that either promote or decry using TF+TG+TT. Even the terratest documentation on Gruntworks has zero mention of terragrunt, and there are zero examples in their provided repositories of using TG+TT.
I'm wondering if anyone has gone down this path before and has any lessons learned they could share (good or bad).
Thanks in advance
https://redd.it/1ppqsxv
@r_devops
I'm tasked with figuring out how to integrate terratest (TT) into a moderately large terraform (TF) repo for AWS resources. The deployment and orchestration is all done with terragrunt (TG) (it passes in the variables, etc.). The organization itself has fully adopted using TG with TF.
My question to you all is about using terratest for integration testing of terraform modules that are themselves orchestrated via terragrunt. My searches for best practices, lessons learned, etc. have returned little useful results. Perhaps most telling, no reddit posts have surfaced that either promote or decry using TF+TG+TT. Even the terratest documentation on Gruntworks has zero mention of terragrunt, and there are zero examples in their provided repositories of using TG+TT.
I'm wondering if anyone has gone down this path before and has any lessons learned they could share (good or bad).
Thanks in advance
https://redd.it/1ppqsxv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is this normal in Devops
I joined my organization last week as Devops intern, 2nd day worked on someones projects built a custom dashboard on cloudwatch , 3rd day got assigned in project also got every accces stage to prod + mac for working and 5 days working is this the best life ? 🤔 or am I missing something....
https://redd.it/1ppuvza
@r_devops
I joined my organization last week as Devops intern, 2nd day worked on someones projects built a custom dashboard on cloudwatch , 3rd day got assigned in project also got every accces stage to prod + mac for working and 5 days working is this the best life ? 🤔 or am I missing something....
https://redd.it/1ppuvza
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Jr DevOps profile. Is it enough?
Hello guys,
I am trying to get my first job in DevOps but I wonder is my profile is even eligible for a company right now. I would really like to have the opinion of the pros to see if I am the kind of person you hire for a jr role. My assets are:
Im a Telecommunications Engineer by the biggest engineering university in Spain (Madrid). I studied in Sweden for a year also, in case that counts for you.
Focus on networking and programming. I know networking and troubleshooting with WireShark and languages like Java, Python, C...
I have only 1 year of experience as an engineer. In a very big tech company, doing things that are hardly related to devOps. I have good referals from my former colleagues at the job.
I just got AWS Cloud Practitioner Certificate.
Now I know this is enough to be hired here, but i am trying to move to another country in EU and I am not sure if this is enough to get interviews. I dont even care about the money right now, i just want to start.
On the meanwhile I am working on small projects on Linux and learning basic devops skills, and see if I can make myself a repository...
https://redd.it/1ppvrjk
@r_devops
Hello guys,
I am trying to get my first job in DevOps but I wonder is my profile is even eligible for a company right now. I would really like to have the opinion of the pros to see if I am the kind of person you hire for a jr role. My assets are:
Im a Telecommunications Engineer by the biggest engineering university in Spain (Madrid). I studied in Sweden for a year also, in case that counts for you.
Focus on networking and programming. I know networking and troubleshooting with WireShark and languages like Java, Python, C...
I have only 1 year of experience as an engineer. In a very big tech company, doing things that are hardly related to devOps. I have good referals from my former colleagues at the job.
I just got AWS Cloud Practitioner Certificate.
Now I know this is enough to be hired here, but i am trying to move to another country in EU and I am not sure if this is enough to get interviews. I dont even care about the money right now, i just want to start.
On the meanwhile I am working on small projects on Linux and learning basic devops skills, and see if I can make myself a repository...
https://redd.it/1ppvrjk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
At what headcount did you feel you lost the "Ground Truth" of your engineering org?
There seems to be a specific breaking point in engineering orgs.
When we were 20 people, I knew everyone’s name, their strengths, and exactly what they shipped yesterday.
Now that we are pushing 60+, I feel like I’m relying entirely on layers of management (Directors/EMs) to tell me what’s going on. It feels like a game of "Broken Telephone"—by the time the signal hits my desk, it’s polished, biased, and often late.
I’m trying to avoid hiring a Chief of Staff (feels expensive right now), but I need a way to get raw visibility without micromanaging or skipping levels.
How do you guys stay plugged into the raw data (Jira/Git signals) without becoming a micromanager?
https://redd.it/1ppwxli
@r_devops
There seems to be a specific breaking point in engineering orgs.
When we were 20 people, I knew everyone’s name, their strengths, and exactly what they shipped yesterday.
Now that we are pushing 60+, I feel like I’m relying entirely on layers of management (Directors/EMs) to tell me what’s going on. It feels like a game of "Broken Telephone"—by the time the signal hits my desk, it’s polished, biased, and often late.
I’m trying to avoid hiring a Chief of Staff (feels expensive right now), but I need a way to get raw visibility without micromanaging or skipping levels.
How do you guys stay plugged into the raw data (Jira/Git signals) without becoming a micromanager?
https://redd.it/1ppwxli
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Migrating from AppDynamics to Datadog
Im wondering if anyone has done a migration from AppDynamics to Datadog and can provide some insight into best practices for noscripting this. I need to parse existing AppDynamics agent config.xml files, pull relevant fields, and place those into the new Datadog agent yaml config file when it is installed.
https://redd.it/1ppy748
@r_devops
Im wondering if anyone has done a migration from AppDynamics to Datadog and can provide some insight into best practices for noscripting this. I need to parse existing AppDynamics agent config.xml files, pull relevant fields, and place those into the new Datadog agent yaml config file when it is installed.
https://redd.it/1ppy748
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community