Reddit DevOps – Telegram
CVE scanners generating more work than actual security

our scanner flagged 800+ critical vulnerabilities last week. spent two days going through them. maybe 15 are actually exploitable in our setup.

the rest? dependencies we dont call. libraries sitting in base images that never execute. stuff in dev containers that arent even accessible. but security sees a red dashboard and loses it.

tried explaining to my manager that a CVE in an unused package isnt the same as an internet-facing API vulnerability. didnt land. now we're supposed to drop sprint work to patch things that literally cant be reached.

started just focusing on whats actually exposed and ignoring the noise. feels bad but we cant keep doing emergency patches for theoretical risks while real infra problems pile up.

anyone else just... tired of this? feels like we spend more time arguing about scanner output than actually building secure systems.Retry

https://redd.it/1nv7pt8
@r_devops
GitLab + Digital Ocean CI/CD

I have a digital ocean ubuntu droplet with a nextjs backend and react frontend app with gitlab. Right now the deployment is manual. How difficult is it to do automatic deployment? If I hire someone to do it, how much would it cost and how long does it usually take?

https://redd.it/1nv8xox
@r_devops
Team wants to use Puppet for infra management - am i wrong to question this?

I started a new job 30 days ago. Very large org, but everything is a mess - some stuff is on swarm, some running straight on the metal and everything's an undocumented mess. We're hosting on-premises.

One of my primary tasks has from the get-go been to help get K8s set up so we can start from scratch. Move everything over before it all crashes and burns.

Some consultants has been hired in to assist with this. They're competent and have lots of experience with K8s. They want to use Puppet for managing nodes. I have little Puppet experience, making it hard to make a good case against (and not very fair, as i have no real basis to make my point from), but i see the current landscape and get a bad feeling about this move.

They want to use the OpenVox fork, saying the recent licensing rugpull by Perforce is a non-issue.

I say the fork looks like it's struggling to gain adoption, they say too many orgs rely on Puppet for that to be the case.

I say Puppet is borderline legacy, they say that just means it's feature complete.

I say Ruby is a problem due to lack of in-house ruby knowledge and general scarcity, they say that's a non-issue as we'll only ever have to touch DSL.

I say we have stable alternatives - Talos being my favorite - but they stick to it.

Am i in the wrong here? If not, how do i make my case?

https://redd.it/1nvanye
@r_devops
CI/CD pipeline to test UPDATE process rather than static PR merge result

Has anyone done this before? Looking for good practice here.

Our project suffered a test environment outage due to a PGSQL upgrade process gone wrong. In our CICD pipelines we test the end result on a Minikube environment which is created just for the duration of the CICD pipeline. for the PGSQL upgrade this went fine - because the Minikube environment was not subjected to the upgrade process, just the (static) end result, which started with version 18.

So now we have an idea to test this update process, by first checking out the base commit ID, setup Minikube, deploy our Helm charts, do some tests to generate data (and Kafka messages). Next, checkout the PR commit ID which would be the end result of the PR changes, redeploy the Helm charts, run tests again and watch the results.

Has anybody done this before? Are there some good practices to follow here?

https://redd.it/1nv2hwy
@r_devops
What is Kali Linux and Kali NetHunter, and how are they used in cybersecurity?

I’ve been reading about Kali Linux and Kali NetHunter and how they’re widely mentioned in cybersecurity discussions. From what I understand, Kali Linux is a Debian-based distro built for penetration testing, ethical hacking, and digital forensics, packed with tons of pre-installed security tools.

But I’d like to hear from actual practitioners here..
- How do you (or your teams) actually use Kali in real-world cybersecurity work?
- Is Kali NetHunter really practical for mobile pen-testing, or more of a niche tool?
- Do professionals rely on Kali exclusively, or just as one part of a broader toolkit?

Would love to hear your experiences, best practices, or even limitations you’ve run into.


https://redd.it/1nveezo
@r_devops
When Stability Turns Into Stagnation: Stay or Take the Risk?

Hello, how are you doing? I’d like to share an idea and hear your opinion. I’ve been working with OpenShift and Kubernetes for a few years now. In my current company abroad, the Kubernetes tech lead is a very complicated person. In our 1:1s, he never gave me negative feedback, but I couldn’t stand the way he treated people. I ended up asking to leave — I just couldn’t handle it anymore, and the problem wasn’t me. He even tried to physically assault someone in the company.

I moved to another team and ended up doing only cloud support and a few things, very little with Terraform. I’m feeling a bit frustrated because I spend all day dealing with Kubernetes and cloud issues, and I no longer write a single line of code, whether in Terraform or YAML… and the manager said we are really becoming a support team. I don’t see growth; I feel like I’m going backwards.

Now I’ve received an offer for a DevSecOps role with a pretty good salary, but my current company matched it and says they want me to stay. The problem is that I feel I’m regressing… The company is stable, but the work is always the same. I think over time this could harm me, but at the same time, I’m afraid of leaving and going to a company where I don’t know anyone and have no idea how things will be.

Could you share your opinion, considering security, growth, and risks?

https://redd.it/1nvg6rq
@r_devops
Question about MetBrains DevOps Engineering program - https://www.metbrains.com/

Hi guys, I received this program from someone on LinkedIn. Has anyone taken it before? How is the quality? According to that person, I only need to pay the enrollment fee of CA$483.00 (I'm in Canada). Any feedback is welcome.

https://redd.it/1nvfws8
@r_devops
Learning AWS with a background in Azure DevOps/Services

Hi there,

Im curious whether anyone who already has a background in Azure DevOps/Services had learnt AWS and whether they found it easier/different (due to prior knowledge/concepts).

I’m in a position where I need to now understand both (having had a good 5 years experience in Azure) so wondering what people’s experiences are who have previously followed this path.

https://redd.it/1nvkd44
@r_devops
What are some common issues that get unnoticed for a very long time?

What are some common issues that get unnoticed for a very long time? And what can we do to find them and fix them? Feel free to share.

https://redd.it/1nvng94
@r_devops
Developer platforms vs cloud-native: where do you draw the line?

DevOps community,
When do you recommend teams move from easy platforms (Vercel, Heroku, etc.) to managing their own cloud infrastructure?
What’s usually the breaking point - cost, scale, compliance, team size?
And what’s your experience helping teams make that transition? Any tools that bridge the gap nicely?

https://redd.it/1nvtq0j
@r_devops
Looking for Real-World DevOps Practice Resources/Projects

I'm a backend developer with foundational DevOps experience (VPS deployment, Docker, K8s clusters) and I'm looking to level up my skills with hands-on practice. I'm specifically NOT interested in platform services like Vercel - I want to work with the actual infrastructure layer.

**My current situation:**

* Can deploy applications (done VPS, Docker, K8s)
* Want to practice advanced DevOps concepts
* Don't want to wait until I build complex backends to practice
* Need real-world scenarios and challenges

**What I'm looking for:**

* Practice labs or environments where I can break things and learn
* Projects that simulate production issues (incidents, scaling, monitoring)
* Resources for implementing observability stacks, GitOps, IaC, service mesh, etc.
* Chaos engineering scenarios
* Real infrastructure challenges, not just tutorials

https://redd.it/1nvujkc
@r_devops
Seeking an Advanced AI PR Review Tool that Catches Logical Oversight

Hey everyone,
**TLDR: I'm looking for an AI PR review tool for Azure DevOps that finds deep logical flaws and incomplete features. Claude code catches this oversight FYI**

I'm on the hunt for a truly intelligent AI PR review tool, and I'm hoping to get some recommendations from the community.

I'm looking for a tool that can act more like a human reviewer—an "agentic" tool that can traverse the codebase to understand the full context of a change and point out when a feature is incomplete or logically flawed.

To give a concrete example of what I mean, we recently had a PR that SonarCloud's AI feature completely missed. The goal was to add a "Discontinued" status for products in our e-commerce system.

The developer made these changes:

```diff
// --- a/src/Enums/ProductStatus.cs
public enum ProductStatus {
Available,
OutOfStock,
+ Discontinued,
}

// --- a/src/Models/ProductDetailsDto.cs
public class ProductDetailsDto {
public int Id { get; set; }
public string Name { get; set; }
public bool IsInStock { get; set; }
+ public bool IsDiscontinued { get; set; }
}

// --- a/src/Services/ProductAvailabilityService.cs
public class ProductAvailabilityService {
public ProductStatus GetProductStatus(ProductDetailsDto product) {
// OVERSIGHT: The new 'IsDiscontinued' flag is fetched but never checked!
// An AI reviewer should flag that this new property is unused in the logic that determines status.
if (!product.IsInStock) {
return ProductStatus.OutOfStock;
}

return ProductStatus.Available;
}
}
```

This PR had two major oversights that a human reviewer would spot, but the AI didn't:

1. **The Logical Flaw:** The `ProductAvailabilityService` was never updated to check the `IsDiscontinued` flag. The new `ProductStatus.Discontinued` enum is effectively dead code and would never be returned.
2. **The Architectural Flaw:** The PR introduced the *concept* of a discontinued product but included no endpoint, service, or mechanism to actually *set* a product as discontinued. The feature was fundamentally incomplete.

This is the kind of critical feedback I'm looking for from an AI tool. I want suggestive comments right in the PR that highlight these kinds of oversights.

I know that **"GPT-5-Codex" is good for conducting code reviews and finding critical flaws.** I'm wondering if this level of technology has made its way into any practical tools yet, especially as a plugin for **Azure DevOps**, which is our platform.

So, my question to the community is: **What are you using that can catch these kinds of complex logical issues?**

I'm looking for a tool that:
* Performs deep logical analysis, not just static analysis.
* Is context-aware and can understand the purpose of a change across multiple files.
* Can identify security vulnerabilities that require understanding business logic.
* **Integrates smoothly with Azure DevOps.**
* Writes clear, actionable comments in the PR review.

Have you had success with tools like GitHub Copilot for PRs, CodeRabbit, Bito, Tabnine, or others for these kinds of complex issues? Any hidden gems out there that go beyond the basics?

Thanks in advance for your help

https://redd.it/1nvv5ox
@r_devops
SMS alerts for infra monitoring What’s reliable?

We want to integrate SMS alerts into our monitoring setup for server downtime and urgent incidents. Tried one provider but messages sometimes arrive late, which defeats the point.
Any recommendations for something more reliable than Twilio/Bandwidth?

https://redd.it/1nvwljd
@r_devops
How do you deal with legacy systems that just refuse to die?

I’m at a company that still runs on a bunch of legacy systems, and honestly, it feels like we’re fighting them every day. Any time we try to roll out something new, we get stuck doing a ton of manual work because the old stuff doesn’t play nice.

Half of my time isn’t even spent building, it’s spent babysitting systems that should’ve been retired five years ago. But management doesn’t want to touch them because “they still work.”

Anyone else stuck in this loop? How do you deal with modernizing without breaking half your environment or getting buried in tech debt?

https://redd.it/1nvxjl8
@r_devops
why monorepos??

just got a question can anybody explain me that i have gone through various organizations repos and found that they all are monorepo while in market people craze and talk about the importance of having the microservices.. then why companies prefer to have this monorepo structure only.. vast majorites of repos are all monorepo only.. its because they are old or is there any other reason..

great to know your insights..

https://redd.it/1nvy3if
@r_devops
awsui:A modern Textual-powered AWS CLI TUI

Hi everyone, I'm currently a DevOps/SRE engineer.

# Why build this?

When using the AWS CLI, I sometimes need to switch between multiple profiles. It's easy to forget a profile name, which means I have to spend extra time searching.

So, I needed a tool that not only integrated AWS profile management and quick switching capabilities, but also allowed me to execute AWS CLI commands directly within it. Furthermore, I wanted to be able to directly call AWS Q to perform tasks or ask questions.

# What can awsui do?

Built by Textual, awsui is a completely free and open-source TUI tool that provides the following features:

* Quickly switch and manage AWS profiles.
* Use auto-completion to execute AWS CLI commands without memorizing them.
* Integration with AWS Q eliminates the need to switch between terminal windows.

If you encounter any issues or have features you'd like to see, please feel free to let me know and I'll try to make improvements and fixes as soon as possible.

**GitHub Repo:** [https://github.com/junminhong/awsui](https://github.com/junminhong/awsui)

I hope this helps others facing the same challenges, thanks!

https://redd.it/1nw36nq
@r_devops
New DevOos role has me spread so thin, I feel I'm not able to learn core tools. Normal?

I started a role as a DevOps engineer in April '25 having around 10 years experience as a SysAdmin + 3 years as a Cloud Engineer. The company is a smaller mid-size with 5k employees. The advertised stack I'd be supporting in the role was: AWS, K8s (+Helm, Flux, etc.), some Azure, Terraform, Gitlab... standard stuff.

The issue I'm having is that for the last several months, I am bombarded by ad-hoc tasks that have pulled me away from the core tech stack listed above, being assigned things like ticket resolution, administrative or small SysAdmin troubleshooting tasks, billing tasks, user issues with tooling, VM deployments, contract renewals, and the list goes on and on. I feel as though I'm drowning in these types of "one-off" issues, and have been hardly able to upskill on the core tools pitched to me during my the interview process.

One major issue I see is the lack of any project management staff/office at this employer, which has lead to several projects running late with no real defined requirements. Additionally, teams run a pseudo-scrum workflow, but there doesn't seem to be any timelines set for projects because we are constantly shifting our priorities. There is no formal ticket queue management process. Folks just pick them up as they see fit, and most fall on the junior folks.

I'm increasingly frustrated at my inability to focus on a project due to the context switching being asked of me, and cannot see how I will be able to attain mastery of some of these core DevOps tools in this environment.

Hoping to hear from some of you folks regarding your experiences and if this sounds normal or not?

Thanks!

https://redd.it/1nw52pd
@r_devops
On-prem IaC: Where do you draw the line between Terraform and Ansible?

At my new job we manage on-prem infra and are automating with Ansible. The cloud teams here rely heavily on Terraform, which got me wondering:

Does Terraform really have a place on-prem?

If so, where do you draw the line between Terraform and Ansible (or maybe other tools)?

I understand it that terraform is for provisioning and ansible is for configuring

Curious what you guys think about it
Cheers 😄

https://redd.it/1nw6sga
@r_devops
Got blamed for an outage I didn’t even cause

We had a rough incident last week where staging went down for hours. The root cause was a terraform destroy that got executed by an automated job after a junior triggered it.

In the postmortem, the blame still landed on me since I own infra. The reality is I never pushed a button, Terraform just followed the instructions it was given, and the pipeline behaved exactly as designed.

That said, it was on me to get things back online. I re-synced the state, made a few YAML changes, redeployed services, and eventually got staging running again.

Has anyone else had to deal with cleaning up a major mess caused by someone else, but still ended up carrying the responsibility?

https://redd.it/1nw756h
@r_devops
Biometric Breach Vector: The reverse search tool that bypassed our basic IAM and access controls.

I run a small DevOps team, and we were doing a casual security audit on our personal digital footprints. We used a powerful biometric search engine to test the integrity of our own security posture.

The audit started with faceseek. I uploaded an ancient, low-res picture of one of our developers that was on a completely separate, non-work-related platform. The goal was to see if the tool could map that face to anything related to our company.

The terrifying discovery: It mapped that single photo to a non-face PFP used on a personal Gitlab repo that contained a legacy, exposed API key (the developer thought the repo was locked down and unindexed). The biometrics served as the unexpected bridge between personal life and professional exposure.

This is a serious security vector. It proves that the weakest link in our Identity and Access Management (IAM) isn't the password or the 2FA token; it's the permanently indexed biometric hash of our team members.

Question for r/devops: How are you integrating biometric threat awareness into your security pipeline? Is anyone using tooling in their CI/CD to audit their own employees' publicly indexable biometric data to preemptively find these kinds of cross-platform security vulnerabilities? We need to treat this as a systemic risk.

https://redd.it/1nw9fm5
@r_devops
Devops/sre engineer with 10 years of experience how to get into quant firms?

Hi all

I’ve been working as an SRE/DevOps engineer for 10 years (CI/CD, infra automation, deployments, monitoring etc). Lately I’ve been curious about roles in quant/prop trading firms.

For someone with my background, should I focus on: • Linux internals & low-level system performance? • Programming (C++/Python) for low-latency systems? • Or just keep building infra/data pipelines?

Also, what roles make sense for me — quant dev, trading infra engineer, low-latency SRE?

Anyone here actually doing SRE/infra at a quant shop — would love to hear what skills really matter and how different it is from regular tech companies.

Thanks!

https://redd.it/1nwaj4v
@r_devops