Reddit DevOps – Telegram
How do you keep knowledge from walking out the door with your senior SRE?

Our senior SRE left two weeks ago and we already felt the pain. Had a P1 last night, DB failover didn’t trigger, nobody knew the manual steps. Spent 45 minutes digging through Slack until we found a 2-year-old Google Doc full of broken commands and “you know what to do here” notes.

We eventually got it working after calling someone who used to work with them, but it took way longer than it should have.

Docs always sound good in theory, but they rot fast and no one maintains them.
So how do you actually capture this kind of tribal knowledge before people leave? What’s actually worked for your team in real life, not just “we should document better”?

https://redd.it/1o7vgi4
@r_devops
How often does your team actually deploy to production?

Just curious how it looks across teams here
Once a day?
Once a week?
Once a quarter and you pray it works? 😅
Feel free to drop your industry too - fintech, SaaS, gov

https://redd.it/1o7zvlx
@r_devops
Efficient tagging in Terraform

Hi everyone,

I keep encountering the same problem at work. When I write infrastructures in AWS using Terraform, I first make sure that everything is running smoothly. Then I look at the costs and have to store the infrastructure with a tagging logic. This takes a lot of time to do manually. AI agents are quite inaccurate, especially for large projects. Am I the only one with this problem?

Do you have any tools that make this easier? Are there any best practices, or do you have your own noscripts?

https://redd.it/1o82hs8
@r_devops
Backend dev learning DevOps - looking for a mentor

I'm a backend developer who recently joined a startup and realized I want to get into DevOps properly. We don't have a dedicated DevOps team, so I'm trying to learn and eventually become good at this.

I have some backend experience but I'm a complete beginner when it comes to DevOps. I'm learning through courses and documentation but would really value having someone experienced I could reach out to for guidance - someone who can point me in the right direction when I'm stuck or help me understand what to focus on.

Not expecting anyone to teach me everything, just looking for occasional guidance and advice as I learn. Happy to buy you coffee (virtual or IRL if you're in Bengaluru) or help with anything I can in return.

Thanks!

https://redd.it/1o82sg0
@r_devops
Built something to simplify debugging & exploratory testing — looking for honest feedback from fellow devs/testers

Hey everyone 👋

I’ve been building a side project to make debugging and exploratory testing a bit easier. It’s a Chrome extension + dashboard that records what happens during a browser session — clicks, navigation, console output, screenshots — and then lets you replay the entire flow to understand what really happened.

On top of that, it can automatically generate test noscripts for Playwright, Cypress, or Selenium based on your recorded actions. The goal is to turn exploratory testing sessions into ready-to-run automated tests without extra effort.

This came from my own frustration trying to reproduce bugs or document complex steps after a session. I wanted something lightweight, privacy-friendly (no cloud data), and useful for both QA engineers and developers.

I’m now looking for a few people who actually do testing or front-end work to try it out and share honest feedback — what’s helpful, what’s missing, what could make it part of your real workflow.

If you’d be open to giving it a spin (I can offer free access for a year), send me a quick DM and I’ll share the details privately. 🙌

No pressure — just trying to make something genuinely helpful for the community.

https://redd.it/1o81kqo
@r_devops
Was misled into a data analyst role and reluctantly stayed due to lack of options. The projects I worked on helped me indirectly discover the company's salaries. Not sure if it’s better to apply internally or leave knowing this info?

About a year ago, I got a job as a software engineer at a major global tech company. The job denoscription was listed as a software engineer involved with DevOps tools like AWS, Terraform, Docker, and noscripting. The interview process felt standard for tech roles, similar to companies like Amazon, but involved 2 hiring managers present in each interview, which I thought was unusual. It was my first full-time corporate position too, and after facing a one-year gap post-graduation, I thought beggars can’t be choosers.

A few days after starting, however, I was informed that I’d actually be working under the other hiring manager. The original manager, who conducted most of my interviews, didn’t need anyone on his team; instead, my actual manager (the other hiring manager) was the one who needed me. They had posted the job under the original manager’s name because it was tied to his cost center, which had lower salary brackets and more resources for vacancies. I found this out on my own later down the line.

Initially, I didn’t think much of it and decided to see how things played out. At first, I was coding and doing cloud-related tasks. However, after six months, I realized my work was far from what was advertised as approximately 70% of my tasks involved Power Automate, Power BI, and Power Apps, with only 30% on actual dev and cloud work. Given they knew my goals and cloud-centric skills, I felt scammed.

As I came to terms with this, I pretty much lost motivation to learn Power Platform, often utilizing AI for most tasks. What was advertised as a software engineering role turned out to be more of a data analyst position working with upper management. Despite the lack of effort on my part, I still managed to meet deadlines, and my work received recognition, even leading to bonuses and a salary bump eight months in.

Anyways, I’m now 1 year into this job. You might wonder why I’ve stayed till now? Honestly, the role is quite easy. I work remotely and don’t need to exert much brain power on my projects as most require basic research because the company lags behind in current practices. Another big reason I've stuck around is the ability to apply for jobs abroad after staying 1.5 years with the company.

More importantly, however, is that I also unexpectedly hit the “jackpot” in one of my recent projects, where I was provided access to payroll data. By combining projects I worked on, I can indirectly figure out the highest-paying roles and the best countries, offices, and teams to work in. I discovered that my manager earns ten times my salary, with his N+1 earning three times as much and N+2 earning five times as much, respectively. I discovered my country consistently offers the shittiest salaries too, and that I need to get out of here if I ever have the chance.

As a result, I plan to apply internally to the best jobs in my company based on the salary knowledge I’ve now acquired. Since I’m coasting most of the day at my current job, I had initially decided to sharpen my cloud and system design skills and also focus on LeetCode. But I’ve been thinking that, since a lot of my actual work experience over the past year has been data-centric, I could combine that with my previous cloud and DevOps skills to pursue a Data Engineering role, at least marketing myself as such on my resume. I believe my current experience + newly acquired skills would give me a better chance of success in applying for data engineering roles rather than purely DevOps ones.

However, my main concern is needing to learn many more technologies in six months. Thus, my question is, which path is more realistic for my career? Is Data Engineering as future-proof as full-scale infrastructure/system design?

And more importantly, to those with years in the field, what is the smartest career path moving
Opinion on using AI in this interview scenario.

I had an interview recently where I was given a laptop that had Ubuntu 24 preinstalled on it. There was a folder with a 3 tier simple web app written in node js and a readme with the challenge instructions. The challenge was to get a local k8 cluster up and running, create docker files for those 3 small apps, create k8 manifests for deployments, services and network policies and expose the front end.

There were no tools installed by default other than vscode. They did give me sudo. The challenge was documented as supposed to take 90 mins in the readme file even though I was only given 1 hr.

I focused on getting the env set up locally while I had copilot build me the dockerfiles and the k8 manifests. I fixed up the dockerfiles a little bit afterwards to my liking. I got to the point to where I applied the manifests and had all 3 deployments running and a service for each. I was just about to start on egress for the front end, but the 1 hr mark hit and I had to stop.

I was told it was good, but the recruiter vaguely said , “I was told you used ai”. Then the communication stopped. I feel like it’s an unrealistic task for the given time frame so I figured I’d delegate where I can and then quickly double check the dockerfiles and manifests. I had to install docker and minikube, fumble around on the thinkpad with no mouse and that took 10-15 mins right out of the gate. I think one gotcha was that I also had to add liveliness probes too in node. Wasn’t hard, just another bit of context.


I asked up front about using ai and I was just told you’re free to do it how you see fit. I’d just like some opinions on the matter.

https://redd.it/1o868ym
@r_devops
How are teams handling versioning and deployment of large datasets alongside code?

Hey everyone,
I’ve been working on a project that involves managing and serving large datasets both open and proprietary to humans and machine clients (AI agents, noscripts, etc.).

In traditional DevOps pipelines, we have solid version control and CI/CD for code, but when it comes to data, things get messy fast:

Datasets are large, constantly updated, and stored across different systems (S3, Azure, internal repos).
There’s no universal way to “promote” data between environments (dev → staging → prod).
Data provenance and access control are often bolted on, not integrated.

We’ve been experimenting with an approach where datasets are treated like deployable artifacts, with APIs and metadata layers to handle both human and machine access kind of like “DevOps for data.”

Curious:

How do your teams manage dataset versioning and deployment?
Are you using internal tooling, DVC, DataHub, or custom pipelines?
How do you handle proprietary data access or licensing in CI/CD?

(For context, I’m part of a team building *OpenDataBay* a data repository for humans and AI. Mentioning it only because we’re exploring DevOps-style approaches for dataset deliver

https://redd.it/1o88e4g
@r_devops
Best AI red teaming for LLM vulnerability assessment?

Looking for AI red teaming service providers to assess our LLMs before production. Need comprehensive coverage beyond basic prompt injection, things like jailbreaks, data exfiltration, model manipulation, etc.

Key requirements:

Detailed reporting with remediation guidance
Coverage of multimodal inputs (Text, image, video)
False positive/negative rates documented
Compliance artifacts for audit trail

Anyone have experience with providers that deliver actionable findings? Bonus if they can map findings to policy frameworks.

https://redd.it/1o87qk0
@r_devops
Observability cost ownership: chargeback vs. centralized control?

Hey community,


Coming from an Observability Engineering perspective, I’m looking to understand how organizations handle observability spend.

Do you allocate costs to individual teams/applications based on usage, or does the Observability team own a shared, centralized budget?

I’m trying to identify which model drives better cost accountability and optimization outcomes.
If your org has tried both approaches, I’d love to hear what’s worked and what hasn’t.

https://redd.it/1o8ckmq
@r_devops
Raft Protocol Basic Question that trips up EVERYONE!

leader replicates value of current term to a quorum of other servers that accept it, must this value eventually be committed even if leader crashes before committing it?

https://redd.it/1o8fdaw
@r_devops
Is it possible to combine DevOps with C#?

I am a support specialist in fintech (Asia). As part of an internal training program, I was given the choice between two paths: C# or DevOps.

My knowledge of C# (.net) and DevOps is very limited, but I would like to learn more. A developer friend of mine says that they can be studied together for a narrow field (Azure), which has further increased my doubts.

https://redd.it/1o8ieuc
@r_devops
Local dev for analytics stacks: ClickHouse + Redpanda + OLTP in one command

Created a demo application where the dev server (run with moose dev spins up your entire CDC pipeline's infrastructure: Postgres, Debezium, Redpanda, Stream Sync, ClickHouse, the whole shebang.

Repo: https://github.com/514-labs/debezium-cdc/tree/main
Blog: https://www.fiveonefour.com/blog/cdc-postgres-to-clickhouse-debezium-drizzle


In the application, there's a docker compose override file that allows this (direct link: https://github.com/514-labs/debezium-cdc/blob/main/docker-compose.dev.override.yaml ).


What do y'all think of this approach?

I am thinking of adding file-watcher support to the code relating to the additional infrastructure supported. Are there any local dev experiences like that now?

https://redd.it/1o8jnmr
@r_devops
How can I build a side hustle using my Cloud & DevOps skills?

Hey everyone,
I work full-time as a Cloud/DevOps Engineer mainly focused on Azure, Terraform, Kubernetes, and automation. I’ve tried freelancing on Upwork and Fiverr, but it doesn’t seem worth it the competition is mostly based on price rather than skill or quality.

I’m looking for ideas or examples of how someone with my background can build a side hustle or business outside of traditional freelancing, maybe something like offering specialized services, automation, or creating small SaaS tools.

Has anyone here done something similar or found a good path to monetize their cloud/DevOps expertise on the side?

Would appreciate any guidance or real-world examples!

https://redd.it/1o8ji75
@r_devops
Interesting in giving input on some new helpful tools?

Hey fellow devops geeks & nerds, 

I am part of the team of seasoned engineers with lots of war stories who wanted to help our teams doing devops work.

We finally have enough code checked in to reach our free open beta. We would really appreciate it if anyone is interested in participating to sign up for free here: https://app.ingenimax.ai/auth/login?screen\_hint=signup (no cc req, no sales pressure we promise!) and give us feedback and input on what we are building.

It’s still early days and we know you all will have a ton of practical insight to help us see if we are doing something useful, and shape this into the best tool it can be

Appreciate it!

EDIT: So sorry there was no link to more details, reddit was really giving me grief posting this at all, not sure why. Let's see if this link sticks: https://www.starops.dev

https://redd.it/1o8emz2
@r_devops
What's the most proudest tool you've made at your work?

What's the most proudest custom noscript/tool/system you've developed/implemented at your work?

https://redd.it/1o8pa54
@r_devops
How do you actually think outside the box, remember stuff like tags and elements, and not feel useless seeing AI build websites in seconds?

So I’ve been learning full-stack (basic)— HTML, CSS, a bit of JS — and I’m realizing something. It’s not the syntax that’s hard, it’s actually remembering everything and knowing how to apply it creatively.

Every time I try to make something on my own, I end up stuck thinking “wait, what was that tag again?” or “how did that layout even work?” and it slows me down so much that I lose motivation.

On top of that, I keep seeing reels and videos of AI tools that generate full websites in under a minute. It honestly messes with my head. I start wondering — why am I even learning all this if AI can just do it better and faster? I know those demos probably skip the hard parts, but still, it feels discouraging.

So I wanted to ask people here who’ve been through this — how do you deal with that feeling? How do you stay creative and keep learning when it feels like machines are getting better at what you’re trying to master?

Also, what helped you actually remember HTML/CSS/JS concepts long-term? Like not just understanding them once, but being able to recall and use them naturally later.

I’m not asking for a “study plan” or “10 tricks to learn faster.” I just want honest advice or perspective from someone who’s been where I am right now — stuck between learning and doubting if it’s even worth it.

https://redd.it/1o8pnp4
@r_devops
On the edge server for hls streaming

I'd like to stream hls streams directly to a mobile app from an edge device. I'm thinking about using an nginx web server coupled with jwt authorization on python authentication backend. What do you guys thnk about this architecture? Is it secure ad I will expose the device port to the public?

https://redd.it/1o8tcjm
@r_devops
Stop saying "10x Developer" now that Copilot writes the boilerplate. We need new metrics.

Is anyone else terrified of their codebase right now? My team's "velocity" is up $40\\%$ thanks to LLM copilots, but half the new code feels like highly optimized technical debt. We’re shipping faster, but I spend more time debating if the AI’s solution is correct or just plausible. What metrics do you trust besides commit counts?

https://redd.it/1o8tbgw
@r_devops