Reddit DevOps – Telegram
Open source CLI and template for local Kubernetes microservice stacks

Hey all, I created kstack, an open source CLI and reference template for spinning up local Kubernetes environments.

It sets up a kind or k3d cluster and installs Helm-based addons like Prometheus, Grafana, Kafka, Postgres, and an example app. The addons are examples you can replace or extend.

The goal is to have a single, reproducible local setup that feels close to a real environment without writing noscripts or stitching together Helmfiles every time. It’s built on top of kind and k3d rather than replacing them.

k3d support is still experimental, so if you try it and run into issues, please open a PR.

Would be interested to hear how others handle local Kubernetes stacks or what you’d want from a tool like this.

https://redd.it/1o7hrbt
@r_devops
Can a solo founder actually sell on cloud marketplaces (AWS, Azure, etc.)?

I’m 24, from Eastern Europe, with a few startup experiences but no enterprise background.

I’ve got some IaaS/SaaS tool ideas that could fit well on cloud marketplaces like AWS or Azure, but I’m wondering how realistic that is as a solo founder.

Most buyers there seem to be enterprise clients are they even open to buying from small indie vendors, or do they mostly stick with “big name” companies?

Basically: can one-person startups actually make money selling through these marketplaces, or is it too enterprise heavy to be worth it?

Would love to hear from anyone who’s tried it or seen it done successfully.

https://redd.it/1o7n613
@r_devops
senior sre who knew all our incident procedures just left now were screwed

had a p1 last night. database failover wasnt happening automatically. nobody knew the manual process. spent 45min digging through old slack messages trying to find the runbook

found a google doc from 2 years ago. half the commands dont work anymore. infrastructure changed but doc didnt. one step just says "you know what to do here"

finally got someone who worked with the senior sre on the phone at 11pm. they vaguely remembered the process but werent sure about order of operations. we got it working eventually but it took 3x longer than it should have

this person left 2 weeks ago and already we're lost. realized they were the only one who knew how to handle like 6 different critical scenarios

how do you actually capture tribal knowledge before people leave? documenting everything sounds great in theory but nobody maintains docs and they go stale immediately

https://redd.it/1o7p2bq
@r_devops
I’ve been offered a 50% pay hike to move from SRE to CSM. Should I switch or stay technical?

Hey guys,

I started working in tech in 2022 and have been doing mostly sre/devops work (Kubernetes, ansible, CI/CD, some bug fixes, and infra POCs). My current compensation is decent, but my team is going through reorgs and there’s talk of possible layoffs early next year.

I recently got an offer for a Customer Success Manager (it's a post-sales function) role with about a 50% hike. It’s not a hands-on technical role — more customer-facing and focused on account management.

Long term, I actually wanted to go deeper into SRE/Platform/DevOps, but I’m still early in my prep and not interview-ready yet. but this CSM offer seems tempting, especially considering the salary bump

I researched on it and the CS function does seem a bit less stable (twilio & snowflake axed their entire CS departments) but this company seems to be growing (just raised 200 mil), maybe it's possible to make something good out of it?



The big question:
Do I take the CSM offer (better pay, but not aligned with what I originally wanted, I'm happy to explore though)?

Or stay in my current track, prep for 3–6 months, and aim for devops/SRE roles?
Also curious — if anyone has gone the CSM route in tech, how does the career ladder and compensation growth look long term? Is it a smart pivot or a trap?



TL;DR: SRE → CSM offer with 50% pay bump. Should I take it or double down on tech?

View Poll

https://redd.it/1o7njof
@r_devops
One man dev, need nginx help

So i started coding some analytics stuff at work months ago. Ended up making a nice react app with a flask and node back end. Serve it from my desktop to like 20 users per day. I was provisioned a Linux dev server but being I’m a one man show, i don’t really get much help when i have an issue like trying to get my nginx to serve the app. It’s basically xyz.com/abc/ and i need to understand what the nginx config should look like because I’m lead to believe when i build the front end certain files have to be pointed to by nginx? Can anyone steer me in the right direction? Thanks!

https://redd.it/1o7qnu7
@r_devops
Arbitrary Labels Using Karpenter AWS

I'm migrating my current use of Managed Nodegroups to use Karpenter. With Managed Nodegroups, we used abitrary labels to ensure no interference. I'm having difficulty with this in Karpenter.

I've created the following Nodepool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: trino
spec:
disruption:
budgets:
- nodes: 10%
consolidateAfter: 30s
consolidationPolicy: WhenEmptyOrUnderutilized
template:
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: randomthing.io/dedicated
operator: In
values:
- trino
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- m
- key: karpenter.k8s.aws/instance-cpu
operator: In
values:
- "8"
- key: karpenter.k8s.aws/instance-memory
operator: In
values:
- "16384"
taints:
- key: randomthing.io/dedicated
value: trino
effect: NoSchedule
labels:
provisioner: karpenter
randomthing.io/dedicated: trino
weight: 10


However, when I create a pod with the relevant tolerations and nodeselectors, I see: label \"randomthing.io/dedicated\" does not have known values". Is there something that I need to do to get this to work?


https://redd.it/1o7rg7s
@r_devops
Azure DevOps Pipeline Cost Analysis

Hey folks,

I’m looking for recommendations on open source tools (or partially open ones) to analyze the cost of Azure DevOps pipelines — both for builds and releases.

The goal is to give each vertical or team visibility into how much an implementation, build, or service deployment is costing. Ideally, something like OpenCost or any other tool that could help track usage and translate it into cost metrics.

Have any of you done this kind of analysis? What tools or approaches worked best for you?

https://redd.it/1o7qxi4
@r_devops
How do you keep knowledge from walking out the door with your senior SRE?

Our senior SRE left two weeks ago and we already felt the pain. Had a P1 last night, DB failover didn’t trigger, nobody knew the manual steps. Spent 45 minutes digging through Slack until we found a 2-year-old Google Doc full of broken commands and “you know what to do here” notes.

We eventually got it working after calling someone who used to work with them, but it took way longer than it should have.

Docs always sound good in theory, but they rot fast and no one maintains them.
So how do you actually capture this kind of tribal knowledge before people leave? What’s actually worked for your team in real life, not just “we should document better”?

https://redd.it/1o7vgi4
@r_devops
How often does your team actually deploy to production?

Just curious how it looks across teams here
Once a day?
Once a week?
Once a quarter and you pray it works? 😅
Feel free to drop your industry too - fintech, SaaS, gov

https://redd.it/1o7zvlx
@r_devops
Efficient tagging in Terraform

Hi everyone,

I keep encountering the same problem at work. When I write infrastructures in AWS using Terraform, I first make sure that everything is running smoothly. Then I look at the costs and have to store the infrastructure with a tagging logic. This takes a lot of time to do manually. AI agents are quite inaccurate, especially for large projects. Am I the only one with this problem?

Do you have any tools that make this easier? Are there any best practices, or do you have your own noscripts?

https://redd.it/1o82hs8
@r_devops
Backend dev learning DevOps - looking for a mentor

I'm a backend developer who recently joined a startup and realized I want to get into DevOps properly. We don't have a dedicated DevOps team, so I'm trying to learn and eventually become good at this.

I have some backend experience but I'm a complete beginner when it comes to DevOps. I'm learning through courses and documentation but would really value having someone experienced I could reach out to for guidance - someone who can point me in the right direction when I'm stuck or help me understand what to focus on.

Not expecting anyone to teach me everything, just looking for occasional guidance and advice as I learn. Happy to buy you coffee (virtual or IRL if you're in Bengaluru) or help with anything I can in return.

Thanks!

https://redd.it/1o82sg0
@r_devops
Built something to simplify debugging & exploratory testing — looking for honest feedback from fellow devs/testers

Hey everyone 👋

I’ve been building a side project to make debugging and exploratory testing a bit easier. It’s a Chrome extension + dashboard that records what happens during a browser session — clicks, navigation, console output, screenshots — and then lets you replay the entire flow to understand what really happened.

On top of that, it can automatically generate test noscripts for Playwright, Cypress, or Selenium based on your recorded actions. The goal is to turn exploratory testing sessions into ready-to-run automated tests without extra effort.

This came from my own frustration trying to reproduce bugs or document complex steps after a session. I wanted something lightweight, privacy-friendly (no cloud data), and useful for both QA engineers and developers.

I’m now looking for a few people who actually do testing or front-end work to try it out and share honest feedback — what’s helpful, what’s missing, what could make it part of your real workflow.

If you’d be open to giving it a spin (I can offer free access for a year), send me a quick DM and I’ll share the details privately. 🙌

No pressure — just trying to make something genuinely helpful for the community.

https://redd.it/1o81kqo
@r_devops
Was misled into a data analyst role and reluctantly stayed due to lack of options. The projects I worked on helped me indirectly discover the company's salaries. Not sure if it’s better to apply internally or leave knowing this info?

About a year ago, I got a job as a software engineer at a major global tech company. The job denoscription was listed as a software engineer involved with DevOps tools like AWS, Terraform, Docker, and noscripting. The interview process felt standard for tech roles, similar to companies like Amazon, but involved 2 hiring managers present in each interview, which I thought was unusual. It was my first full-time corporate position too, and after facing a one-year gap post-graduation, I thought beggars can’t be choosers.

A few days after starting, however, I was informed that I’d actually be working under the other hiring manager. The original manager, who conducted most of my interviews, didn’t need anyone on his team; instead, my actual manager (the other hiring manager) was the one who needed me. They had posted the job under the original manager’s name because it was tied to his cost center, which had lower salary brackets and more resources for vacancies. I found this out on my own later down the line.

Initially, I didn’t think much of it and decided to see how things played out. At first, I was coding and doing cloud-related tasks. However, after six months, I realized my work was far from what was advertised as approximately 70% of my tasks involved Power Automate, Power BI, and Power Apps, with only 30% on actual dev and cloud work. Given they knew my goals and cloud-centric skills, I felt scammed.

As I came to terms with this, I pretty much lost motivation to learn Power Platform, often utilizing AI for most tasks. What was advertised as a software engineering role turned out to be more of a data analyst position working with upper management. Despite the lack of effort on my part, I still managed to meet deadlines, and my work received recognition, even leading to bonuses and a salary bump eight months in.

Anyways, I’m now 1 year into this job. You might wonder why I’ve stayed till now? Honestly, the role is quite easy. I work remotely and don’t need to exert much brain power on my projects as most require basic research because the company lags behind in current practices. Another big reason I've stuck around is the ability to apply for jobs abroad after staying 1.5 years with the company.

More importantly, however, is that I also unexpectedly hit the “jackpot” in one of my recent projects, where I was provided access to payroll data. By combining projects I worked on, I can indirectly figure out the highest-paying roles and the best countries, offices, and teams to work in. I discovered that my manager earns ten times my salary, with his N+1 earning three times as much and N+2 earning five times as much, respectively. I discovered my country consistently offers the shittiest salaries too, and that I need to get out of here if I ever have the chance.

As a result, I plan to apply internally to the best jobs in my company based on the salary knowledge I’ve now acquired. Since I’m coasting most of the day at my current job, I had initially decided to sharpen my cloud and system design skills and also focus on LeetCode. But I’ve been thinking that, since a lot of my actual work experience over the past year has been data-centric, I could combine that with my previous cloud and DevOps skills to pursue a Data Engineering role, at least marketing myself as such on my resume. I believe my current experience + newly acquired skills would give me a better chance of success in applying for data engineering roles rather than purely DevOps ones.

However, my main concern is needing to learn many more technologies in six months. Thus, my question is, which path is more realistic for my career? Is Data Engineering as future-proof as full-scale infrastructure/system design?

And more importantly, to those with years in the field, what is the smartest career path moving
Opinion on using AI in this interview scenario.

I had an interview recently where I was given a laptop that had Ubuntu 24 preinstalled on it. There was a folder with a 3 tier simple web app written in node js and a readme with the challenge instructions. The challenge was to get a local k8 cluster up and running, create docker files for those 3 small apps, create k8 manifests for deployments, services and network policies and expose the front end.

There were no tools installed by default other than vscode. They did give me sudo. The challenge was documented as supposed to take 90 mins in the readme file even though I was only given 1 hr.

I focused on getting the env set up locally while I had copilot build me the dockerfiles and the k8 manifests. I fixed up the dockerfiles a little bit afterwards to my liking. I got to the point to where I applied the manifests and had all 3 deployments running and a service for each. I was just about to start on egress for the front end, but the 1 hr mark hit and I had to stop.

I was told it was good, but the recruiter vaguely said , “I was told you used ai”. Then the communication stopped. I feel like it’s an unrealistic task for the given time frame so I figured I’d delegate where I can and then quickly double check the dockerfiles and manifests. I had to install docker and minikube, fumble around on the thinkpad with no mouse and that took 10-15 mins right out of the gate. I think one gotcha was that I also had to add liveliness probes too in node. Wasn’t hard, just another bit of context.


I asked up front about using ai and I was just told you’re free to do it how you see fit. I’d just like some opinions on the matter.

https://redd.it/1o868ym
@r_devops
How are teams handling versioning and deployment of large datasets alongside code?

Hey everyone,
I’ve been working on a project that involves managing and serving large datasets both open and proprietary to humans and machine clients (AI agents, noscripts, etc.).

In traditional DevOps pipelines, we have solid version control and CI/CD for code, but when it comes to data, things get messy fast:

Datasets are large, constantly updated, and stored across different systems (S3, Azure, internal repos).
There’s no universal way to “promote” data between environments (dev → staging → prod).
Data provenance and access control are often bolted on, not integrated.

We’ve been experimenting with an approach where datasets are treated like deployable artifacts, with APIs and metadata layers to handle both human and machine access kind of like “DevOps for data.”

Curious:

How do your teams manage dataset versioning and deployment?
Are you using internal tooling, DVC, DataHub, or custom pipelines?
How do you handle proprietary data access or licensing in CI/CD?

(For context, I’m part of a team building *OpenDataBay* a data repository for humans and AI. Mentioning it only because we’re exploring DevOps-style approaches for dataset deliver

https://redd.it/1o88e4g
@r_devops
Best AI red teaming for LLM vulnerability assessment?

Looking for AI red teaming service providers to assess our LLMs before production. Need comprehensive coverage beyond basic prompt injection, things like jailbreaks, data exfiltration, model manipulation, etc.

Key requirements:

Detailed reporting with remediation guidance
Coverage of multimodal inputs (Text, image, video)
False positive/negative rates documented
Compliance artifacts for audit trail

Anyone have experience with providers that deliver actionable findings? Bonus if they can map findings to policy frameworks.

https://redd.it/1o87qk0
@r_devops
Observability cost ownership: chargeback vs. centralized control?

Hey community,


Coming from an Observability Engineering perspective, I’m looking to understand how organizations handle observability spend.

Do you allocate costs to individual teams/applications based on usage, or does the Observability team own a shared, centralized budget?

I’m trying to identify which model drives better cost accountability and optimization outcomes.
If your org has tried both approaches, I’d love to hear what’s worked and what hasn’t.

https://redd.it/1o8ckmq
@r_devops
Raft Protocol Basic Question that trips up EVERYONE!

leader replicates value of current term to a quorum of other servers that accept it, must this value eventually be committed even if leader crashes before committing it?

https://redd.it/1o8fdaw
@r_devops
Is it possible to combine DevOps with C#?

I am a support specialist in fintech (Asia). As part of an internal training program, I was given the choice between two paths: C# or DevOps.

My knowledge of C# (.net) and DevOps is very limited, but I would like to learn more. A developer friend of mine says that they can be studied together for a narrow field (Azure), which has further increased my doubts.

https://redd.it/1o8ieuc
@r_devops