Reddit DevOps – Telegram
Does your company run staging servers?

I'm curious to know how you guys work with staging servers in the real world.... (not my Hobbyist world). At work we have a mix between teams being small enough that testing locally is enough, or the opposite end of having a 64GB staging server on 24/7.

Do you share 1 staging server between teams (if your org is big enough for that)? Do you get per PR staging environments? Does your staging env run on a schedule? Do you have no staging server.... review code and deploy to prod!

Genuinely curious, thanks! Poll for if you don't want to put a comment :)

View Poll

https://redd.it/1o7a8eo
@r_devops
I created a DevOps newsletter/blog for solo developers to give back to the community

Hey everyone

I’m a DevOps engineer with over 7 years of experience, and I recently started working on a side project that combines two things I really enjoy — technical writing and giving back to the community.

Over the years I’ve received (and you did either) tons of questions from solo developers and small teams:

“Can you help me deploy this?”
“Why is infra so complicated ?”
"What is better AWS/GCP" "How to do this, how to do that"

After repeating the same explanations many times, I decided to turn it into something useful for more people, a blog/newsletter called **IndieDevOps**.

It’s all about practical DevOps. Simple, hands-on guides on how to deploy, monitor, and scale without the complexity of traditional infrastructure.

The project is still very new, so please don’t be too harsh if something doesn’t work perfectly 😅 . I’m still experimenting and finding the best format.

If you like the topic and want to follow along, you’re very welcome to subscribe or just check out.
https://indiedevops.com

Would love to hear your thoughts.

https://redd.it/1o7c1j7
@r_devops
[Guide] Implementing Zero Trust in Kubernetes with Istio Service Mesh - Production Experience

I wrote a comprehensive guide on implementing Zero Trust architecture in Kubernetes using Istio service mesh, based on managing production EKS clusters for regulated industries.

**TL;DR:**

* AKS clusters get attacked within 18 minutes of deployment
* Service mesh provides mTLS, fine-grained authorization, and observability
* Real code examples, cost analysis, and production pitfalls

**What's covered:**

✓ Step-by-step Istio installation on EKS

✓ mTLS configuration (strict mode)

✓ Authorization policies (deny-by-default)

✓ JWT validation for external APIs

✓ Egress control

✓ AWS IAM integration

✓ Observability stack (Prometheus, Grafana, Kiali)

✓ Performance considerations (1-3ms latency overhead)

✓ Cost analysis (\~$414/month for 100-pod cluster)

✓ Common pitfalls and migration strategies

Would love feedback from anyone implementing similar architectures!

Article is [here](https://medium.com/@heinancabouly/zero-trust-for-kubernetes-implementing-service-mesh-security-529adb66665a)

https://redd.it/1o7d35b
@r_devops
I created an external reporting tool for SonarQube Community Edition

Hello everyone!

As a frequent user of SonarQube Community Edition, both personally and professionally, I always have the problems of distributing the results of a scan due to the lack of reporting mechanisms.

Therefore, I created a tool called ReflectSonar. It reads the data via API and generates a PDF report for general metrics, issues, security hotspots and triggered rules.

I’d be more than happy to see your opinions, ideas and contributions! If you have any questions, please do not hesitate to contact me.

Here is the Github link: https://github.com/ataseren/reflectsonar
You can also use: pip install reflectsonar

https://redd.it/1o7cs35
@r_devops
Getting my feet wet with DevOps at my day job

Hi there!

I'm the tech lead at a startup and I'm looking to grow our DevOps practices and bring IaC to help scale our server infrastructure.


Currently, we have two envs (Dev and Prod). Dev is currently in one region only, with plans to add a second with this process to test things closer to prod. Prod is currently deployed to 3 geographic regions (Canada, US, and UK) with plans for more.


Our GO Microservices app(s) run in GCP Cloud run with a Postgres database.


I know running on a single DB defeats the purpose of microservices, but that's a whole other conversation of why I've chosen them.

I'm looking for feedback on project structure and tools I should be using.

We're very bootstrappy so I'm trying to keep to open source tooling. My trust on free tier corporations isn't high.


Current tool ideas:

\- OpenTofu

\- Atlantis

\- Github for PRs

I'm planning on deployinbg Atlantis in cloud run as well in it's own project.

Am I missing something critical?

As far as project structure, I'd love suggestions.

Thank you kinly!

https://redd.it/1o7gnxw
@r_devops
Creating Mongodb collection on azure using openshift pipeline

Any idea how to automate creating mongodb collection on azure cosmos db with specific RUs, selecting auto sacle option and indexes with ttl one week using pipeline on openshift ?

The reason is I have a pipeline that takes backup of collections and then drop the collections and upload the data on azure to store it for later retrieval and instead of recreating it manually I want to automate it.

https://redd.it/1o7ilvm
@r_devops
Is chainguard missing Ubuntu image?

Why don't I see chainguard Ubuntu image? Thought that was basic one, or we should not use Ubuntu at all

https://redd.it/1o7igaf
@r_devops
Open source CLI and template for local Kubernetes microservice stacks

Hey all, I created kstack, an open source CLI and reference template for spinning up local Kubernetes environments.

It sets up a kind or k3d cluster and installs Helm-based addons like Prometheus, Grafana, Kafka, Postgres, and an example app. The addons are examples you can replace or extend.

The goal is to have a single, reproducible local setup that feels close to a real environment without writing noscripts or stitching together Helmfiles every time. It’s built on top of kind and k3d rather than replacing them.

k3d support is still experimental, so if you try it and run into issues, please open a PR.

Would be interested to hear how others handle local Kubernetes stacks or what you’d want from a tool like this.

https://redd.it/1o7hrbt
@r_devops
Can a solo founder actually sell on cloud marketplaces (AWS, Azure, etc.)?

I’m 24, from Eastern Europe, with a few startup experiences but no enterprise background.

I’ve got some IaaS/SaaS tool ideas that could fit well on cloud marketplaces like AWS or Azure, but I’m wondering how realistic that is as a solo founder.

Most buyers there seem to be enterprise clients are they even open to buying from small indie vendors, or do they mostly stick with “big name” companies?

Basically: can one-person startups actually make money selling through these marketplaces, or is it too enterprise heavy to be worth it?

Would love to hear from anyone who’s tried it or seen it done successfully.

https://redd.it/1o7n613
@r_devops
senior sre who knew all our incident procedures just left now were screwed

had a p1 last night. database failover wasnt happening automatically. nobody knew the manual process. spent 45min digging through old slack messages trying to find the runbook

found a google doc from 2 years ago. half the commands dont work anymore. infrastructure changed but doc didnt. one step just says "you know what to do here"

finally got someone who worked with the senior sre on the phone at 11pm. they vaguely remembered the process but werent sure about order of operations. we got it working eventually but it took 3x longer than it should have

this person left 2 weeks ago and already we're lost. realized they were the only one who knew how to handle like 6 different critical scenarios

how do you actually capture tribal knowledge before people leave? documenting everything sounds great in theory but nobody maintains docs and they go stale immediately

https://redd.it/1o7p2bq
@r_devops
I’ve been offered a 50% pay hike to move from SRE to CSM. Should I switch or stay technical?

Hey guys,

I started working in tech in 2022 and have been doing mostly sre/devops work (Kubernetes, ansible, CI/CD, some bug fixes, and infra POCs). My current compensation is decent, but my team is going through reorgs and there’s talk of possible layoffs early next year.

I recently got an offer for a Customer Success Manager (it's a post-sales function) role with about a 50% hike. It’s not a hands-on technical role — more customer-facing and focused on account management.

Long term, I actually wanted to go deeper into SRE/Platform/DevOps, but I’m still early in my prep and not interview-ready yet. but this CSM offer seems tempting, especially considering the salary bump

I researched on it and the CS function does seem a bit less stable (twilio & snowflake axed their entire CS departments) but this company seems to be growing (just raised 200 mil), maybe it's possible to make something good out of it?



The big question:
Do I take the CSM offer (better pay, but not aligned with what I originally wanted, I'm happy to explore though)?

Or stay in my current track, prep for 3–6 months, and aim for devops/SRE roles?
Also curious — if anyone has gone the CSM route in tech, how does the career ladder and compensation growth look long term? Is it a smart pivot or a trap?



TL;DR: SRE → CSM offer with 50% pay bump. Should I take it or double down on tech?

View Poll

https://redd.it/1o7njof
@r_devops
One man dev, need nginx help

So i started coding some analytics stuff at work months ago. Ended up making a nice react app with a flask and node back end. Serve it from my desktop to like 20 users per day. I was provisioned a Linux dev server but being I’m a one man show, i don’t really get much help when i have an issue like trying to get my nginx to serve the app. It’s basically xyz.com/abc/ and i need to understand what the nginx config should look like because I’m lead to believe when i build the front end certain files have to be pointed to by nginx? Can anyone steer me in the right direction? Thanks!

https://redd.it/1o7qnu7
@r_devops
Arbitrary Labels Using Karpenter AWS

I'm migrating my current use of Managed Nodegroups to use Karpenter. With Managed Nodegroups, we used abitrary labels to ensure no interference. I'm having difficulty with this in Karpenter.

I've created the following Nodepool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: trino
spec:
disruption:
budgets:
- nodes: 10%
consolidateAfter: 30s
consolidationPolicy: WhenEmptyOrUnderutilized
template:
spec:
expireAfter: 720h
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: randomthing.io/dedicated
operator: In
values:
- trino
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- m
- key: karpenter.k8s.aws/instance-cpu
operator: In
values:
- "8"
- key: karpenter.k8s.aws/instance-memory
operator: In
values:
- "16384"
taints:
- key: randomthing.io/dedicated
value: trino
effect: NoSchedule
labels:
provisioner: karpenter
randomthing.io/dedicated: trino
weight: 10


However, when I create a pod with the relevant tolerations and nodeselectors, I see: label \"randomthing.io/dedicated\" does not have known values". Is there something that I need to do to get this to work?


https://redd.it/1o7rg7s
@r_devops
Azure DevOps Pipeline Cost Analysis

Hey folks,

I’m looking for recommendations on open source tools (or partially open ones) to analyze the cost of Azure DevOps pipelines — both for builds and releases.

The goal is to give each vertical or team visibility into how much an implementation, build, or service deployment is costing. Ideally, something like OpenCost or any other tool that could help track usage and translate it into cost metrics.

Have any of you done this kind of analysis? What tools or approaches worked best for you?

https://redd.it/1o7qxi4
@r_devops
How do you keep knowledge from walking out the door with your senior SRE?

Our senior SRE left two weeks ago and we already felt the pain. Had a P1 last night, DB failover didn’t trigger, nobody knew the manual steps. Spent 45 minutes digging through Slack until we found a 2-year-old Google Doc full of broken commands and “you know what to do here” notes.

We eventually got it working after calling someone who used to work with them, but it took way longer than it should have.

Docs always sound good in theory, but they rot fast and no one maintains them.
So how do you actually capture this kind of tribal knowledge before people leave? What’s actually worked for your team in real life, not just “we should document better”?

https://redd.it/1o7vgi4
@r_devops
How often does your team actually deploy to production?

Just curious how it looks across teams here
Once a day?
Once a week?
Once a quarter and you pray it works? 😅
Feel free to drop your industry too - fintech, SaaS, gov

https://redd.it/1o7zvlx
@r_devops
Efficient tagging in Terraform

Hi everyone,

I keep encountering the same problem at work. When I write infrastructures in AWS using Terraform, I first make sure that everything is running smoothly. Then I look at the costs and have to store the infrastructure with a tagging logic. This takes a lot of time to do manually. AI agents are quite inaccurate, especially for large projects. Am I the only one with this problem?

Do you have any tools that make this easier? Are there any best practices, or do you have your own noscripts?

https://redd.it/1o82hs8
@r_devops
Backend dev learning DevOps - looking for a mentor

I'm a backend developer who recently joined a startup and realized I want to get into DevOps properly. We don't have a dedicated DevOps team, so I'm trying to learn and eventually become good at this.

I have some backend experience but I'm a complete beginner when it comes to DevOps. I'm learning through courses and documentation but would really value having someone experienced I could reach out to for guidance - someone who can point me in the right direction when I'm stuck or help me understand what to focus on.

Not expecting anyone to teach me everything, just looking for occasional guidance and advice as I learn. Happy to buy you coffee (virtual or IRL if you're in Bengaluru) or help with anything I can in return.

Thanks!

https://redd.it/1o82sg0
@r_devops
Built something to simplify debugging & exploratory testing — looking for honest feedback from fellow devs/testers

Hey everyone 👋

I’ve been building a side project to make debugging and exploratory testing a bit easier. It’s a Chrome extension + dashboard that records what happens during a browser session — clicks, navigation, console output, screenshots — and then lets you replay the entire flow to understand what really happened.

On top of that, it can automatically generate test noscripts for Playwright, Cypress, or Selenium based on your recorded actions. The goal is to turn exploratory testing sessions into ready-to-run automated tests without extra effort.

This came from my own frustration trying to reproduce bugs or document complex steps after a session. I wanted something lightweight, privacy-friendly (no cloud data), and useful for both QA engineers and developers.

I’m now looking for a few people who actually do testing or front-end work to try it out and share honest feedback — what’s helpful, what’s missing, what could make it part of your real workflow.

If you’d be open to giving it a spin (I can offer free access for a year), send me a quick DM and I’ll share the details privately. 🙌

No pressure — just trying to make something genuinely helpful for the community.

https://redd.it/1o81kqo
@r_devops
Was misled into a data analyst role and reluctantly stayed due to lack of options. The projects I worked on helped me indirectly discover the company's salaries. Not sure if it’s better to apply internally or leave knowing this info?

About a year ago, I got a job as a software engineer at a major global tech company. The job denoscription was listed as a software engineer involved with DevOps tools like AWS, Terraform, Docker, and noscripting. The interview process felt standard for tech roles, similar to companies like Amazon, but involved 2 hiring managers present in each interview, which I thought was unusual. It was my first full-time corporate position too, and after facing a one-year gap post-graduation, I thought beggars can’t be choosers.

A few days after starting, however, I was informed that I’d actually be working under the other hiring manager. The original manager, who conducted most of my interviews, didn’t need anyone on his team; instead, my actual manager (the other hiring manager) was the one who needed me. They had posted the job under the original manager’s name because it was tied to his cost center, which had lower salary brackets and more resources for vacancies. I found this out on my own later down the line.

Initially, I didn’t think much of it and decided to see how things played out. At first, I was coding and doing cloud-related tasks. However, after six months, I realized my work was far from what was advertised as approximately 70% of my tasks involved Power Automate, Power BI, and Power Apps, with only 30% on actual dev and cloud work. Given they knew my goals and cloud-centric skills, I felt scammed.

As I came to terms with this, I pretty much lost motivation to learn Power Platform, often utilizing AI for most tasks. What was advertised as a software engineering role turned out to be more of a data analyst position working with upper management. Despite the lack of effort on my part, I still managed to meet deadlines, and my work received recognition, even leading to bonuses and a salary bump eight months in.

Anyways, I’m now 1 year into this job. You might wonder why I’ve stayed till now? Honestly, the role is quite easy. I work remotely and don’t need to exert much brain power on my projects as most require basic research because the company lags behind in current practices. Another big reason I've stuck around is the ability to apply for jobs abroad after staying 1.5 years with the company.

More importantly, however, is that I also unexpectedly hit the “jackpot” in one of my recent projects, where I was provided access to payroll data. By combining projects I worked on, I can indirectly figure out the highest-paying roles and the best countries, offices, and teams to work in. I discovered that my manager earns ten times my salary, with his N+1 earning three times as much and N+2 earning five times as much, respectively. I discovered my country consistently offers the shittiest salaries too, and that I need to get out of here if I ever have the chance.

As a result, I plan to apply internally to the best jobs in my company based on the salary knowledge I’ve now acquired. Since I’m coasting most of the day at my current job, I had initially decided to sharpen my cloud and system design skills and also focus on LeetCode. But I’ve been thinking that, since a lot of my actual work experience over the past year has been data-centric, I could combine that with my previous cloud and DevOps skills to pursue a Data Engineering role, at least marketing myself as such on my resume. I believe my current experience + newly acquired skills would give me a better chance of success in applying for data engineering roles rather than purely DevOps ones.

However, my main concern is needing to learn many more technologies in six months. Thus, my question is, which path is more realistic for my career? Is Data Engineering as future-proof as full-scale infrastructure/system design?

And more importantly, to those with years in the field, what is the smartest career path moving