Reddit DevOps – Telegram
how much time should seniors spend on reviews? trying to save time on manual code reviews

our seniors are spending like half their time reviewing prs and everyone's frustrated. Seniors feel like they're not coding anymore, juniors are waiting days for feedback, leadership is asking why everything takes so long.

I know code review is important and seniors should be involved but this seems excessive. We have about 8 seniors and 20 mid/junior engineers, everyone's doing prs constantly. Seniors get tagged on basically everything because they know the systems best.

trying to figure out what's reasonable here. Should seniors be spending 20 hours a week on reviews? 10? Less? And how do you actually reduce it without quality going to shit? We tried having seniors only review certain areas but then knowledge silos got worse.

https://redd.it/1pl6jj7
@r_devops
an open-source realistic exam simulator for CKAD, CKA, and CKS featuring timed sessions and hands-on labs with pre-configured clusters.

[https://github.com/sailor-sh/CK-X](https://github.com/sailor-sh/CK-X) \- found a really neat thing

* open-source
* designed for **CKA / CKAD / CKS** prep
* **hands-on labs**, not quizzes
* built around **real k8s clusters** you interact /w using `kubectl`
* capable of **timed sessions**, to mimic exam pressure



https://redd.it/1pl6rau
@r_devops
How in tf are you all handling 'vibe-coders'

This is somewhere between a rant and an actual inquiry, but how is your org currently handling the 'AI' frenzy that has permeated every aspect of our jobs? I'll preface this by saying, sure, LLMs have some potential use-cases and can sometimes do cool things, but it seems like plenty of companies, mine included, are touting it as the solution to all of the world's problems.

I get it, if you talk up AI you can convince people to buy your product and you can justify laying off X% of your workforce, but my company is also pitching it like this internally. What is the result of that? Well, it has evolved into non-engineers from every department in the org deciding that they are experts in software development, cloud architecture, picking the font in the docs I write, you know...everything! It has also resulted in these employees cranking out AI-slop code on a weekly basis and expecting us to just put it into production--even though no one has any idea of what the code is doing or accessing. Unfortunately, the highest levels of the org seem to be encouraging this, willfully ignoring the advice from those of us who are responsible for maintaining security and infrastructure integrity.


Are you all experiencing this too? Any advice on how to deal with it? Should I just lean into it and vibe-lawyer or vibe-c-suite? I'd rather not jump ship as the pay is good, but, damn, this is quickly becoming extremely frustrating.

*long exhale*

https://redd.it/1pl96e8
@r_devops
how long until someone runs prod from chrome?

scrolling reddit, I saw something… unsettling

https://labs.leaningtech.com/blog/browserpod-beta-announcement.html

It’s a tool that lets you run node, py, and other runtimes directly in the browser

a little more of this, and we’ll genuinely be running k8s nodes - or something very kuber-adjacent - inside the browser itself

https://redd.it/1pl8am5
@r_devops
What a Fintech Platform Team Taught Me About Crossplane, Terraform and the Cost of “Building It Yourself”

I recently spoke with a platform architect at a fintech company in Northern Europe.

They’ve been building their internal platform for about three years. Today, they manage **50-60 Kubernetes clusters in production**, usually **2-3 clusters per customer**, across multiple clouds (Azure today, AWS rolling out), with strong isolation requirements because of banking and compliance constraints.

Platform Engineering Tips is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In other words: not a toy platform.

What they shared resonated with a lot of things I see elsewhere, so I’ll summarize it here in an anonymized way. If you’re in DevOps / platform engineering, you’ll probably recognize parts of your own world in this.

# Their Reality: A Platform Team at Scale

The platform team is around **7 people** and they own two big areas:

**Cloud infrastructure automation & standardization**

* Multi-account, multi-cluster setup
* Landing zones
* Compliance, security, DR tests, audits
* Cluster lifecycle, upgrades, observability

**Application infrastructure**

* Opinionated way to build and run apps
* Workflow orchestration running on Kubernetes
* Standardized “packages” that include everything an app needs: cluster, storage, secrets, networking, managed services (DBs, key vault, etc.)

Their goal is simple to describe, hard to execute:

>“Our goal is to do this at scale in a way that’s easy for us to operate, and then gradually put tools in the hands of other teams so they don’t depend on us.”

Classic platform mandate.

# Terraform Hit Its Limits

They started with Terraform. Like many. It worked… until it didn’t. This is what they hit:

**State problems at scale**

* Name changes and refactors causing subtle side effects
* Surprises when applies suddenly behave differently

**Complexity**

* Multiple pipelines for infra vs app
* Separate workflows for clusters, cloud resources, K8s resources

**Drift and visibility**

* Keeping Terraform state aligned with reality became painful
* Not a good fit when you want continuous reconciliation

Their conclusion:

>“We pushed Terraform to its limits for this use case. It wasn’t designed to orchestrate everything at this scale.”

That’s not Terraform-bashing. Terraform is great at what it does. But once you try to use it as **the control plane of your platform**, it starts to crack.

# Moving to a Kubernetes-Native Control Plane

So they moved to a **Kubernetes-native model**.

Roughly:

* **Crossplane** for cloud resources
* **Helm** for packaging
* **Argo CD** for GitOps and reconciliation
* A **hub control plane** managing all environments centrally
* Some custom controllers on top

Everything: clusters, databases, storage, secrets, etc. are now represented as **Kubernetes resources**.

Key benefit:

>*“We stopped thinking ‘this is cloud infra’ vs ‘this is app infra’.*
*For us, an environment now is the whole thing: cluster + cloud resources + app resources in one package.”*

So instead of “first run this Terraform stack, then another pipeline for K8s, then something else for app config”, they think in **full environment units** That’s a big mental shift.

# UI vs GitOps vs CLI: Different Teams, Different Needs

One thing that came out strongly:

* Some teams **don’t want to touch infra at all**. They just want: *“Here’s my code, please run it.”*
* Some teams are **comfortable going deep into Kubernetes and YAML**.
* Others want a **simple UI** to toggle capabilities (e.g. “enable logging for this environment”).

So they’re building **multiple abstraction layers**:

* **GitOps interface** as the “middle layer” (already established)
* A **CLI** for teams comfortable with infra
* Experiments with **UI portals** on top of their control plane

They experimented with tools like **Backstage**, using them as thin UIs on top of their existing orchestration:

>*“We built a lot of the UI in a portal by connecting it to
our control plane and CRDs. You go to an environment and say ‘enable logging’, it runs the GitOps changes in the background.”*

Because they already have the orchestration layer (Crossplane + Argo CD + custom controllers), portals can stay “just portals”: UI on top of an existing engine.

This is important: a portal *without* a strong control plane becomes just a dashboard. A portal *with* a strong control plane becomes a real self-service platform.

# The Real Challenges Are Not (Only) Technical

The interesting part of the conversation wasn’t “we use Crossplane” or “we use GitOps”. That’s expected. The harder problems they described were:

# 1. Different maturity levels across teams

* Some teams want full control over infra
* Some don’t care and just want things to “work”
* Some like GitOps, others are allergic to it

>*“It’s very hard to build a single solution that makes everyone happy.*
*You end up making trade-offs and accepting you won’t please all teams.”*

Hence the multi-layer approach.

# 2. Doing this with a small team

Even with 7 people, running:

* 50-60 clusters
* strict isolation per customer
* multi-cloud
* compliance, security, DR tests
* audits

…is hard.

>*“We want to automate as much as possible. Manual operations at this scale just don’t work.”*

This is where the real cost of “build it yourself” shows up. Even a very strong team ends up spending a lot of time on **operations and glue**, not on differentiating features.

# 3. Third-Party Tools vs Banking Compliance

They tried to adopt third-party tools for observability (Datadog, Sumo Logic, etc.). Technically, this made sense. Organizationally, it became painful.

* Every external SaaS triggered **risk assessment** on the customer side
* Technical teams were fine
* Legal and risk teams often said “no”
* Out of several customers, **only a few** accepted standardized third-party observability tools

The result:

* No consistent, standardized third-party layer
* More pressure to build and operate internally

If you’re in a regulated environment, this probably sounds familiar.

# Build vs Buy: The Platform Engineer’s Dilemma

One thing I appreciated was how honest they were about the **trade-offs**. On one side, building your own platform means:

* you control everything
* you can shape it to your domain
* you avoid some vendor risks

On the other side:

* A 7-person platform team easily costs **\~900,000€/year** (or more)

Most of their time is not spent on “cool problems”. It’s spent on: upgrades, security and compliance obligations, DR testing, provider bugs, drift, documentation, keeping everything running.

As they said:

>*“Sometimes buying seems expensive, but people don’t account for the time cost. A lot of money is wasted in time spent building and maintaining everything.”*

And they’re right. The build vs buy decision is less about tools, more about **where you want your team’s energy to go**.

# What I Took Away From This Conversation

A few things I keep seeing across companies, and this call reinforced them:

1. **Terraform is fantastic, but not a silver bullet for platforms.** Using it as the main engine for a large-scale, multi-cluster, multi-tenant control plane is painful.
2. **Kubernetes-native control planes are powerful when you unify cloud infra + app infra.** Treating “an environment” as a single unit (cluster + cloud resources + app resources) is a big win.
3. **Teams need multiple interfaces.** CLI, GitOps, and UI all have their place. Different teams want different levels of abstraction.
4. **Platform teams underestimate how much they’ll have to build around UX, RBAC, audit, and self-service.** This is where a lot of hidden time goes.
5. **Regulated environments distort the tool landscape.** You can’t always just “adopt Datadog” or “plug in X SaaS”. Legal and risk vetoes matter as much as technical arguments.
6. **Build vs buy is not a one-time decision.** You might build a strong internal platform today and later decide to complement or replace parts of it with external platforms as constraints change.

# You’re Not the
Only One Dealing With This

If you’re reading this and thinking:

* “We’re also fighting Terraform and drift at scale.”
* “We’re stuck between portal/UI and GitOps purists.”
* “Our platform team is spending too much time on plumbing.”
* “Compliance kills half of the tools we want to use.”

You’re not alone.

A lot of DevOps and platform teams are facing **exactly the same constraints**, just with slightly different shapes.

If you’d like to **learn from what other DevOps / platform engineers are doing in the real world**, I’m building a community where people share these kinds of stories, patterns, and scars openly. Feel free to subscribe to [my personal blog](https://romaricphilogene.substack.com/).

It’s not about tools first. It’s about:

* what you’re trying to build
* which trade-offs you chose
* what worked
* what hurt

If that sounds useful, come hang out, ask questions, and learn from others who are in the same situation.

https://redd.it/1pldcjy
@r_devops
Do you use curl? What's your biggest pain point?

Hey devs! I'm researching curl workflows and would love your input:



1. How often do you use curl?

2. What's the most annoying part?

3. Would AI-powered curl automation help?



Takes 2 minutes - really appreciate it! 🙏Hey devs! I'm researching curl workflows and would love your input:1. How often do you use curl?2. What's the most annoying part?3. Would AI-powered curl automation help?Takes 2 minutes - really appreciate it! 🙏

https://redd.it/1plekev
@r_devops
IAM vs IGA: which one actually strengthens security more?

I often see IAM and IGA used interchangeably, but they solve slightly different security problems. IAM is usually focused on access authentication, authorization, SSO, MFA, and making sure the right users can log in at the right time. It’s critical for preventing unauthorized access and handling day-to-day identity security.

IGA, on the other hand, feels more about control and visibility. It focuses on who should have access, why they have it, approvals, reviews, certifications, and audit readiness. From a security perspective, IGA seems stronger at reducing long-term risk like privilege creep, orphaned accounts, and compliance gaps.

Curious how others see it in practice. Do you treat IAM as the frontline security layer and IGA as the governance backbone? Or have you seen environments where one clearly adds more security value than the other? Would love to hear real-world experiences.

https://redd.it/1plfem1
@r_devops
Need guidance on how to learn devops

Hey guys, I'm a software developer and I know how to create backend and frontend and also how to manually deploy to AWS.

I want to upskill and want to learn devops so that I can automate and deploy application.

I'm unable to find good resources which actually covers industry practices all I find is simple tutorial which I already know. I want to lean how deployment is actually done in companies, how to write production GitHub workflows, dockerfile and all.

Please let me know if you have any such resources, tutorials.

Thanks.

https://redd.it/1plejof
@r_devops
Getting Problem in Creating First VM | Please Help

Hi everybody,

I hope you all are doing well.

I just started learning about microsoft azure. and tried to create first VM with my free trial.

But, I am not able to create and getting same issue "This size is currently unavailable in westus3 for this subnoscription: NotAvailableForSubnoscription." in every region.
I changed regions as well, still gating same issue.

Please help

https://redd.it/1playgj
@r_devops
Released a tool I built and personally use a lot - Is it THAT risky??

Hi, I just released a tool I built in Go, which is an AI agent that can run system commands using the latest GPT-5.2. It helps me with automations and fast actions.

Honestly, it works great, and I use it a lot. Got initial feedback that it's unwise and that it shouldn't be used IN ANY CASE.

Is it that bad?
It's super convenient, I want to start using that in remote environments

https://github.com/matank001/OsDevil

https://redd.it/1pljy5c
@r_devops
DevOps Engineer trying to stay afloat after a layoff and a few bad decisions.

Hi everyone,

I’m posting here because I need to say this somewhere, and I don’t feel comfortable dumping it all on the people in my life.

I’m a DevOps / infrastructure engineer in Canada with several years of experience. I’ve worked across cloud, CI/CD, containers, automation, and I hold multiple certifications (AWS, Docker, Terraform, Kubernetes-related). On paper, I should be “fine.” That’s part of what makes this harder.

Earlier this year I was laid off, and it really broke something in me. Since then, my confidence hasn’t fully come back. I second-guess myself constantly, panic in interviews, and replay mistakes in my head over and over. I’ve fumbled questions I know I know. My brain just locks up under pressure.

Recently, in a state of anxiety, I left a job too quickly — a decision I regret. I’m about to start at a new org that, based on people already working there, is extremely micromanaging and heavy on interference. Even before day one, it’s triggering a lot of dread. I already feel like I’m bracing myself just to survive instead of grow.

I’m still have savings and insurance, so I’m not financially desperate, but mentally I feel exhausted all the time. There’s a constant low-grade tension in my body, like my nervous system is always switched on. I overthink every decision, beat myself up for past ones, and feel like I’m slowly shrinking as a person.

Sometimes my thoughts drift into very bleak, philosophical territory about life, purpose, and suffering but not because I want to harm myself (I don’t), but because I feel worn down by the constant effort of “keeping it together.” I want to be clear: I am safe. This is burnout, anxiety, and mental fatigue, not a crisis.

I’m trying to cope by:

Focusing on small wins (certs, small goals, structure)

Taking things one day at a time

Continuing to apply for other roles quietly

Reminding myself that jobs can be temporary, even if they’re bad

I guess I’m looking to hear from people who’ve been through something similar:
Has anyone else had anxiety completely hijack their decision-making? How did you rebuild confidence after layoffs or professional burnout? How do you survive a micromanaging environment without it destroying your mental health?


If you made it this far, thank you for reading. Writing this already helps me feel a little less alone.

https://redd.it/1plmm5f
@r_devops
A short whinge about the current state of the sub and lack of moderation

Hi,

As many readers are aware, this subreddit is a dump.

It is filled with posts that the majority of users do not want as evidenced by the downvotes the majority of posts receive.

Reporting the absolute garbage posted unfortunately doesn't result in a removal either.

A quick scan of posts finds:

AI blogspam
Vendor blogspam
"I created X to solve Y (imaginary problem)"
Product market research
Covert marketing
Problems that would be solved with less effort by using Google rather than making a Reddit post

Can the mods open up applications to people who actually want to moderate the sub and consult with the community on evolving the current ruleset?

https://redd.it/1plo7f0
@r_devops
Building a QEMU/KVM based virtual home lab with automated Linux VM provisioning and resource management with local domain control

I have been building and using an automation toolkit for running a complete virtual home lab on KVM/QEMU. I understand there are a lot of opensource alternatives available, but this was built for fun and for managing a custom lab setup.

The automated setup deploys a central lab infrastructure server VM that runs all essential services for the lab: DNS (BIND), DHCP (KEA), iPXE, NFS, and NGINX web server for OS provisioning. You manage everything from your host machine using custom built CLI tools, and the lab infra server handles all the backend services for your local domain (like .lab.local).

You can deploy VMs two ways: network boot using iPXE/PXE for traditional provisioning, or clone golden images for instant deployment. Build a base image once, then spin up multiple copies in seconds. The CLI tools let you manage the complete lifecycle—deploy, reimage, resize resources, hot-add or remove disks and network interfaces, access serial consoles, and monitor health. Your local DNS infrastructure is handled dynamically as you create or destroy VMs, and you can manage DNS records with a centralized tool.

Supports AlmaLinux, Rocky Linux, Oracle Linux, CentOS Stream, RHEL, Ubuntu LTS, and openSUSE Leap using Kickstart, Cloud-init, and AutoYaST for automated provisioning.

The whole point is to make it a playground to build, break, and rebuild without fear. Perfect for spinning up Kubernetes clusters, testing multi-node setups, or experimenting with any Linux-based infrastructure. Everything is written in bash with no complex dependencies. Ansible is utilized for lab infrastructure server provisioning.

GitHub: https://github.com/Muthukumar-Subramaniam/server-hub

Been using this in my homelab and made it public so anyone with similar interests or requirements can use it. Please have a look and share your ideas and advice if any.

https://redd.it/1plns7z
@r_devops
Automate KVM image creation for testing purposes

I'm trying to clean up the testing workflow for a project I'm working on, a database built on top of io_uring and NVMe.

Right now I'm using KVM and its NVMe device emulator to power the dev environment, but the developer experience is poor: I have a noscript to recreate the KVM image but it requires some manual steps, and I don't want to commit the KVM image itself for obvious reasons

My questions are:

- Is there an alternative to dockerfiles for KVM images?
- If not, what are my best options for my use case?
- What other options do I have to emulate NVMe devices?

Things I tried:

- Running an nvmevirt device emulator, but it's not suitable for my test environment because it requires to load a kernel module
- Mocking an NVMe device with some code and a memory backed file, but it's not real testing

https://redd.it/1pln6hj
@r_devops
Exposing Services on a KIND Cluster on Contabo VPS, MetalLB vs cloud-provider-kind?

I'm setting up a test Kubernetes environment on a Contabo VPS and KIND to spin up the cluster.

I’m figuring out the least hacky way to expose services externally.

So far, I see two main options:

1. MetalLB

2. cloud-provider-kind

My goal isn’t production traffic, but I do want something that:

Behaves close to real Kubernetes networking

Doesn’t rely on NodePort hacks

Is reasonable for CI/testing


For those who’ve run KIND on VPS providers like Contabo/Hetzner:

Which approach did you settle on?

Any gotchas with MetalLB on a single-node KIND cluster?


https://redd.it/1plv49u
@r_devops
GitHub - eznix86/kseal: CLI tool to view, export, and encrypt Kubernetes SealedSecrets.

I’ve been using *kubeseal* (the Bitnami sealed-secrets CLI) on my clusters for a while now, and all my secrets stay sealed with Bitnami SealedSecrets so I can safely commit them to Git.

At first I had a bunch of *bash* one-liners and little helpers to export secrets, view them, or re-encrypt them in place. That worked… until it didn’t. Every time I wanted to peek inside a secret or grab all the sealed secrets out into plaintext for debugging, I’d end up reinventing the wheel. So naturally I thought:

>“Why not wrap this up in a proper noscript?”

Fast forward a few hours later and I ended up with **kseal** — a tiny Python CLI that sits on top of kubeseal and gives me a few things that made my life easier:

* `kseal cat`: print a decrypted secret right in the terminal
* `kseal export`: dump secrets to files (local or from cluster)
* `kseal encrypt`: seal plaintext secrets using `kubeseal`
* `kseal init`: generate a config so you don’t have to rerun the same flags forever

You can install it with pip/pipx and run it wherever you already have access to your cluster. It’s basically just automating the stuff I was doing manually and providing a consistent interface instead of a pile of ad-hoc noscripts. ([GitHub](https://github.com/eznix86/kseal/))

It is just something that *helped me* and maybe helps someone else who’s tired of:

* remembering kubeseal flags
* juggling secrets in different dirs
* reinventing small helper noscripts every few weeks

Check it out if you’re in the same boat: [https://github.com/eznix86/kseal/](https://github.com/eznix86/kseal/)

https://redd.it/1plw3n7
@r_devops
Looking for Slack App Feedback - Slack --> Github/Linear Issues

As a systems engineer(clearly used to writing too many user stories) I tend to have many ideas that get lost in chat or I need to copy pasta over to Github. Was playing around in Discord and got a pretty handy tool(for me at least) going where I react to urls or messages and port those over into Github. I refer to the proces as Capture Clean Create.

**What it does:**

\- React with an emoji to any message with a URL → creates a GitHub issue or Linear ticket

\- Use `/idea capture` to summarize the last N messages into a structured issue

\- AI extracts noscript, summary, category, and key points automatically

Just looking for some feedback on if this is a useful tool for you, mostly for developers/PMs. Outside of Slack/Github it currently supports Linear, Discord. Jira and Teams are next up.

https://slack.com/oauth/v2/authorize?client\_id=9193114002786.10095883648134&scope=channels:history,channels:read,chat:write,reactions:read,users:read,team:read,commands&user\_scope=

https://redd.it/1pltrez
@r_devops
Multi region AI deployment and every country has different data residency laws, compliance is impossible.

We are expanding AI product to europe and asia and thought we had compliance figured out but germany requires data processed in germany, france has different rules, singapore different, japan even more strict. We tried regional deployments but then we have data sync problems and model consistency issues, tried to centralize but that violates residency laws.

The legal team sent us a spreadsheet with 47 rows of different rules per country and some contradict each other. How are companies with global AI products handling this? feels like we need different deployment per country which is impossible to maintain.

https://redd.it/1plyiz1
@r_devops