When does Policy-as-Code become "The Slow Lane" for developers?
Hey r/devops,
I'm working on scaling up our internal developer platform (IDP) and one of the biggest points of friction is how we enforce DevSecOps and compliance policies without killing our velocity. We're trying to shift left, but it feels like we've just shifted all the pipeline friction right onto the developer's lap.
We moved from a few post-merge human approval tollgates to an aggressive Policy-as-Code strategy using tools like Open Policy Agent (OPA) with Rego on every pull request (PR).
The result? Our security posture is fantastic. Our IaC drift is near zero. But our average PR time is up 25%, and the team is starting to view the pipeline as an adversary, not an enabler.
The checks are running: SAST, SCA, Terrascan, custom checks for naming conventions, and resource tagging compliance. All before merge. The problem is that a failed low-severity SAST finding can hold up a critical patch that has a clean functional change.
My burning question to the community:
How are you balancing the enforcement of non-critical-but-mandatory policies (like resource tagging or specific naming conventions) in the pipeline?
1. Do you have an explicit 'fail fast/fail hard' policy only for critical security issues, and let minor compliance issues run through the main pipeline, alerting to a dashboard for follow-up? (i.e., making them blocking in pre-prod but non-blocking in the main CI?)
2. Are you using a separate, performance-optimized "compliance-only" pipeline that runs less frequently, thereby unblocking the core CI/CD flow?
I’m looking for actual tooling or architectural patterns that allow for selective blocking that doesn't rely on us writing custom logic in every single Jenkinsfile/GitHub Action workflow.
https://redd.it/1p0z7dp
@r_devops
Hey r/devops,
I'm working on scaling up our internal developer platform (IDP) and one of the biggest points of friction is how we enforce DevSecOps and compliance policies without killing our velocity. We're trying to shift left, but it feels like we've just shifted all the pipeline friction right onto the developer's lap.
We moved from a few post-merge human approval tollgates to an aggressive Policy-as-Code strategy using tools like Open Policy Agent (OPA) with Rego on every pull request (PR).
The result? Our security posture is fantastic. Our IaC drift is near zero. But our average PR time is up 25%, and the team is starting to view the pipeline as an adversary, not an enabler.
The checks are running: SAST, SCA, Terrascan, custom checks for naming conventions, and resource tagging compliance. All before merge. The problem is that a failed low-severity SAST finding can hold up a critical patch that has a clean functional change.
My burning question to the community:
How are you balancing the enforcement of non-critical-but-mandatory policies (like resource tagging or specific naming conventions) in the pipeline?
1. Do you have an explicit 'fail fast/fail hard' policy only for critical security issues, and let minor compliance issues run through the main pipeline, alerting to a dashboard for follow-up? (i.e., making them blocking in pre-prod but non-blocking in the main CI?)
2. Are you using a separate, performance-optimized "compliance-only" pipeline that runs less frequently, thereby unblocking the core CI/CD flow?
I’m looking for actual tooling or architectural patterns that allow for selective blocking that doesn't rely on us writing custom logic in every single Jenkinsfile/GitHub Action workflow.
https://redd.it/1p0z7dp
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Feedback Antigravity IDE for DevOps: Any feedback on integrations & automation?
Anyone tried using Antigravity by Google for DevOps workflows? I noticed the AI can suggest fixes/refactors and the IDE supports agent-like automation (e.g., review agent, code agent). Integration with Gemini 3 and VS Code style interface helped me resurrect a legacy web app.
\- Anyone tested Chrome extension/API or CI/CD integrations?
\- How's the support for Docker, containerized dev flows, pipelines?
\- Is the multi-agent system practical for DevOps use cases?
https://redd.it/1p10asi
@r_devops
Anyone tried using Antigravity by Google for DevOps workflows? I noticed the AI can suggest fixes/refactors and the IDE supports agent-like automation (e.g., review agent, code agent). Integration with Gemini 3 and VS Code style interface helped me resurrect a legacy web app.
\- Anyone tested Chrome extension/API or CI/CD integrations?
\- How's the support for Docker, containerized dev flows, pipelines?
\- Is the multi-agent system practical for DevOps use cases?
https://redd.it/1p10asi
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I do not know what is going wrong and I am desperate for help. I cannot build an EKS Cluster for whatever reason and I cannot figure it out.
Hello,
I'm attempting to get into DevOps, and I'm trying to build a personal project as a way to learn and understand DevOps stuff.
My goal is to build an EKS cluster via Terraform, set up a prod and dev environment, and then slap in a dumb little website and load balance it.
I have followed EVERY TUTORIAL I COULD FIND and every single time, they give me code. I either download their code or set it up EXACTLY as they do (including the tutorial from Terraform themselves!) and for whatever reason, my ec2 instances NEVER JOIN AS NODES. It always always ALWAYS gives me the issue type of NodeCreationFailure.
I discovered that if I add the vpc-cni addon to the cluster, suddenly it works and everything is happy. So I thought maybe all I have to do in Terraform is specify that it should add the vpc-cni add-on before compute is built in the cluster and it solves everything.
BUT THEN I RAN INTO A NEW PROBLEM. The vpc-cni add-on ALWAYS finds conflicts, even on a new cluster, and will not install. I have tried every single thing I can try in Terraform to make it so that it will run with OVERRIDE on the conflicts, but it is not working. No matter which way I do it, I cannot set it to override, and therefore the vpc-cni addon can never be added to the cluster via Terraform.
I do not know what else I can do. I have tried everything and looked at every possible resource. This is driving me absolutely insane because I cannot find anything anywhere that solves my problem.
Please, if you know how to fix this, or at the very least, if you know how to help me troubleshoot this, please help me. I just want to get this project working so I can get experience. This is the first step and I'm already failing.
https://redd.it/1p11mnv
@r_devops
Hello,
I'm attempting to get into DevOps, and I'm trying to build a personal project as a way to learn and understand DevOps stuff.
My goal is to build an EKS cluster via Terraform, set up a prod and dev environment, and then slap in a dumb little website and load balance it.
I have followed EVERY TUTORIAL I COULD FIND and every single time, they give me code. I either download their code or set it up EXACTLY as they do (including the tutorial from Terraform themselves!) and for whatever reason, my ec2 instances NEVER JOIN AS NODES. It always always ALWAYS gives me the issue type of NodeCreationFailure.
I discovered that if I add the vpc-cni addon to the cluster, suddenly it works and everything is happy. So I thought maybe all I have to do in Terraform is specify that it should add the vpc-cni add-on before compute is built in the cluster and it solves everything.
BUT THEN I RAN INTO A NEW PROBLEM. The vpc-cni add-on ALWAYS finds conflicts, even on a new cluster, and will not install. I have tried every single thing I can try in Terraform to make it so that it will run with OVERRIDE on the conflicts, but it is not working. No matter which way I do it, I cannot set it to override, and therefore the vpc-cni addon can never be added to the cluster via Terraform.
I do not know what else I can do. I have tried everything and looked at every possible resource. This is driving me absolutely insane because I cannot find anything anywhere that solves my problem.
Please, if you know how to fix this, or at the very least, if you know how to help me troubleshoot this, please help me. I just want to get this project working so I can get experience. This is the first step and I'm already failing.
https://redd.it/1p11mnv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is there a way to create jobs that I can trigger with certain parameters in Github Actions?
I've used Jenkins for a while, and sometimes other teams we worked with needed to e.g. onboard a client, and we created a Jenkins job that takes parameters (relating to their details) and runs a certain number of tasks for them to automate the onboarding process.
Is such a thing possible in Github Actions?
I'm thinking of things such as, lets say I want to hook up two VPCs, I just go to the job, I input the ID and CIDR range of VPC 1 and ID and CIDR range of VPC 2, and it automatically makes the API calls to create a Peering Connection between the two and updates their respective tables.
Or I want to whitelist a clients IP in our AWS WAF, so you input the parameter, and it runs the job. As far as I can see, there is no way to feed a parameter into a job in Github Actions?
Any advice would be much appreciated.
https://redd.it/1p131v1
@r_devops
I've used Jenkins for a while, and sometimes other teams we worked with needed to e.g. onboard a client, and we created a Jenkins job that takes parameters (relating to their details) and runs a certain number of tasks for them to automate the onboarding process.
Is such a thing possible in Github Actions?
I'm thinking of things such as, lets say I want to hook up two VPCs, I just go to the job, I input the ID and CIDR range of VPC 1 and ID and CIDR range of VPC 2, and it automatically makes the API calls to create a Peering Connection between the two and updates their respective tables.
Or I want to whitelist a clients IP in our AWS WAF, so you input the parameter, and it runs the job. As far as I can see, there is no way to feed a parameter into a job in Github Actions?
Any advice would be much appreciated.
https://redd.it/1p131v1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Wrote a blog about things to focus on when starting a new DevEx role
Hey everyone! I've been working in the platform engineering/devex space for about 3 years now. Based on what I've heard from the community and my own experiences I put together a guide of things to focus on in the first 30 days of starting a new role. Hope this helps!
Read here: https://metalbear.com/blog/devex-engineer/
https://redd.it/1p131e9
@r_devops
Hey everyone! I've been working in the platform engineering/devex space for about 3 years now. Based on what I've heard from the community and my own experiences I put together a guide of things to focus on in the first 30 days of starting a new role. Hope this helps!
Read here: https://metalbear.com/blog/devex-engineer/
https://redd.it/1p131e9
@r_devops
MetalBear 🐻
Your First 30 Days as a DevEx Engineer: What to Audit and Improve
A practical 30 day audit framework for new DevEx engineers to benchmark feedback loops, reduce context switching, and eliminate outdated rituals that slow teams down.
Quarkus with Buildpacks and OpenShift Builds
Howcto build images for Quarkus apps with Cloud Native Buildpacks locally and in OpenShift: https://piotrminkowski.com/2025/11/19/quarkus-with-buildpacks-and-openshift-builds/
https://redd.it/1p12iih
@r_devops
Howcto build images for Quarkus apps with Cloud Native Buildpacks locally and in OpenShift: https://piotrminkowski.com/2025/11/19/quarkus-with-buildpacks-and-openshift-builds/
https://redd.it/1p12iih
@r_devops
Piotr's TechBlog
Quarkus with Buildpacks and OpenShift Builds - Piotr's TechBlog
In this article, you will learn how to build Quarkus application images using Cloud Native Buildpacks and OpenShift Builds
Anyone else struggling because dev, devops and security never see the same context
I’m trying to understand how people are actually solving this, because in my environment it feels like we have one problem disguised as many:
Developers, DevOps, and Security all look at completely different versions of “reality.”
Developers only see issues if they show up in the build or during code review. Anything outside that path is invisible.
DevOps ends up maintaining integrations for every scanner/security tool under the sun, each with its own policies and YAML changes. Half the effort is just keeping the pipelines consistent.
Security gets flooded with findings that rarely map cleanly back to an owner, a commit, or a service. A good chunk of alerts conflict with each other or miss enough context to be useful.
The root problem seems simple:
no shared visibility across the pipeline, so every team ends up working in its own world.
I’m curious how other teams are handling this.
Are you using a single platform to unify everything? Stitching multiple tools together? Rolling your own visibility layer? Using something like Orca, Wiz, or something completely different?
https://redd.it/1p15v9t
@r_devops
I’m trying to understand how people are actually solving this, because in my environment it feels like we have one problem disguised as many:
Developers, DevOps, and Security all look at completely different versions of “reality.”
Developers only see issues if they show up in the build or during code review. Anything outside that path is invisible.
DevOps ends up maintaining integrations for every scanner/security tool under the sun, each with its own policies and YAML changes. Half the effort is just keeping the pipelines consistent.
Security gets flooded with findings that rarely map cleanly back to an owner, a commit, or a service. A good chunk of alerts conflict with each other or miss enough context to be useful.
The root problem seems simple:
no shared visibility across the pipeline, so every team ends up working in its own world.
I’m curious how other teams are handling this.
Are you using a single platform to unify everything? Stitching multiple tools together? Rolling your own visibility layer? Using something like Orca, Wiz, or something completely different?
https://redd.it/1p15v9t
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Drowning in tools, saving nothing
Our team is using 5 different tools just to get one feature out the door Jira for bugs, Asana for sprints, Notion for documentation and then we still end up DMing each other on Slack because no one knows where anything actually lives. At this point, I genuinely think we spend more time searching for the right board than actually writing code. Every time we onboard someone new, we give them a tool map like its a museum tour. I just want one place that doesn’t make me jump tabs like I m speedrunning a browser challenge. Something flexible, something that makes sense. What are teams using that connects planning + code + reporting?
https://redd.it/1p15q0p
@r_devops
Our team is using 5 different tools just to get one feature out the door Jira for bugs, Asana for sprints, Notion for documentation and then we still end up DMing each other on Slack because no one knows where anything actually lives. At this point, I genuinely think we spend more time searching for the right board than actually writing code. Every time we onboard someone new, we give them a tool map like its a museum tour. I just want one place that doesn’t make me jump tabs like I m speedrunning a browser challenge. Something flexible, something that makes sense. What are teams using that connects planning + code + reporting?
https://redd.it/1p15q0p
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Sentry to GlitchTip
We’re migrating from Sentry to GlitchTip, and we want to manage the entire setup using Terraform. Sentry provides an official Terraform provider, but I couldn’t find one specifically for GlitchTip.
From my initial research, it seems that the Sentry provider should also work with GlitchTip. Has anyone here used it in that way? Is it reliable and hassle-free in practice?
Thanks in advance!
https://redd.it/1p18kjk
@r_devops
We’re migrating from Sentry to GlitchTip, and we want to manage the entire setup using Terraform. Sentry provides an official Terraform provider, but I couldn’t find one specifically for GlitchTip.
From my initial research, it seems that the Sentry provider should also work with GlitchTip. Has anyone here used it in that way? Is it reliable and hassle-free in practice?
Thanks in advance!
https://redd.it/1p18kjk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need advise on release versioning
Hi all,
I would like some guidance in our packaging workflow and some feedback on best practices.
We build several components as .deb using jenkins and git buildpackage. Application code lives on main, and the packaging files (debian/*) are on a separate branch ubuntu/focal. For a release, developers tag main as vX.Y. When we decide to release a component, the developer merges main into ubuntu/focal branch, runs gbp dch --release --commit, and jenkins builds the release deb package from ubuntu/focal.
For nightlies, if main is ahead of the ubuntu/focal branch, jenkins checkouts main, copy debian/* from ubuntu/focal on top of main then generates a snapshot and builds a package with a version like X.Y-~<jenkins_build_number>.deb
It "works", but honestly it feels a bit messy especially with the overlay of debian/* and the build-number suffix. I would like to move towards a more standard, automated approach for tag handling, versioning for snapshots and releases, etc..
How would you structure the branches and versioning? Any concrete patterns or examples to look at would great. I feel there is a lot error-prone and manual work involved in the current process
Thank you
https://redd.it/1p1bhh3
@r_devops
Hi all,
I would like some guidance in our packaging workflow and some feedback on best practices.
We build several components as .deb using jenkins and git buildpackage. Application code lives on main, and the packaging files (debian/*) are on a separate branch ubuntu/focal. For a release, developers tag main as vX.Y. When we decide to release a component, the developer merges main into ubuntu/focal branch, runs gbp dch --release --commit, and jenkins builds the release deb package from ubuntu/focal.
For nightlies, if main is ahead of the ubuntu/focal branch, jenkins checkouts main, copy debian/* from ubuntu/focal on top of main then generates a snapshot and builds a package with a version like X.Y-~<jenkins_build_number>.deb
It "works", but honestly it feels a bit messy especially with the overlay of debian/* and the build-number suffix. I would like to move towards a more standard, automated approach for tag handling, versioning for snapshots and releases, etc..
How would you structure the branches and versioning? Any concrete patterns or examples to look at would great. I feel there is a lot error-prone and manual work involved in the current process
Thank you
https://redd.it/1p1bhh3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
OpenShift
In alot of roles I see OpenShift skill requirements. Mostly traditional IT environments. Does anyone see going on an education for OpenShift or is it easy to learn with the documentation when knowing Kubernetes?
https://redd.it/1p1aw4d
@r_devops
In alot of roles I see OpenShift skill requirements. Mostly traditional IT environments. Does anyone see going on an education for OpenShift or is it easy to learn with the documentation when knowing Kubernetes?
https://redd.it/1p1aw4d
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
None of this is fun anymore
I can't put my finger on it but I'm just not interested in the work anymore....with everything going on with AI and how quickly things are changing,.I feel like I should be more excited, but work just no longer interests me and more so just feels like a burden.
Is it time to look for a new gig? I'm a staff level platform engineer.
https://redd.it/1p1dydv
@r_devops
I can't put my finger on it but I'm just not interested in the work anymore....with everything going on with AI and how quickly things are changing,.I feel like I should be more excited, but work just no longer interests me and more so just feels like a burden.
Is it time to look for a new gig? I'm a staff level platform engineer.
https://redd.it/1p1dydv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Help please 😭
Hello everyone,
I hope you're all doing well.
I’m writing this because I genuinely feel lost, and I really need guidance from people who understand the tech field more than I do.
Life has been tough on me recently — debts, health issues, and personal struggles that completely knocked me off track. I lost focus on my studies for a long time, and now that I’m trying to rebuild my life, I’m overwhelmed and unsure where to begin.
What I truly want is to get back on the right path and become aligned with the fast-growing world of software and technology. I want to learn real, practical skills that can help me build a career — especially remote work, because I have difficulty leaving the house regularly, and working from home would be the ideal path for me.
I’m very interested in starting with DevOps, but I honestly don’t know how to build a proper learning plan. There are so many tools, so many directions, and I feel like I’m drowning in information.
If anyone here can guide me, share a roadmap, point me to reliable resources, or give me advice on how to move step by step — it would mean the world to me.
I’m not asking for someone to mentor me full-time, but any direction, even small pieces of advice, could make a huge difference.
Thank you so much to anyone who takes the time to respond.
Your help could truly change someone’s life.
https://redd.it/1p1dvr7
@r_devops
Hello everyone,
I hope you're all doing well.
I’m writing this because I genuinely feel lost, and I really need guidance from people who understand the tech field more than I do.
Life has been tough on me recently — debts, health issues, and personal struggles that completely knocked me off track. I lost focus on my studies for a long time, and now that I’m trying to rebuild my life, I’m overwhelmed and unsure where to begin.
What I truly want is to get back on the right path and become aligned with the fast-growing world of software and technology. I want to learn real, practical skills that can help me build a career — especially remote work, because I have difficulty leaving the house regularly, and working from home would be the ideal path for me.
I’m very interested in starting with DevOps, but I honestly don’t know how to build a proper learning plan. There are so many tools, so many directions, and I feel like I’m drowning in information.
If anyone here can guide me, share a roadmap, point me to reliable resources, or give me advice on how to move step by step — it would mean the world to me.
I’m not asking for someone to mentor me full-time, but any direction, even small pieces of advice, could make a huge difference.
Thank you so much to anyone who takes the time to respond.
Your help could truly change someone’s life.
https://redd.it/1p1dvr7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
PMs please stop making up work with AI
Rant:
Product manager doesn't know what they are doing:
They use AI to generate a SOW (Statement of Work) with completely made up objectives,
Then they use AI to generate JIRA tasks based on the made up SOW.
Then they use AI to make subtasks for the made up JIRA tasks.
They _THINK_ they are helping.
Now there are 68 items in the backlog which make no sense and are just noise. They are now presenting it to the client as if we have so much work to do when the work doesn't match reality.
Example JIRAs:
\- Automate MySQL database provisioning (Client uses Postgres)
\- Migrate databases to cloud (Client is on prem with no plans to move to the cloud).
\- Use terraform to automate provisioning (Client wants to use Ansible Automation Platform, not Terraform)
https://redd.it/1p1hbsl
@r_devops
Rant:
Product manager doesn't know what they are doing:
They use AI to generate a SOW (Statement of Work) with completely made up objectives,
Then they use AI to generate JIRA tasks based on the made up SOW.
Then they use AI to make subtasks for the made up JIRA tasks.
They _THINK_ they are helping.
Now there are 68 items in the backlog which make no sense and are just noise. They are now presenting it to the client as if we have so much work to do when the work doesn't match reality.
Example JIRAs:
\- Automate MySQL database provisioning (Client uses Postgres)
\- Migrate databases to cloud (Client is on prem with no plans to move to the cloud).
\- Use terraform to automate provisioning (Client wants to use Ansible Automation Platform, not Terraform)
https://redd.it/1p1hbsl
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Monitoring infra cost for on-prem infrastructure(Not Cloud): which tool do you use?
Hi,
We need a tool to estimate infra cost for deploying new application which will be hosted on-prem or local data center like cost for using vCPU, Memory, Storage, DB and the cost to provision (labor cost) them.
Could you please tell me what all tools do you use to perform all this task.
Thank you
https://redd.it/1p1eblo
@r_devops
Hi,
We need a tool to estimate infra cost for deploying new application which will be hosted on-prem or local data center like cost for using vCPU, Memory, Storage, DB and the cost to provision (labor cost) them.
Could you please tell me what all tools do you use to perform all this task.
Thank you
https://redd.it/1p1eblo
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Building prod image with certificate
What’s the best way to do inject ssl certificates into a docker build process? I currently am copying the certs as part of the dockerfile which is fine but I’d rather only do it during the prod build process.
Thanks
https://redd.it/1p1mkrn
@r_devops
What’s the best way to do inject ssl certificates into a docker build process? I currently am copying the certs as part of the dockerfile which is fine but I’d rather only do it during the prod build process.
Thanks
https://redd.it/1p1mkrn
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What are the best SAST tools for identifying security vulnerabilities?
What are the best SAST tools for identifying security vulnerabilities? We already use Snyk at work, so I was wondering if there are free tools I can use to find even more security issues.
https://redd.it/1p1nw0d
@r_devops
What are the best SAST tools for identifying security vulnerabilities? We already use Snyk at work, so I was wondering if there are free tools I can use to find even more security issues.
https://redd.it/1p1nw0d
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Has anyone ever felt burn out and found changes to really help?
Reading through this sub, I see I’m not too original in thinking maybe having a side gig with manual labor or hands-on work is not too uncommon. Maybe the better question would be, did that help? Did you exit the industry ultimately or just find balance with other interests?
https://redd.it/1p1qb40
@r_devops
Reading through this sub, I see I’m not too original in thinking maybe having a side gig with manual labor or hands-on work is not too uncommon. Maybe the better question would be, did that help? Did you exit the industry ultimately or just find balance with other interests?
https://redd.it/1p1qb40
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is maintaining a VPC/ rented servers really that much more effort than what the cloud providers offer?
Hey everyone,
I’m stuck trying to choose between going all-in on AWS or running everything on a Hetzner + K8s setup for 2 projects that are going commercial. They're low-traffic B2B/B2C products where a bit of downtime isn’t the end of the world, and after going in circles, I still can’t decide which direction makes more sense. I've used both approaches to some extent in the past, nothing too business critical, and had pleasant-ish experience with both approaches.
I am 99% certain I am fine with either choice and we'll be able to migrate from one to another if needs be, but I am genuinely curious to hear peoples opinions.
**AWS:**
I *want* to just pay someone else to deal with the operational headaches, that’s the big appeal. But the price feels ridiculous for what we actually need. A “basic” setup ends up being \~$400/month, with $100 just for the NAT Gateway. And honestly, the complexity feels like overkill for a small-scale product that won’t need half the stuff AWS provides. The numbers may be a bit off, but if I want proper subnets, endpoints and all the I'd say necessary setup around VPC, the costs really ramps up. I doubt we'd go over $400-600 even if we have prod and staging, but still.
**Hetzner:**
On the flip side, I love the bang for the buck. A small k3s cluster on Hetzner has been super straightforward, reliable, and mostly hands-off in my pet projects. Monitoring is simple, costs are predictable, and it feels like I’m actually in control. The turn off is the self hosted parts is running my own S3-compatible storage, secrets manager, or registry. I’ve done it before, but I don’t really want the ongoing babysitting.
Right now I’m leaning toward a hybrid: Hetzner for compute + database, and AWS (or someone else) for managed services like S3 and Secrets Manager.
**What I’d love feedback on:**
* If you’ve been in this exact 50/50 situation, what was the one thing that pushed you to choose one over the other?
* Is a hybrid setup actually a good idea, or do the hidden costs (like data transfer) ruin the savings?
* And if I *do* self-host, what are the lowest-maintenance, production-ready alternatives to S3/Secrets/ECR that really “just work” without constant hand-holding?
Maybe I am too much in my head and can't see things clearly, but my question boils down to, is self hosting/ having servers really that much hassle and effort? I've had single machines in bare-bones docker setup run for a year without any interventions. At the same time I don't want to spend all my time on infra rather than on the product, but I don't feel like AWS would save me that much time in this regard.
Looking for that one insight to break the deadlock. Appreciate any thoughts!
https://redd.it/1p1fiw9
@r_devops
Hey everyone,
I’m stuck trying to choose between going all-in on AWS or running everything on a Hetzner + K8s setup for 2 projects that are going commercial. They're low-traffic B2B/B2C products where a bit of downtime isn’t the end of the world, and after going in circles, I still can’t decide which direction makes more sense. I've used both approaches to some extent in the past, nothing too business critical, and had pleasant-ish experience with both approaches.
I am 99% certain I am fine with either choice and we'll be able to migrate from one to another if needs be, but I am genuinely curious to hear peoples opinions.
**AWS:**
I *want* to just pay someone else to deal with the operational headaches, that’s the big appeal. But the price feels ridiculous for what we actually need. A “basic” setup ends up being \~$400/month, with $100 just for the NAT Gateway. And honestly, the complexity feels like overkill for a small-scale product that won’t need half the stuff AWS provides. The numbers may be a bit off, but if I want proper subnets, endpoints and all the I'd say necessary setup around VPC, the costs really ramps up. I doubt we'd go over $400-600 even if we have prod and staging, but still.
**Hetzner:**
On the flip side, I love the bang for the buck. A small k3s cluster on Hetzner has been super straightforward, reliable, and mostly hands-off in my pet projects. Monitoring is simple, costs are predictable, and it feels like I’m actually in control. The turn off is the self hosted parts is running my own S3-compatible storage, secrets manager, or registry. I’ve done it before, but I don’t really want the ongoing babysitting.
Right now I’m leaning toward a hybrid: Hetzner for compute + database, and AWS (or someone else) for managed services like S3 and Secrets Manager.
**What I’d love feedback on:**
* If you’ve been in this exact 50/50 situation, what was the one thing that pushed you to choose one over the other?
* Is a hybrid setup actually a good idea, or do the hidden costs (like data transfer) ruin the savings?
* And if I *do* self-host, what are the lowest-maintenance, production-ready alternatives to S3/Secrets/ECR that really “just work” without constant hand-holding?
Maybe I am too much in my head and can't see things clearly, but my question boils down to, is self hosting/ having servers really that much hassle and effort? I've had single machines in bare-bones docker setup run for a year without any interventions. At the same time I don't want to spend all my time on infra rather than on the product, but I don't feel like AWS would save me that much time in this regard.
Looking for that one insight to break the deadlock. Appreciate any thoughts!
https://redd.it/1p1fiw9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
QA tests blocking our CI/CD pipeline 45min per run, how do you handle this bottleneck?
We've got about 800 automated tests in the pipeline and they're killing our deployment velocity. 45 min average, sometimes over an hour if resources are tight.
The time is bad enough but the flakiness is even worse. 5 to 10 random test failures every run, different tests each time. So now devs just rerun the pipeline and hope it passes the second time which obviously defeats the entire purpose of having tests.
We're trying to ship multiple times daily but qa stage has become the bottleneck so either wait for slow tests or start ignoring failures which feels dangerous. We tried parallelizing more but hit resource limits also tried running only relevant tests per pr but then we miss regressions.
It feels like we're stuck between slow and unreliable. Anyone actually solved this problem? We need tests that run fast, don't randomly fail, and catch real issues. Im starting to think the whole approach might be flawed.
https://redd.it/1p1uh6c
@r_devops
We've got about 800 automated tests in the pipeline and they're killing our deployment velocity. 45 min average, sometimes over an hour if resources are tight.
The time is bad enough but the flakiness is even worse. 5 to 10 random test failures every run, different tests each time. So now devs just rerun the pipeline and hope it passes the second time which obviously defeats the entire purpose of having tests.
We're trying to ship multiple times daily but qa stage has become the bottleneck so either wait for slow tests or start ignoring failures which feels dangerous. We tried parallelizing more but hit resource limits also tried running only relevant tests per pr but then we miss regressions.
It feels like we're stuck between slow and unreliable. Anyone actually solved this problem? We need tests that run fast, don't randomly fail, and catch real issues. Im starting to think the whole approach might be flawed.
https://redd.it/1p1uh6c
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking at how FaceSeek works made me think about the DevOps side of large scale image processing
I tried a face search tool called FaceSeek with an old photo just out of curiosity. The quick response time surprised me and it made me think about the DevOps challenges behind something like that. It reminded me that behind every fast public facing feature there is usually a lot of work happening with pipelines, caching strategies, autoscaling, and monitoring.
I started wondering how a system like FaceSeek handles millions of embeddings, how it manages indexing jobs, and how it keeps latency reasonable when matching images against large datasets. It also made me think about what the CI and CD setup for this kind of workload would look like, especially when updating models or deploying new versions that might change the shape of the data.
This is not a promotion for FaceSeek. It simply sparked a technical question.
For those experienced in DevOps work, how would you approach designing the infrastructure for a system that depends on heavy preprocessing tasks, vector search, and bursty user traffic? I am especially curious about how to structure queues, scale workers, and maintain observability for something that needs to handle unpredictable spikes.
Would love to hear thoughts from people who have dealt with similar workloads.
https://redd.it/1p1v5vl
@r_devops
I tried a face search tool called FaceSeek with an old photo just out of curiosity. The quick response time surprised me and it made me think about the DevOps challenges behind something like that. It reminded me that behind every fast public facing feature there is usually a lot of work happening with pipelines, caching strategies, autoscaling, and monitoring.
I started wondering how a system like FaceSeek handles millions of embeddings, how it manages indexing jobs, and how it keeps latency reasonable when matching images against large datasets. It also made me think about what the CI and CD setup for this kind of workload would look like, especially when updating models or deploying new versions that might change the shape of the data.
This is not a promotion for FaceSeek. It simply sparked a technical question.
For those experienced in DevOps work, how would you approach designing the infrastructure for a system that depends on heavy preprocessing tasks, vector search, and bursty user traffic? I am especially curious about how to structure queues, scale workers, and maintain observability for something that needs to handle unpredictable spikes.
Would love to hear thoughts from people who have dealt with similar workloads.
https://redd.it/1p1v5vl
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community