Reddit DevOps – Telegram
Funny how the worst DevOps bottlenecks have nothing to do with tools, and almost nobody brings them up.

Every time people talk about DevOps, the conversation somehow circles back to tools, CI/CD choices, Kubernetes setups, IaC frameworks, whatever. But the longer I’ve worked with different teams, the more I’m convinced the biggest bottlenecks aren’t usually the tools. 

It’s all the weird “in-between” stuff nobody ever brings up.

One thing I keep running into is just… messy handoffs. A feature is “done,” but the tests are half-missing, or the deploy requirements aren’t clear, or the local/staging/prod environments are all slightly different in ways that break everything at the worst possible moment. 

None of that shows up in a DevOps guide, but it slows things down more than any actual infrastructure issue.

Another one, slow feedback loops. When a pipeline takes 20-30 minutes per commit, people won’t say anything, but they silently start pushing code less often. 

It completely changes how the team works, even if the pipeline is technically “fine.”

Anyway, I’m curious what other people have seen.

What’s a DevOps bottleneck you’ve dealt with that doesn’t really get talked about?

https://redd.it/1p7ynlq
@r_devops
Deployment to production . Docker containers

We have a automated ci cd environment for the Dev triggered by any changes to dev . Most of the artifacts are either react app or docker containers


Now we need to move this containers to a prod environment. Assume aws and different region.

Now how do we deploy certain containers. Would it be manual as containers are already built amd noscripts need to be built to just deploy a certain docker image to a different t region ?

https://redd.it/1p7z1ij
@r_devops
QA/Dev AI testing tool

Hey everyone! I’m working on a new AI-powered QA tool called Sentinel that’s still in development, but we’ve got a few features ready to test out and I’d love to get some real-world feedback. Basically, it helps with things like self-healing tests, AI-driven dashboards, and visual regression comparisons, and I’m looking for a couple of companies or teams who might want to give it a spin and let me know what they think. If you’re interested in trying it out and giving some feedback, just let me know!

P.S.

It’s not a magic AI tool that claims that’s going to take over your testing. It’s more of a dev focused tool that provides insights and gives suggestions.

https://redd.it/1p82kdi
@r_devops
SWE with 7 yoe, thinking about applying to an internal devops/kubernetes role. Advice?

Hello everyone. I’ve been thinking about making a move into a DevOps/kubernetes role at my company, and wanted to hear from people with real experience in the field.

A bit about my background:
- 7 yoe in big data/software development/data engineering, including about 4 years of Python and general noscripting
- 4 yoe working directly with Kubernetes. Writing Helm charts, deploying and maintaining internal apps, debugging, etc.
- 4 yoe managing multiple EKS clusters, handling upgrades with terraform, maintaining monitoring stacks, etc.

Reasons for wanting to make the jump:
- I enjoy managing our EKS infrastructure. I enjoy working with kubernetes.
- I’ve become a bit disinterested in coding. Particularly the CRUD apps. With how much AI can handle now, it’s honestly demotivating, and I really dislike the typical software engineering interview process.
- Maybe this is naïve, but DevOps feels like one of the more AI-safe areas. Much of my software development work can be heavily automated, but the debugging and fire-fighting we do in our current infrastructure feels a lot harder for AI to replace anytime soon.
.

Reasons I’m hesitant:
- It’s a new domain. I think I have a leg up with my current k8s experience, but I really lack networking/linux expertise.
- Stress level. I’m certainly no stranger to late night fire fighting and upgrades. But I’m not sure how much I can handle in the long term.
- Long term outlook. Is this field going to have a future as AI grows?
- Maybe im in a bit of “grass is greener” scenario?


Just seeking some advice/opinions from more experienced folk.

https://redd.it/1p86b77
@r_devops
Developed a tool for instant, local execution of AI-generated code — no copy/paste.

Create more bad code! Do more vibe coding with fully automated degeneration with Auto-Fix!

People hate AI Reddit posts so I keep it real the project was, of course Vibe Coded.

But its fully working and tested. You can use with Ollama or any API (Google, Claude, OpenAI or your mother).

You have a Vibe tell it, AI code its, Executes it local on your machine(your fucked) but NO its in a Docker so not yet and this Docker you can even export. If there is an error it sends the error back and generates new code that hopefully works.


As your prompting like a monkey, it doenst matter, someday the Auto-Fix will Fix ist for you. You have no idea what just happend, but things are working?


Great now you can export the whole Docker Container with the Program inside und Ship to to Production ASAP. What a time to be alive!


https://github.com/Ark0N/AI-Code-Executor


Below the "serious" information:

https://redd.it/1p87pub
@r_devops
Manage cultural change

Hello,
Coming from a technical background, I’ve recently been offered the opportunity to become an observability advocate at my current organization, within a team that promotes DevOps and manages the so-called “DevOps” tools (closer to platform engineering).
The current situation is the result of a legacy, highly siloed structure: developers are not very engaged in observability. They either lack time, interest, or feel it isn’t their responsibility. Operations are still handled by dedicated teams using older processes and tools, and developers or application managers are only involved when incidents are escalated through tickets.
A new observability platform has been purchased, but it hasn’t yet been fully integrated into existing processes.
I’m curious to hear about your experience: how would you approach cultural change in this situation? How can we encourage people to invest in observability and take more ownership of their applications (“you build it, you run it”)?
I’m also open to any resources you can share on driving cultural change, as this is still relatively new to me.

Thank you all for reading, and for any help you can provide.

https://redd.it/1p88ao8
@r_devops
I built a "Portable" Postgres/FastAPI stack with baked-in DR, Connection Pooling, and Load Testing

https://github.com/Selfdb-io/SelfDB-mini

We all know moving stateless containers is trivial, but moving stateful workloads (databases) usually involves a manual checklist of pg_dumpscp, volume mounting, and re-aligning environment variables.

I built SelfDB-mini to make the "stateful" part as portable as the container itself, specifically for self-hosted or on-prem environments where you don't have managed RDS.

The "Disaster Recovery" Approach:
Instead of relying on external backup agents, the system treats the database state and the runtime configuration as a single portable unit. It bundles the SQL dump and the .env config into a .tar.gz artifact.

Migration: Spin up a fresh `docker-compose` stack on a new server, upload the artifact via the UI (or CLI), and the system restores the DB and injects the config automatically.

The Architecture (Batteries Included):
I didn't want a toy setup, so I included the infrastructure needed for stability:

Connection Pooling: PgBouncer is pre-configured in front of PostgreSQL 18. (Essential for async Python apps to prevent connection exhaustion).
Observability/Testing: I baked in Locust for load testing and Schemathesis for API contract testing, so you can validate the stack immediately after deployment.
Backend: FastAPI (Python 3.11) running on uv.

It’s open-source and fully Dockerized. I’d love to hear your thoughts on this "snapshot" approach for smaller deployments versus traditional streaming replication.

https://redd.it/1p8bke3
@r_devops
I built a Python ingestion pipeline to archive Reddit data locally.

I needed a way to archive and analyze large volumes of text data (specifically engineering career discussions) from Reddit without relying on the heavy overhead of Selenium, but using PRAW, cuz duh .

It's an ingestion pipeline (ORION) that runs locally.

The Architecture:

Ingestion: Python requests hitting Reddit's JSON endpoints directly rather than parsing HTML.
Rate Limiting: Implemented a custom delay logic to handle HTTP 429 backoffs without getting the IP blacklisted.
Transformation: Parses the raw nested JSON tree, cleans the data (removes stickies/automod spam), and structures it into linear text/PDF reports.
Resource Usage: Runs on minimal resources (no headless browser required).

It’s a specific tool for a specific job, but I thought the approach to handling the JSON endpoints might be interesting to anyone looking to build lightweight things

Source Code: https://mrweeb0.github.io/ORION-tool-showcase/

It's non promotional an fully open source, munch trho it.

Feedback on the error handling logic is welcome.

https://redd.it/1p8aujq
@r_devops
What’s the right way to deal with a QA team that slows down your workflow?

I am a dev and I’m running into some issues with my QA team. I’m trying to get a clear picture of what’s actually causing them because we keep seeing vague bug reports, inconsistent coverage, and build/test mismatches, and it slows things down more than it should. don't get me wrong, i’m not looking to blame anyone here, I’ve worked with brilliant QA teams before and clearly know how important the role is.

I just want to understand where these breakdowns usually start and how to go about addressing them without creating internal conflict, and what a healthy QA–dev process actually looks like. appreciate everyone's feedback

small ps: please be respectful and contribute productively to the thread.

https://redd.it/1p8em0j
@r_devops
Devops Job Titles question

I used to work for a AWS Ops Center, where mostly we monitored and tracked/recorded alerts thru cloudwatch.

After 2 years with the company they gave me AWS Admin rights, & the developers were not able to trigger the cards in Jenkins themselves since they were not admins, they trusted me to do so. Also the admin rights gave me rights to grant/deny access to instances/databases for developers for a certain amount of time (while they deploy their codes).

Since I do not have any coding background, I see that im not qualified to apply to DevOps positions. However, would there be any other positions i could apply to? Are there more job noscripts out there that are responsible for monitoring? Maybe i can learn how to create these alerts?

Is there a job noscripts for what i was doing? Or would it be worth while to learn the coding since i have experience of how Ci/CD works now.

https://redd.it/1p8f7wb
@r_devops
Which is the most popular CI/CD tool used nowadays?

SO, there are many CI/CD tools like Jenkins, Azure pipelines, GitHub Actions etc., Which one is the most popularly used in current market? I guess it would be GtHub actions based on its ease of use and flexibility. Any other tool apart from these that you can mention here? Thank you

https://redd.it/1p8glxi
@r_devops
Intel SGX alternative needed since they're killing attestation service

Earlier this year Intel announced killing SGX IAS which is attestation service for older trusted execution tech. Deadline April 2025 sounded far but migrations always take forever. Anyone built on SGX started scrambling to migrate to Intel TDX or AMD SEV. Problem was these aren't drop in replacements, APIs different and security models work differently.

I recently dug deep enough and saw an old post complaining about it and started thinking abt it. Back then companies were posting about this everywhere, lots of production workloads was still on SGX cause it was most mature for years and suddenly everyone was rebuilding. Silver lining is that newer stuff actually better with performance improved and less memory restrictions. Thankfully it wasn't just another migration for the sake of it. Still annoying tho when you build critical infrastructure on vendor hardware and they discontinue. Makes you think twice about single vendor dependence.

Wondering after some time passed, how widespread impact of this was how many production systems using SGX attestation need migration?

https://redd.it/1p8ilv5
@r_devops
How do you do CI/CD when you're not allowed to implement any automation

I'm currently looking into CI/CD options for a project I'm on. However, automated CI/CD is blocked indefinitely (even on a local machine not accessible to the Internet). I don't think I'd get approval for a simple Powershell automation either.

What are some ways to do some CI/CD like practices when automation is blocked indefinitely. I can't call it CI/CD or automation or it'll be blocked.

https://redd.it/1p8j5t0
@r_devops
Repository Firewall alternatives needed

Hi all,

I am evaluating the repository firewalls for a self hosted company (because npm)

The alternatives so far are:

Sonatype Repository Firewall
JFrog Curation: this might be the better option capability wise but also more expensive.

Do you use any other tools? Or have anything to say for/against them?


https://redd.it/1p8pee6
@r_devops
Zero downtime deployments without Kubernetes

Hey guys,


One of the nicest feature of Kubernetes are zero downtime deployments.


In general, thinking beyond Kubernetes, to have it for a web-based app that responds to requests, we must have some kind of proxy before it. Why? Because at the very moment of deploying a new app version, we cannot take down the previous one immediately; it must be up and running until the new one is ready.

What do you guys use to have zero downtime deployments when you do not use Kubernetes?

https://redd.it/1p8vyxi
@r_devops
Seeking Advice

I have a network administrator degree and want to get into devops. I've been looking at videos on YouTube and getting some experience with my homelab, but it is only taking me so far. I would like to find a bootcamp that has either live instruction or at least will be able to answer questions if I get stuck on a question. If any one in the community can point me in the right direction, that would be great.

https://redd.it/1p8wb9i
@r_devops
Nexus choked to death

A funny incident happened today at my workplace. For context, our company enforced pulling from public repo strictly through Nexus proxy.

I had finish with hardening AL2023 minimal with Nexus proxy configured. Who would've thought DNF 3 packages during build stage would brought down our Nexus server which never happened until today. The platform guy thought "Huh guess it's time to scale up". He did with 16 vCPU and 64 GB of memory. Same thing happened. He couldn't believe it and like "Aight imma get Sonatype support for this".

Not long after a devops guy called me just want to see it live + trying to blame my dockerfile. He noticed there's a for-loop though it only disables repo lists except for Nexus's. I build the image to prove it to him and lo and behold, Nexus server died in front of his eyes. He laughed in disbelief for a good minute there.

In the end, he asked me to rebuild again so he can record and show to support.

Not sure what happened tbh but it's pretty funny ngl.

https://redd.it/1p8zgrd
@r_devops
Anyone here taken the CNPE (Cloud Native Platform Engineer) certification?

Hey all,


The CNPE certification is now available, and I’m curious, has anyone here taken it yet?
What was your experience? Difficulty level? Worth it for platform engineers?

Would love to hear your thoughts before I go for it.

https://redd.it/1p918pk
@r_devops