Reddit DevOps – Telegram
DevOps Eng Looking for Collaboration: Exchange High-Perf US-East Infra for Project Ideas

Hey y'all,
I know the pain of launching a project on cheap, distant infrastructure. I’ve currently got a high-spec, low-latency VPS with Cloudpanel in Ashburn, VA (US-East) that is sitting partially underutilized and screaming for a purpose.

I'm looking to partner with other engineers, developers, or product people who have solid Micro-SaaS or AI-powered app ideas but need a high-performance, cost-free environment to launch and test.

The Proposition: I provide the optimized infrastructure and ongoing maintenance/scaling; you provide the project concept and handle the development/marketing. We agree on a fair profit-split. Thinking specifically about projects where latency matters (e.g., real-time tools, high-traffic APIs).

If you have an idea that needs a rock-solid US-East foundation, hit me up!

https://redd.it/1owmfis
@r_devops
Introduction to Docker Image Optimization — practical steps and pitfalls for smaller, faster containers

Hi all — I recently wrote a blog post that walks through how to **optimize Docker container images**, focusing on common mistakes, layering strategies, build cache nuances, and how to reduce runtime footprint.

Some of the things covered:

* What makes a Docker image “bloated” and why that matters in CI/CD or production.
* Techniques like multi-stage builds, minimizing base images, proper layer ordering.
* Real-world trade-offs: speed vs size, security vs size, build complexity vs maintainability.
* A checklist you can apply in your next project (even if you’re already comfortable with Docker).

I’d love feedback from fellow devs/ops folks:

* Which techniques do you use that weren’t covered?
* Have you run into unexpected problems when trying to shrink images?
* In your environment (cloud, on-prem, edge) what did image size actually cost you (time, storage, cost)?

Here’s the link: [https://www.codetocrack.dev/introduction-to-docker-image-optimization](https://www.codetocrack.dev/introduction-to-docker-image-optimization)

I’m not just dropping a link — I’m here to discuss, clarify, expand on any bit you find interesting. Happy to walk through any part of the post in more depth if you like.

https://redd.it/1owq0t5
@r_devops
Hiring dev / cloud help

I'm trying to setup code in cloud, i'm doing it on azure and it doesn't load right, the website is blank and it shouldn't be. It might be code or setup issue I don't know. I've asked AI and it doesn't know what to do. I'll pay like $100 or more for the fix which should take like 2 hours. $50/h. And you'll look and tell me what's the issue and fix it. I want it done now so send me dm and let me know if you can do it.

https://redd.it/1owsoxk
@r_devops
Context aware AI optimization for Spark jobs

trying to optimize our Spark jobs using some AI suggestions, but it keeps recommending things that would break the job. The recommendations don't seem to take into account our actual data or cluster setup. How do you make sure the AI suggestions actually fit your environment? looking for ways to get more context-aware optimization that doesn't just break everything.

https://redd.it/1owthpv
@r_devops
Anyone in Europe getting more than 100K?

Hello all,

I'm looking for a job as the US client I'm currently working for didn't like I took paternity leave.

I'm wondering how difficult is to find a remote job where I can get more than 100K. Is this realistic?

Any advice for the ones who managed to do so? I've thought about creating a LLC in the US and then try to find clients over there but that's gonna be hard as hell plus the bureaucracy.

Another option I've thought is to go niche, taking into advantage I have a past in embedded software I have thought about going into eBPF or something like that. Any recommendations? There are many paths kubernetes development, AI, security, etc. so I'm a bit lost about this option.

For the ones interested in helping me in the right direction my CV is here https://www.swisstransfer.com/d/a438c72f-e4b3-4ee8-a114-09d177118015 feel free to connect on Linkedin.

Thank you in advance.

https://redd.it/1owt72p
@r_devops
Implementing a Telemetry Agent in 2025

If you were redesigning a telemetry agent (something like Fluent Bit) in 2025, what would you focus on?

https://redd.it/1owx9a3
@r_devops
Choosing dev products between GCP and Cloudflare

I'm considering using Google Cloud Platform and Firebase for my next SaaS project.

Since GCP doesn't offer domain registrar, I'm also looking at Cloudflare because they provide a lot of interesting products, not just domains, that I might want to use in the future.

Here's what I have so far:

Database — Google Cloud SQL (Postgres)
Compute — Google Cloud Run
Auth — Firebase Authentication
Domains — Cloudflare Registrar

And now I need to decide on:

Storage — Google Cloud Storage vs Cloudflare R2
Hosting — Firebase Hosting vs Cloudflare Pages

I initially wanted to keep everything within GCP, but Cloudflare R2 has lower pricing and no egress fees.

If you were in my shoes, what would you choose? Is there anything else I should consider?

https://redd.it/1owyt2d
@r_devops
Integrated AI for bug detection into our CI/CD and it's catching bugs but also creating new problems

Was skeptical about AI test tools but our manual QA process was becoming a bottleneck. Every deploy meant waiting 4-6 hours for the QA team to run through test cases and half the time they'd miss something anyway.

Added Spur to our pipeline last sprint. It runs through critical user flows automatically which is great, but we're still dealing with some false positives and figuring out how to write tests that don't break with every UI change.

Did catch a real bug yesterday in staging that would have taken down checkout in production. The AI noticed that a form validation change broke the submit button for users with certain browser extensions. Not something we would have tested manually.

Still figuring out the right balance between test coverage and build time. And writing effective test scenarios is more art than science. Anyone else integrating AI testing into their pipeline? What's your experience been?

https://redd.it/1owzo13
@r_devops
I built a free AWS certs practice platform – introducing CLOUD.VERSE

Earlier this year I shared here a simple single-file HTML quiz for AWS certifications. It worked, but it was very limited: one page, one flow, no real structure.

I’ve now rebuilt it from the ground up as CLOUD.VERSE, focused on a more realistic exam experience and better feedback for people seriously preparing for AWS certs.

Entirely done w/ CC and Codex in VS.

Link in the comments (free, no login required):

What’s inside (current version)

Certs covered
AWS Cloud Practitioner (CLF-C02)
AWS Solutions Architect Associate (SAA-C03)
AWS AI Practitioner (AIF-C01)
Practice modes
Quick mode: 35 questions / 40 minutes
Full mode: 65 questions / 130 minutes
Domain-focused practice
Review mode
Exam-like UX
Timer
Question grid navigation
“Mark for review”
Multi-select questions with required selection counts enforced
Feedback and scoring
Detailed explanations
“Why the other options are wrong”, not only which one is correct
AWS-style score range (100–1000)
Donut-style analytics by domain instead of just a final percentage
General experience
Questions filtered by certification, domains, tier, and seed
Responsive layout, fast navigation, and a UI designed to stay out of the way so you can focus on thinking
Optional Ko-fi support for anyone who wants to help, but no paywall on the practice itself

Why I built this (and why it’s free)

I’ve seen how much a single AWS certification can change someone’s career, and I’ve also seen how the price of courses and practice exams quietly excludes a lot of people.

CLOUD.VERSE is my attempt to lower that barrier: serious, exam-style practice that feels close to the real thing, but without locking access behind a payment page. The basic principle is simple: access first, funding second. Donations help with hosting/maintenance and keep me motivated, but they’re never required to study.

What I’d like from the community

Try a mode for the cert you’re studying (CLF-C02, SAA-C03, or AIF-C01)
Let me know:
If the difficulty feels close to your experience with the real exam
If the scoring and feedback are useful
What’s missing for this to be part of your regular study routine

I’d recommend using this alongside hands-on practice in AWS and the official docs/whitepapers, not as your only resource. But if you need structured, realistic questions to pressure-test your knowledge before exam day, CLOUD.VERSE is there to help.

https://redd.it/1ox2cdq
@r_devops
Open-source local (air-gapped) Claude-Code alternative for DevOps - seeking beta feedback

Been working on a small open-source project - a local Claude-Code-style assistant built with Ollama.

It runs entirely offline and uses a locally trained model optimised for speed, aimed at practical DevOps tasks: reading/writing files, running shell commands, checking env vars, etc.

Core points:

* Local model**:** Qwen3 1.7B via Ollama (\~1.1 GB RAM), small enough for CI/CD or air-gapped hosts
* Speed-optimised**:** after initial load, responses come in \~7–10 seconds (similar to ChatGPT or Claude.)
* No data leaking**:** no APIs, telemetry, or subnoscriptions — everything stays on your machine

The goal is a fast, transparent automation layer for DevOps teams, not a chat toy.

Repo: [github.com/ubermorgenland/devops-agent](https://github.com/ubermorgenland/devops-agent)

It’s early-stage but functional - would love a few beta testers to try it locally and share feedback or ideas for new integrations.

https://redd.it/1ox297t
@r_devops
Kubernetes just announced the retirement of the community Ingress-NGINX controller — here’s how to check if you’re affected

Kubernetes maintainers have officially announced that the *community* `ingress-nginx` controller is being retired.
After **March 2026**, there will be:

* no new releases
* no bug fixes
* no security patches

A lot of folks don’t realize this, but there are actually *two different* NGINX controllers with very confusing names:

1. **ingress-nginx** → community (this one is being retired)
2. **kubernetes-ingress (nginxinc)** → vendor-backed (not impacted)

If you installed ingress from the Kubernetes docs, you’re likely affected.
If you installed using the NGINX/F5 docs, you’re probably not.

I wrote a breakdown covering:

* how to check what your cluster is running
* the retirement timeline
* migration options (Gateway API, Traefik, Kong, vendor NGINX)
* a simple 4-week migration plan

Sharing it here in case it helps others avoid surprises:
👉 [https://deepakkumar2o.hashnode.dev/ingress-nginx-retirement-migrate-to-gateway-api](https://deepakkumar2o.hashnode.dev/ingress-nginx-retirement-migrate-to-gateway-api)

Not trying to self-promote — I just saw a lot of confusion in my team and thought this might help someone preparing for migration.

https://redd.it/1ox5wxx
@r_devops
Looking for resources to help with a NetDevOps automation project (books, articles, papers, projects)

Hey everyone,
I’m working on a NetDevOps project for my internship, and I’m looking for good resources to guide me. The project involves things like network automation, CI/CD for network configurations, traffic generation for testing, and possibly some AI for self-healing.

If you know any useful books, articles, research papers, GitHub projects, or even full learning paths, I’d appreciate your recommendations.

Thanks in advance!

https://redd.it/1ox44ov
@r_devops
Discussions/guidelines about AI generated code

We all know that there’s a push for using AI tools and certainly some appetite from engineers to use them. What guidelines have you put in place with regard to more junior folks pushing very obviously generated code?

What discussions have you had to have with them individuals about the quality of the code they’re pushing and is obviously generated?

Really not trying to take a side here on using or not using generally, but in some ways it feels like Cursor et al are motorbikes and some engineers have just shed their training wheels. And that maybe some engineers don’t have enough experience to know if the generated code should ever be committed or if it could use some massaging.

Do you see this problem where you’re at? Do you take the policy route and document best practices? Are you having individual conversations with folks? Is this just me? 😂

https://redd.it/1ox8yft
@r_devops
Just discovered something crazy on my website

I’ve been testing a new analytics setup and I can literally watch a video of what users do on my site.
Seeing real sessions changed everything… I noticed a small issue I had never caught before.

People would scroll, hesitate, and then completely miss the main CTA because it was slightly below the fold on mobile.

Do you use anything similar to analyze user behavior?

https://redd.it/1oxa2gf
@r_devops
Help Wanted

Help Wanted: Full-Time Developer for Social App MVP

We’re seeking an experienced developer (3+ years) to join us full-time and help launch our social app MVP within the next 1-3 months. We have the wireframes and UI/UX plans ready, and we need someone dedicated to bring this vision to life. If you’re passionate and ready to dive in, we’d love to connect!

https://redd.it/1ox9yd4
@r_devops
What was the tool that gave you your “big break”

I’m interested in what tool or maybe specialty allowed you to transition into DevOps. Like did you transfer from SWE or SysAd, did you get really good with Kubernetes or did you transfer from cloud. What’s everyone’s story?

https://redd.it/1oxg94i
@r_devops
I built an open source, code-based agentic workflow platform!

Hi r/OpenSourceAI,

We are building Bubble Lab, a Typenoscript first automation platform to allow devs to build code-based agentic workflows! Unlike traditional no-code tools, Bubble Lab gives you the visual experience of platforms like n8n, but everything is backed by real TypeScript code. Our custom compiler generates the visual workflow representation through static analysis and AST traversals, so you get the best of both worlds: visual clarity and code ownership.

Here's what makes Bubble Lab different:

1/ prompt to workflow: typenoscript means deep compatibility with LLMs, so you can build/amend workflows with natural language. An agent can orchestrate our composable bubbles (integrations, tools) into a production-ready workflow at a much higher success rate!

2/ full observability & debugging: every workflow is compiled with end-to-end type safety and has built-in traceability with rich logs, you can actually see what's happening under the hood

3/ real code, not JSON blobs: Bubble Lab workflows are built in Typenoscript code. This means you can own it, extend it in your IDE, add it to your existing CI/CD pipelines, and run it anywhere. No more being locked into a proprietary format.

we're also open source :) https://github.com/bubblelabai/BubbleLab

We are constantly iterating Bubble Lab so would love to hear your feedback!!

https://redd.it/1oxgdgg
@r_devops
How do you implement tests and automation around those tests?

I'm in a larger medium sized company and we have a lot of growing pains currently. One such pain is lack of testing just about everywhere. I'm currently trying to foster an environment where we encourage, and potentially enforce, testing but I'm not some super big expert. I try to read about different approaches and have played with a lot of things but curious what opinions others have around this.

We have a big web of API calls between apps and a few backend processing services that consume queues. I am trying to focus on the API portion first because a big problem is feature development in one area breaks another because we didn't know another app needed this API, etc, etc.


Here's a quick sketch of what I'm thinking (these will all be automated)

PR Build/Test
Run unit tests
Run integration tests
Run consumer contract tests
Spin up app with mocked dependencies in a container and run playwright tests against the app <-- (unsure if this should be done here or after deployment to a dev environment)
Contract testing
When consumer contract changes, kick off test against provider
Gate deployments if contract testing does not pass
After stage deployment
Run smoke tests and full E2E tests against live stage environment
After prod deployment
Run smoke tests


I'm sure once we have things implemented for a time we'll find what works and what doesn't, but I would love to hear what others are doing for their testing setup and possibly get some ideas on where we're lacking

https://redd.it/1oxg37a
@r_devops
Group, compare and track health of GitHub repos you use

Hello,

Created this simple website gitfitcheck.com where you can group existing GitHub repos and track their health based on their public data. The idea came from working as a Sr SRE/DevOps on mostly Kubernetes/Cloud environments with tons of CNCF open source products, and usually there are many competing alternatives for the same task, so I started to create static markdown docs about these GitHub groups with basic health data (how old the tool is, how many stars it has, language it was written in), so I can compare them and have a mental map of their quality, lifecycle and where's what.

Over time whenever I hear about a new tool I can use for my job, I update my markdown docs. I found this categorization/grouping useful for mapping the tool landscape, comparing tools in the same category and see trends as certain projects are getting abandoned while others are catching attention.

The challenge I had that the doc I created was static and the data I recorded were point in time manual snapshots, so I thought I'll create an automated, dynamic version of this tool which keeps the health stats up to date. This tool became gitfitcheck.com. Later I realized that I can have further facets as well, not just comparison within the same category, for example I have a group for my core Python packages that I bootstrap all of my Django projects with. Using this tool I can see when a project is getting less love lately and can search for an alternative, maybe a fork or a completely new project. Also, all groups we/you create are public, so whenever we search for a topic/repo, we'll see how others grouped them as well, which can help discoverability too.

I found this process useful in the frontend and ML space as well, as both are depending on open source GitHub projects a lot.

Feedback are welcome, thank you for taking the time reading this and maybe even giving a try!

Thank you,

sendai

PS: I know this isn't the next big thing, neither it has AI in it nor it's vibe coded. It's just a simple tool I believe is useful to support SRE/DevOps/ML/Frontend or any other jobs that depends on GH repos a lot.

https://redd.it/1oxnhq6
@r_devops