Reddit DevOps – Telegram
Is support in the same time zone important to you?

Have you ever dropped (or avoided) a tool because the vendor was on the ‘wrong’ side of the world for your team?

I‘ve had a quite interesting discussion with my buddy working as a CTO (based in Germany), who said he prefers to work with European Vendors due to their customer support being in the same time zone. Of course AI Bots are reducing this friction, but still.

Would you chose a US-based vendor over an Australian or European? Or does time zone difference not have any impact at all?

https://redd.it/1nhk60j
@r_devops
Feeling stuck in DevOps career after 2 years, not sure how to prepare for interviews

Hey folks,

I have been working as the DevOps Engineer with 2 yrs of experience, so my current company is completely uncertain and don't know what will happen at what time, so I am applying for job switch , I have did good accomplishments like scaling Kubernetes workloads, automating mobile build pipeline from scratch but the thing is, I am not mastered any of the things, I kept my footprints in the all the tech stacks and worked on demand by researching it.

Recently i gave an interview with ZETA for SRE 2 role, they asked me below questions
1. Jenkinsfile stages , like checkout,build, push and deploy so I wrote the skeleton


2 - python question (two sum problem), i solved it, but u was asked for the time complexity of the 5 line python problem 🙂, why do DevOps Engineers need Time complexity, since we use python most of the time to automatic the tasks


3 - python noscript for archiving 10 days older file and push to s3, I created a pseudocode noscript with the flow



4 - among 3 replica , 1 pod is giving crashloopback, I answered , possibilities, OOMkilled, PvC in different regions node is in different

But they expected the bookish answers I think, Nothing they have asked about my work which i mentioned in resume, just came up with the questions and share it with Google docs

Pls can anyone guide me how can I prepare for the interview and become interview-ready

Thank you in advance

https://redd.it/1nhlgn0
@r_devops
Our AWS bill is getting insane (>95k/mo), I'm going insane, how do we even start to lower it?

Our company's AWS bill has been steadily climbing for the past few months and it's starting to get out of control.



We don't even fully understand why. We have all the usual monitoring tools and dashboards, which tell us what services are costing the most (EC2, RDS, S3, of course), and when usage spikes. But things are still unpredictable.



It feels like we're constantly reacting. We see a spike, we investigate, maybe we find an obvious runaway process or an unoptimized query, we fix it, and then another cost center pops up somewhere else. It's getting rly fkn annoying.



We don't know which teams are contributing most to the increases in a meaningful way. We can see service usage, but translating that into "Team A's new feature" or "Team B's analytics pipeline" is a manual, time-consuming nightmare involving cross-referencing dashboards and asking around.



We don't know why specific architectural decisions or code deployments are leading to cost increases before they become a problem.



Our internal discussions about cost optimization often go in circles because everyone has anecdotal evidence, but we lack a clear, synthesized understanding of the underlying drivers. Is it dev environments? Is it staging? Is it that new batch job? Is it just general growth?. No way to validate these.



We're trying to implement FinOps principles, but without a clear way to attribute costs and understand the why behind usage patterns, it's incredibly difficult to foster a culture of cost awareness and ownership among our engineering teams. We need something that can connect the dots between our technical metrics and the actual human decisions and activities driving them.



Any advice or tips would be greatly appreciated. Also open to third party tools as long as they won't take over our account or billing.

https://redd.it/1nhlsz5
@r_devops
How often do you actually use scalability models (like the Universal Scalability Law) in DevOps practice?

I’ve been studying the Universal Scalability Law (USL) introduced by Neil. J. Gunther, which models throughput with factors for resource contention (σ) and coordination overhead (κ).

On paper it feels like a great way to reason about when adding servers stops giving you linear gains. But in real SRE/DevOps practice, I rarely see people talk about it explicitly.

For example: do you ever use USL (or similar models) to guide capacity planning, cluster sizing, or cost/performance trade-offs? Or is it more common to rely purely on load testing and dashboards?

Curious to hear how much theory like this actually makes it into day-to-day operations, and if you’ve seen cases where it helped (or failed) in real-world systems.

Reference for USL: https://cran.r-project.org/web/packages/usl/vignettes/usl.pdf?

https://redd.it/1nhltik
@r_devops
Migrating GKE Dataplane V1 → V2 (PVC Backup + Terraform state questions)

Hi everyone,

I’m currently testing a migration from GKE Dataplane V1 to V2 and decided to use GKE Backup for the process. I’ve run into two issues and would love some advice from people with more experience:


1. PVC Backup stuck in Pending
• Whenever I try to back up PVCs, the restore ends up stuck in Pending.
• I also noticed that the StorageClass changes automatically (from standard-rwo → gce-pd-gkebackup-de).
• Is this expected? Do I need to adjust snapshot config or handle StorageClass mapping differently?



2. Terraform state management after upgrade
• My cluster and resources are managed with Terraform (state stored in GCS).
• After upgrading, I thought about running terraform import on existing resources to re-sync them with state.
• Is that the right approach, or would you recommend another strategy (e.g. terraform state mv, or letting Terraform recreate)?


I’m still learning, so I’d really appreciate best practices or lessons learned from anyone who’s been through a Dataplane V1 → V2 migration 🙏

https://redd.it/1nhry7f
@r_devops
DNS server on Macos

Hey,

I am a devops engineer and the company for some reason gave me a Mac (not my initial choice btw)
I want some DNS server tool, where I can manage dns server and Microsoft AD, anyone?

https://redd.it/1nhoz4p
@r_devops
How to get DevOps job

Hello everyone i am a relitavely new DevOps person. I just graduated from college and i am looking into DevOps jobs but I cant seem to find any jobs that fits my requirements. They are looking for 5+ years experience in this field and there arent many entry level roles in this field.
Can you tell me how to get started i am applying non stop to the jobs with chatgpt premium by modifying my resume to the targeted jobs and even lying in some areas but i am still getting rejection mails.
I have a very good understanding of my field i have certifications of AWS, RHCSA (almost finishing RHCE now), and terraform and i have done multiple projects (Terraform, ansible, ec2,Kubernetes ,Eks) self projects since i have no prior DevOps working experience i just have 1 year software development experience in my Home country not here
any leads or idea on how to get a job would be appreciated
thank you
this is my resume https://docs.google.com/document/d/1db9Q4XpLDRNKhTeN0RUa4Ff9rhNBuT3nVlyyHbCjekA/edit?usp=sharing
If anyone wants to see it

https://redd.it/1nhyall
@r_devops
Is it good to upgrade in macOS Tahoe 26 now?

Are there any bugs or issues that you have encountered or know so far while doing Flutter dev?

https://redd.it/1ni50oy
@r_devops
Same docker image behaving differently

I have docker container running in kubernetes cluster, its a java app that does video processing using ffmpeg and ffprobe, i ran into weird problem here, it was running fine till last week but recently dev pushed something and it stopped working at ffprobe command. I did git hard reset to the old commit and built a image, still no luck. So i used old image and it works.. also same docker image works in one cluster but not in diff cluster.. please help i am running out of ideas to check

https://redd.it/1ni9dym
@r_devops
I NEED A MOBILE PAGER

I’ve been banging my head against this for a while and can’t quite land on the best solution, so hoping someone here can point me in the right direction.

I’ve got CloudWatch + SSM set up on my EC2 instances to monitor CPU, memory, and disk. The alerting part works fine, but the way I receive them is the problem.SMS is too costly in the long run while Emails end up buried and don’t really grab my attention.


What I’d really like is some kind of free pager-style app for Android that AWS can push notifications to (via HTTP/HTTPS API) — something loud and impossible to ignore, like a siren on my phone.

Does anyone have a solid recommendation for this kind of setup? Ideally free, reliable, and works well with AWS alarms.

Appreciate any tips or personal experiences


gpt enhanced for clarity

https://redd.it/1niaqhe
@r_devops
AWS ECS ( CI / CD )

which CI/CD you guys are using and which is better ??

note : needs to self hosted

https://redd.it/1niapty
@r_devops
DevOps team set up 15 different clusters 'for testing.' That was 8 months ago and we're still paying $87K/month for abandoned resources.

Our Devs team spun up a bunch of AWS infra for what was supposed to be a two-week performance testing sprint. We had EKS clusters, RDS instances (provisioned with GP3/IOPS), ELBs, EBS volumes, and a handful of supporting EC2s.

The ticket was closed, everyone moved on. Fast forward eight and a half months… yesterday I was doing some cost exploration in the dev account and almost had a heart attack. We were paying $87k/month for environments with no application traffic, near-zero CloudWatch metrics, and no recent console/API activity for eight and a half months. No owner tags, no lifecycle TTLs, lots of orphaned snapshots and unattached volumes.

Governance tooling exists, but the process to enforce it doesn’t. This is less about tooling gaps and more about failing to require ownership, automated teardown, and cost gates at provision time. Anyone have a similar story to make me feel better? What guardrails do you have to prevent this?

https://redd.it/1nieqfn
@r_devops
Pod requests are driving me nuts

Anyone else constantly fighting with resource requests/limits?
We’re on EKS, and most of our services are Java or Node. Every dev asks for way more than they need (like 2 CPU / 4Gi mem for something that barely touches 200m / 500Mi). I get they want to be on the safe side, but it inflates our cloud bill like crazy. Our nodes look half empty and our finance team is really pushing us to drive costs down.


Tried using VPA but it's not really an option for most of our workloads. HPA is fine for scaling out, but it doesn’t fix the “requests vs actual usage” mess. Right now we’re staring at Prometheus graphs, adjusting YAML, rolling pods, rinse and repeat…total waste of our time.


Has anyone actually solved this? Scripts? Some magical tool?
I keep feeling like I’m missing the obvious answer, but everything I try either breaks workloads or turns into constant babysitting.
Would love to hear what’s working for you.

https://redd.it/1niec2z
@r_devops
we deploy our app on ec2 instance with docker-composer. how to get more observability of docker containers on aws native? i’m unable to use config.json to scrape docker metrics in cwagent

e

https://redd.it/1nigaft
@r_devops
Any AI code review tools for GitHub PRs?

my agency’s been using cursor to ship features faster (seriously insane how much time it saves). BUT once code hits github prs… cursor doesn’t help. we still do manual reviews and end up missing dumb stuff. been going through this whole list of tools (coderabbit, qodo, codium, greptile, etc) and honestly i’m CONFUSED AF. every site says “best ai code review” but half of it feels like hype demos. currently following this list - https://www.codeant.ai/blogs/best-github-ai-code-review-tools-2025 but i think there is a lot missing here too?

all i really want is something that can act like a second pair of eyes before merge. doesn’t need to be magical, just catch obvious things humans miss. open source would be cool too, but i’m fine with paid IF IT ACTUALLY WORKS in production. anyone here using these daily? what’s worth the setup?

https://redd.it/1niif8l
@r_devops
Interacting with a webpage during tests

I'm implementing some features for a docker compose based application. Some of such features are backup and restore.

I'd like to add some tests for this.

The steps would be something like the below

docker compose up

# Assert the instance is actually working by logging in
# Change username, profile image and update/install some apps

make backup

docker compose down --remove-orphans --volumes

docker compose up

make restore

# Assert the changes previously made are all still there


I'm having a hard time finding a good solution how to interact with the web page and do the stuff prefixed with #. Do I have better options then adding noscripts based on PlayWright, Selenium or Cypress?

https://redd.it/1niji8v
@r_devops
Resources for learning Openshift for someone who's already experienced in Kubernetes?

I have 5 years of Kubernetes experience. I have a technical interview coming up for a job I'm determined to get, though it's an open shift job.

What are the best resources for learning open shift when you already understand Kubernetes?

https://redd.it/1niizpl
@r_devops
Which AI coding assistant is best for building complex software projects from scratch, especially for non-full-time coders?

Hi everyone,

I’m an embedded systems enthusiast with experience working on projects using Raspberry Pi, Arduino, and microcontrollers. I have basic Python skills and a moderate understanding of C, C++, and C#, but I’m not a full-time software developer. I have an idea for a project that is heavily software-focused and quite complex, and I want to build at least a prototype to demonstrate its capabilities in the real world — mostly working on embedded platforms but requiring significant coding effort.

My main questions are:

Which AI tools like ChatGPT, Claude, or others are best suited to help someone like me develop complex software from scratch?
Can these AI assistants realistically support a project of this scale, including architectural design, coding, debugging, and iteration?
Are there recommended workflows or strategies to effectively use these AI tools to compensate for my limited coding background?
If it’s not feasible to rely on AI tools alone, what are alternative approaches to quickly build a functional prototype of a software-heavy embedded system?

I appreciate any advice, recommendations for specific AI tools, or general guidance on how to approach this challenge.

Thanks in advance!

https://redd.it/1ninjyc
@r_devops
I may be over relying on AI and I’m not sure how to stop

I understand that similar questions might have been asked before but most of the answers assume the person is thinking of ditching AI entirely and people say it’s only a tool and should be used.

My problem is I’m still basically at the first levels of devops and I can’t for the life of me learn with a deadline. I understand the concepts and what almost everything does, but writing those noscripts? Almost every time I have a project , even if personal, with a deadline I use AI and as the noscripts and stuff are generally easy and simply, it does it in a single message.

I then assume I’ll finish everything and submit and then take the time to understand, and while I do actually understand, I wouldn’t be able to replicate or do some of those noscripts completely on my own.

What did everyone do at the start? How did you start studying and understand without relying much on AI? And when do you mix AI with your work? I know that maybe in the future we won’t be writing noscripts but I’d like to at least know how to write them and then I can throw it on the AI.

https://redd.it/1nir0ap
@r_devops
Basic tool for small tasks during the day using pomodoro technique for focus

I have difficulty jumping from tool to tool, projects, languages and you can't really track time with project management tools. I started writing a tool after some courses and books in go. It works for Linux/wsl/mac not windows cause I still have some issues.

You just start a task in your terminal like:
Pomo-cli start --task "write post in reddit" --time 15 --background

Then a pid process starts and a local db is updated in your homedir\.pomo-cli. After it finishes you receive a message in the terminal and it's added to the db. You can also view the statistics and pause the task. It helps me focusing and take short breaks between changing repos or tools.

If anyone wants to use it:
https://github.com/arushdesp/pomo-cli

https://redd.it/1niqnir
@r_devops