Reddit DevOps – Telegram
How much of this AWS bill is a waste?

Started working with a big telecom provider here in Canada, these guys are wasting so much on useless shit it boggles my mind

Monthly bill for their cutting edge "tech innovation department" (the in-house tech accelerator) clocks in at $30k/m.

The department is suppose to be leading the charge on using AI to reduce cost and use the best stuff AWS can offer and "deliver best experience for the end user".

First day observations.

EC2 over provisioned by 50%. currently x50 instance could be half to 25. No cloudwatch, no logging, no monitoring is enabled, no one can answer "do we need it?" questions.

No one have done any usage analysis over the past 18 months, let alone the best practice of evaluating every 3-6 month.

There's no performance baseline, no SLAs for any of the services. No uptime guarantee (and they wonder why everyone hates them), no load/response time monitoring.. no cost impact analysis.

NO infra as code (ie terraform), no auto scaling policies and definitely no red teaming/resilience test.

I spoke to a handful architects and no one can point me to the direction of FinOps team who's in charge of cost optimization. so basically the budget keeps growing and they keep getting sold to.

I honestly don't know why I'm here.

https://redd.it/1o5toxi
@r_devops
Do homelabs really help improve DevOps skills?

I’ve seen many people build small clusters with Proxmox or Docker Swarm to simulate production. For those who tried it, which homelab projects actually improved your real world DevOps work and which ones were just fun experiments?

https://redd.it/1o5w3sv
@r_devops
How do you keep IaC repositories clean as teams grow?

Our Terraform setup began simple but now every microservice team adds their own modules and variables. It’s becoming messy with inconsistent naming and ownership. How do you organize large IaC repos without forcing everything into a single centralized structure?

https://redd.it/1o5w3di
@r_devops
Anyone else experimenting with AI assisted on call setups?

We started testing a workflow where alerts trigger a small LLM agent that summarizes logs and suggests a likely cause before a human checks it. Sometimes it helps a lot, other times it makes mistakes. Has anyone here tried something similar or added AI triage to their DevOps process?

https://redd.it/1o5w30f
@r_devops
Who is responsible for owning the artifact server in the software development lifecycle?

So the company I work at is old, but brand new to internal software development. We don’t even have a formal software engineering team, but we have a sonatype nexus artifact server. Currently, we can pull packages from all of the major repositories (pypi, npm, nuget, dockerhub, etc…).

Our IT team doesn’t develop any applications, but they are responsible for the “security” of this server. I feel like they have the settings cranked as high as possible. For example, all linux docker images (slim bookworm, alpine, etc) are quarantined for stuff like glib.c vulnerabilities where “a remote attacker can do something with the stack”… or python’s pandas is quarantined for serializing remote pickle files, sqlalchemy for its loads methods, everything related to AI like langchain… all of npm is quarantined because it is a package that allows you to “install malicious code”. I’ll reiterate, we have no public facing software. Everything is hosted on premise and inside of our firewalls.

Do all organizations with an internal artifact server just have to deal with this? Find other ways to do things? Who typically creates the policies that say package x or y should be allowed? If you have had to deal with a situation like this, what strategies did you implement to create a more manageable developer experience?

https://redd.it/1o5zv57
@r_devops
self-hosted AI analytics tool useful? (Docker + BYO-LLM)

I’m the founder of Athenic AI (tool to explore/analyze data w natural language). Toying with the idea of a self-hosted community edition and wanted to get input from people who work with data...

the community edition would be:

Bring-Your-Own-LLM (use whichever model you want)
Dockerized, self-contained, easy to deploy
Designed for teams who want AI-powered insights without relying on a cloud service

IF interested, please let me know:

Would a self-hosted version be useful
What would you actually use it for
Any must-have features or challenges we should consider

https://redd.it/1o5voxu
@r_devops
Rundeck Community Edition

Its been a while since i have looked at Rundeck and not to my surprise, pagerduty is pushing for people to purchase a commercial license. Looking at the comparison chart, i wonder if the CE is useless. I dont care for aupport and HA but not being able to schedule jobs is a deal breaker for us. Is anyone using rundeck and can vouch that it is still useful with the free edition? Are plugins available?

What we need
- self service center for adhoc jobs
- schedule job
- retry failed jobs
- fire off multiple worker nodes (ecs containers) to run multiple jobs independent of one another

https://redd.it/1o6344v
@r_devops
Need advice — Should I focus on Cloud, DevOps, or go for Python + Linux + AWS + DevOps combo?

Hey everyone,

I’m currently planning my long-term learning path and wanted some genuine advice from people already working in tech.

I’m starting from scratch (no coding experience yet), but my goal is to get into a high-paying and sustainable tech role in the next few years. After researching a bit, I’ve shortlisted three directions:
1. Core Cloud Computing (AWS, Azure, GCP, etc.)
2. Core DevOps (CI/CD, Docker, Kubernetes, automation, etc.)
3. A full combo path — Python + Linux + AWS + basic DevOps

I’ve heard that the third path gives the best long-term flexibility and salary growth, but it’s also a bit longer to learn.
What do you guys think?
• Should I specialize deeply in Cloud or DevOps?
• Or should I build the full foundation first (Python + Linux + AWS + DevOps) even if it takes longer?
• What’s best for getting a high-paying, stable job in 4–5 years?

Would love to hear from professionals already in these roles.

https://redd.it/1o64ct8
@r_devops
DevOps experts: What’s costing teams the most time or money today?

What’s the biggest source of wasted time, money, or frustration in your workflow?
Some examples might be flaky pipelines, manual deployment steps, tool sprawl, or communication breakdowns — but I’m curious about what you think is hurting productivity most.


Personally, coming from a software background and recently joining a DevOps team, I find the cognitive load of learning all the tools overwhelming — but I’d love to hear if others experience similar or different pain points.

https://redd.it/1o672nn
@r_devops
Need advice — Physics grad but confused between DevOps, ML, or CFA

Hey everyone,
I graduated this year with a degree in Physics from a good college. I’ve been into coding since childhood — used to mess around on XDA Developers about 10 years ago, making random projects and tinkering with stuff.

This year I took a drop to work on a startup with my friends — we’re building a VM provisioning system, and I wrote most of the backend and part of the frontend. Before that, around 3 years ago, I even tried starting something in cybersecurity.

Now I’m kind of stuck deciding where to go next. A few options I’ve been thinking about:
• Doing a Master’s in Physics from IIT (I actually love the subject).
• Doing BCA again, just to strengthen my theoretical CS fundamentals.
• Getting deeper into DevOps, because I really enjoyed working with stuff like Firecracker and Kubernetes during our project.
• Going into Machine Learning, since I already have a good math background and love problem-solving.
• Or maybe even pursuing CFA, because I’ve always been interested in finance and markets too.

I know these fields are pretty different, but they all genuinely interest me in different ways.
What do you guys think — where should I focus next or double down?


https://redd.it/1o67ka8
@r_devops
Migrating from Lightsail to EC2 for Terraform experience?

Hey everyone! I’m currently handling DevOps for our company, and we’ve been using AWS Lightsail for most of our projects. It’s been great in terms of simplicity and cost savings, but as the number of projects and servers grows, it’s getting harder to manage.

We use Docker Swarm to deploy stacks (1 stack = 1 app), and we host dev/test/prod environments together on some servers.

I'm planning to slowly migrate to ec2 so I can adopt terraform for infrastructure management. As well as I wanna personally grow and learn it. But ec2 is more expensive and since we’re a startup, I need to justify the cost difference before suggesting it to management.

Would it be possible to do it without increasing our cost to run the servers? or save more? Has anyone here gone through the transition? Would love to hear your insights. Thanks



https://redd.it/1o6a77i
@r_devops
Tool for productivity: notes, links, pass

Hi

Do you use any tool to track notes, links, credentials, any files etc for your work?

I am working on multiple projects that are vastly different and have multiple sources of notes. Something is in git, something online in Jira, some notes during development in text files and some noscripts everywhere. And its for all project and im having hard time to search relevant info.

I would like to have some tool where i can create main 'folders' and under that subfolders where can be password manager, links to system files, notes etc etc..

Also i use only linux. Any idea?

https://redd.it/1o6a22f
@r_devops
HackerRank devops assessment of Arcesium

Hi everyone! I have been shortlisted for the SSE Infrastructure role at Arcesium. The HR has shared a HackerRank assessment link that needs to be completed within the next 48 hours. Can anyone share what kind of questions are usually asked? This will be my first time attempting a HackerRank test. Has anyone attended it? It will be very helpful for me if anyone has attempted it.

https://redd.it/1o6c7hu
@r_devops
KubeGUI - release v1.8

🎉[Release\] KubeGUI v1.8 — lightweight desktop client for managing Kubernetes clusters

Hey folks 👋

Just released KubeGUI v1.8, a free desktop app for visualizing and managing Kubernetes clusters without server-side or other dependencies. You can use it for any personal or commercial needs.

Highlights:

🤖Now possible to configure and use AI (like phind or openai compatible apis) to provide fix suggestions directly inside application based on error message text.

🩺Live resource updates (pods, deployments, etc.)

📝Integrated YAML editor with syntax highlighting and validation.

💻Built-in pod shell access directly from app.

👀Aggregated (multiple or single containers) live log viewer.

🍱CRD awareness (example generator).

Faster UI and lower memory footprint.

Runs locally on Windows & macOS - just point it at your kubeconfig and go.

👉 Download: [https://kubegui.io\](https://kubegui.io](https://kubegui.io))

🐙 GitHub: [https://github.com/gerbil/kubegui\](https://github.com/gerbil/kubegui](https://github.com/gerbil/kubegui)) (your suggestions are always welcome!)

💚 To support project: [https://ko-fi.com/kubegui\](https://ko-fi.com/kubegui](https://ko-fi.com/kubegui))

Would love to hear your thoughts or suggestions — what’s missing, what could make it more useful for your day-to-day ops?

https://redd.it/1o6e7om
@r_devops
Best solution to automate docker bundle backup ?

Hi. I have been scratching my head around this one for a while, multiple back and forth with AI too, but in the end, I can never decide. I thought asking DevOps might be better...

My OS is Ubuntu 24.04 Pro.
Using Docker to self-host a bunch of services, with a mix of named volume and bind mount for persistent storage. Some services use Postgres / Supabase and n8n for automations so it is better not to interrupt it for too long (or at all), generally speaking.

I am basically unsure what is the most straightforward / easy solution to implement a periodic auto backup of everything (the data for all containers), just in case my server dies out (it's an old pc, I use it for experimenting).

I'd like the backup to be auto uploaded to the cloud.

I initially thought I'd use Ubuntu's "online accounts" feature which integrates Google account, so I could just use "deja dup backups" + only bind mounts for containers, and upload a folder of everything to Gdrive weekly.

The problem is that this is not acceptable for Postgres db, and instead I should do a proper pg dump first. I haven't even downloaded Supabase CLI nor the pg dump / pg restore tools yet.
Copying and pasting a folder with all bind mounts is not a valid way of doing it correctly.

\-------

I have recently discovered and installed Coolify, so I dunno if you guys recommend leveraging its features to deal with that, or is there an even better way ?

I have no formal engineering degree, by the way. I'm keen to dig the technical details but generally speaking, I obviously prefer a solution that involves less complexity.

Thanks in advance



https://redd.it/1o6gbyv
@r_devops
I have a DevSecOps intern interview tomorrow. What to expect?

As the noscript suggest, i have a DevSecOps Intern interview tomorrow, and would really like to secure this internship. Considering it is an internship, what do you think is expected of me to know? They did say that my resume caught their attention. Hence, i was given the interview opportunity

https://redd.it/1o6he7h
@r_devops
Are you running your tests in argocd? If so how are you getting the reports out?

We're running applications with gitops using argocd and looking at post-sync test jobs for running E2E tests.

Got my POC running before realizing i have no good way of getting this report out and in front of devs.

How are you exposing test results from jobs with argocd?

https://redd.it/1o6k6px
@r_devops
Looking for some roadmap advice

I've been working in a DevOps-like role at a small company for about two or three years now (my work includes CI/CD babysitting, Terraform modules written by others, basic Kubernetes operations, and a lot of Bash). But I feel like my progress has slowed down. I'm mostly busy with maintenance and handling tickets.

I'm wondering what else I can do in the future, because DevOps is so overwhelming and I'm a bit lost. I'm currently focusing on: System + Networking fundamentals (Linux internals, TCP, DNS, TLS; Terraform module design, state management, multi-account/organizational mode); and Cloud architecture (proper IAM implementation, organizational guardrails, landing zones).

I'm familiar with Linux, Git, and writing small Python/Bash utilities. I can read Terraform and fix issues, but designing from scratch still requires improvement. Lately, I've been browsing YouTube, LeetCode, and the IQB interview question bank for insights. But I'd rather hear real, everyday experiences.

If I were you, what would you focus on to improve your competence over the next year? What resources would you choose? What resources would be truly helpful? Books, labs, real projects, and practical examples are all highly sought after, as I currently don't know what keywords to search for. TIA.

https://redd.it/1o6fgc6
@r_devops
After more than a decade in DevOps, I’ve realized I’m more of a developer at heart

I’ve been in the DevOps/SRE space for over a decade now, working across different roles and organizations. But one thing I’ve consistently noticed throughout my career — I genuinely love coding far more than working on infrastructure, operations, or even IaC.

Whenever I’m writing code — automating something, building tools, or creating something new — I get completely absorbed. I never feel tired or bored. But when it comes to the “Ops” side of things — maintaining infra, monitoring, or writing Terraform/Ansible — I start feeling drained pretty quickly.

People often say there’s a lot of scope for coding and automation in DevOps/SRE, and while that’s true to some extent, it still feels much less fulfilling compared to a traditional development role.

This has always been my realization, and I just wanted to share it here. Has anyone else felt something similar — that maybe your real strength lies in the “Dev” part of DevOps? How did you deal with that realization? Did you shift towards development, or find a balance that kept you happy while staying in DevOps/SRE?

Would really love to hear your experiences and perspectives.

https://redd.it/1o6n0im
@r_devops
AI Providers, Security and Compliance

Hey Folks,

So as part of my "continuing education", I've put my skepticism aside and have been using AI in and on some of my side projects, meaning I use AI to help code, and I am making "AI Wrapper" projects. I am starting to see some value, but I'm also finding some...infra-smells? Gaps? I don't know.

First, I'm sort of ignoring the big enterprise side of things. It seems like for that you just pick vertex or bedrock and go all in, vendor lockin and cost be damned. Bully for them.

But on the smaller side, you've got all these scrappy startups using neat tools like openrouter, notdiamond, LiteLLM, etc. which is great if you want to use the latest models, do some cost optimization, have dynamic routing and all that. But I found myself instinctively wanting to build all sorts of infrastructure JUST FOR ME to do this safely.

I still have PTSD from when the guy sitting next to me checked his AWS Access keys into github and we lost 40k over the weekend. And now we're back to passing around API keys willy nilly. How are people managing this? Am I being too paranoid? What happens when you scale up and hire an offshore team, are you really just sending keys over to them via slack?

Also, most of these providers make no guarantees of compliance, or any kind of privacy, so a lot of us are probably sending our deepest darkest secrets directly to china (deepseek) or google (gemini). And I don't know which is worse.

How are people actually managing this in sort of the "middle tier" of companies? Say, an AI wrapper startup that is scaling and now needs to be both cost-effective and hipaa compliant? Are there libraries or services to help with this? Or are you just rolling your own stuff?

Go easy on me, I literally JUST got done with SOCII for one of my clients, so I'm in that headspace anyway, it just seems like nobody is really talking about this.

https://redd.it/1o6p384
@r_devops
What tools do you use to stay organized?

As a DevOps engineer, there's many things to keep track of:

tasks you're working on
discussions and meetings you've had
code snippets and/or cli commands you frequently use
links to company wikis, docs etc
personal notes about how you solved a particular problem
personal notes about people you work with
information about different systems you need to log in to (user names, passwords, ways of logging in)
etc.

What do you use for that? Obsidian? Notion? Plain markdown files? Hand written notes? I'd be interested in hearing about the tools you use, and if you're using a specific system to make sense of it all.

https://redd.it/1o6nhlz
@r_devops