Reddit DevOps – Telegram
Switching from Data Science to DevOps/Cloud Engineering — need advice as a fresher

Hey everyone,

I’m a fresher who initially started preparing for Data Science, but recently I realized that almost every other person around me is going into ML/DS, and fresher entry into real Data Scientist roles is very limited (most start as Data Analysts).

After researching and discussing with mentors, I feel DevOps + Cloud Engineering suits me better since it’s more of a pure engineering role, in high demand, and has a clearer entry path for freshers. I also like the idea that later I can pivot into MLOps if I want to connect with ML.

My plan right now:

Month 1: Linux, Networking, Git, Bash/Python noscripting (+ Oracle Cloud Foundations cert in parallel)
Month 2–3: AWS/OCI core services, Docker, CI/CD, Terraform, Kubernetes basics
Month 4: Hands-on projects + cert + portfolio (GitHub)

👉 I’d love to hear from folks in the industry:

Does this switch make sense long-term compared to chasing Data Science?
For a fresher, is Cloud/DevOps a better entry point?
Any tips on what not to waste time on in the beginning?

Thanks in advance 🙏

https://redd.it/1nrqz5e
@r_devops
How do you guys handle cluster upgrades?

I am currently managing 30+ enterprise workload clusters and its upgrade time again, the clusters are mostly AWS and have 1 managed nodegrp for karpenter and other nodegroups are managed by karpenter so upgrades comparatively takes less time.

But i still have a few clusters which have self managed node groups ( some created using terraform and some using eksctl but both the terraform and the eksctl yaml is lost ) so the upgrades are hectic for these.

How do you guys handle it? Is it that you all have corresponding terraforms handy everytime or do you have some generic automation noscript written to handle such things?

If its a noscript i am also trying to write one, some advice would be much appreciated.

https://redd.it/1nrwbvy
@r_devops
New to devops, any feedback / suggestion for my IaC setup?

Hi!
I previously had a Kubernetes cluster that I was managing myself, and I decided to convert to IaC.

My setup now consists of:
\- a terraform project to bootstrap a k3s cluster on Hetzner servers, using the amazing terraform-hcloud-kube-hetzner tf module (this kinda sets up the hardware, and the really basic kubernetes resources like CNI, etc...)
\- an argocd project that manages additional resources I want available in my cluster, like cert-manager ClusterIssuer-s etc...

I think the terraform part is ok, I'm really unsure about the ArgoCD setup.
I'm new to that and it's kind of overwhelming so I have no idea whether what I'm doing is good practice.
(Also, I've read about ways to structure the repo for different environments like prod, staging, qa, etc, but since this is for my cluster which is basically a production only thing, I did not go all the way to implement that env structure)

Roast me! Here is the link to my repo: https://github.com/Giuliopime/gport

https://redd.it/1nrxvfq
@r_devops
What's your CI setup and do you like it?

Hey everyone,

I'm currently the only DevOps at my company, and I'm looking for new solutions for my CI/CD setup, as the current one is reaching its limits.
We're on GitHub action, using two self hosted runners and one remote buildkit instance. Those 3 instances are on hetzner, so disturbingly cheap. We manage around 35 users concurrency with that. We have around 300k minutes/month.
Limits of this system are obvious, concurrency is not so high, maintenance on those machines is super manual, we need to manage machines disk size etc.

What are your current setup, how many minutes do you run approximately per month, and how happy are you about your CI system?

I've looked at stuff like ARC, Phillips Terraform, blacksmith.io but they all feel like solving some issues but creating more (managing another EKS, cost high, scalability etc.)

Cheers!

https://redd.it/1nryntt
@r_devops
DevOps Colors, hows been your experience?

To provide some context to the noscript:

The idea of the DevOps can be explained to some degree with "Devs who care where "their" code runs on, and Ops that care what code runs on "their" infra" seeking the best efficiency and least issues possible. If we could assign a color based on their background there are Dev heavy Devops and Ops heavy Devops.

For example myself i'm a clear Ops person, I can hit my Scripts, IaC and Bash snippets, I know my way around K8s environment, even there is the Devside that takes me longer and probably I do it worse than some of my colleagues, thats fine we provide and split our duties.

What I've come to realize that... as time goes by more Dev-heavy folks come to the game, and I'm seen a change in how the bulk of DevOps teams approach their work, I know my code-side is weaker, and I try to think the how before thinking about code, in a sense analyze and find the easiest way or less-friction possible to ease my time in the IDE as much as possible, but with time I feel most of my colleagues jump directly to the IDE and start coding trying to find that path among VSCode lines.

Curious about others ideas, thoughts or see if my feel that Ops heavy (Think about SysAdmins, Support, Solutions Architects) in the DevOps space are becoming rarer and rarer.

https://redd.it/1ns05is
@r_devops
After Python Which Path to Choose?

I have been learning Python day and night, but now I’m confused between two areas: AI development or DevOps/Cloud.

To be honest, I don’t love either or even programming. I’m just doing it to get paid. I’m the kind of person who gets things done, even if I hate them.

So, if you were only focused on making money and solving problems at a large scale, what would you choose?

https://redd.it/1ns11pd
@r_devops
What do other people use besides kubernetes?


I began my career working directly with Kubernetes, but I’ve noticed not all companies adopt it, they often say it’s too complex. Are there real alternatives to Kubernetes? Personally, I can’t imagine managing a company’s infrastructure without it.

So what do those companies use instead to handle scaling, self-hosting, and similar needs?

https://redd.it/1ns30sw
@r_devops
How I stopped cron jobs from silently failing

I used to think cron jobs were “set it and forget it.” Then one quietly failed for three days before anyone noticed, and we only found out because an upstream pipeline broke. I’ve since learned to never trust a one-liner noscript in production.

I wrote a breakdown of how I now write cron jobs: logs with rotation, alerts when things fail, lockfiles to avoid overlaps, and set -euo pipefail so failures don’t go unnoticed. Would love to hear what reliability tricks other DevOps folks add.

You can read it here : https://medium.com/@subodh.shetty87/the-developers-guide-to-robust-cron-job-noscripts-5286ae1824a5?sk=c99a48abe659a9ea0ce1443b54a5e79a

https://redd.it/1ns4q5n
@r_devops
I built a CLI to detect env var mismatches after spending hours debugging a non-bug

Last Monday we spent almost 3 hours debugging a bug that wasn’t even a bug. The admin panel kept failing with broken API calls and weird errors that gave us zero clues about what was really happening. We dug into the logs, double-checked the backend, reviewed auth, routing, everything… nothing seemed off.

The real problem turned out to be ridiculously simple: the frontend used environment variables to define the base paths for the APIs it called, and a new variable had been added to .env.example but never made its way into the actual .env, the docker-compose file, or the Dockerfile. Because of that, the panel was building with an undefined base URL and sending requests to garbage endpoints.

That tiny mistake cost us hours of time. And maybe that’s on us, but it also made me realize how easy it is for something this trivial to break everything, especially when you’re dealing with multiple services and a growing list of environment variables.

I looked around for a tool that could catch this kind of mismatch early , something that would compare .env files, Dockerfiles, docker-compose configs, and warn you when a variable is missing or out of sync. I couldn’t find anything that actually did that well. So I built one, in my beloved language Go.

It’s called EnvQuack. It’s a CLI tool meant to run in CI pipelines and stop these kinds of errors before they happen. It checks for differences between .env and .env.example, audits docker-compose and Dockerfile variables, and flags anything that looks like it could break your build. It’s still alpha (v0.1.0), but even now it’s already saving us from stupid, time-wasting mistakes like this one.

I’d love to hear what others think. Are there more checks you’d want to see? Should the tool fail a build by default when a mismatch is found? And how do you deal with this kind of environment drift in your own projects?

GitHub: https://github.com/DuckDHD/EnvQuack

https://redd.it/1nrycww
@r_devops
46M, 17 YOE A Senior Idiot in Need of Help

I go by SeniorIdiot online - a reminder not to assume I'm the smartest person in the room. Yet, despite many years of experience, I'm still conflicted and wrestle with the same challenges. I'm not even sure what I'm asking for. I just got back to 100% after many years of being sick and feel I have a new purpose and energy in life, but got knee-caped pretty fast - it's the same slog as it's always been. I'm out of patience with BS and other shenanigans.

As an "all over the place" INF*-T, my head tend to run on patterns, connections, and nuance. When I try to express an important idea, I often find myself "shaping it in thin air" or "chopping the air" - as if I'm sketching the abstract into existence with my hands. I visualize concepts midair long before I can pin them down in words. To me, these gestures feel like anchors for thought, but of course, only I (the mad wizard) can see what I'm thinking. I sometimes expect others to read between the lines and "get it" instinctively, when in reality I've left them with abstract words and motions that make sense only in my own head. This habit bridges thought and speech for me, but it also fuels my tendency to ramble or let "bluntness" slip in where nuance was intended.

I've led teams, tried to drive change and shape processes, but clarity and empathy don't always flow together for me. I want my directness to convey clarity and insight without making others feel dismissed. I want to champion progress without triggering defensiveness. And, maybe most of all, I want to channel my frustration into productive energy rather than letting it linger as irritation or judgment.

Dan North once said, "People don't remember what you said, they remember how you made them feel." That's my biggest flaw - how do I speak hard truths without leaving people feeling bruised? How do I inspire and drive initiatives forward while keeping people aligned and engaged? And how do I cultivate patience when "inefficiencies" that seem glaring to me appear unreasonable or incomprehensible to others?

For some reason people tend to like and respect me even though I tend to come off as harsh. I have no idea why. I'm just as lost now as when I was 25. I want to become a better person and stop fighting stupid and make more awesome.

PS. Not neurodivergent - just CPTSD so I tend to over-analyse and see patterns in everything.
PS2. Previous post *https://www.reddit.com/r/cscareerquestions/comments/1n02kl3/help\_how\_do\_i\_take\_the\_next\_step\_without\_breaking/*

https://redd.it/1ns7hm6
@r_devops
What are you learning these days? Any cool recent discoveries you can share with the community?

I just want insight into what’s new that I may not be upto date on. I think we should do something like this every now and then.

https://redd.it/1nsam46
@r_devops
Zig + TypeScript deployed to Lambda using a “connections compiler”

By “connections compiler” I mean a build tool that combines infra and runtime code into 1 app.

I’ve been working on this tool for a few years now and have had a really hard time explaining it to people. So I’ve been making different small projects using the tool just to see how well it works.

Here’s one of them where I call a Zig function from TypeScript on multiple architectures inside AWS Lambda:

https://github.com/JadenSimon/multi-arch-zig-lambda

My tool has an integration specifically for Zig though it’d be possible to extend this to other languages while still supporting infrastructure in TypeScript.

Any thoughts on this sort of tech? I’m aware of other projects that also have aims of more streamlined development though they tend to be more focused on specific pain points rather than generalized.



https://redd.it/1nsb6bx
@r_devops
How Do I Know I’m Doing DevOps the Right Way?

I’ve recently started learning **DevOps** and deploying my apps to the cloud. But I keep running into the same challenge: there are so many ways to deploy the same app—single VM, Docker, Docker Compose, Kubernetes, CI/CD pipelines, and more.

I understand **why** each method exists, but when I actually start deploying, I get confused:

* In CI/CD, should I clone the repo, build, and deploy directly?
* Or should I build, push to a container registry, and pull?
* Should I use Dockerfiles, Docker Compose, or something else entirely?
* How do I safely manage secrets?

The thing is, I **do deploy my apps**, but I’m never sure if I’m doing it in the **most efficient or safest way**. Efficiency matters because inefficient deployments are expensive. Safety matters because insecure deployments can be a disaster.

I **love DevOps**—managing these systems excites me—but after the beginner tutorials, YouTube only goes so far. I’ve tried paid courses, but real learning in computer science often comes from **making mistakes, reflecting on them, and iterating**. I can figure things out, but I’m never sure I’m following best practices.

The bigger problem is I’m mostly on my own. I’m in a Tier-3 college, and my peers usually only focus on programming languages. I don’t have anyone nearby who knows as much as I do—or close. I can improve by myself, but it takes a lot of time, and I need to **be precise and make informed decisions**.

So here’s my question:

**How can I check if I’m doing DevOps “the right way”?**

* Should I study how other projects are deployed on GitHub?
* Are there ways to evaluate my deployments against best practices?
* How do I know if my setup is safe, efficient, and maintainable?

Any guidance, resources, or frameworks to **self-evaluate and improve in DevOps** would be immensely helpful.

https://redd.it/1ns2wmy
@r_devops
Devops and Cloud consultancy - Need advice

I have fair amount of experience (more than a decade) working in corporate sector and handling devops and cloud infra for customers in various domains like banking, healthcare, hospitality, retail etc.
If I want to do consultancy to small firms or IT companies how can I do it on individual level.
Is there any requirement for architects who can help with devops and cloud consulting and designing the infrastructure.
Also how they can leverage AI in this field.

I am looking for some clue on where and how to start. I am an introvert and dont have a network except few folks from my previous organizations.

https://redd.it/1nsnj88
@r_devops
5 Interviews down and I can't take it anymore

About me: I have about 3 years of experience in devops. I worked in a SBC for a client. Tech stack includes Azure(mostly VMSS, App Gateway, LB), Github Actions, A bit of - Python + Bash + PowerShell, also worked on AKS briefly like I know it at a high level. Apart from that I've also started on terraform and AWS personally.

Since last 3 months I have given 5 interviews, from SBCs to PBCs. The thing is all were totally different. I one I was asked deep knowledge about Python.. like seriously?... Some ask CI/CD while some stick with cloud scenarios and some on Kubernetes.

Honestly I find it difficult to prepare for an interview. I try to prepare according to the JD but I could not complete everything. Feeling very low. In my current role I am doing very well. Through my contributions I've earned the trust of people around me. Everyday one thing bugs me that I am the least paid guy in the team while I contribute more than them : (
Watching my peer devs switching with hefty pay just makes me sad more.

Just wanted to rant about my struggle. If you have any advice for me please give it.

https://redd.it/1nsp9e7
@r_devops
How do you deploy to production once a month?

In lower envs everything is deployed via Github Actions but in production only our SRE team is allowed to push to prod. Currently we use a bunch of Ansible noscripte to deploy both EC2 and various ECS apps. An engineer fires off a bunch of noscripts from their machine. Im interested in addressing this via GH but considering we could be deploying from anywhere between 15-20 apps (each with their own GH repos), this makes clicking buttons within actions a pain. Each month, not the same apps will go out. Anyone with similar pain points?

Edit: i wanted to add that we can't change the cadence to the monthly deploy. Rules set by upper management

https://redd.it/1nspglg
@r_devops
Automating Nexus OSS EULA Acceptance on Ubuntu Server


Hey folks,

I’m trying to automate the acceptance of the EULA for Nexus OSS (running on an Ubuntu server).

I first tried writing a Selenium noscript, but it fails with errors related to user data. I checked and confirmed that I don’t have any other Chrome processes running.

I’d prefer not to rely on extra binaries like chromedriver, since I want to keep the setup lightweight on the server side.

I also attempted to hit the API directly, but it returns 400 Bad Request because of missing/invalid headers (things like CSRF tokens and cookies seem to be required).


So my questions are:

1. Is there a clean way to accept the Nexus OSS EULA programmatically (via API or config) without having to go through the web UI?


2. If the API requires CSRF/cookie headers, is there a recommended approach to handle this in a headless/server-only environment?



Any guidance or alternative solutions would be super appreciated

https://redd.it/1nswwbo
@r_devops
I feel stuck learning DevOps

Hey guys, I’ve been learning DevOps for more than 5 months now, I’ve been able to gain some knowledge on CI/CD, some cloud tools on AWS, Linux commands for DevOps operations, monitoring with Grafana, Prometheus and Nagios, kubernetes, Docker etc……Although I’m not a master of any yet I have basic knowledge. The problem now is I’m confused on how to grow from here, I feel like I need real life application of my knowledge but I can’t seem to find that in my country right now.


I feel stuck and unmotivated, also feel a lack of direction, I’ve contemplated quitting already but this is really what I want to do, I just need to feel that my knowledge is useful because when I learn and don’t utilize my knowledge I tend to forget! Please guys I need help as this is becoming frustrating.

https://redd.it/1nsy3lz
@r_devops
Looking for collaborators to build a security project

I’m starting a project around security automation and want to form a team. Goal is to shape it into a product, service, or at least a solid project. If you’re interested in collaborating, DM me or drop a comment.(Btw I'm final year CS student from India)
Thanks.

https://redd.it/1nsxxwh
@r_devops
The easiest way to keep code and docs synced

Drift AI

One problem about coding and documentation is keeping your docs up-to-date, no developers likes documentation. Or even worse, knowing which and what parts out of thousands of docs to update.

We are launching Drift AI soon. With every push to your main branch, we retrieve relevant documents, highlight and suggest edits to outdated parts, and tag the right engineer to approve the edits.

No new platforms, we directly integrate with Confluence and everything is done in Confluence.

You can grab your early access spot if you find this useful for you or your team.

https://redd.it/1nsywko
@r_devops
Cloud vs. On-Prem Cost Calculator

Every "cloud pricing calculator" I’ve used is either from a cloud provider or a storage vendor. Surprise: their option always comes out cheapest

So I built my own tool that actually compares **cloud vs on-prem** costs on equal footing:

* Includes hardware, software, power, bandwidth, and storage
* Shows breakeven points (when cloud stops being cheaper, or vice versa)
* Interactive charts + detailed tables
* Export as CSV for reporting
* Works nicely on desktop & mobile, dark mode included

It gives a full yearly breakdown without hidden assumptions.

I’m curious about your workloads. Have you actually found cloud cheaper in the long run, or does on-prem still win?

[https://infrawise.sagyamthapa.com.np/](https://infrawise.sagyamthapa.com.np/)

https://preview.redd.it/r5px17b6mzrf1.png?width=1080&format=png&auto=webp&s=c50f1bb0a86a023482d3807cf0f3365c7a8e33ea

https://redd.it/1nt2ib6
@r_devops