Reddit DevOps – Telegram
What should I focus on to switch to devops

Hi everyone,

I'm currently working as an SRE for a few months but it's just ops role in a large organisation when I am being siloed.

I also have a few years of experience as cloud sysadmin with a focus on AWS and other sysadmin and support roles but I feel like I lose my skillset in my current role.

So I'd like to ask for advice regarding tools, areas projects I could focus on to improve chances of having a shot at a devops role.



https://redd.it/1o367ou
@r_devops
I inherited a problem and need your advice

The company I work for has 6 custom websites that are hosted by a relatively small hosting company(\~10 employees). This company also serves as our Devops. They control everything after our Github account. This includes managing Cloudflare which is used to help with security and performance, particularly their firewall and cacheing.

A decision was made before I got involved that this vendor would own the Cloudflare account. I'm honestly not sure what the reason was, but our website's Cloudflare licenses are within their company-wide account. We've been told that we cannot have visibility into the account or share access for security reasons, partly because we would see the instances of their other clients, but also because it's a safety precaution to not allow devs to meddle in devops. Our devs have no interest in doing devops, but often need to look at logs to debug issues, which they can't do right now. I'm also concerned about portability if our relationship with this vendor sours.

So, I'm stepping into this situation thinking we should absolutely own and control the Cloudlfare account that contains the licenses that our websites depend on. We don't have control or visibility into this part of our stack. I'm looking for advice on whether I'm looking at this from the right perspective. I'm also interested in hearing what are industry best practices for a client/vendor relationship in terms of ownership, control, and visibility. Thank you

https://redd.it/1o37sv7
@r_devops
Requesting Recommendations: AI CLI Agent for DevOps/SRE Workflow (Warp/Gemini-CLI alternatives?)

Hey everyone, I'm trying to level up my terminal game with an AI CLI agent and I'm a total noob. I'm a DevOps/SRE guy, so my job is basically a mix of:

* **25% Coding:** Python, Go, shell noscripts.
* **50% CLI Hell:** Heavy `kubectl`, `aws cli`, `terraform`, and diving into logs/configs to troubleshoot.
* **25% Think Tank:** Architecting stuff, writing docs, and runbooks.

I've been playing with `gemini-cli` and `Warp`, and they're clutch for troubleshooting—the ability for the AI to read a giant `kubectl describe` or a tricky log file to diagnose an issue is a lifesaver.

But I know I'm barely scratching the surface. I need the community's brainpower!


**Quick Questions for the Experts:**

1. **What else is out there?** Besides `gemini-cli`, `qwen`, and `Warp`, what other **agentic CLI tools** are you using? Any good opensource or local-first options (`Aider`, `Claude Code CLI`, etc.) that crush it for infrastructure work?
2. **Multi-Model Setup:** I hate vendor lock-in. I assume `gemini-cli` is Google-only. What are the best CLI agents that let you swap models easily (Gemini, X.ai, Claude, OpenAI, or even Ollama for local models)?
3. **VSCode Terminal Flow:** Can I get this same deep, context-aware utility using something like **Cline in VSCode**? Or is a dedicated terminal like Warp still better for the full experience?
4. **Warp Pro:** I saw a thread (link in comments/PM) mentioning a $56/year deal for Warp Pro. Won't that be a scam? What do you think?

Thanks in advance for any insights.

https://redd.it/1o39bur
@r_devops
Career Advice for junior platform engineer

Im fresh out of college and landed a platform engineering role
I was completely new to the "ops" side of development cycle
I was trained for 2 months on AWS, K8S, Linux and docker
After 6 months into the job I still find I have lots of learning to do but I cannot find the time to do it
I'm still expected to finish the task which sometimes includes a technology or framework im completey unaware of

And to solve an issue most times u need knowledge of the application and how the infra is set up to support it
While I can understand the infra side i don't know about the application side and I find myself asking silly questions to my seniors which I think is dumb to be doing after 6 months into the job


So I overthink simple tasks and take too much time competing the task since i spend a lot of time trying to learn or understand the tech or the task in itself


FYI the product im under is complex and trying to fully get to know how it works might take me months


Any advice on how I can do my job better from here on?
What should I focus on and what is an realistic goal at this point?

I still want to be useful to my team and wish to get over this HUGE learning curve ASAP

https://redd.it/1o3cx5a
@r_devops
Anyone else feels like AI crowd is mostly JS ppl ?

Every conference i watch like OpenAI etc, are ppl showcasing stuff in typenoscript. Any training I participated in were ppl showcasing how fast to bootstrap JS project, either react or angular or vue.

All of them sitting in VSCode pumping out next 4000 stars GH project that does as much as a single command in terminal.

Moving so fast noone of them even asks a question „does it even make sense?”, who cares, ship it, lets make some mani.

In DevOps Im strugling to find a real use-case for non-deterministic agents. We had one for monitoring but one in blue moon it thought its a good idea to restart services while the issue was transient causing more harm than good.

Any time I bootstrap k8s operator, i have to refactor whole project, even when using pretty strict instructions.md.

When refactoring I still get methods calls that dont even exist. Thats with gpt5.

Dunno if Im too old and stupid or hype is too much, by ppl who dont even care Oo

https://redd.it/1o3c2fd
@r_devops
Need some advice regarding role change

I am a system admin working mostly on linux, citrix suite and a little bit of networking, websphere . I am trying to move to devops or cloud ops. I have some course level knowledge about devops tools. Im getting a few interview calls which require only linux and networking but, sound like they are totally customer facing roles where i would troubleshoot issues that they encounter. Right now, my role involves deployments , app support and on call rotations. Would it be bad for my career to move to a supposedly customer facing support role ? The pay would definitely be 2x or 3x of what im making currently as im still a junior . Thoughts , please.

https://redd.it/1o3c10r
@r_devops
5 Years of Development Experience... to Write YAML?

It's surprising how many DevOps/SRE roles require 5+ years of software development experience and include LeetCode style interviews, when in reality you're most likely going to be writing YAML, Terraform or Python noscripts.

Would love to hear others' experiences. Do people actually do professional software development in these roles? At that point, doesn’t the role just become a standard software engineering position?

P.S On a side note, would you count writing custom glue code, Typenoscript/Python noscripts as a software development experience?

P.P.S Title may read sarcastic, but I'm just trying to navigate the job market and frustrated with the job requirements.

https://redd.it/1o3fwe1
@r_devops
Homelabs and DevOps related experience.

Hello everyone. I’ve been navigating into this sub, to see similar questions. Gathered some valuable information but want to dig up a little more.

Basically I just want to know which projects could be great to have in your own home lab so you can practice and even show in your GitHub account.

What can reinforce sysadmin/sre/devops related knowledge. Or… is it even worth it in the professional world?

I have some sysadmin experience but it was so long ago that I do not even feel comfortable on Linux tech interviews.

I’m from Colombia and not sure how similar would be to you countries. Anyway any information will be appreciated.

https://redd.it/1o3id7q
@r_devops
Finding git base branch

While coding, from which base branch did I create this feature branch? This bash noscript helps me answer this question instantly, pretty useful in automation as well as my daily dev workflow.

What can be improved further?

Link to the noscript code

Author Credit: Abhishek, SDE II at RudderStack

https://redd.it/1o3j66q
@r_devops
laptop for Devops

Cloud services cost a lot, and the worst part is, you don’t even own the machine.

Initially, building a desktop PC appeared to be a cost-effective option. However, after accounting for additional expenses such as a UPS (due to frequent power outages), a monitor, and other peripherals, a laptop proves to be a better value in my situation.

Second hand market are a trap in Nepal.

Earlier I had i5 7th generation laptop with 16GB RAM. It would start to cry whenever I put more than three virtual machines. The host OS was windows 10 and guest OS was rocky linux minimal inside Hyper-V/Virtualbox. And I would like to keep it that way.

Thus I will require 32GB RAM.

And a solid processor should be non-negotiable. But I am not sure about which processor would be most value for money? i.e. give me highest ROI for the least amount of leap in budget?

My budget is around 500 US dollars or 65000 INR. It is 100K NPR(nepal price after tax and shit like that, not conversion value). I cannot go beyond that because I do not have further money as savings. (Currently unemployed)


https://redd.it/1o3mwiz
@r_devops
Every Monday our dev server dies and I have to ping DevOps to restart 😩 — anyone else deal with this?

I’m working at a small SaaS startup.
Our dev & staging environments (on AWS EC2) randomly go down — usually overnight or early morning.

When I try to test something in the morning, I get the lovely “This site can’t be reached”.

Then I Slack our DevOps guy — he restarts the instance, and it magically works again.

It happens like 3–4 times a week, wasting 20–30 mins each time for me + QA.

I was thinking of building a small tool to automatically detect and restart instances (via AWS SDK) when this happens.

Before I overthink —
👉 does anyone else face this kind of recurring downtime in dev/staging?
👉 how do you handle it? (auto noscripts, CloudWatch, or just manual restart?)

Curious if it’s common enough that a small self-healing tool could actually be useful.

https://redd.it/1o3nzcs
@r_devops
How can monday dev help run daily standups without meetings?

We set up boards and automations so updates happen asynchronously. What strategies have other dev teams used to make standups faster and more effective?

https://redd.it/1o3psa8
@r_devops
Trixter: A Chaos Proxy for Simulating Network Faults

>

Hey folks 👋

I’ve just published a post about **Trixter** — a high-performance chaos proxy written in Rust for simulating unreliable networks in CI/CD or staging environments.

Unlike Linux tc netem, it runs entirely in user space (no root, no kernel modules), and you can tweak network faults dynamically via REST JSON API — latency, throttling, loss, terminations, corruption, etc.

Example use:

$ docker run --network host ghcr.io/brk0v/trixter \
--listen 0.0.0.0:8080 \
--upstream 127.0.0.1:3000 \
--api 127.0.0.1:8888
--delay-ms 300 \
--slice-size-bytes 128 \
--terminate-probability-rate 0.01

💡 Run tests with random seeds, and if something fails — extract the seed from logs and reproduce the chaos locally.

Full post with architecture, comparison to tc netem, and reproducible chaos setup here: https://biriukov.dev/posts/trixter-chaos-proxy/

https://redd.it/1o3rkri
@r_devops
Anyone changed careers from DevOps to Data Science/ Engineering

I've been working as a DevOps Engineer for like 3 years now. I loved DevOps initially when I learned about Kubernetes and Cloud computing. I also liked System Design.

But with the actual work it feels like a pressuried job that you're responsible for the underlying platform all the time. Constant context switching and never ending tasks with broader scope is sometimes overwhelming. I really feel that development is a lesser stessful role compared to this.

I'm with a strong mathematical and engineering background. With that background I feel that data science / data engineering can be a much better field compared to this.

Anyone made the switch? Would love to hear your advices.

TIA

https://redd.it/1o3swdy
@r_devops
Top choice for agile project management in 2025?

I’ve been using monday dev for a while and it feels like a smoother experience than jira. Curious to hear how others use it for their dev teams.

https://redd.it/1o3t8ni
@r_devops
Why their response feels like a joke | shouldn’t they be restricting users from doing such things

Response from their team.

I’ve been using this e-learning platform for quite some time for Azure sandboxes, and out of curiosity, I tried editing the RBAC roles, and guess what? I actually could! I believe that’s the platform’s fault for not disabling such actions. I did end up doing things that were outside my allowed scope, which led to my account being suspended.

I contacted their support team about it, and while I understand their point that I wasn’t supposed to do it, I still think their response wasn’t ideal. Instead of investigating how I was able to make those changes and fixing the loophole to prevent others from doing the same, they simply expect me to refrain from doing it again. That doesn’t seem like the right way to handle the situation.

I also asked (before doing this) if there were any perks for reporting such platform issues, and they replied that no such program currently exists.

https://redd.it/1o3yui3
@r_devops
Cost of Secret Management - Don't let devs bother you

# The Hidden Cost of Secret Management: Developer Productivity

Day 1, New Developer:

PM: "Connect to the staging database"
Dev: "What's the connection string?"
PM: "Ask DevOps"
Dev: Opens Slack "Hey DevOps, need staging DB credentials"
DevOps: "Check the wiki"
Dev: Finds 3-year-old wiki page
DevOps: "That's outdated, I'll DM you"
DevOps: "Wait, I'm sure I've created a Vault in a specific account/sub for that, let me send a ticket to assign you roles/permissions"
3 hours later, developer can finally start working

This happens every sprint. For every new feature. For every environment.

# The Real Problem

It's not about where secrets are stored. It's about:

No traceability \- Who changed the API key? When? Why?
No collaboration \- PM can't see what configs exist, DevOps doesn't know what developers need
No audit trail \- Compliance asks "who accessed prod secrets?" → checks Slack history
No versioning \- Which version of the app needs which secrets?
Lost productivity \- 2 hours per developer per sprint hunting for credentials

# What OneSeal Changes

Treat platform outputs like code:

# DevOps: Generate from infrastructure
oneseal generate terraform.tfstate --name @company/platform-staging

# Commit to git (encrypted)
git add platform-staging/
git commit -m "feat: add new S3 bucket for uploads"
git push

# Developer: Install like any dependency
npm install @company/platform-staging

In code:

import { State } from '@company/platform-staging';

const config = await new State().initialize();
console.log(config.s3.uploadBucket); // TypeScript knows this exists
console.log(config.database.host); // Autocomplete works

# What This Enables

For Developers:

Onboarding: `npm install` instead of 2-hour credential hunt
No typos: config.database.host instead of process.env.DATABSE_HOST
Offline work: No VPN needed for config access
Self-service: No waiting on DevOps for every environment

For DevOps:

Infrastructure as code → config as code (same workflow)
No more "what's the bucket name?" Slack messages
Deploy new infrastructure → regenerate SDK → developers get updates
Revoke access: Remove public key, regenerate

For Product/Management:

Git history shows what changed, when, and by whom
PR reviews for configuration changes
Rollback configs like code: `git revert`
Audit trail: Every secret access is logged in git

For Compliance/Security:

Complete audit trail (who, what, when)
Environment isolation (dev keys can't decrypt prod)
Asymmetric encryption (each person has own key)
No shared secrets

# The Workflow

DevOps sets up once:

# Generate keypairs for team
oneseal generate-key # Per developer
oneseal generate-key --output ci.key # For CI/CD

# Generate SDK with multiple recipients
oneseal generate terraform.tfstate \
--public-key alice.pub \
--public-key bob.pub \
--public-key ci.pub \
--name @company/platform-infra

Developers consume:

// No Slack messages
// No wiki hunting
// No waiting on DevOps
import { State } from '@company/platform-infra';
const config = await new State().initialize();

Product tracks changes:

git log platform-infra/
# See exactly what changed between releases
git diff v1.0.0 v1.1.0
# Compare configurations across versions

# Security Model

Each environment has different encryption keys
Developer with staging key cannot decrypt prod secrets
Production keys only in CI/CD and production infrastructure
Cryptographic isolation, not trust-based access control

# The Result

Before OneSeal:

New feature → 2 hours getting credentials
Environment broken → hunt through Slack for config
Compliance audit → reconstruct timeline from memory
Secret rotation → update 10 places manually

After
OneSeal:

New feature → `npm install` → start coding
Environment broken → git log shows what changed
Compliance audit → export git history
Secret rotation → regenerate SDK → bump version

Think of it as bringing GitOps practices to configuration management.

Built OneSeal to solve this: github.com/oneseal-io/oneseal

Terraform/Vault → encrypted SDK → version control → developer productivity

What's your onboarding time for new developers? How do you handle config/secret distribution across teams?

https://redd.it/1o40aq1
@r_devops
Will DevOps teams become smaller because of AI?

What are your thoughts? Any prior experiences from work would also be really appreciated...

https://redd.it/1o44drt
@r_devops
What category of software am I looking for?

The requirement from the business is:

As part of our running software we want to be able to 'send events' to a central place, and have other software consume them.

These 'events' might be informational or an error that has been hit.

Not huge volume, but important and very specific info about what has happened.

Like data processing of X data item from Y provider failed because Z reason.

We then want downstream services and guis to be able to subscribe to these 'events'.

Like in the above example, we might care about more providers than others.

Originally we thought this sounds like a logging problem, but I'm having my doubts about that. Realtime/push/apis being the main thing.

The more I dig, the more it sounds like this should be a solved problem and my googling is not helping.

I google event software and get random software to help organise events.

Is this a solved problem? maybe something that sits on top of a logging platform.

https://redd.it/1o44ng1
@r_devops