Reddit DevOps – Telegram
github-ci: Lint your GitHub Actions workflows and auto-upgrade to latest versions

https://github.com/reugn/github-ci



I've been spending time managing GitHub Actions workflows manually across different projects. I built this tool to automate some of that and make it less tedious. If you find it useful, let me know - I'm planning to add more features over time, so contributions are welcome.

https://redd.it/1pu3beq
@r_devops
Gitea actions - multi repo

Hello all,
I am working on multi repo project, and at the moment I am struggling with unifying local build and build in Gitea actions.

Main problem is access to other repos from Gitea actions.
For local build cmake with FetchContent is working, but it cannot work in Gitea actions since all repos are private and runner-s ssh pub key is not in list of approved keys.

At the moment i have solution that I don't like but I had to unblock others, solution is to have multiple checkout-s, and with them to download all needed repos. Main problem is that versions of other repos must be maintained on two places and it is ok for now, but in the future it will be problem.

Can anyone help me to find better solution?

https://redd.it/1pu4evt
@r_devops
Is ELK Stack still relevant?

I have been learning docker for the past month or so. The resource for my learning has been The Ultimate Docker Container book. For most parts it is okay but some of its content has been outdated one being the part where it talks about ELK. I have been struggling to find recent resources that will make me understand Shipping Logs and Monitoring Containers using the ELK stack.

Is it not getting used in the industry anymore? What are you guys using?

https://redd.it/1pu28v7
@r_devops
Senior Salesforce DevOps (8 yrs) planning transition to AWS/Kubernetes DevOps — what depth is expected?

I have total 8 years of experience and 5 years of experience in Salesforce DevOps (GitLab CI/CD, Copado, shell noscripting).

With Salesforce budgets tightening in the Indian market, I’m planning a transition toward core platform DevOps roles involving AWS, Kubernetes, and infrastructure automation.

What I’m trying to understand from people who’ve made a similar move in India:

• What level of AWS + Kubernetes depth was actually evaluated in interviews?

• What kind of infra or platform projects helped you stand out?

• What knowledge gaps surprised you during the transition?

I’m planning to spend 6 months building real systems (not tutorial-level setups) and want to align my learning with what hiring managers in India actually value.



https://redd.it/1ptvqk6
@r_devops
Migrating from C# CDKTF to Native TF

One of our goals is to migrate from our existing C# CDKTF to native TF. With the deprecation of CDKTF, and given the massive amount of drift that we have, this is likely to be a large undertaking.

For those that have migrated.. what was your experience in using CDKTF synth and what are your thoughts on using that as a starting point versus having some AI, like Claude do the analysis and conversion?

Am I correct in understanding that with cdktf synth —hcl that we can continue to use the existing state files without importing all our resources manually, or is that incorrect?

https://redd.it/1pua22y
@r_devops
AI testing tools keep popping up, which ones actually reduced toil in your pipeline?

Every now and then there’s another wave of AI-powered testing tools claiming they’ll fix flaky tests, speed up feedback, or reduce manual QA. In isolation, a lot of them look fine. Once they hit a real pipeline with real apps, real UIs, and real failure modes… not so much.

For teams running CI/CD at any reasonable scale: did any AI-assisted testing tools actually reduce operational pain for you? As in fewer broken builds, less babysitting, and better signal.

\+ where did things fall apart? was it maintenance overhead, flaky UI stages, tools that worked on APIs but fell over on real workflows, or just another thing to keep running??

https://redd.it/1puc9g7
@r_devops
Dear Tenable: Please get your shit together

The amount of time I have to spend talking to our internal compliance team and fixing your shitty audit files is too damned high. The bash noscript provided for a STIG audit check going out of it's way to look for port numbers to verify that a config file contains "\^Banner /etc issue.net" ... I'm sorry... Were you paying the person who wrote that by the character? Cause they shit out a turd that just makes my life miserable. Don't over complicate your damned checks.


Also whoever came up with the idea of putting bash noscripts in XML... please just... fire them. They're a horrible person. Or if it was a team effort, shit-can the lot of them. That whole idea is damn near a war-crime committed on the entirety of the infosec community.


Signed by a person who just wants his pipelines to stop failing because of Tenable being ass.

https://redd.it/1pudn1w
@r_devops
Why is sms so hard now

We’re trying to fix tier 0 alerts because slack is too noisy at 3am, but the carrier red tape for sms is insane. our "low volume" 10dlc campaigns keep getting stuck in manual review for weeks.

I’m testing an api that handles the compliance on its end so we can just pipe alerts through instantly.

How are you guys routing priority alerts to your team in 2026? are you fighting carriers or looking for a way to outsource the compliance?

https://redd.it/1puffay
@r_devops
My learning path stopped being linear

I'm currently at a stage where my DevOps learning is no longer a "pick a tool → master it → move on" pattern. Early in my career, progress was obvious. Learn Docker. Learn Terraform. Improve CI/CD skills. Handle on-call duties confidently. Each step had clear signals that you were "leveling up." But the longer I've been in this industry, the weaker those signals have become.

Most of my growth now comes from ambiguous situations. Design reviews with unclear requirements. Stakeholders changing priorities mid-quarter. Post-mortems where no individual mistakes yet the system still crashed. These moments force you to articulate the reasons behind your choices.

This is also where AI is starting to appear in my workflow; I use it to help me with reviews.Because more and more situations aren't simply solved by mastering a skill. It ultimately comes down to soft skills. I'm becoming the kind of manager I used to dislike, haha. I interact with more people than I use tools every day. I'm currently preparing for a job change, and I've noticed my preparation process is different this time. While I still use resources like Indeed or IQB interview question banks and GPT or Beyz coding assistant for mock interviews, the goal this time is to slow down and make my reasoning process clearer. AI can speed up execution, but I feel that senior engineers need slower, clearer thinking for growth. This isn't something that can be easily quantified by how many problems you've solved or how many projects you've led. Even the feedback is much more ambiguous than learning a new tool.

I'm still unsure what the "correct" learning path looks like at this stage. It feels like becoming a sponge absorbing and disseminating information. The influencing factors and things to balance have become much more numerous than before. Where are the boundaries of this career development/promotion noscript? I recently saw an interesting analogy: we are a collection of cells constantly controlling the influx and efflux of new and old matter. So how do we determine "new" and "old" in our growth?

https://redd.it/1puffao
@r_devops
Help with OS Orchestration

I’m interested in building a malware analysis sandbox. For each analysis run, I need to automatically provision a fresh virtual machine, execute a malware sample, collect results, and then fully destroy the environment. The sandbox should support multiple operating systems such as Windows, Linux, macOS, and Android.

My main focus is on the orchestration layer, specifically, which technologies or tech stacks can be used to automate the deployment, execution, isolation, and teardown of these environments efficiently and securely.

https://redd.it/1pugxh0
@r_devops
Finally solved my "which port is this app on" problem with a simple Caddy trick

If you work on multiple projects (or a monorepo), you know the pain:
- "Was the API on 4323 or 4321?"
- "Is the marketing site 5400 or 5450?"
- opens 5 browser tabs with different ports
- checks package.json for the 10th time

I finally got tired of this and set up Caddy with wildcard localhost routing. Now I just type my-app.dev.localhost and I'm there. No ports. Real HTTPS. Green padlock.

The key insight that took me a few failed attempts: \*.localhost doesn't work (browsers reject the wildcard cert), but \*.dev.localhost does. Adding one subdomain level makes it work.



Basic setup:
    *.dev.localhost {
tls internal

host my-api.dev.localhost
handle {
reverse_proxy localhost:4323
}

host my-app.dev.localhost
handle {
reverse_proxy localhost:5400
}

handle {
respond "Not configured" 404
}
}


Run caddy trust once, then sudo caddy start --config Caddyfile. Done.

Bonus: also solves OAuth callback URL issues since your local URLs look like production (https://my-app.dev.localhost/auth/callback).



I wrote up the full thing with gotchas and a dev dashboard if anyone wants: https://thesashka.com/blog/posts/technical/taming-localhost-ports-with-caddy/

https://redd.it/1puid36
@r_devops
Feeling Like an Outsider a Few Months into Job

Hey everyone!

I'm a relatively new to my job, just a few months full time. I did intern with my team before, so I knew what to expect going in.

During my internship, I felt so incredibly confused the entire time. During the time between my internship and starting full time, I did some personal projects and filled in some gaps with containerization and other things.
Now that I am full time, I feel like I somewhat know what I'm doing, but I think what gets me is that my team is able to come up with new things to automate, find gaps in things that I don't see, and come up with better solutions with new technologies. I work for a good company, and my team is really smart, so I know if they are willing to have me, I must be okay.

I think what gets me sometimes is the vast amount of knowledge about tons of different things being in DevOps, and not having much of a background in anything else. There is so much to learn - and only over the past few months have I REALLY worked with RHEL, containerization, CI/CD, AWS, and of course our systems we have created. This, and sometimes I get so invested in the tasks themselves, that I can look over small details in PRs, or forgetting to keep up with putting in progress/closing out my Jira stories.

My team is also extremely organized, and although I find myself to be a very organized person, I feel like I make so many small mistakes during my work. I know I'm only a few months in, but things still take me time and even then, there are so many comments on my PRs. I want to be really good at this, and I really do enjoy it.

If anyone has any tips as far as organization, dealing with imposter syndrome in this field, and/or gaining confidence in my skills and knowledge, I would love to hear it.

Thank you!



https://redd.it/1puc8te
@r_devops
How do you prevent PowerShell noscripts from turning into a maintenance nightmare?

In many DevOps teams, PowerShell noscripts start as quick fixes for specific issues, but over time more noscripts get added, patched, or duplicated until they become hard to maintain and reason about. I’m curious how teams handle this at scale: how do you keep PowerShell noscripts organized, maintainable, and clean as they pile up? Do you eventually turn them into proper modules or tools, enforce standards through CI/automation, or replace them with something else altogether? Interested in hearing what’s actually worked in real-world environments.


https://redd.it/1puabzv
@r_devops
Help resolving connection refused between two sites cert manager

I have 3 nodes in one site and one on another it has only private ips and 3nodes is under same VIP i have done kubeadm init with vip and connected 3 node as control plane one in other location has worker

Worker to this 3 node has icmp and tcp connection all port open between this two

I deployed cert manager in worker 3
When i try applying an yaml it says https://svc:443 connection refused

I have all port opens i did upto my knowledge

Can you help me resolve this issue
Im stuck with this issue past 3 days

https://redd.it/1pugpce
@r_devops
EnvX-UI: Local, Encrypted & Editable .env

EnvX-UI was built to manage and edit .env files across multiple projects, including encrypted ones. A clean, intuitive interface for developers who need secure and centralized environment variable management.


https://github.com/litepacks/envx-ui

https://redd.it/1pulboq
@r_devops
Got actions/flows you swear by ?

Just wondering what people have defaults when they start a repo ?

We have linters and code stylers on production code repos
Just wondering is there others out there that may be handy ?

https://redd.it/1punbp6
@r_devops
State backend on AWS

How do you deal with the “chicken and egg” situation when creating backend for your infra on AWS? I’ve seen people do a bootstrap directory that deploys s3 and dynamodb table, and I have grown accustomed to it as well. I’m wondering how others approach it especially with dynamodb being depreciated for statelocking.

https://redd.it/1pum2l9
@r_devops
About stack in 2026

i have 4 years of experience job with full stack development in php,node,python,mysql,mongodb,redist and vue and react frontend framework.

i have knowledge in linux, nginx, apache, aws, docker, terraform, ansible, github and gitlab pipelines, a little bit about prometheus and grafana.

I have done some infra deploy in aws and digital ocean, but i feel im not enough yet.

Next month i will have a interview by a devops engineer mid/senior job, but i really want to this do right.

What stack do you guys recommend me to learn or revise to do well in the interview?

i really love do devops engineer much more than do code, and i really want migrate to this job, but feel very insecure because its a mid/senior job, i are have indicate to this job by a friend, that friend which taught me a lot about devops.

https://redd.it/1pult38
@r_devops
Zero-trust inside an early LLM platform: did you implement it from day one?

We’re building an internal LLM platform and compared two access models:

Option A - strict zero-trust between microservices (mTLS/JWT per call, sidecars, IdP).
Option B - a trusted boundary at the Docker network level (no per-request auth inside, strong boundary controls)

Current choice: Option B for the MVP. Context: single operator domain, no external system callers to the LLM service.

Why now
• Lower inference latency, faster delivery, lower integration cost

Main risk
• Lateral movement if a node inside the boundary is compromised

Compensators we use
• Network isolation/firewall, minimal images, read-only secrets with rotation, CI dependency scans, centralized logs/alerts, audit of outbound calls to external LLM APIs, isolated job containers without internal network

What we actually measure
• LLM service latency under load
• Secret rotation cadence
• Vulnerability scan score/drift
• Anomaly rate on outbound calls

Switch criteria to zero-trust later
• External integrations, multi-tenant mode, third-party operators/contractors, regulatory pressure

Questions to the community

1. On small teams: which mTLS/JWT pattern kept ops simple enough (service mesh vs per-service libs)?
2. What was the real latency/complexity tax you observed when going zero-trust inside the boundary?
3. Any “gotchas” with token management between short-lived jobs/containers?

https://redd.it/1puloxu
@r_devops
Where do you start when automating things for a series-A/B startup, low headcount?

Hey all

I’m curious how others approach this:

I’m working with a startup, they’re 2 years in and have some solid customers, and a dev team of about 8.

Software assets

\- spring boot/react typical web app for a UI, a bunch of LLM interactions, and data management

\- admin app where prompt engineers work with poorly/manual git versioned workflow

Testing

\- no unit

\- no integration

\- limited selenium coming online now

\- thousands of manual test cases, regression takes 5 days (!)

Deploy:

\- everything is non-CI, some shell noscripts

\- liquibase rolls into schema JARs

Infra:

\- stale terraform, likely significant config drift

Envs:

\- AWS

\- dev/qa/preprod/prod, but also a handful of “prod v1.x” instances where customers are being migrated from

Git:

\- trunk based, release branches, feature branches

Your reply could be from any experience, I’m just setting a little bit of level here so that we’re on the same page in terms of where they are in dev maturity. I have my thoughts, too, and a plan, and im curious how other folks see it, always something to learn.

Cheers!

https://redd.it/1pus3sb
@r_devops
Google Cloud CDN vs Cloudfront help me decide?

Hey guys
I'm building a video heavy app with long form stuff like 30 mins each and trying to figure out which CDN to use as a backup.
​I use Cloudflare as my main right now but after the recent outages I really need a solid secondary. I'm torn between Google Cloud CDN and AWS Cloudfront.
​GCP seems faster because of their private fiber network but AWS is just everywhere. For anyone who actually used both for video streaming or large files which one was less of a headache to set up? And how is the caching for long videos?
​Not really looking for marketing fluff just want to know from someone who’s been in the trenches which one is more reliable when things go south?
​Cheers

https://redd.it/1puutzv
@r_devops