Reddit DevOps – Telegram
Vouch: earn the right to submit a pull request (from Mitchell Hashimoto)

Mitchell Hashimoto got tired of watching open-source maintainers drown in AI-generated pull requests. So he built Vouch, a contributor trust management system. The concept is almost absurdly simple: before you can submit a PR to a project using Vouch, someone already trusted has to vouch for you.

The whole thing lives in a single text file inside the repo. One username per line. A minus sign means denounced. You can parse it with grep.

Sigstore verifies artifacts. SLSA verifies builds. Dependabot checks dependencies. None of them answer the question of whether a given person should be contributing to a project at all. That's the gap Vouch fills: contributor trust, not artifact trust.

Hashimoto designed it the same way he designed Terraform. Declarative. Human-readable. Version-controlled. Instead of .tf files for infrastructure, you get .td files for trust. Same brain, different domain.

The xz-utils backdoor is the elephant in the room. "Jia Tan" spent two years earning trust through legitimate contributions before planting a CVSS 10.0 backdoor. Vouch wouldn't have stopped that attack. But the vouch record would've been visible in the git history, who vouched for them, when, and the denouncement would propagate to every project subscribing to that vouch list. Less of a lock, more of a security camera.

Ghostty is already integrating it. The repo picked up 600 stars in three days. A GitHub staff member commented on the HN thread saying they'd ship changes "next week."

The concerns are real though. Gatekeeping is the obvious one. Open source is supposed to be open, and Vouch creates an explicit barrier where there wasn't one before. One HN commenter called it "social credit on GitHub." The persona gaming problem hasn't gone away either; someone could still spend months building trust before going rogue.

Hashimoto himself flags it as experimental. But it's the first serious attempt at making contributor trust visible and version-controlled.

I wrote up the full breakdown, including how Vouch compares to PGP's web of trust, Advogato, and Debian's maintainer process, here if you want the deep dive.

https://redd.it/1qzgoao
@r_devops
State of OpenTofu?

Has OpenTofu gained anything on Terraform? Has it proven itself as an alternative?


I unfortunately don't use IaC in my current deployment but I'm curious how the landscape has changed.



https://redd.it/1qz67sq
@r_devops
Need advice: trying to document an installation guide for production

Hey guys, I recently open-sourced a pretty huge self-hosted project. I've set up a docker-compose.yaml that worked fine for local deployments, but I suppose I made a lot of rookie mistakes for a production deployment guide.

I don't have much experience in DevOps except for small services and deploying websites with nginx+letsencrypt, and when people started coming to me for advice on why their setup failed, I was a bit overwhelmed.

For the last three evenings I've been trying to come to a default installation guide for a reverse proxy that would work fine for production.

So, the current setup is pretty standard:

- docker-compose.yaml with setup on localhost by default
- pretty much a default Go backend container
- frontend container that builds the frontend with baked in nginx that serves the static files on / and sets up a localhost reverse proxy on /api

 

My initial prod setup directed people to build the images manually and to edit the frontend/nginx.conf.template that the frontend container uses, so that people change their servername/adjust their IP address and so on.

Well, after debugging a couple environment-specific problems that people faced trying to deploy it this way, I realized that I need to adjust the guide ASAP.

At first, I thought that I needed to remove the baked-in nginx from the frontend container and move it up to `docker-compose.yaml`, but then I've read a suggestion on the internet that I can just put another reverse proxy in front of the frontend-internal nginx one.

 

So, my current thinking process is:

1. adjust nginx.conf.template to accept the DOMAIN and BACKEND
PORT, so that they're provided by docker-compose, not changed by the user (or should the baked in nginx.conf be left untouched, without accepting those env vars, staying localhost-only?)
2. add a new container in docker-compose for prod setups - caddy with a reverse proxy in front (maybe as an override file)

Also, is it fine to mix caddy and nginx this way? or am I better off overhauling the setup entirely? If so, what's the best course of action for me?

In case someone wants to take a look: https://github.com/Vsein/Neohabit (the setup files are docker-compose.yaml, .env.example, frontend/nginx.conf.template; all of them are mentioned in the installation guide "building manually from source")

And here's what I've been trying to do: https://github.com/Vsein/Neohabit/pull/110

Anyway, sorry if this post is amateurish, I just genuinely feel like I'm wasting my time trying to do something that might be a wrong direction entirely.

https://redd.it/1qzmth6
@r_devops
How do devs secure their notebooks?

Hi guys,
How do devs typically secure/monitor the hygiene of their notebooks?
I scanned about 5000 random notebooks on GitHub and ended up finding almost 30 aws/oai/hf/google keys (frankly, they were inactive, but still).



https://redd.it/1qzn7f2
@r_devops
Priority Dilemma: Academic GPA vs. Personal Projects in DevOps

​Hi everyone,

​I’m a first-year Computer Science student, and I’m currently facing a dilemma that I’d love to get your take on (especially from the recruiters and hiring managers here).

​On one hand, a high GPA is often seen as a critical resource and a primary screening tool for many companies.

​On the other hand, I feel that the DevOps world is highly practical.
A project that demonstrates a complete End-to-End Pipeline (using tools like GitHub Actions, AWS, Docker, K8s, Terraform, Ansible, etc.)
shows hands-on toolchain knowledge and real-world application—qualities that are hard to measure through a GPA alone.

​I’d like to ask about your priorities:

1. ​When screening for a Junior or Student position, what would make you stop and look at my CV—a 90 GPA with no projects, or an 80 GPA with a portfolio that demonstrates a deep understanding of CI/CD and IaC?

2. ​Do you have any tips on how to properly present such projects on a CV or in an interview to effectively reflect architectural understanding?

​Thanks in advance for your insights! 🙏

https://redd.it/1qzoupy
@r_devops
What should I prepare / learn in detail before a DevOps / Cloud Engineer internship? (GitLab, Terraform, AWS)

Hi everyone,

I have a **DevOps / Cloud Engineer internship** coming up (about **4–5 months long**) , and the main tools used are **GitLab, Terraform, and AWS**.

For context, I already have:

* **AWS Solutions Architect Associate**
* **Terraform Associate**
* **CKA (In progress)**

So I’m familiar with the **concepts and theory**, but I don’t have much **real hands-on / production-style experience yet**, which I’d like to work on before the internship starts.

I’d really appreciate advice from people in DevOps / cloud roles on:

* What **hands-on skills** I should focus on with:
* **GitLab** (CI/CD pipelines, runners, YAML, etc.)
* **Terraform** (state management, modules, best practices?)
* **AWS** (which services matter most at intern level?)
* Any **common gaps interns usually have**, even with certs
* Things you wish you had practiced *before* your first DevOps / cloud role

I’m not trying to master everything, just want to be **useful quickly and not completely lost** on day one 😅

Any advice, learning priorities, or “focus on this, ignore that” tips would be really appreciated. Thanks!

https://redd.it/1qzrs6y
@r_devops
What decides where to ru the build on git runners or cloud build machines . Which is better in the long run if you may have multiple clouds

Currently using aws ci cd but new devops guy is using git runners .

No idea what is the right strategy


https://redd.it/1qzw544
@r_devops
[Weekly/temp] Built a tool? New idea? Seeking feedback? Share in this thread.

This is a weekly thread for sharing new tools, side projects, github repositories and early stage ideas like micro-SaaS or MVPs.

What type of content may be suitable:

* new tools solving something you have been doing manually all this time
* something you have put together over the weekend and want to ask for feedback
* "I built X..."

etc.

If you have built something like this and want to show it, please post it here.

Individual posts of this type may be removed and redirected here.

Please remember to follow the rules and remain civil and professional.

*This is a trial weekly thread.*

https://redd.it/1qzyfzf
@r_devops
1
Weekly/temp DevOps ENTRY LEVEL - internship / fresher & changing careers

This is a weekly thread to ask questions about getting into DevOps.

If you are a student, or want to start career in DevOps but do not know how? Ask here.

Changing careers but do not have basic prerequisites? Ask here.

Before asking

try to search if your question was asked and answered
try these resources
[https://roadmap.sh/devops](https://roadmap.sh/devops)
(please suggest more)

_____________

Individual posts of this type may be removed and redirected here.

Please remember to follow the rules and remain civil and professional.

This is a trial weekly thread.

https://redd.it/1qzzvku
@r_devops
SSL/TLS explained (newbie-friendly): certificates, CA chain of trust, and making HTTPS work locally with OpenSSL

I kept hearing “just add SSL” and realized I didn’t actually understand what a certificate proves, how browsers trust it, or what’s happening during verification—so I wrote a short “newbie’s log” while learning.

In this post I cover:

What an “SSL certificate” (TLS, really) is: issuer info + public key + signature
Why the signature matters and how verification works
The chain of trust (Root CA → Intermediate CA → your cert) and why your OS/browser already trusts certain roots
A practical walkthrough: generate a local root CA + sign a localhost cert (SAN included), then serve a local site over HTTPS with a tiny Python server + import the root cert into Firefox

Blog Link: https://journal.farhaan.me/ssl-how-it-works-and-why-it-matters

https://redd.it/1r07ejx
@r_devops
Monitoring performance and security together feels harder than it should be

One thing I have noticed is how disconnected performance monitoring and cloud security often are. You might notice latency or error spikes, but the security signals live somewhere else entirely. Or a security alert fires with no context about what the system was doing at that moment.

Trying to manage both sides separately feels inefficient, especially when incidents usually involve some mix of performance, configuration, and access issues. Having to cross check everything manually slows down response time and makes postmortems messy.

I am curious if others have found ways to bring performance data and security signals closer together so incidents are easier to understand and respond to.

https://redd.it/1r0dbxa
@r_devops
When is it time to quit?

I wrapped up a tech panel for a Principal Azure Engineer role at an investment bank a couple of hours ago. This followed an interview with the hiring manager last Wednesday. We know each other from the past, i.e., I’ve interviewed for multiple roles at this firm over the last 5-6 years.

This role landed on my LinkedIn feed randomly. I commented on the post and emailed the hiring manager directly, we had a short back-and-forth, and his recruiter called me almost immediately. The process has been unusually smooth by modern standards.

Today’s panel felt strong. I’m confident I cleared the bar with both the Azure SME and the hiring manager. I saw visible agreement on several answers, got verbal acknowledgment more than once and handled questions from a junior panelist with ease. I was told that I’m “first in line” (not sure if that means FIFO or first on the shortlist), however, it seemed to be directionally positive.

Here’s the problem: I was laid off a little over six months ago and I am EXHAUSTED. It's like I've been on the hamster wheels of interviews since 8/4/2025. I’ve done the prep, the loops, the panels, the follow-ups. I know I’m good enough to be gainfully employed as a DevOps engineer.

If this role doesn’t turn into an offer, I’m seriously questioning whether I want to continue in tech at all. I don’t know if I have it in me to keep doing 5–7 round interview gauntlets, only to be rejected for vague reasons like “culture fit” or not smiling enough. I’ve given my adult life to STEM / engineering / corporate IT / tech and I am exhausted from having to engage with recruiters who want someone to take managerial roles for IC level pay.

I’m not bitter about rejection. I’m tired of dysfunction...hiring managers who don’t know the difference between EC2 and AWS Lambda, recruiters who can’t distinguish an AWS account from an Azure subnoscription and BS interview processes that ding candidates for being "too intense".

So I’m asking honestly: when is it time to walk away?
For those who’ve been at a similar crossroads...did you step back temporarily, change strategy or leave tech altogether?


TL;DR: Six months, countless interviews, strong signals in today's tech panel. If today's tech panel doesn’t result in an offer, I’m seriously considering being done with the tech interview industrial complex.

https://redd.it/1r0jghq
@r_devops
Security findings come in Jira tickets with zero context

Security scanner runs nightly and I wake up to 15 Jira tickets. Each one says fix CVE-2025-XXXX in dependency Y with no explanation of what the dependency does, where it's used, or why it matters.

I'm supposed to drop whatever sprint work I'm on, research the CVE, find where we use that package, assess actual risk, test the upgrade, and hope nothing breaks.

Meanwhile the ticket was auto-generated and the security team has no idea what they're asking me to fix. Just scanner said critical so here's a ticket.

Why can't these tools give actual context? Like this package is used in auth flow, vulnerability allows account takeover, here's how to fix it. Instead of just screaming CVE numbers at me.

https://redd.it/1r4xpz9
@r_devops
Duplicate writes in multi-step automation: where do you enforce idempotency?

Genuine question.

We run multi-step automation that touches tickets, db writes, api calls and emails.

A step partially failed or timed out. we restarted the run. a downstream write had already happened. result: duplicate tickets, duplicate notifications.

This does not feel like a simple retry problem. it is about where step boundaries live and how side effects stay idempotent across an entire run.

Things we are trying:

Treating write-capable steps differently from read-only steps
Requiring idempotency keys or operation ids for side effects
Making re-runs step-scoped instead of whole-run
Keeping a durable per-step ledger with inputs, outputs and timestamps
Adding manual pause or cancel before certain write steps

It still feels easy to get wrong.

Where do you enforce idempotency in practice?

Application layer
Workflow engine
Middleware or sidecar
Sagas or outbox pattern
Approval gates

If you have shipped long-running automation with real side effects, what worked and what caused incidents?

https://redd.it/1r4u7zr
@r_devops
How are you handling AI agent inventory and compliance in your infrastructure?

With the EU AI Act enforcement date coming up (August 2026), we've been dealing with a problem that I think a lot of DevOps teams are going to hit: figuring out what AI agents are actually running in your infrastructure.

Our situation: we had n8n workflows calling OpenAI, LangChain agents deployed by different teams, random Zapier integrations making API calls to Claude — and nobody had a central view of all of it. Classic shadow AI problem.

The compliance angle made it urgent. The EU AI Act requires organizations to classify AI systems by risk level, maintain documentation, and demonstrate oversight. Can't do any of that if you don't even have an inventory.

What we ended up building was a scanner that walks through your infra and maps AI components — models, agents, API calls, data flows. We open-sourced it as AI-BOM (github.com/Trusera/ai-bom) since we figured other teams are hitting the same wall.

But I'm curious how others are approaching this:

- Do you have visibility into what AI/LLM integrations are running across your org?
- Is anyone tracking AI agents as part of their CMDB or asset inventory?
- How are you thinking about EU AI Act compliance from an infrastructure perspective?
- Anyone using SBOM-style approaches for AI components?

Would love to hear what other teams are doing — or if this just isn't on your radar yet.

https://redd.it/1r4y6b7
@r_devops
Any resources to help a senior backend engineer moving into a lead data platform engineering role? My DevOps knowledge is elementary at best and I don't know everything AWS but I'm the most qualified to do this.

For context, I'm a strong backend engineer and I've used Terraform to create my own services and whatnot but I've never done anything this in-depth like the SREs and lead platform engineers at my previous companies.

Establishing engineering best practices for the team, platform monitoring, observability, security/governance, failover, design patterns, architecture, and the whole 9 yards are going to be my main responsibility (this absolutely terrifies me). I'm going to be the main engineer that data/analytics engineers, ml engineers, and management can come to for advice.

My vision here is to build a boring but reliable and well-oiled machine. Ideally costs are optimized, we're not being idiots by leaving resources unattended to. Everything's being built from scratch so I have the final say but I'm worried about screwing it up and doing something stupid that'll cost the companies thousands for no reason.

Tooling wise, it's mainly AWS, Snowflake, and I'm thinking of introducing Gitlab instead of Github.

https://redd.it/1r50dcd
@r_devops
Need help preparing for internship

Hi, I was lucky enough to get a cloud/devops engineer intern, but I rlly only know the basics of the cloud, I don’t really know much about it.

Are there any resources/books you recommend to learn more abt cloud technologies and be able to do good during the internship?

Thank you so much!

https://redd.it/1r52nkk
@r_devops
Book recommendation

What is the best book to learn network? I have general idea about dns, firewalls, NAT, switch, hub etc. But I still don’t feel confident regarding network and want to dig deeper.

https://redd.it/1r4wpu8
@r_devops
Do you feel the Heat of AI in DevOps Roles?

as the noscript suggests, do you feel AI is after your DevOps job?.

have you seen it helping effectively in your role or eliminating your role.

helping --> generating IAC, python code for automation. decesion making when your confused at using anything in DevOps. etc.,

Eliminating --> AI can replace you in every possible way.

I can go first:

Helping --> I have seen juniors using it effectively and writing better code with faster turnaround time.my junior is nothing without AI and so arrogant person that he tells him self and others that he knows everything. true to this my manager supports him as he fixes and provisions infra in no time.but he engages us in calls for hours to make him self understand the requirement.

Eliminating --> i strongly feel our roles will be vanished in years to come.may be max 5 yrs. the reason I see is the bug. the startup bug. everyone wants to do something and they feel as if they are doing favour to the society. but no, they are satisfieng their ego.they are looking very closely at all roles to see what can be automated and targetting them. DevOps is no exception here. thts how Amazon also had to let go many DevOps/cloud engineerings.

https://redd.it/1r56d15
@r_devops
ACA autoscaling killing long running jobs — best practice?

Using Azure Container Apps with HTTP autoscaling(with 10 as concurrent users) for report generation. During scale up/down, replicas get terminated and reports fail mid-execution.

Questions:
• Is this the right pattern for long-running jobs on ACA?
• Any Service Bus lock timeout gotchas?

https://redd.it/1r4hkzu
@r_devops
Dual boot or VMware

I started learning devops a while ago, I used to practice on VMware but sometimes the machine freezes specially when I am learning k8s so I start thinking about dual boot but I don’t know if it is good enough for learning and practice all the tools or I should give the machine more specs

https://redd.it/1r57rba
@r_devops