Reddit DevOps – Telegram
what does a DevOps engineer actually do day-to-day?

Hi everyone,

I’m currently getting into DevOps and had a few beginner questions that I’ve been thinking about.

From a real-world perspective, what does a DevOps engineer usually do on a daily basis?
Do you mostly write noscripts and automation, or do you also write application code?

Another thing I’m curious about is command usage. As a beginner, it feels overwhelming to remember so many commands and configurations. In real jobs, do engineers memorize most commands, or is it normal to rely on documentation, notes, and previously written noscripts?

Also, how different is interview expectation compared to actual on-the-job work?
I’m asking this genuinely to understand what I should focus on while learning.

https://redd.it/1puwcya
@r_devops
A little cookiecutter noscript to add logging and redirect to circusd

I've recently set up a home server slash IoT hub (router with three wifi access points, zigbee server, file server, a bunch of little web servre apps) and ended up using circusd. Mostly to keep services nicely separate from one another and systemd. It lets me look at the pstree for an entire service, watch for restarts and look at all the logs together.

I have a pattern where each service gets its own user with files for running circus, rsyslog etc. I've done this enough times that I've set up a little cookiecutter noscript to set up the user and I thought I might as well share this here. It's very much tuned for the "home network" setting (e.g. I am publishing services on mdns using avahi etc). Also people probably want autoscaling container magic for things used in anger, but works pretty well for single user stuff.

`https://github.com/talwrii/cookiecutter-circus`

https://redd.it/1puyj6m
@r_devops
Why am i getting rejected from internships?

Hi there DevOps community
This year am looking for an internship, so i apply as a student should do, there r actually a lot of offres, and i know where am applying there aren't that many DevOps students, so i was expecting to get responses quickly, but for some reason i get rejected right away
There is a pattern that i noticed tho, big companies take a looong time and than they rejected me, but smaller companies take medium time and they accept Moving to the next stage
My resume is strong, and the few other DevOps students are also facing the same issues
Does anyone have an idea on what's happening?

https://redd.it/1puyjs2
@r_devops
How are you handling CI/CD for AI Agents?

I’m a dev working on a tool to help audit and deploy AI agents. I realized that standard CI/CD breaks down with agents because a code rollback doesn't necessarily fix a "behavior" regression caused by a prompt drift or model update. If you are deploying LLMs in production: Do you treat prompts as config files (Helm charts/Env vars) or code? If an agent starts hallucinating in prod, does your current pipeline allow you to "hot swap" the prompt version without a full redeploy?

https://redd.it/1pv11x0
@r_devops
Which AWS consulting partners in Europe are actually worth it? Top 10

Let’s be honest, browsing the AWS Partner Network directory feels like trying to find a needle in a haystack where every needle claims to be Premier. Everyone has badges, everyone promises seamless digital transformation, but how many actually deliver when production is on fire? Finding top AWS consultants who don't just bill you for hours but actually fix your cloud infrastructure is harder than it looks.

I’ve dealt with enough agencies to know that a shiny sales deck doesn't equal clean code. So this isn't a ranked leaderboard, but rather a curated list of companies that actually bring value to the table, depending on whether you need AWS managed services or deep engineering muscle:

1. Nordcloud: They are essentially the IBM of the cloud world in Europe now. If you are a massive enterprise needing standardized compliance and have the budget to match, they are a solid bet.
2. Beetroot: A strong choice if you need AWS certified developers but want them embedded in your team rather than just consulting from the outside. They specialize in building dedicated teams and handling complex DevOps pipelines. Their focus is big on the "human" side of tech, which helps when retention matters.
3. DoiT International: Go to them if your bill is bleeding you dry. They are absolute wizards at cost optimization and reselling, though less focused on building custom apps from scratch.
4. The Scale Factory: Great for SaaS businesses. They understand scalability and don't just throw hardware at problems.
5. Storm Reply: Very strong on the technical execution side, particularly in Germany and Italy. They handle heavy IoT and industrial cloud projects well.
6. AllCloud: If you are stuck between Salesforce and AWS, these guys bridge that gap better than most.
7. tecRacer: Another heavy hitter in the DACH region. Their training is top-tier, which usually translates to competent consultants.
8. SoftwareOne: Good for licensing and general management, though sometimes feels a bit corporate for agile startups.
9. Contino: Excellent for the transformation culture. They focus heavily on cloud-native adoption rather than just "lift and shift."
10. Caylent: While they have a heavy US presence, their European operations are growing and they are deep into AWS Lambda and serverless architectures.

When you interview these firms, ask about their DevOps culture. Do they automate security checks? Do they use Terraform or CloudFormation? If they stare blankly, run. You want partners who push for serverless where it saves money and containers where it makes sense, not just whatever is easiest for them to bill. If you just need hands, standard outsourcing works. But for architecture, you need top AWS consultants who will challenge your bad ideas. The best cloud migration services often involve telling the client that their legacy app shouldn't be migrated as-is. It makes a massive difference in the long run.

https://redd.it/1puy3t4
@r_devops
Vagrant SSH CTRL C Bug Workaround - Decoding DevOps

Hi everyone!

I'm new in my DevOps journey, following a Udemy course named Decoding DevOps, and for now I'm liking it a lot, the only thing that was quite annoying is that the vagrant ssh command would exit the ssh client whenever you sent a CTRL+C, I couldn't find a way around it apart from using the normal SSH client through your Git BASH, so I just made a simple tidy noscript that automatically gets all the info needed from the VM and creates an alias for simple ssh connecting. Here is my repo, it's the first time I'm doing something like this, I know its really simple but tbh having it work on my end made me very happy and I want to just share this somewhere.

https://github.com/jovanjungic/vssh-sync

https://redd.it/1puxrzo
@r_devops
Building a deterministic policy firewall for AI execution — would love infra feedback

I’m experimenting with a control-plane style approach for AI systems and looking for infra/architecture feedback.



The system sits between AI (or automation) and execution and enforces hard policy constraints before anything runs.



Key points:

\- It does NOT try to reason like an LLM

\- Intent normalization is best-effort and replaceable

\- Policy enforcement is deterministic and fails closed

\- Every decision generates an audit trail



I’ve been testing it in fintech, health, legal, insurance, and gov-style scenarios, including unstructured inputs.



This isn’t monitoring or reporting — it blocks execution upfront.



Repo here: https://github.com/LOLA0786/Intent-Engine-Api



Genuinely curious:

\- What assumptions would you attack?

\- Where would this be hard to operate?

\- What would scare you in prod?



https://redd.it/1pv6ox2
@r_devops
Mist: self-hostable PaaS for deploying apps on your own infrastructure

Over the past few months, me and a friend have been building Mist, a self-hostable PaaS aimed at people running their own VPS or homelab setups.
Mist helps you deploy and manage applications on infrastructure you control using a Docker-based workflow, while keeping things lightweight and predictable.

Current features:
- auto-deployments on git push
- Docker-based application deployments
- multi-user architecture
- domain and TLS management

The project is fully open source. There’s a fairly large roadmap ahead, and we’re actively looking for contributors and early feedback from people who self-host or build infra tools.

Docs / project site: https://trymist.cloud
Source code: https://github.com/corecollectives/mist

Happy to answer questions or hear suggestions.

We’re still relatively new to software development and are building this in the open while learning and iterating.

https://redd.it/1pv7pk9
@r_devops
Devops or Devlopment as a fresher

I don’t have much in-depth knowledge about web dev like I know only basic html, css, did some vibe coded projects from scratch and deployed it on vercel. By this I got to know about how backend and frontent works. How different tech stack works like surface knowledge, react, angular, different backend frameworks like django fastapi, as well as middlerware and where they are used, as well as built tools like vue, runtime environment, crud databases, supabase, sql, hiding .env before pushing to git, different package managers, microservices, RESTapi integration as well as different api options, tier 2 and tier 3 web architecture difference, all because of curiosity and AI. Now If u tell me to code without AI I will know which tech stack to use, what to build but not how to build it as I don’t know the syntax of each lang but understand the logic behind the structure of the project.

I am confused as a 4th sem btech student tier 3, I m not much inclined towards web dev learning it from scratch as well as long codes but I like top down or big picture approach how different systems work and manages lot of interactions without breaking, how it scales and most importantly I like to automate task rather than writing long codes, so I got to know about devops which fits my interest as I know Linux, noscripting, networking, yaml and also interest in learning cloud computing.

So I wanted to ask if I should go for pure devops instead of development will I get entry level jobs and internships.

Your guidance will be much appreciated 🙏

https://redd.it/1pv6csu
@r_devops
I’m building runtime “IAM for AI agents” policies, mandates, hard enforcement. Does this problem resonate?

I’m working on an MVP that treats AI agents as **economic actors**, not just noscripts or prompts and I want honest validation from people actually running agents in production.



The problem I keep seeing

Agents today can:

* spend money (LLM calls, APIs)
* call tools (email, DB, infra, MCP servers)
* act repeatedly and autonomously

But we mostly “control” them with:

* prompts
* conventions
* code



There’s no real concept of:

* agent identity
* hard authority
* budgets that can’t be bypassed
* deterministic enforcement



If an agent goes rogue, you usually find out **after** money is spent or damage is done.



What I’m building

A small infra layer that sits **outside** the LLM and enforces authority mechanically.



Core ideas:

* **Agent** = stable identity (not a process)
* **Policy** = static, versioned authority template (what **could** be allowed)
* **Rule** = context-based selection (user tier, env, tenant, etc.)
* **Mandate** = short-lived authority issued per invocation
* Enforcement = allow/block tool/MCP + LLM calls at runtime



No prompt tricks. No AI judgment. Just deterministic allow / block.



Examples:

* Free users → agent can only read data, $1 budget
* Paid users → same agent code, higher budget + more tools
* Kill switch → instantly block all future actions
* All actions audited with reason codes



What this is NOT

* Not an agent framework
* Not AI safety / content moderation
* Not prompt guardrails
* Not model alignment



It’s closer to IAM / firewall thinking, but for agents.



Why I’m unsure

This feels **obvious** once you see it, but also very infra-heavy.

I don’t know if enough teams feel the pain **yet**, or if this is too early.



I’d love feedback on:

1. If you run agents in prod: what failures scare you most?
2. Do you rely on prompts for control today? Has that burned you?
3. Would you adopt a hard enforcement layer like this?
4. What would make this a “no-brainer” vs “too much overhead”?



I’m not selling anything, just trying to validate whether this is a real problem worth going deeper on.

github repo for mvp (local only): [https://github.com/kashaf12/mandate](https://github.com/kashaf12/mandate)

https://redd.it/1pvat3j
@r_devops
How do you automate license key delivery after purchase?

I’m selling a desktop app with one-time license keys (single-use). I already generated a large pool of unique keys and plan to sell them in tiers (1 key, 5 keys, 25 keys).

What’s the best way to automatically:

assign unused keys when someone purchases, and
email the key(s) to the buyer right after checkout?

I’m open to using a storefront platform + external automation, but I’m trying to avoid manual fulfillment and exposing the full key list to customers.

If you’ve done this before or have a recommended stack/workflow, I’d love to hear what works well and what to avoid.

Also, is this by chance possible on FourthWall?

https://redd.it/1pvhb5a
@r_devops
Seeking a Mentor in DevOps for Guidance on Projects, Production Environments, and Managing Complexity

Hello, fellow DevOps enthusiasts!

I am actively looking for a mentor who can guide me through the intricacies of DevOps, particularly when it comes to managing real-world production environments and tackling the complexities that come with them. I’ve been exploring DevOps tools and concepts, but I feel that having someone with hands-on experience would greatly accelerate my learning.

Specifically, I'm looking for guidance on:

* Managing production environments at scale
* Optimizing CI/CD pipelines for larger projects
* Understanding and mitigating the complexities of infrastructure
* Best practices for automation, monitoring, and security in production
* Working on and improving existing projects with a focus on reliability and efficiency

If you have experience in these areas and would be willing to help me navigate the challenges, I would greatly appreciate your mentorship. I'm eager to learn, share ideas, and work on real-world projects that will enhance my skills.

Feel free to message me if you’re open to a mentorship opportunity, and I look forward to connecting with some of you!

Thanks in advance!

https://redd.it/1pvdqry
@r_devops
Is there a book that covers every production-grade cloud architecture used or the most common ones?

Is there a recipe book that covers every production-grade cloud architecture or the most common ones? I stopped taking tutorial courses, because 95% of them are useless and cover things I already know, but I am looking for a book that features complete end-to-end IaC solutions you would find in big tech companies like Facebook, Google and Microsoft.

https://redd.it/1pvk0ni
@r_devops
Would you consider putting an audit proxy in front to postgres/mysql

Lately I've been dealing with compliance requirements for on-prem database(Postgres). One of those is providing audit logs, but enabling slow query log for every query(i.e. log_min_duration_statement=0) is not recommended for production databases and pgAudit seems to be consuming too much I/O.

I'm writing a simple proxy which will pass all authentication and other setup and then parse every message and log all queries. Since the proxy is stateless it is easy to scale it and it doesn't eat the precious resources of the primary database. The parsing/logging is happening asynchronously from the proxying

So far it is working good, I still need to hammer it with more load tests and do some edge case testing (e.g. behavior when the database is extremely slow). I wrote the same thing for MySQL with the idea to open-sourcing it.

I'm not sure if other people will be interested in utilizing such proxy, so here I am asking about your opinion.

Edit: Grammar

https://redd.it/1pvm6qv
@r_devops
Versioning cache keys to avoid rolling deployment issues

During rolling deployments, we had multiple versions of the same service running concurrently, all reading and writing to the same cache. This caused subtle and hard-to-debug production issues when cache entries were shared across versions.



One pattern that worked well for us was versioning cache keys \- new deployments write to new keys, while old instances continue using the previous ones. This avoided cache poisoning without flushing Redis or relying on aggressive TTLs.



I wrote up the reasoning, tradeoffs, and an example here:

https://medium.com/dev-genius/version-your-cache-keys-to-survive-rolling-deployments-a62545326220



How are others handling cache consistency during rolling deploys? TTLs? blue/green? dual writes?

https://redd.it/1pvow88
@r_devops
🛡️ Built MCP Guard - a security proxy for Cursor/Claude agents (I'm the dev)

Hey everyone! 👋

I've been working on something for the past few weeks and wanted to share it here.

The problem I faced:
I use Cursor with MCP to interact with my databases. One day, I accidentally let my agent run with full read/write/delete access. I watched in horror as it started building queries... and I realized I had zero control over what it could do.

What if it runs DROP TABLE users instead of SELECT *?

What I built:
MCP Guard - a lightweight security proxy that sits between your AI agent and your MCP servers.

Features:

Block dangerous commands (DROP, DELETE, TRUNCATE, etc.)
Generate API keys with rate limits and RBAC
Full audit logs of every agent interaction
Sub-3ms latency
Why I'm posting here:
I'm launching the beta on Dec 28 and looking for feedback from actual users. Not trying to sell anything - the free tier gives you 1,000 requests/month with no credit card.

If you're using MCP with Cursor/Claude and have thoughts on security, I'd love to hear from you.

Link: https://mcp-shield.vercel.app

Happy to answer any questions! I'm the sole developer behind this, so AMA about how it works. 🔥

https://redd.it/1pvw25o
@r_devops
what actually helps once ai code leaves the chat window?



ai makes it easy to spin things up, but once that code hits a real repo, that’s where i slow down. most of my time goes into figuring out what depends on what and what i’m about to accidentally break.

i still use chatgpt for quick thinking and cosine just to trace logic across files. nothing fancy.

curious what others lean on once ai code is real.

https://redd.it/1pvwxey
@r_devops
Securing the frontend application and backend apis

Hi all,

In am looking for a reliable solution to secure the frontend url and backend apis so that is only accisible to people who has our VPN. Is it possible to do so ? I am using AWS currently, how I can do that reliably. Please help!

https://redd.it/1pvwujj
@r_devops
Throwback 2025 - Securing Your OTel Collector

Hi there, Juraci here. I've been working with OpenTelemetry since its early days and this year I started Telemetry Drops - a bi-weekly ~30 min live stream diving into OTel and observability topics.

We're 7 episodes in since we started four months ago. Some highlights:

AI observability and observability with AI (two different things!)
The isolation forest processor
How to write a good KubeCon talk proposal
A special about the Collector Builder

One of the most-watched so far is this walkthrough of how to secure your Collector - based on a blog post I've been updating for years as the Collector evolves.

https://youtube.com/live/4-T4eNQ6V-A

New episodes drop ~every other Friday on YouTube. If you speak Portuguese, check out Dose de Telemetria, which I've been running for some years already!

Would love feedback on what topics would be most useful - what OTel questions keep you up at night?

https://redd.it/1pw15e0
@r_devops
Your localhost, online in seconds.Public URL for everyone

Tired of complex tunneling tools?

Portex cli exposes your localhost to the internet in seconds. Features secure tunnels with end-to-end encryption, one-click PIN protection for client demos, and a built-in traffic inspector to debug webhooks instantly.

No forced logins, just portex start. Works on macOS, Windows & Linux.

https://portex.space/

https://redd.it/1pvyzeb
@r_devops
Cache npm dependencies

I am trying to cache my npm dependencies so every time my GitHub Actions runs, it pulls the dependencies from cache unless package-lock.json changes. I tried the code below, but it does not work (the npm install is still happening on every run):

build:

runs-on: ubuntu-latest

needs: security



steps:

- uses: actions/checkout@v3



- name: Set up Node.js version

uses: actions/setup-node@v4

with:

node-version: '14.17.6'

cache: 'npm'



- name: Cache node modules

uses: actions/cache@v3

with:

path: ~/.npm

key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}

restore-keys: |

${{ runner.os }}-node-



- name: npm install and build

run: |

export NODE_OPTIONS="--max-old-space-size=4096"

npm ci

npm run build

env:

CI: false

REACT_APP_ENV: dev



- name: Zip artifact for deployment

run: cd build && zip -r ../release.zip *



- name: Upload artifact for deployment job

uses: actions/upload-artifact@v4

with:

name: node-app

path: `release.zip`

https://redd.it/1pw5ars
@r_devops