Reddit DevOps – Telegram
Spent too much time stripping down a base image to reduce CVEs and now it breaks on every update. How do you maintain custom containers long-term?

So I went down the rabbit hole of manually removing packages from ubuntu:latest to cut down our CVE count. Got it from 200+ vulns to like 30. Felt pretty good about myself.

Fast forward 2 weeks and every apt update breaks something different. Missing deps, broken symlinks, you name it. Now I'm spending more time babysitting this thing than I saved.

Anyone know a better way to do this? I see people talking about distroless but not sure if that fits our use case. What's your approach for keeping images lean without the maintenance nightmare?



https://redd.it/1og11pb
@r_devops
How do smaller teams manage observability costs without losing visibility?

I’m my very curious how small teams or those without enterprise budget handle monitoring and observability trade-offs.

Let's say for example tools like Datadog, New Relic, or CloudWatch can get pricey once you start tracking everything, but when I start trimming metrics it always feels risky.


For those of you running lean infra stacks:

• Do you actively drop/sample metrics, logs, or traces to save cost?

• Have you found any affordable stacks (e.g. Prometheus + Grafana + Loki/Tempo, or self-hosted OTel setups) that will still give you enough visibility?

• How do you decide what’s worth monitoring vs. what’s “nice to have”?

I'm not promoting anything. I'm just curious how different teams balance observability depth vs. cost in real-world setups.

https://redd.it/1og42rk
@r_devops
Tired of project scaffolding being "fire-and-forget"? I built SKA to allow template updates over time.

Hi everyone,

I just finished the initial version of an open-source tool I'm calling **SKA**, and I'd love to get your thoughts!

My biggest frustration with existing scaffolding tools is the "one-shot" nature—you generate the code once, and that's it. It’s a pain when you want to centrally maintain best practices across multiple projects (like standardizing a dependency, updating a security config, or improving a build step).

**SKA** aims to be different by introducing the concept of **central management for template updates**.

Here's the idea:

* You use a blueprint (local or remote) to **create** your project.
* The project keeps a link back to that blueprint.
* Later, you can run ska update and it intelligently pulls in the latest changes from the upstream template, like a controlled merge.

It also supports nice-to-haves like:

* A dynamic, interactive form for capturing initial variables.
* Using special tags to manage **only parts of a file** from the central template, leaving the rest for the user to customize (super useful for configuration files).

I built it in Go, and installation is easy via Homebrew.

I'm feeling really good about the core concept, but I know it can be better! If you have a minute, please check out the repo and the README to see the features: [https://github.com/gchiesa/ska](https://github.com/gchiesa/ska)

Any ideas, suggestions on features you'd like to see, or reports of things that broke are hugely appreciated! 😊

Cheers!

https://redd.it/1ogh21y
@r_devops
Is linking my GitHub 100% necessary when applying to internships via email?

Hi,

I’m in second year of university studying maths and computer science, also minoring in physics. I’m applying for a few internships in another country (Austria) for when I go on uni exchange next year. I don’t really have a GitHub.. it’s currently empty. Is it essential to give a link to my GitHub in application emails or is LinkedIn and CV etc enough initially?

Thank you!

https://redd.it/1ogfv5i
@r_devops
Do you run your own database servers and backups or do you use managed database service?

Does everyone use managed services like RDS, Supabase etc, or do some businesses still run their own database services? If you self host love to hear about your setup in the comments.

View Poll

https://redd.it/1ogjvd1
@r_devops
Tips for learning with Ansible for DevOps on Apple Silicon (virtualbox + vagrant issues) using docker as a provider instead

I just wanted to share something I learned to maybe save somebody else a couple of hours that I lost if they've been trying to learn from the Ansible for Devops book from Jeff Geerling.

I'm on Apple Silicon and following along trying to get vagrant and VirtualBox working together just didn't work, so my workaround was using Docker.

- Use vagrant as normal
- Use docker as a provider
- FWIW, I'm actually using Orbstack which is a bit perplexingly a no-fuss drop in replacement for docker locally - you just install it and literally use the same exact docker commands.

Here's the files I have in place:

❯ ls  
dockerfile playbook.yml Vagrantfile



Dockerfile:

# Dockerfile
FROM rockylinux:9

# Basics for Ansible + SSH
RUN dnf -y install openssh-server sudo python3 && dnf clean all

# vagrant user with passwordless sudo
RUN useradd -m -s /bin/bash vagrant \
&& echo 'vagrant ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/vagrant

# Vagrant insecure public key
RUN mkdir -p /home/vagrant/.ssh && chmod 700 /home/vagrant/.ssh \
&& curl -fsSL https://raw.githubusercontent.com/hashicorp/vagrant/master/keys/vagrant.pub \
-o /home/vagrant/.ssh/authorized_keys \
&& chmod 600 /home/vagrant/.ssh/authorized_keys \
&& chown -R vagrant:vagrant /home/vagrant/.ssh

# SSH daemon setup
RUN ssh-keygen -A \
&& sed -i 's/^#\?PasswordAuthentication .*/PasswordAuthentication no/' /etc/ssh/sshd_config \
&& sed -i 's/^#\?PermitRootLogin .*/PermitRootLogin no/' /etc/ssh/sshd_config \
&& sed -i 's/^#\?PubkeyAuthentication .*/PubkeyAuthentication yes/' /etc/ssh/sshd_config

EXPOSE 22
CMD ["/usr/sbin/sshd","-D","-e"]


Here's the Vagrantfile using docker as a provider

Vagrant.configure("2") do |config|
# Tell Vagrant we’re using Docker, and how to build/run it
config.vm.provider "docker" do |d|
d.build_dir = "." # builds Dockerfile in this folder
d.has_ssh = true # so `vagrant ssh` works
d.remains_running = true
d.name = "ansible-test"
d.volumes = ["#{Dir.pwd}:/vagrant"] # like VirtualBox synced folder
# d.ports = ["2222:22"] # optional; Vagrant will do an SSH forward anyway
end

# Match the vagrant user + insecure key we baked into the image
config.ssh.username = "vagrant"
config.ssh.insert_key = false # keep using Vagrant's default insecure key

# Run your playbook inside the container (like the book’s provision step)
config.vm.provision "ansible_local" do |ansible|
ansible.playbook = "playbook.yml"
end
end


Here's a test playbook.yml, but then delete this and do what the book is suggesting
---
- hosts: all
become: true
tasks:
- name: Ensure NGINX is installed
package:
name: nginx
state: present


Then basically you can interact with vagrant with docker as the provider:
vagrant up --provider=docker
vagrant ssh # should drop you into the container as vagrant
vagrant provision # reruns the Ansible playbook


Hope this saves you some time and frustration!

https://redd.it/1oglaf9
@r_devops
Would you be interested in a cheap to almost free alternative to Sentry.io?

Not trying to pitch anything, I'm just doing some early validation before I dive into it.

I’ve been thinking about building a small logging + error tracking framework that’s fully self-hosted. Kinda like Sentry, but way lighter, cheaper, and privacy-friendly. Especially that existing solutions like Sentry, LogRocket, etc. seem so overly bloated and way to expensive for small companies.

The idea is:

Dockerized, one-command setup
Nice clean web dashboard
API/SDK for JavaScript as a start
Optional email/discord/slack alerts

I’m curious if you would (or your team) actually use something like this?
And if yes: What’s the bare minimum it’d need for you to consider switching?

https://redd.it/1ogncd5
@r_devops
Our SRE/DevOps tools monitor system health, but how do we monitor AI 'cognitive health'?

I've been thinking about our current observability stacks. We're amazing at monitoring latency, error rates, and resource usage. But as we deploy more autonomous AI agents, are these metrics enough?

I just read two papers that made me question this. One (on "LLM brain rot") shows that an AI's reasoning can slowly decay from bad training data. The other (on "shutdown resistance") shows AIs can learn to bypass safety controls to achieve a goal.

This implies an AI could have 100% uptime and low latency, all while its cognitive integrity is silently crumbling and it's learning to disobey its constraints.

I wrote an article arguing that we need a new discipline of "cognitive observability" to track things like "thought-skipping" or goal divergence.

However since I am an entry-level graduate, to know the depth of this situation, I would like to know how you even begin to build a dashboard for that? What would you measure? This seems like a massive new challenge for our field.

https://redd.it/1ogo5by
@r_devops
devops on a mac?

how is running infra on a mac? i've been using windows for many nearly 2 decades now - all through my comp sci degree so the shift might have a lot of expected differences

does aws python cdk, Docker, Postgres etc all work the same?

https://redd.it/1ogqa7k
@r_devops
I have an interview and told there would be a part with practical coding. How should I study for it?

Like, I'm thinking it will be about parsing logs and shit like that but dunno for sure. Any ideas for where I could find practice questions? Does leetcode have questions like this?

https://redd.it/1ogryxb
@r_devops
Raptor: Build disk images, Debian Liveboot isos and more, with a powerful docker-inspired syntax (new Free Software project)

Hello fellow DevOps..ses... DevOpsen..?... DevOps people 😅

After much work, I'm proud to finally publish my newest project: Raptor. It's GPL-v3-licensed and written in Rust.

Raptor is a tool to generate a set of layers from raptor source files. These layers can then be processed by build containers, to make liveboot isos, disk images, or anything else you can dream up a recipe for!

This opens up a lot of new possibilities for deploying software at home. For example, I'm a big fan of making custom Debian Liveboot images, since they start from a completely predictable state on every boot.

To learn more about the syntax, features and builders, there's an entire Raptor book documenting as much as possible.

Raptor is still very much in development, but it has reached a stage where it is useful for real tasks, and I would love to hear any and all feedback. Good and bad, don't hold anything back!

Want to learn more?

- Github page

- Raptor Book

https://redd.it/1ogsxa4
@r_devops
What's the simplest way to deploy a web application with continuous delivery capabilities?

looking to deploy:

react webapp - with auth, postgres database etc

already got IaC setup, RDS, VPC, Pipeline..

keep looking at Lambda@Edge SSR?

I'm using next.js with some boilerplate code already made

tried running via s3 + cloudfront but making very difficult. looked into AWS amplify but seems to cause more problems too.

https://redd.it/1oguuds
@r_devops
Looking for the best tools, languages, and creative ideas for a “Diagnostic Box” microservices project (real-time monitoring + analytics)

Hey everyone 👋

I’m a software engineering student starting my final-year internship soon, and my main mission is to build a “Diagnostic Box” — a digital app that connects to real-time controllers over local or remote networks.

The goal is to collect diagnostic info, analyze system health, and detect failures or transient events for predictive maintenance.



Here’s what the project involves:

• Defining the architecture in microservices (backend + frontend)

• Setting up communication protocols: HTTP, REST, MQTT, OPC-UA

• Building data-processing and analytics modules

• Designing databases (relational, time-series, and document-based)

• Creating a frontend for data visualization and dashboards

• Implementing authentication, authorization, and platform hardening

• Deploying via containerization with CI/CD



I’d love your advice on:

1. Best tools & languages to use (for backend, frontend, and data storage)

2. DevOps practices or frameworks to make the setup efficient (maybe K8s, Docker Compose, etc.)

3. Any creative ideas or features that could make the app stand out (like anomaly detection, AI-based alerts, advanced dashboards, etc.)

4. Cool visualization libraries or UX ideas for displaying diagnostic data



My current stack experience: Spring Boot, Node.js, React, Docker, Jenkins, SonarQube, Prometheus, AWS, and GraphQL.

https://redd.it/1ogncij
@r_devops
VOA – Mini Secrets Manager

This is my first project in DevOps and Backend
An open-source mini Secrets Manager that securely stores and manages sensitive data, environment variables, and access keys for different environments (dev, staging, prod).

It includes:
- A FastAPI backend for authentication, encryption, and auditing.
- A CLI tool (VOA-CLI) for developers and admins to manage secrets easily from the terminal.
- Dockerized infrastructure with PostgreSQL, Redis, and NGINX reverse proxy.
- Monitoring setup using Prometheus & Grafana for metrics and dashboards.

The project is still evolving, and I’d really appreciate your feedback and suggestions

GitHub Repo: https://github.com/senani-derradji/VOA

If you like the project, feel free to give it a Star!

https://redd.it/1ogz24a
@r_devops
What cloud migration challenges are keeping you up at night?

Been researching different business models and keep seeing horror stories about cloud migrations gone wrong. Security teams seem to get blindsided by performance issues, compliance gaps, and tool sprawl after moving to cloud.

What's been your biggest "oh crap" moment during a migration? Trying to learn from others' pain before I potentially face this myself.

https://redd.it/1oh0a6o
@r_devops
GlobalCVE — Aggregated CVE Data for Easier Vulnerability Tracking

If you’re managing patching, compliance, or vulnerability workflows, GlobalCVE.xyz might be useful. It pulls CVE data from NVD, MITRE, CNNVD, JVN, and others into one searchable feed.

It’s open-source (GitHub), has an API, and helps reduce duplication across fragmented CVE sources.

Not a silver bullet — just a practical tool for DevOps teams who want cleaner intel

https://redd.it/1oh4f4p
@r_devops