Reddit DevOps – Telegram
Tips for learning with Ansible for DevOps on Apple Silicon (virtualbox + vagrant issues) using docker as a provider instead

I just wanted to share something I learned to maybe save somebody else a couple of hours that I lost if they've been trying to learn from the Ansible for Devops book from Jeff Geerling.

I'm on Apple Silicon and following along trying to get vagrant and VirtualBox working together just didn't work, so my workaround was using Docker.

- Use vagrant as normal
- Use docker as a provider
- FWIW, I'm actually using Orbstack which is a bit perplexingly a no-fuss drop in replacement for docker locally - you just install it and literally use the same exact docker commands.

Here's the files I have in place:

❯ ls  
dockerfile playbook.yml Vagrantfile



Dockerfile:

# Dockerfile
FROM rockylinux:9

# Basics for Ansible + SSH
RUN dnf -y install openssh-server sudo python3 && dnf clean all

# vagrant user with passwordless sudo
RUN useradd -m -s /bin/bash vagrant \
&& echo 'vagrant ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/vagrant

# Vagrant insecure public key
RUN mkdir -p /home/vagrant/.ssh && chmod 700 /home/vagrant/.ssh \
&& curl -fsSL https://raw.githubusercontent.com/hashicorp/vagrant/master/keys/vagrant.pub \
-o /home/vagrant/.ssh/authorized_keys \
&& chmod 600 /home/vagrant/.ssh/authorized_keys \
&& chown -R vagrant:vagrant /home/vagrant/.ssh

# SSH daemon setup
RUN ssh-keygen -A \
&& sed -i 's/^#\?PasswordAuthentication .*/PasswordAuthentication no/' /etc/ssh/sshd_config \
&& sed -i 's/^#\?PermitRootLogin .*/PermitRootLogin no/' /etc/ssh/sshd_config \
&& sed -i 's/^#\?PubkeyAuthentication .*/PubkeyAuthentication yes/' /etc/ssh/sshd_config

EXPOSE 22
CMD ["/usr/sbin/sshd","-D","-e"]


Here's the Vagrantfile using docker as a provider

Vagrant.configure("2") do |config|
# Tell Vagrant we’re using Docker, and how to build/run it
config.vm.provider "docker" do |d|
d.build_dir = "." # builds Dockerfile in this folder
d.has_ssh = true # so `vagrant ssh` works
d.remains_running = true
d.name = "ansible-test"
d.volumes = ["#{Dir.pwd}:/vagrant"] # like VirtualBox synced folder
# d.ports = ["2222:22"] # optional; Vagrant will do an SSH forward anyway
end

# Match the vagrant user + insecure key we baked into the image
config.ssh.username = "vagrant"
config.ssh.insert_key = false # keep using Vagrant's default insecure key

# Run your playbook inside the container (like the book’s provision step)
config.vm.provision "ansible_local" do |ansible|
ansible.playbook = "playbook.yml"
end
end


Here's a test playbook.yml, but then delete this and do what the book is suggesting
---
- hosts: all
become: true
tasks:
- name: Ensure NGINX is installed
package:
name: nginx
state: present


Then basically you can interact with vagrant with docker as the provider:
vagrant up --provider=docker
vagrant ssh # should drop you into the container as vagrant
vagrant provision # reruns the Ansible playbook


Hope this saves you some time and frustration!

https://redd.it/1oglaf9
@r_devops
Would you be interested in a cheap to almost free alternative to Sentry.io?

Not trying to pitch anything, I'm just doing some early validation before I dive into it.

I’ve been thinking about building a small logging + error tracking framework that’s fully self-hosted. Kinda like Sentry, but way lighter, cheaper, and privacy-friendly. Especially that existing solutions like Sentry, LogRocket, etc. seem so overly bloated and way to expensive for small companies.

The idea is:

Dockerized, one-command setup
Nice clean web dashboard
API/SDK for JavaScript as a start
Optional email/discord/slack alerts

I’m curious if you would (or your team) actually use something like this?
And if yes: What’s the bare minimum it’d need for you to consider switching?

https://redd.it/1ogncd5
@r_devops
Our SRE/DevOps tools monitor system health, but how do we monitor AI 'cognitive health'?

I've been thinking about our current observability stacks. We're amazing at monitoring latency, error rates, and resource usage. But as we deploy more autonomous AI agents, are these metrics enough?

I just read two papers that made me question this. One (on "LLM brain rot") shows that an AI's reasoning can slowly decay from bad training data. The other (on "shutdown resistance") shows AIs can learn to bypass safety controls to achieve a goal.

This implies an AI could have 100% uptime and low latency, all while its cognitive integrity is silently crumbling and it's learning to disobey its constraints.

I wrote an article arguing that we need a new discipline of "cognitive observability" to track things like "thought-skipping" or goal divergence.

However since I am an entry-level graduate, to know the depth of this situation, I would like to know how you even begin to build a dashboard for that? What would you measure? This seems like a massive new challenge for our field.

https://redd.it/1ogo5by
@r_devops
devops on a mac?

how is running infra on a mac? i've been using windows for many nearly 2 decades now - all through my comp sci degree so the shift might have a lot of expected differences

does aws python cdk, Docker, Postgres etc all work the same?

https://redd.it/1ogqa7k
@r_devops
I have an interview and told there would be a part with practical coding. How should I study for it?

Like, I'm thinking it will be about parsing logs and shit like that but dunno for sure. Any ideas for where I could find practice questions? Does leetcode have questions like this?

https://redd.it/1ogryxb
@r_devops
Raptor: Build disk images, Debian Liveboot isos and more, with a powerful docker-inspired syntax (new Free Software project)

Hello fellow DevOps..ses... DevOpsen..?... DevOps people 😅

After much work, I'm proud to finally publish my newest project: Raptor. It's GPL-v3-licensed and written in Rust.

Raptor is a tool to generate a set of layers from raptor source files. These layers can then be processed by build containers, to make liveboot isos, disk images, or anything else you can dream up a recipe for!

This opens up a lot of new possibilities for deploying software at home. For example, I'm a big fan of making custom Debian Liveboot images, since they start from a completely predictable state on every boot.

To learn more about the syntax, features and builders, there's an entire Raptor book documenting as much as possible.

Raptor is still very much in development, but it has reached a stage where it is useful for real tasks, and I would love to hear any and all feedback. Good and bad, don't hold anything back!

Want to learn more?

- Github page

- Raptor Book

https://redd.it/1ogsxa4
@r_devops
What's the simplest way to deploy a web application with continuous delivery capabilities?

looking to deploy:

react webapp - with auth, postgres database etc

already got IaC setup, RDS, VPC, Pipeline..

keep looking at Lambda@Edge SSR?

I'm using next.js with some boilerplate code already made

tried running via s3 + cloudfront but making very difficult. looked into AWS amplify but seems to cause more problems too.

https://redd.it/1oguuds
@r_devops
Looking for the best tools, languages, and creative ideas for a “Diagnostic Box” microservices project (real-time monitoring + analytics)

Hey everyone 👋

I’m a software engineering student starting my final-year internship soon, and my main mission is to build a “Diagnostic Box” — a digital app that connects to real-time controllers over local or remote networks.

The goal is to collect diagnostic info, analyze system health, and detect failures or transient events for predictive maintenance.



Here’s what the project involves:

• Defining the architecture in microservices (backend + frontend)

• Setting up communication protocols: HTTP, REST, MQTT, OPC-UA

• Building data-processing and analytics modules

• Designing databases (relational, time-series, and document-based)

• Creating a frontend for data visualization and dashboards

• Implementing authentication, authorization, and platform hardening

• Deploying via containerization with CI/CD



I’d love your advice on:

1. Best tools & languages to use (for backend, frontend, and data storage)

2. DevOps practices or frameworks to make the setup efficient (maybe K8s, Docker Compose, etc.)

3. Any creative ideas or features that could make the app stand out (like anomaly detection, AI-based alerts, advanced dashboards, etc.)

4. Cool visualization libraries or UX ideas for displaying diagnostic data



My current stack experience: Spring Boot, Node.js, React, Docker, Jenkins, SonarQube, Prometheus, AWS, and GraphQL.

https://redd.it/1ogncij
@r_devops
VOA – Mini Secrets Manager

This is my first project in DevOps and Backend
An open-source mini Secrets Manager that securely stores and manages sensitive data, environment variables, and access keys for different environments (dev, staging, prod).

It includes:
- A FastAPI backend for authentication, encryption, and auditing.
- A CLI tool (VOA-CLI) for developers and admins to manage secrets easily from the terminal.
- Dockerized infrastructure with PostgreSQL, Redis, and NGINX reverse proxy.
- Monitoring setup using Prometheus & Grafana for metrics and dashboards.

The project is still evolving, and I’d really appreciate your feedback and suggestions

GitHub Repo: https://github.com/senani-derradji/VOA

If you like the project, feel free to give it a Star!

https://redd.it/1ogz24a
@r_devops
What cloud migration challenges are keeping you up at night?

Been researching different business models and keep seeing horror stories about cloud migrations gone wrong. Security teams seem to get blindsided by performance issues, compliance gaps, and tool sprawl after moving to cloud.

What's been your biggest "oh crap" moment during a migration? Trying to learn from others' pain before I potentially face this myself.

https://redd.it/1oh0a6o
@r_devops
GlobalCVE — Aggregated CVE Data for Easier Vulnerability Tracking

If you’re managing patching, compliance, or vulnerability workflows, GlobalCVE.xyz might be useful. It pulls CVE data from NVD, MITRE, CNNVD, JVN, and others into one searchable feed.

It’s open-source (GitHub), has an API, and helps reduce duplication across fragmented CVE sources.

Not a silver bullet — just a practical tool for DevOps teams who want cleaner intel

https://redd.it/1oh4f4p
@r_devops
Custom Internal Developer Portal IDP

I create a self-service Internal Developer Platform (IDP) dashboard that enables team to provision infrastructure and software components with ease. Built with Next.js, Express.js, PostgreSQL, and integrated with Terraform Cloud and GitHub. I am still working on it and i build this completely using Cursor AI. I would ask your suggestions how i can improve it. If anyone already working as platform engineer i would like to connect to get ideas. If you like the project please leave a start. Thanks

https://github.com/sajjadkhan12/personal-idp-dashboard.git

https://redd.it/1oh78ja
@r_devops
How do you think your role will change over the next decade, and how are you preparing for it?

Hey everyone!

I’ve been having these thoughts lately that honestly give me a bit of anxiety. We’ve all seen how fast AI has evolved. It’s not perfect, but it’s improving at an unbelievable pace.

I work in DevOps, and I think I’ve been doing fairly well so far, but I can’t help wondering how sustainable this career really is in the long run.
The demand for DevOps engineers already feels lower compared to other tech roles, and with AI slowly taking over, I sometimes wonder how long this role will stay as relevant as it is today.

On top of that, tech jobs in general don’t feel very stable. It’s not like traditional careers where you can safely work till 60. Another thing I keep thinking about is what happens over the next decade, when a large cohort of younger engineers move into senior roles. There will be a lot of people competing for management and leadership positions, and we all know not everyone is going to get them. That makes the future feel even more uncertain.

Then there’s the financial angle. The world is more debt-driven than ever. Housing prices are through the roof, and for someone like me with no family backup, taking on a 15–20 year home loan feels risky.

So I wanted to get some honest perspectives from this community:
- How much can one really rely on a DevOps career (or tech in general) for the long term?
- How do you position yourself to stay relevant and employable as the industry keeps changing?
- What’s a realistic way to build a second stream of income as a hedge? I’ve looked into a few options, but nothing has really clicked with my skills or situation so far.

Would really appreciate hearing from others who’ve had similar thoughts, or from anyone who’s found a way to deal with this uncertainty better.

https://redd.it/1ohajid
@r_devops
How do you verify vulnerability deltas between provider hardened and official upstream images?

I started benchmarking some hardened base images against their official upstreams (Ubuntu, Alpine, Debian, etc.). theoretically, CVE count drops dramatically but scanner metadata doesn’t always align. Some vulnerabilities are silently patched by upstream backports that scanners don’t recognize. Others look fixed in the hardened version but are really just suppressed by package removal. how do you objectively measure delta between a hardened image and the stock one?

https://redd.it/1ohbi0p
@r_devops
which roadmap?

Hey, I'm starting to study to become a DevOps engineer and I came to find two roadmaps, this one
Become A DevOps Engineer in 2025: \[A Practical Roadmap\](https://devopscube.com/become-devops-engineer/)
And this one from roadmap.sh
https://roadmap.sh/devops
I don't know which one to follow? Any help, please?

https://redd.it/1ohc5bp
@r_devops
Residency-first collaboration for regulated orgs: neutral notes on Gem Team

Regulated teams often need collaboration tools they can fully control. Gem Team is one example in this space - a secure B2B messenger that brings chat, voice, video, and file sharing together in one familiar workspace with enterprise-grade safeguards.

According to its docs, it supports meetings with up to 300 participants, including screen sharing, recording, and moderator roles. You also get presence indicators, message editing, delivery status, and native voice notes.

On the security side, it uses TLS 1.3, encryption at rest, and minimizes metadata. The platform runs on fail-safe clusters in Uptime Institute Tier III facilities. Deployment is flexible - on-prem, secure cloud, hybrid, or even fully air-gapped - with extras like IP masking and metadata shredding.

Data residency and lifecycle controls are customizable - you can choose where data is stored, set retention periods, and automate deletion on servers and endpoints. It aligns with ISO 27001, GDPR, and GCC regulations (including Qatar CRA).

Compared to cloud-only suites like Slack or Microsoft Teams, Gem Team focuses on data sovereignty, large meetings and recording out of the box, and no stated limits on message or file history.

https://redd.it/1ohee2r
@r_devops
Debugging LLM apps in production was harder than expected

I have been Running an AI app with RAG retrieval, agent chains, and tool calls. Recently some Users started reporting slow responses and occasionally wrong answers.

Problem was I couldn't tell which part was broken. Vector search? Prompts? Token limits? Was basically adding print statements everywhere and hoping something would show up in the logs.

APM tools give me API latency and error rates, but for LLM stuff I needed:

Which documents got retrieved from vector DB
Actual prompt after preprocessing
Token usage breakdown
Where bottlenecks are in the chain

My Solution:

Set up Langfuse (open source, self-hosted). Uses Postgres, Clickhouse, Redis, and S3. Web and worker containers.

The @observe() decorator traces the pipeline. Shows:

Full request flow
Prompts after templating
Retrieved context
Token usage per request
Latency by step

Deployment

Used their Docker Compose setup initially. Works fine for smaller scale. They have Kubernetes guides for scaling up. [Docs ](
https://langfuse.com/self-hosting)

Gateway setup

Added Anannas AI as an LLM gateway. Single API for multiple providers with auto-failover. Useful for hybrid setups when mixing different model sources.

Anannas handles gateway metrics, Langfuse handles application traces. Gives visibility across both layers. [Implementation Docs](
https://langfuse.com/integrations/gateways/anannas)

What it caught

Vector search was returning bad chunks - embeddings cache wasn't working right. Traces showed the actual retrieved content so I could see the problem.

Some prompts were hitting context limits and getting truncated. Explained the weird outputs.

Stack

Langfuse (Docker, self-hosted)
Anannas AI (gateway)
Redis, Postgres, Clickhouse

Trace data stays local since it's self-hosted.

If anyone is debugging similar LLM issues for the first timer, might be useful.

https://redd.it/1ohf70t
@r_devops
any self hostable alternatives for code rabbit??

as mentioned in the noscript im looking for open-source, self-hosted alternatives to coderabbit that can be deployed in our own cloud and integrated with openai, claude, or other ai api keys.... the reason is straightforward we’re a startup with cloud startup credits, so rather than purchasing coderabbit, we’d prefer to leverage these existing credits to run a similar solution ourselves.

https://redd.it/1oheri0
@r_devops
what Git flow for a repo of Ansible playbooks?

Hello all! I started a new contract where I have to administer a consul cluster with mainly Ansible playbooks through an awx platform.

---

Currently there is one branch per environment and there is no difference between them.

So for each evolution we merge the feature branch in each environment branch. it seems cumbersome to me. on the awx platform we have a template for each branch for deployment.

we are a team of 2 and sometimes 3 and I started to talk about tags and release/develop branch but they don't know about those concepts.

I was thinking to propose a trunk based approach with the use of rc and release tags whixill be linked to the awx templates. with only one main branch and feature branches.

our development environments could be linked to our main branch. the staging environment to a rc tag and ou production to a release tag.

also there is no pipeline today. so I also wanted to add a job to automate the updates of the awx platform to set then with the right tags to aim

---

what do you think about it?
do you have advices or other approach?

thanks!


https://redd.it/1ohcxo2
@r_devops