Reddit DevOps – Telegram
ANN - Simple: Observability

👋🏻 Hi folks,

I've created an simple observability dashboard that can be run via docker and configured to check your healthz endpoints for some very simple and basic data.

Overview: Simple: Observability
Dashboard: Simple: Observability Dashboard

Sure, there's heaps of other apps that do this. This was mainly created because I wanted to easily see the "version" of an microservice in large list of microservices. If one version is out (because a team deployed over your code) then the entire pipeline might break. This gives an easy visual indication of environments.

The trick is that I have a very specific schema which the healthz endpoint needs to return which my app can parse and read.

Hope this helps anyone wanting to get a simple way to control their microservice versions of observability 🌞

https://redd.it/1pxej1d
@r_devops
I built a supervisor like system using Rust

I run a few projects using supervisor manage small services that my app needs. they run on tiny machines (e.g. 512M ram). Supervisor had been a challenge in this case.

I started https://github.com/wushilin/processmaster

It is a daemon manager like supervisor (CLI, WEB), typically uses less than 1M memory and almost no CPU resources, purely event driven, no busy loops anywhere.

Feature wise I would say it is 1:1 comparable to supervisor but I would like to share:
1. cronjob support built in

2. supports one time provisioning triggers (e.g. set net_bind flag on your caddy binary so it can run as non-root and still bind to 443)

3. cgroup v2 based, resource constraint per service (CPU, RAM, SWAP IO-Ratio), or all service together (the processmaster) is possible.

4. Support launching your process any way you like. background, foreground, forking, as long as you like. We track your process by your cgroup id and we can always find it and stop it when you asked.

5. But it only run on linux with cgroup v2 support. For Ubuntu, at least Ubuntu22, RHEL or similar, at least 8 or newer.

And I have been using it for a few weeks, so far so good. Appreciate any feedback!



https://redd.it/1pxm7jd
@r_devops
How I added LSP validation/autocomplete to FluxCD HelmRelease values

The feedback loop on Flux HelmRelease can be painful. Waiting for reconciliation just to find out there's a typo in the values block.

This is my first attempt at technical blogging, showing how we can shift-left some of the burden while still editing. Any feedback on the post or the approach is welcome!

Post: https://goldenhex.dev/2025/12/schema-validation-for-fluxcd-helmrelease-files/

https://redd.it/1pxluyr
@r_devops
Beta testers Needed - Raw screen captures to documentation with annotated screenshots with AI

Hi everyone,

Video Demo Link: https://streamable.com/n171if

I realize this sub sees a thousand "AI writes your docs" posts a week. Usually, they are either RAG platforms (chatting with a trannoscript) or browser extensions that use the browser DOM while you click.

2 things that I think are important with this that help both creators and end users.

1. **For the Creators (Raw Video / Zero Friction):** Unlike tools that force you to install a browser extension or record in a specific "mode," you just upload a **raw screen recording** (MP4/MOV). Use OBS, QuickTime, or whatever you want. The AI "watches" the video to handle the "Visual Tax"—smart cropping, zooming, and highlighting the clicks automatically.

2. **For the End User (The "Anti-Video" Experience):** Users hate scrubbing through a 5-minute video just to find a 10-second answer. This synthesizes that raw recording into a **precision visual guide** (HTML or GIF). They can skim, find the exact step they need, and get back to work without hitting play.

**Honest opinions and feedback: **

I’m looking for honest feedback.

* Does this specific workflow (Raw Video → Visual Guide) actually solve a pain point for you?
* Are there tools already doing this well? (If I'm reinventing the wheel, please tell me).

I’d be happy to hear if this is useful, or if you aren't interested at all. If you *are* curious, I'm looking for 5-10 beta testers to try breaking it.



DM me if interested.

Thanks!

https://redd.it/1pxpyl5
@r_devops
Any good cloud provider in europe

Hello devops
For you is there a good cloud provider
That provide the same services than Azure GCP AWS, but in europe (and that is not expansive as hell) ?
(With the same uprate also 99.99)
Thanks

https://redd.it/1pxql86
@r_devops
Gitlab CI GPG Signing

I have a self hosted Gitlab instance, I want a series of jobs that sign tag/commit changes as part of the release process, but I am currently hitting an issue with \`gpg: signing failed: Not a tty\` does anyone know how to work around?

I have created an Access token and assigned it a GPG Public Key via the API.

My Projects have a 'main' branch that is protected with only changes coming via merge request.

There are series of jobs that trigger if a branch has the 'release' prefix, these will perform the release process. Which involves tagging the build and altering the project version.

I want the CI to sign its tagging and commits and push them into the release branch. The last stage of the release process is to open a merge request so a person can review the CI changes before they are pulled into main. This way the normal release processes can complete but every bot change has to undergo a review before its merged.

I am trying to use language/alpine images as a base (e.g. maven:3.9.11-eclipse-temurin-25-alpine), using alpine as a standard for noscripting and trying to avoid specialised docker images I have to maintain.

I have managed to get the GPG key imported via noscripting, but when the maven release process runs I am getting the following error:

[INFO] 11/17 prepare:scm-commit-release
[INFO] Checking in modified POMs...
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'add' '--' 'pom.xml'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'rev-parse' '--show-prefix'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'status' '--porcelain' '.'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[WARNING] Ignoring unrecognized line: ?? .gitlab-ci.settings.xml
[WARNING] Ignoring unrecognized line: ?? .m2/
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'commit' '--verbose' '-F' '/tmp/maven-scm-1813294456.commit'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 53.857 s
[INFO] Finished at: 2025-12-27T23:51:34Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.1.1:prepare (default-cli) on project resources: Unable to commit files
[ERROR] Provider message:
[ERROR] The git-commit command failed.
[ERROR] Command output:
[ERROR] error: gpg failed to sign the data:
[ERROR] [GNUPG:] KEY_CONSIDERED <removed valid key> 2
[ERROR] [GNUPG:] BEGIN_SIGNING H10
[ERROR] [GNUPG:] PINENTRY_LAUNCHED 343 curses 1.3.1 - - - - 0/0 0
[ERROR] gpg: signing failed: Not a tty
[ERROR] [GNUPG:] FAILURE sign 83918950
[ERROR] gpg: signing failed: Not a tty
[ERROR]
[ERROR] fatal: failed to write commit object

Before Script logic currently used:

- |-
- apk add --no-cache curl git
- |-
if [[ ! -z $SERVICE_ACCOUNT_NAME ]]; then
apk add --no-cache git;
git config --global user.name "${SERVICE_ACCOUNT_NAME}"
else
git config --global user.name "${GITLAB_USER_NAME}"
fi
- |-
if [[ ! -z $SERVICE_ACCOUNT_EMAIL ]]; then
git config --global user.email "${SERVICE_ACCOUNT_EMAIL}"
elif [[ ! -z $SERVICE_ACCOUNT_NAME ]]; then
git config --global user.email "${SERVICE_ACCOUNT_NAME}@noreply.${CI_SERVER_HOST}"
else
git config --global user.name "${GITLAB_USER_EMAIL}"
fi
- |-
if [[ ! -z $SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY ]]; then
apk add --no-cache gnupg keychain
1
gpg-agent gpg-agent pinentry pinentry-tty
GPG_OPTS='--pinentry-mode loopback'
gpg --batch --import $SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY
PRIVATE_KEY_ID=$(gpg --list-packets "$SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY" | awk '$1=="keyid:"{print$2}' | head -1)
git config --global user.signingkey "$PRIVATE_KEY_ID"
git config --global commit.gpgsign true
git config --global tag.gpgSign true
fiI have a self hosted Gitlab instance, I want a series of jobs that sign tag/commit changes as part of the release process, but I am currently hitting an issue with `gpg: signing failed: Not a tty` does anyone know how to work around?I have created an Access token and assigned it a GPG Public Key via the API.My Projects have a 'main' branch that is protected with only changes coming via merge request.There are series of jobs that trigger if a branch has the 'release' prefix, these will perform the release process. Which involves tagging the build and altering the project version.I want the CI to sign its tagging and commits and push them into the release branch. The last stage of the release process is to open a merge request so a person can review the CI changes before they are pulled into main. This way the normal release processes can complete but every bot change has to undergo a review before its merged.I am trying to use language/alpine images as a base (e.g. maven:3.9.11-eclipse-temurin-25-alpine), using alpine as a standard for noscripting and trying to avoid specialised docker images I have to maintain.I have managed to get the GPG key imported via noscripting, but when the maven release process runs I am getting the following error:[INFO] 11/17 prepare:scm-commit-release
[INFO] Checking in modified POMs...
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'add' '--' 'pom.xml'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'rev-parse' '--show-prefix'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'status' '--porcelain' '.'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[WARNING] Ignoring unrecognized line: ?? .gitlab-ci.settings.xml
[WARNING] Ignoring unrecognized line: ?? .m2/
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'commit' '--verbose' '-F' '/tmp/maven-scm-1813294456.commit'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 53.857 s
[INFO] Finished at: 2025-12-27T23:51:34Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.1.1:prepare (default-cli) on project resources: Unable to commit files
[ERROR] Provider message:
[ERROR] The git-commit command failed.
[ERROR] Command output:
[ERROR] error: gpg failed to sign the data:
[ERROR] [GNUPG:] KEY_CONSIDERED E41746688317921E0CF04D50749A11E721B0DCAE 2
[ERROR] [GNUPG:] BEGIN_SIGNING H10
[ERROR] [GNUPG:] PINENTRY_LAUNCHED 343 curses 1.3.1 - - - - 0/0 0
[ERROR] gpg: signing failed: Not a tty
[ERROR] [GNUPG:] FAILURE sign 83918950
[ERROR] gpg: signing failed: Not a tty
[ERROR]
[ERROR] fatal: failed to write commit objectBefore Script logic currently used:- |-
- apk add --no-cache curl git
- |-
if [[ ! -z $SERVICE_ACCOUNT_NAME ]]; then
apk add --no-cache git;
git config --global user.name "${SERVICE_ACCOUNT_NAME}"
else
git config --global user.name "${GITLAB_USER_NAME}"
fi
- |-
if [[ ! -z $SERVICE_ACCOUNT_EMAIL ]]; then
git config --global user.email "${SERVICE_ACCOUNT_EMAIL}"
elif [[ ! -z $SERVICE_ACCOUNT_NAME ]]; then
git config --global user.email "${SERVICE_ACCOUNT_NAME}@noreply.${CI_SERVER_HOST}"
else
git config --global user.name "${GITLAB_USER_EMAIL}"
fi
- |-
if [[ ! -z $SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY ]]; then
apk add --no-cache gnupg keychain gpg-agent gpg-agent pinentry pinentry-tty
GPG_OPTS='--pinentry-mode loopback'
gpg --batch --import $SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY
PRIVATE_KEY_ID=$(gpg --list-packets "$SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY" | awk '$1=="keyid:"{print$2}' | head -1)
git config --global user.signingkey "$PRIVATE_KEY_ID"
git config --global commit.gpgsign true
git config --global tag.gpgSign true
fi

https://redd.it/1pxrhqg
@r_devops
I packaged a reusable observability stack (Grafana + Prometheus + Zabbix) to avoid rebuilding it every time

After rebuilding the same observability setup multiple times

(Grafana dashboards, Prometheus jobs, Blackbox probes, Zabbix integration),

I decided to package it into a reusable Docker Compose stack.



It includes:

\- Grafana with practical dashboards (infra + web/API checks)

\- Prometheus + Blackbox Exporter (HTTP and TCP examples)

\- Zabbix integration for host monitoring

\- Nginx reverse proxy with SSL



This is not a “click and forget” solution.

You still configure datasources and import dashboards manually,

but it gives you a clean, production-ready baseline instead of wiring everything from scratch.



I built it primarily for my own use and decided to share it.

Happy to answer technical questions or get feedback.



https://redd.it/1pxr4dc
@r_devops
Cold start VM timings, how far is it worth optimizing?

Hi folks, I’m replacing Proxmox/OpenStack with a custom-built cloud control plane and recently added detailed CLI timings, a cold VM start takes about 10s end to end (API accept \~300ms, provisioning \~1.2–3s, Ubuntu 25 boot \~7–8s). I understand this is highly workload and use-case-dependent and everyone has different needs. I can probably optimize it further, but I’m unsure whether it’s actually useful or just work for the sake of work.


From your experience, how do major public clouds compare on cold starts, and where does further optimization usually stop making sense?

https://redd.it/1pxov5c
@r_devops
ClickOps vs IaC

I get the benefits of using IaC, you get to see who changed what, the change history, etc. All with the benefits, why do people still do ClickOps though?

https://redd.it/1pxz8ge
@r_devops
How do you track IaC drifts by ClickOps?

I'm learning IaC right now. I learned that IaC often face drift problems caused by ClickOps. How do you detect the drifts? Or you just can't...?

https://redd.it/1py07yg
@r_devops
How do you decide whether to touch a risky but expensive prod service?

I’m curious how this works on other teams.

Say you have a production service that you know is overprovisioned or costing more than it should. It works today, but it’s brittle or customer facing, so nobody is eager to touch it.

When this comes up, how do you usually decide whether to leave it alone or try to change it?

Is there a real process behind that decision, or does it mostly come down to experience and risk tolerance?

Would appreciate hearing how people handle this in practice.

https://redd.it/1py0ptl
@r_devops
My experiences on the best kinds of documentation, what are yours?

Like many of you, I've often felt that writing documentation didn't serve its stated purpose.

I've heard people say "what if you get hit by a bus?", but then whatever I write becomes irrelevant before anyone reads it. Tribal knowledge and personal experience seem to matter more.

That said, I've found a few cases where documentation actually works:

Architecture diagrams \- Even when they're not 100% accurate, they help people understand the system faster than digging through config panels or code.

Quick reference for facts \- URLs for different environments, account numbers, repo names. Things you need to recall but don't use every day.

Vision/roadmap documents \- Writing down multi-year plans helps the team align on direction. Everyone reads the same thing instead of having different interpretations from meetings.

But detailed how-to guides or step-by-step procedures? Those seem to go stale immediately and nobody maintains them.

What's the most useful documentation you've seen, and what made it actually work?

https://redd.it/1py3xes
@r_devops
What is the most difficult thing you had to implement as a DevOps engineer?

I had to complete some DevOps ticket in the past, but I didn't do anything particularly difficult as I am primarily a software developer, so I was interested to know what actual DevOps engineer might do on a day-to-day basis, beyond just clearing basic infrastructure tickets.

https://redd.it/1py4eb0
@r_devops
Looking for a cheap Linux server for Spring Boot app + domain

Hi everyone,

I’m a beginner when it comes to deploying applications and servers, and I’m planning to deploy my first Spring Boot Application.

Right now I’m searching for a cheap Linux server / VPS to host a small project (nothing high-load yet). I’d appreciate recommendations for reliable low-cost providers.

I also have a few related questions:

\- Where is the best place to buy a domain name?

\- Is it reasonable to run the database on the same server as the API for a small project, or is it better to separate them from the start?

If you have any tips, warnings, or best practices to share - I’d be happy to hear them.
Thanks in advance!

https://redd.it/1py2546
@r_devops
Did DevOps Get Harder or Did We Overdo the Tools

Sometimes it feels like DevOps didn’t get harder, we just kept adding tools over time. One team on ArgoCD, another on Jenkins or GitHub Actions, workflows in Prefect, infra split between Terraform and Pulumi, monitoring across Datadog and Prometheus, plus Cosine for code navigation into daily work.

Each tool is fine on its own. Together, every deploy feels like walking through old decisions and duct tape. When something breaks, we end up debugging the toolchain more than the product.

How do you deal with this. Standardize, let teams choose, or accept the chaos.

https://redd.it/1pyek0i
@r_devops
Macbook air or pro? Urgent!!

Hello,



I currently work in AWS with networking services and I want to learn devops in upcoming days to switch to a complete devops role where learning involves setting up and running kubernetis and docker.

For this, I am buying a personal laptop where I need sufficient space to set up and run all these. Performance wise, there’s no such requirement as this is completely for learning purpose. Also, I am not sure what else I am going to need / set up during learning phase as I am unsure about devops things as of now.

Considering all these, Would Macbook air 256 GB suffice this learning requirement?

Or should I buy pro?

The thing is I am buying this from US and if I am going for air 512 gb, it’s better that I get a pro by paying a lik extra. So please help me choose between macbook air 256gb or macbook pro?

Thanks in advance!

https://redd.it/1pyftgb
@r_devops
How does the Podman team expect people to learn it?

I've been instructed by our infra team that my proposed project should be deployed with Podman (and not Docker) cause they are afraid of giving root access.

I said "no biggie" just another tool in my belt but I am quite clueless on where to start. The docs are frightingly sparse. It's even worse with Quadlets. Top 3 results on google are a reddit thread, Podman Desktop, and the podman-quadlet docs that have even less info than the podman ones.

It feels like im not in on some joke. Sure I can google tutorials (I prefer official documentation as I find tutorials too ad-hoc) but is that really everything that there is? I almost don't believe it. Does the podman team expect tech influencers to write tutorials/books based on trial and error?

https://redd.it/1pyhwak
@r_devops
😁1
Is the "DevOps" noscript just becoming a fancy name for a 24/7 Support Engineer?

I’ve been in the industry for some time, and I’m starting to worry about the direction the "DevOps" role is taking in a lot of companies. Originally, it was supposed to be about breaking down silos and shared responsibility, but in many places, it has just turned into a dumping ground for everything the dev team doesn't want to deal with.

If a deployment fails, it’s a DevOps problem. If the cloud bill is too high, it’s a DevOps problem. If a database is slow, call DevOps. We’ve gone from "building platforms" to just being the people who get paged at 3 AM because a noscript we didn't write failed in a way we couldn't predict. We are spending so much time putting out fires that we don't have the bandwidth to actually automate the systems that prevent them.

I’ve been trying to document some better boundaries and automation patterns on OrbonCloud lately.
Are we just stuck as the "everything" engineers now?

https://redd.it/1pyi2tp
@r_devops
Chainguard vs Docker HDI

Docker releasing their hardened images for free - does that affect Chainguard at all or are people fully locked in?

https://redd.it/1pyjhc7
@r_devops