Reddit DevOps – Telegram
Do you think we need a CNPG open source restore manager?

I was wondering that if there is a need for an oss alternative to kasten or similar(well in this limited sense at least) that can recover your CNPG cluster and perform automated DR drills. I asked something similar in a postgresql community and got crickets.. I persoanally envision something like a report being sent to me with a checkbox: yep your org will survive this.
Every project I surveyed does the backup, none the guarentee to restore and automated DR drills.

https://redd.it/1pwt143
@r_devops
”Aspiring to Secretless Machine-to-Machine Authentication and Authorization” question

So I came across this article

https://medium.com/@jaredhatfield/aspiring-to-secretless-machine-to-machine-authentication-and-authorization-70df900cb1e1

I like the idea of a cetralized Authentication and Authorization service. The article also float the concept of a Workload Identity that will issue keys for the workload so a solutions microservices do not need to hold keys. But it fails to explain exactly how such a system would be implemented. Wouldnt it just be another place where pub/private keys need to be stored and rotated?

I think this is something similar to IAM that AWS is using. But how would one translate this concept to on-premise machines?
My requirements are a purely on-prem setup, with as much open source tools as possible - like KeyCloak or other for the OAuth Server.

https://redd.it/1pwui9k
@r_devops
First job, no senior, already responsible for everything

I have just graduated and this is my first job ever. The company has just opened a branch in my country, so everything is barely established (HR, R&D team, infrastructure, etc.)

They handed me a project and paired me with another guy who’s also a fresher. The project is basically migrating the company's Windows app to the web. We are in charge of everything, from setting up the database host machine, git, writing APIs to designing the UI, testing and delivery.

We have no senior engineer to review our code or showing us how things should be done properly. The bright side is that I get to touch and learn a lot of things, but I am worried I will end up picking up lots of bad habits and practices.

I’m not sure if this is a great opportunity or a risky situation for someone at the very start of their career. How do I avoid building bad habits when there’s no senior guidance. What should I focus on to make sure I’m actually learning in the right direction? I’d really appreciate advices from you guys.

https://redd.it/1pwtnud
@r_devops
CosmosCost - unified cloud cost tracking for AWS, GCP & Azure

Hey everyone 👋

After internally testing it with some mid-large size companies, today I'm launching [https://cosmoscost.com](https://cosmoscost.com) \- a cloud cost management platform I built after getting fed up with juggling separate billing dashboards for AWS, GCP, and Azure.

**The Problem**

If you run multi-cloud infrastructure, you know the pain:

* AWS calls them "EC2 Instances", GCP says "Compute Engine", Azure has "Virtual Machines" - same thing, zero clarity on comparative costs
* Surprise charges from idle resources every month
* Exporting to spreadsheets that go stale overnight

**What I Built**

* Unified dashboard across all three major cloud providers
* Unified terminology - EC2, Compute Engine, and VMs all show as "Compute Instances" so you can actually compare apples to apples
* Privacy-first AI insights - runs 100% locally in your browser using WebGPU (your data never leaves your device)
* Easy reporting

Would love feedback from anyone dealing with multi-cloud cost chaos. What features would make this a must-have for your stack?



🔗 [https://cosmoscost.com](https://cosmoscost.com)

https://redd.it/1pwvg5w
@r_devops
Building a new cli that simplifies and enhances functionality of official cli of GCP, AWS, Azure

I’m planning to build a new cli that wraps official APIs of GCP, AWS, and Azure to simplify and enhance the functionality of official cli.

Things like better logs with easier filters, faster changing projects or profiles, and more.

It will be written in Golang, so it will run faster than the official cli tools which are written in Python.

Any feedback or what features you’d like to see?

https://redd.it/1pwxock
@r_devops
3+ years DevOps experience, still underpaid — looking for blunt feedback

I’ve got 3+ years of DevOps experience. After a 6-month gap, I joined a startup where I worked on containerizing open-source apps, Docker/K8s deployments, and supervised services supporting AI agent training. That role didn’t last, and now I’m doing a mix of QA + some dev + infra work.

I’ll be upfront: I used ChatGPT to tighten the wording here, but the situation is 100% real.

I’m currently in an on-site role, around 42k/month, and working ~1000 km away from my hometown. The instability + pay mismatch is starting to wear me down. I keep seeing people with similar experience landing solid DevOps roles (including remote US-based ones), and I’m clearly missing something.

What I’d appreciate:

What should I fix first — skills, positioning, or proof of work?

What actually helped you move up in DevOps?

Any platforms or strategies that worked for landing remote roles?


Not looking for sympathy — just blunt, practical advice.

https://redd.it/1pwyx4a
@r_devops
What's the best free site to easily apply for remote devops positions?

By easily I mean you just upload your resume and click apply and move on to the next job post, instead of being required to sign up/register and fill in endless forms about my experience, only to be asked to upload my resume again.

https://redd.it/1pwymen
@r_devops
hetzner-k3s v2.4.4 is out - Open source tool for Kubernetes on Hetzner Cloud

For those not familiar with it, it's by far the easiest way to set up cheap Kubernetes on Hetzner Cloud. The tool is open source and free to use, so you only pay for the infrastructure you use. This new version improves network requests handling when talking to the Hetzner Cloud API, as well as the custom local firewall setup for large clusters. Check it out! https://hetzner-k3s.com/

If you give it a try, let me know how it goes. If you have already used this tool, I'd appreciate some feedback. :)

If have chosen other tools over hetzner-k3s, I would love to learn about them and why you chose them, so that I can improve the tool or the documentation etc.

https://redd.it/1px0nqn
@r_devops
Is it normal to see KubeAstronaut-level candidates applying to junior DevOps roles, while experienced tech leads struggle to pass CKS?

Do certifications actually signal skill anymore, or are they just one narrow metric that doesn’t reflect seniority? and if it doesn't then how do you know that person is actually decent at what he is doing?

https://redd.it/1px30ow
@r_devops
Secrets in Docker

I am deploying a web application whose backend (FastAPI) requires the use of AWS credentials. I was using a .env file to store the credentials as environment variables, but my credentials got leaked in dockerhub and now I got a bill for it. Anyway, I tried using a .dockerignore file to ignore the .env file, and then create the .env file once I pulled the image of the backend in my EC2 instance, however, the container seems not to use this file to create environment variables, but most importantly, I would like to know how experienced cloud engineers deal with this problem!

https://redd.it/1px4b0p
@r_devops
I built a tool for learning PROMQL effectively using a scenario based mechanism

My team recently moved from New Relic to OTEL. So, I decided to build a tool for my team to learn PROMQL by going through several common scenarios.


Try it: https://promql-playground.vercel.app/
Github: https://github.com/rohitpotato/promql-playground

Appreciate any feedback.


https://redd.it/1px6rsx
@r_devops
Study group for DevOps/SRE/System Design

Is there any study group where you guys discuss topics related to DevOps/SRE/System Design? Exclusively for interview preparations for senior roles?

https://redd.it/1pxahvr
@r_devops
I really need honest advice and help

Hi everyone,
I currently work as a Product Support Engineer somewhat similar to SRE, and I’m trying to transition into DevOps. With the amount of information out there it’s honestly overwhelming, and I sometimes wonder if I’m starting too late.

Background-wise, I studied Computer Science at university and did some freelance web development, though I wouldn’t call myself a strong coder. I can still build things and I’m familiar with Python and JavaScript, along with common frameworks (not heavy on algorithms). I recently passed the AWS Cloud Practitioner exam and I’m now studying for the Solutions Architect Associate. I’ve also learned Docker, GitHub Actions, and have hands-on exposure to cloud and tooling.

I feel like I’m doing bits of everything and not sure if I’m on the right path. Given my background, I’d really appreciate advice on what I should focus on first, what to strengthen, and how to move forward toward a solid DevOps role.

https://redd.it/1px3znb
@r_devops
🔥1
Can NGINX support mTLS and Basic Auth in parallel for Prometheus API access?

In our AWS EKS cluster, NGINX is deployed in front of the Prometheus API.

Currently, access is protected using mTLS, where both the client and the server authenticate using certificates.

We want to support two parallel authentication methods on NGINX:

One specific team should authenticate only with username and password (Basic Auth),

While other teams should authenticate only with mTLS (client certificates).


Is it possible to configure NGINX so that both authentication methods work in parallel, without disabling mTLS, and without making Prometheus insecure?

If yes, what is the recommended and secure way to configure this in NGINX?

https://redd.it/1pxdxrg
@r_devops
ANN - Simple: Observability

👋🏻 Hi folks,

I've created an simple observability dashboard that can be run via docker and configured to check your healthz endpoints for some very simple and basic data.

Overview: Simple: Observability
Dashboard: Simple: Observability Dashboard

Sure, there's heaps of other apps that do this. This was mainly created because I wanted to easily see the "version" of an microservice in large list of microservices. If one version is out (because a team deployed over your code) then the entire pipeline might break. This gives an easy visual indication of environments.

The trick is that I have a very specific schema which the healthz endpoint needs to return which my app can parse and read.

Hope this helps anyone wanting to get a simple way to control their microservice versions of observability 🌞

https://redd.it/1pxej1d
@r_devops
I built a supervisor like system using Rust

I run a few projects using supervisor manage small services that my app needs. they run on tiny machines (e.g. 512M ram). Supervisor had been a challenge in this case.

I started https://github.com/wushilin/processmaster

It is a daemon manager like supervisor (CLI, WEB), typically uses less than 1M memory and almost no CPU resources, purely event driven, no busy loops anywhere.

Feature wise I would say it is 1:1 comparable to supervisor but I would like to share:
1. cronjob support built in

2. supports one time provisioning triggers (e.g. set net_bind flag on your caddy binary so it can run as non-root and still bind to 443)

3. cgroup v2 based, resource constraint per service (CPU, RAM, SWAP IO-Ratio), or all service together (the processmaster) is possible.

4. Support launching your process any way you like. background, foreground, forking, as long as you like. We track your process by your cgroup id and we can always find it and stop it when you asked.

5. But it only run on linux with cgroup v2 support. For Ubuntu, at least Ubuntu22, RHEL or similar, at least 8 or newer.

And I have been using it for a few weeks, so far so good. Appreciate any feedback!



https://redd.it/1pxm7jd
@r_devops
How I added LSP validation/autocomplete to FluxCD HelmRelease values

The feedback loop on Flux HelmRelease can be painful. Waiting for reconciliation just to find out there's a typo in the values block.

This is my first attempt at technical blogging, showing how we can shift-left some of the burden while still editing. Any feedback on the post or the approach is welcome!

Post: https://goldenhex.dev/2025/12/schema-validation-for-fluxcd-helmrelease-files/

https://redd.it/1pxluyr
@r_devops
Beta testers Needed - Raw screen captures to documentation with annotated screenshots with AI

Hi everyone,

Video Demo Link: https://streamable.com/n171if

I realize this sub sees a thousand "AI writes your docs" posts a week. Usually, they are either RAG platforms (chatting with a trannoscript) or browser extensions that use the browser DOM while you click.

2 things that I think are important with this that help both creators and end users.

1. **For the Creators (Raw Video / Zero Friction):** Unlike tools that force you to install a browser extension or record in a specific "mode," you just upload a **raw screen recording** (MP4/MOV). Use OBS, QuickTime, or whatever you want. The AI "watches" the video to handle the "Visual Tax"—smart cropping, zooming, and highlighting the clicks automatically.

2. **For the End User (The "Anti-Video" Experience):** Users hate scrubbing through a 5-minute video just to find a 10-second answer. This synthesizes that raw recording into a **precision visual guide** (HTML or GIF). They can skim, find the exact step they need, and get back to work without hitting play.

**Honest opinions and feedback: **

I’m looking for honest feedback.

* Does this specific workflow (Raw Video → Visual Guide) actually solve a pain point for you?
* Are there tools already doing this well? (If I'm reinventing the wheel, please tell me).

I’d be happy to hear if this is useful, or if you aren't interested at all. If you *are* curious, I'm looking for 5-10 beta testers to try breaking it.



DM me if interested.

Thanks!

https://redd.it/1pxpyl5
@r_devops
Any good cloud provider in europe

Hello devops
For you is there a good cloud provider
That provide the same services than Azure GCP AWS, but in europe (and that is not expansive as hell) ?
(With the same uprate also 99.99)
Thanks

https://redd.it/1pxql86
@r_devops
Gitlab CI GPG Signing

I have a self hosted Gitlab instance, I want a series of jobs that sign tag/commit changes as part of the release process, but I am currently hitting an issue with \`gpg: signing failed: Not a tty\` does anyone know how to work around?

I have created an Access token and assigned it a GPG Public Key via the API.

My Projects have a 'main' branch that is protected with only changes coming via merge request.

There are series of jobs that trigger if a branch has the 'release' prefix, these will perform the release process. Which involves tagging the build and altering the project version.

I want the CI to sign its tagging and commits and push them into the release branch. The last stage of the release process is to open a merge request so a person can review the CI changes before they are pulled into main. This way the normal release processes can complete but every bot change has to undergo a review before its merged.

I am trying to use language/alpine images as a base (e.g. maven:3.9.11-eclipse-temurin-25-alpine), using alpine as a standard for noscripting and trying to avoid specialised docker images I have to maintain.

I have managed to get the GPG key imported via noscripting, but when the maven release process runs I am getting the following error:

[INFO] 11/17 prepare:scm-commit-release
[INFO] Checking in modified POMs...
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'add' '--' 'pom.xml'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'rev-parse' '--show-prefix'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'status' '--porcelain' '.'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[WARNING] Ignoring unrecognized line: ?? .gitlab-ci.settings.xml
[WARNING] Ignoring unrecognized line: ?? .m2/
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'commit' '--verbose' '-F' '/tmp/maven-scm-1813294456.commit'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 53.857 s
[INFO] Finished at: 2025-12-27T23:51:34Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.1.1:prepare (default-cli) on project resources: Unable to commit files
[ERROR] Provider message:
[ERROR] The git-commit command failed.
[ERROR] Command output:
[ERROR] error: gpg failed to sign the data:
[ERROR] [GNUPG:] KEY_CONSIDERED <removed valid key> 2
[ERROR] [GNUPG:] BEGIN_SIGNING H10
[ERROR] [GNUPG:] PINENTRY_LAUNCHED 343 curses 1.3.1 - - - - 0/0 0
[ERROR] gpg: signing failed: Not a tty
[ERROR] [GNUPG:] FAILURE sign 83918950
[ERROR] gpg: signing failed: Not a tty
[ERROR]
[ERROR] fatal: failed to write commit object

Before Script logic currently used:

- |-
- apk add --no-cache curl git
- |-
if [[ ! -z $SERVICE_ACCOUNT_NAME ]]; then
apk add --no-cache git;
git config --global user.name "${SERVICE_ACCOUNT_NAME}"
else
git config --global user.name "${GITLAB_USER_NAME}"
fi
- |-
if [[ ! -z $SERVICE_ACCOUNT_EMAIL ]]; then
git config --global user.email "${SERVICE_ACCOUNT_EMAIL}"
elif [[ ! -z $SERVICE_ACCOUNT_NAME ]]; then
git config --global user.email "${SERVICE_ACCOUNT_NAME}@noreply.${CI_SERVER_HOST}"
else
git config --global user.name "${GITLAB_USER_EMAIL}"
fi
- |-
if [[ ! -z $SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY ]]; then
apk add --no-cache gnupg keychain
1
gpg-agent gpg-agent pinentry pinentry-tty
GPG_OPTS='--pinentry-mode loopback'
gpg --batch --import $SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY
PRIVATE_KEY_ID=$(gpg --list-packets "$SERVICE_ACCOUNT_GNUGP_PRIVATE_KEY" | awk '$1=="keyid:"{print$2}' | head -1)
git config --global user.signingkey "$PRIVATE_KEY_ID"
git config --global commit.gpgsign true
git config --global tag.gpgSign true
fiI have a self hosted Gitlab instance, I want a series of jobs that sign tag/commit changes as part of the release process, but I am currently hitting an issue with `gpg: signing failed: Not a tty` does anyone know how to work around?I have created an Access token and assigned it a GPG Public Key via the API.My Projects have a 'main' branch that is protected with only changes coming via merge request.There are series of jobs that trigger if a branch has the 'release' prefix, these will perform the release process. Which involves tagging the build and altering the project version.I want the CI to sign its tagging and commits and push them into the release branch. The last stage of the release process is to open a merge request so a person can review the CI changes before they are pulled into main. This way the normal release processes can complete but every bot change has to undergo a review before its merged.I am trying to use language/alpine images as a base (e.g. maven:3.9.11-eclipse-temurin-25-alpine), using alpine as a standard for noscripting and trying to avoid specialised docker images I have to maintain.I have managed to get the GPG key imported via noscripting, but when the maven release process runs I am getting the following error:[INFO] 11/17 prepare:scm-commit-release
[INFO] Checking in modified POMs...
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'add' '--' 'pom.xml'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'rev-parse' '--show-prefix'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'status' '--porcelain' '.'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[WARNING] Ignoring unrecognized line: ?? .gitlab-ci.settings.xml
[WARNING] Ignoring unrecognized line: ?? .m2/
[INFO] Executing: /bin/sh -c cd '/builds/devsecops/maven/maven-site-resources' && 'git' 'commit' '--verbose' '-F' '/tmp/maven-scm-1813294456.commit'
[INFO] Working directory: /builds/devsecops/maven/maven-site-resources
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 53.857 s
[INFO] Finished at: 2025-12-27T23:51:34Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.1.1:prepare (default-cli) on project resources: Unable to commit files
[ERROR] Provider message:
[ERROR] The git-commit command failed.
[ERROR] Command output:
[ERROR] error: gpg failed to sign the data:
[ERROR] [GNUPG:] KEY_CONSIDERED E41746688317921E0CF04D50749A11E721B0DCAE 2
[ERROR] [GNUPG:] BEGIN_SIGNING H10
[ERROR] [GNUPG:] PINENTRY_LAUNCHED 343 curses 1.3.1 - - - - 0/0 0
[ERROR] gpg: signing failed: Not a tty
[ERROR] [GNUPG:] FAILURE sign 83918950
[ERROR] gpg: signing failed: Not a tty
[ERROR]
[ERROR] fatal: failed to write commit objectBefore Script logic currently used:- |-
- apk add --no-cache curl git
- |-
if [[ ! -z $SERVICE_ACCOUNT_NAME ]]; then
apk add --no-cache git;
git config --global user.name "${SERVICE_ACCOUNT_NAME}"
else
git config --global user.name "${GITLAB_USER_NAME}"
fi
- |-
if [[ ! -z $SERVICE_ACCOUNT_EMAIL ]]; then