Reddit DevOps – Telegram
I have an interview and told there would be a part with practical coding. How should I study for it?

Like, I'm thinking it will be about parsing logs and shit like that but dunno for sure. Any ideas for where I could find practice questions? Does leetcode have questions like this?

https://redd.it/1ogryxb
@r_devops
Raptor: Build disk images, Debian Liveboot isos and more, with a powerful docker-inspired syntax (new Free Software project)

Hello fellow DevOps..ses... DevOpsen..?... DevOps people 😅

After much work, I'm proud to finally publish my newest project: Raptor. It's GPL-v3-licensed and written in Rust.

Raptor is a tool to generate a set of layers from raptor source files. These layers can then be processed by build containers, to make liveboot isos, disk images, or anything else you can dream up a recipe for!

This opens up a lot of new possibilities for deploying software at home. For example, I'm a big fan of making custom Debian Liveboot images, since they start from a completely predictable state on every boot.

To learn more about the syntax, features and builders, there's an entire Raptor book documenting as much as possible.

Raptor is still very much in development, but it has reached a stage where it is useful for real tasks, and I would love to hear any and all feedback. Good and bad, don't hold anything back!

Want to learn more?

- Github page

- Raptor Book

https://redd.it/1ogsxa4
@r_devops
What's the simplest way to deploy a web application with continuous delivery capabilities?

looking to deploy:

react webapp - with auth, postgres database etc

already got IaC setup, RDS, VPC, Pipeline..

keep looking at Lambda@Edge SSR?

I'm using next.js with some boilerplate code already made

tried running via s3 + cloudfront but making very difficult. looked into AWS amplify but seems to cause more problems too.

https://redd.it/1oguuds
@r_devops
Looking for the best tools, languages, and creative ideas for a “Diagnostic Box” microservices project (real-time monitoring + analytics)

Hey everyone 👋

I’m a software engineering student starting my final-year internship soon, and my main mission is to build a “Diagnostic Box” — a digital app that connects to real-time controllers over local or remote networks.

The goal is to collect diagnostic info, analyze system health, and detect failures or transient events for predictive maintenance.



Here’s what the project involves:

• Defining the architecture in microservices (backend + frontend)

• Setting up communication protocols: HTTP, REST, MQTT, OPC-UA

• Building data-processing and analytics modules

• Designing databases (relational, time-series, and document-based)

• Creating a frontend for data visualization and dashboards

• Implementing authentication, authorization, and platform hardening

• Deploying via containerization with CI/CD



I’d love your advice on:

1. Best tools & languages to use (for backend, frontend, and data storage)

2. DevOps practices or frameworks to make the setup efficient (maybe K8s, Docker Compose, etc.)

3. Any creative ideas or features that could make the app stand out (like anomaly detection, AI-based alerts, advanced dashboards, etc.)

4. Cool visualization libraries or UX ideas for displaying diagnostic data



My current stack experience: Spring Boot, Node.js, React, Docker, Jenkins, SonarQube, Prometheus, AWS, and GraphQL.

https://redd.it/1ogncij
@r_devops
VOA – Mini Secrets Manager

This is my first project in DevOps and Backend
An open-source mini Secrets Manager that securely stores and manages sensitive data, environment variables, and access keys for different environments (dev, staging, prod).

It includes:
- A FastAPI backend for authentication, encryption, and auditing.
- A CLI tool (VOA-CLI) for developers and admins to manage secrets easily from the terminal.
- Dockerized infrastructure with PostgreSQL, Redis, and NGINX reverse proxy.
- Monitoring setup using Prometheus & Grafana for metrics and dashboards.

The project is still evolving, and I’d really appreciate your feedback and suggestions

GitHub Repo: https://github.com/senani-derradji/VOA

If you like the project, feel free to give it a Star!

https://redd.it/1ogz24a
@r_devops
What cloud migration challenges are keeping you up at night?

Been researching different business models and keep seeing horror stories about cloud migrations gone wrong. Security teams seem to get blindsided by performance issues, compliance gaps, and tool sprawl after moving to cloud.

What's been your biggest "oh crap" moment during a migration? Trying to learn from others' pain before I potentially face this myself.

https://redd.it/1oh0a6o
@r_devops
GlobalCVE — Aggregated CVE Data for Easier Vulnerability Tracking

If you’re managing patching, compliance, or vulnerability workflows, GlobalCVE.xyz might be useful. It pulls CVE data from NVD, MITRE, CNNVD, JVN, and others into one searchable feed.

It’s open-source (GitHub), has an API, and helps reduce duplication across fragmented CVE sources.

Not a silver bullet — just a practical tool for DevOps teams who want cleaner intel

https://redd.it/1oh4f4p
@r_devops
Custom Internal Developer Portal IDP

I create a self-service Internal Developer Platform (IDP) dashboard that enables team to provision infrastructure and software components with ease. Built with Next.js, Express.js, PostgreSQL, and integrated with Terraform Cloud and GitHub. I am still working on it and i build this completely using Cursor AI. I would ask your suggestions how i can improve it. If anyone already working as platform engineer i would like to connect to get ideas. If you like the project please leave a start. Thanks

https://github.com/sajjadkhan12/personal-idp-dashboard.git

https://redd.it/1oh78ja
@r_devops
How do you think your role will change over the next decade, and how are you preparing for it?

Hey everyone!

I’ve been having these thoughts lately that honestly give me a bit of anxiety. We’ve all seen how fast AI has evolved. It’s not perfect, but it’s improving at an unbelievable pace.

I work in DevOps, and I think I’ve been doing fairly well so far, but I can’t help wondering how sustainable this career really is in the long run.
The demand for DevOps engineers already feels lower compared to other tech roles, and with AI slowly taking over, I sometimes wonder how long this role will stay as relevant as it is today.

On top of that, tech jobs in general don’t feel very stable. It’s not like traditional careers where you can safely work till 60. Another thing I keep thinking about is what happens over the next decade, when a large cohort of younger engineers move into senior roles. There will be a lot of people competing for management and leadership positions, and we all know not everyone is going to get them. That makes the future feel even more uncertain.

Then there’s the financial angle. The world is more debt-driven than ever. Housing prices are through the roof, and for someone like me with no family backup, taking on a 15–20 year home loan feels risky.

So I wanted to get some honest perspectives from this community:
- How much can one really rely on a DevOps career (or tech in general) for the long term?
- How do you position yourself to stay relevant and employable as the industry keeps changing?
- What’s a realistic way to build a second stream of income as a hedge? I’ve looked into a few options, but nothing has really clicked with my skills or situation so far.

Would really appreciate hearing from others who’ve had similar thoughts, or from anyone who’s found a way to deal with this uncertainty better.

https://redd.it/1ohajid
@r_devops
How do you verify vulnerability deltas between provider hardened and official upstream images?

I started benchmarking some hardened base images against their official upstreams (Ubuntu, Alpine, Debian, etc.). theoretically, CVE count drops dramatically but scanner metadata doesn’t always align. Some vulnerabilities are silently patched by upstream backports that scanners don’t recognize. Others look fixed in the hardened version but are really just suppressed by package removal. how do you objectively measure delta between a hardened image and the stock one?

https://redd.it/1ohbi0p
@r_devops
which roadmap?

Hey, I'm starting to study to become a DevOps engineer and I came to find two roadmaps, this one
Become A DevOps Engineer in 2025: \[A Practical Roadmap\](https://devopscube.com/become-devops-engineer/)
And this one from roadmap.sh
https://roadmap.sh/devops
I don't know which one to follow? Any help, please?

https://redd.it/1ohc5bp
@r_devops
Residency-first collaboration for regulated orgs: neutral notes on Gem Team

Regulated teams often need collaboration tools they can fully control. Gem Team is one example in this space - a secure B2B messenger that brings chat, voice, video, and file sharing together in one familiar workspace with enterprise-grade safeguards.

According to its docs, it supports meetings with up to 300 participants, including screen sharing, recording, and moderator roles. You also get presence indicators, message editing, delivery status, and native voice notes.

On the security side, it uses TLS 1.3, encryption at rest, and minimizes metadata. The platform runs on fail-safe clusters in Uptime Institute Tier III facilities. Deployment is flexible - on-prem, secure cloud, hybrid, or even fully air-gapped - with extras like IP masking and metadata shredding.

Data residency and lifecycle controls are customizable - you can choose where data is stored, set retention periods, and automate deletion on servers and endpoints. It aligns with ISO 27001, GDPR, and GCC regulations (including Qatar CRA).

Compared to cloud-only suites like Slack or Microsoft Teams, Gem Team focuses on data sovereignty, large meetings and recording out of the box, and no stated limits on message or file history.

https://redd.it/1ohee2r
@r_devops
Debugging LLM apps in production was harder than expected

I have been Running an AI app with RAG retrieval, agent chains, and tool calls. Recently some Users started reporting slow responses and occasionally wrong answers.

Problem was I couldn't tell which part was broken. Vector search? Prompts? Token limits? Was basically adding print statements everywhere and hoping something would show up in the logs.

APM tools give me API latency and error rates, but for LLM stuff I needed:

Which documents got retrieved from vector DB
Actual prompt after preprocessing
Token usage breakdown
Where bottlenecks are in the chain

My Solution:

Set up Langfuse (open source, self-hosted). Uses Postgres, Clickhouse, Redis, and S3. Web and worker containers.

The @observe() decorator traces the pipeline. Shows:

Full request flow
Prompts after templating
Retrieved context
Token usage per request
Latency by step

Deployment

Used their Docker Compose setup initially. Works fine for smaller scale. They have Kubernetes guides for scaling up. [Docs ](
https://langfuse.com/self-hosting)

Gateway setup

Added Anannas AI as an LLM gateway. Single API for multiple providers with auto-failover. Useful for hybrid setups when mixing different model sources.

Anannas handles gateway metrics, Langfuse handles application traces. Gives visibility across both layers. [Implementation Docs](
https://langfuse.com/integrations/gateways/anannas)

What it caught

Vector search was returning bad chunks - embeddings cache wasn't working right. Traces showed the actual retrieved content so I could see the problem.

Some prompts were hitting context limits and getting truncated. Explained the weird outputs.

Stack

Langfuse (Docker, self-hosted)
Anannas AI (gateway)
Redis, Postgres, Clickhouse

Trace data stays local since it's self-hosted.

If anyone is debugging similar LLM issues for the first timer, might be useful.

https://redd.it/1ohf70t
@r_devops
any self hostable alternatives for code rabbit??

as mentioned in the noscript im looking for open-source, self-hosted alternatives to coderabbit that can be deployed in our own cloud and integrated with openai, claude, or other ai api keys.... the reason is straightforward we’re a startup with cloud startup credits, so rather than purchasing coderabbit, we’d prefer to leverage these existing credits to run a similar solution ourselves.

https://redd.it/1oheri0
@r_devops
what Git flow for a repo of Ansible playbooks?

Hello all! I started a new contract where I have to administer a consul cluster with mainly Ansible playbooks through an awx platform.

---

Currently there is one branch per environment and there is no difference between them.

So for each evolution we merge the feature branch in each environment branch. it seems cumbersome to me. on the awx platform we have a template for each branch for deployment.

we are a team of 2 and sometimes 3 and I started to talk about tags and release/develop branch but they don't know about those concepts.

I was thinking to propose a trunk based approach with the use of rc and release tags whixill be linked to the awx templates. with only one main branch and feature branches.

our development environments could be linked to our main branch. the staging environment to a rc tag and ou production to a release tag.

also there is no pipeline today. so I also wanted to add a job to automate the updates of the awx platform to set then with the right tags to aim

---

what do you think about it?
do you have advices or other approach?

thanks!


https://redd.it/1ohcxo2
@r_devops
Did you have to leetcode to get your DevOps role and was it worth it (i.e. financially)?

I have never had to leetcode for my DevOps jobs in the past 10 years. However, none of what I’ve ever done is more than 30% noscripting/coding. I have learnt typenoscript and go just to stay competitive but no one ever tested me on it. That being said, I’m working in a LCOL region of the US and I’m in the top percentile of this region. It’s not bad. I get envious at the FAANG income-earners from time to time but I largely can’t complain. Anybody else see benefits from learning leetcode for this field in particular?

https://redd.it/1ohk7dn
@r_devops
Monitoring Jenkins Nodes with Datadog

Hi Community,

We have a Jenkins controller connected to multiple build nodes.
I’d like to monitor the health and performance of these nodes using Datadog.

I’ve explored the available Jenkins metrics and events, but haven’t been able to find a clear way to capture node-level metrics (such as connectivity, availability, or job execution health) through Datadog.

Has anyone implemented Datadog monitoring for Jenkins nodes successfully?
If so, could you please share how you achieved it or point me toward relevant configuration steps or documentation?

Appreciate any guidance or best practices you can provide!

Thanks,

https://redd.it/1ohl2v1
@r_devops
AWS Apprunner - impossible to deploy with - how do you use it??

[](https://www.reddit.com/r/aws/?f=flair_name%3A%22containers%22)trying to develop on app runner, cdk, python etc. w/ a webapp react and nextjs and node server and docker

keep running into "An error occurred (InvalidRequestException) when calling the StartDeployment operation: Can't start a deployment on the specified service, because it isn't in RUNNING state. "

you would think you can just cancel the deployment, but it is fully greyed out - can't do anything and its just hanging with very limited logging.

how do you properly develop on this thing?

https://redd.it/1ohnrse
@r_devops