Reddit DevOps – Telegram
DevOps Vouchers Extension

Hi

I bought a DevOps foundation and SRE exam voucher from the DevOps institute back in 2022.
A few life events happened and I wasn't able to give the exam. I'd like to attempt the exams now.

The platform was webassessor back then. Now i think its peoplecert.

I emailed their customer support and the people cert team picked up stating they have no records of my purchase.

I can provide the receipt emails, voucher codes and my email id for proof of payments.

Any one who encountered such an issue before or knows how to resolve?

Will really appreciate because its around $400 of hard earned money



https://redd.it/1qllumj
@r_devops
From DevOps Engineer to Consultant

Has anyone in Europe gone from a DevOps engineer role to work self employed in Europe? How easy or difficult is it? Any tips on how to do the change?

https://redd.it/1qlmufo
@r_devops
curl killed their bug bounty because of AI slop. So what’s your org’s “rate limit” for human attention?

curl just shut down their bug bounty program because they were getting buried in low-quality AI “vuln reports.”

This feels like alert fatigue, but for security intake. If it’s basically free to generate noise, the humans become the bottleneck, everyone stops trusting the channel, and the one real report gets lost in the pile.

How are you handling this in your org? Security side or ops side. Any filters/gating that actually work?

Source: https://github.com/curl/curl/pull/20312

https://redd.it/1qlqgnt
@r_devops
Udemy course recommendations for a graduate platform enginner

hi all, I'll be starting my first job as a graduate platform engineer soon

so i would like enquire about what udemy courses would you recommend to get a graduate platform engineer up to speed as fast as possible, as they are to many courses on udemy to choose from.

all recommendations and advice is greatly appreciated, thanks

https://redd.it/1qlug67
@r_devops
A CLI to Tame OWASP Dependency-Track Version Sprawl in CI/CD

Like many of you, I struggled with automating Dependency-Track. Using curl was messy, and my dashboard was flooded with hundreds of "Active" versions from old CI builds, destroying my metrics.

I built a small CLI tool (Go) to solve this. It handles the full lifecycle in one command:

* Uploads the SBOM.
* Tags the new version as Latest.
* Auto-archives old versions (sets active: false) so only the deployed version counts toward risk scores.

It’s open source and works as a single binary. Hope it saves you some bash-noscripting headaches!

Repo: [https://github.com/MedUnes/dtrack-cli](https://github.com/MedUnes/dtrack-cli)

https://redd.it/1qm066u
@r_devops
Is there any useful tool that allows you to test your kubernetes configs without deploying or running it locally?

Is there any useful tool that allows you to test your kubernetes configs without deploying or running it locally? I am wondering if there's anything like that, because I have a large config with a lot of resources.

https://redd.it/1qm89l8
@r_devops
How should i pivot to devops, without losing half my salary?

Hey guys,

Here’s my situation. I’m currently working as a Cloud Engineer, mostly with IaaS, PaaS and IaC. I’ve been in the cloud space for about a year now, and overall I have around 5–6 years of IT experience.

In the cert side, i have AZ-900, AZ-104, AZ-305, and AZ-400

In my current role I worked my way up to a medior level, but my real goal is to move into DevOps. I know that means I need solid Docker and Kubernetes knowledge, so I’ve started learning and practicing them in my limited free time. I’ve even built some small projects already.

The problem is that my current salary is around standard market level, which is great, but when I apply for DevOps roles, I usually run into two outcomes:

1, I don’t even get invited to an interview,

2, I get an interview, but they offer me about half my current salary because they would hire me as a junior DevOps engineer due to my lack of hands-on experience with Docker and Kubernetes.

Right now I simply can’t afford to cut my salary in half. On top of that, my current company doesn’t really use Docker or Kubernetes, so I don’t have the chance to gain real work experience with them.

I know the market is shit for switching jobs right now, but living in a country where salaries are already much lower than in most of Europe makes this even more frustrating. Honestly, it’s hard to see a clear way forward.

What would you do in my situation? How would you successfully pivot into DevOps without taking such a big financial step back? Any advice would be really appreciated.

https://redd.it/1qmcemd
@r_devops
Ingress NGINX retires in March, no more CVE patches, ~50% of K8s clusters still using it

Talked to Kat Cosgrove (K8s Steering Committee) and Tabitha Sable (SIG Security) about this. Looks like a ticking bomb to me, as there won't be any security patches.

TL;DR: Maintainers have been publicly asking for help since 2022. Four years. Nobody showed up. Now they're pulling the plug.

It's not that easy to know if you are running it. There's no drop-in replacement, and a migration can take quite a bit of work.

Here is the interview if you want to learn more https://thelandsca.pe/2026/01/29/half-of-kubernetes-clusters-are-about-to-lose-security-updates/

https://redd.it/1qqkqzn
@r_devops
Observability is great but explaining it to non-engineers is still hard

We’ve put a lot of effort into observability over the years - metrics, logs, traces, dashboards, alerts. From an engineering perspective, we usually have good visibility into what’s happening and why.

Where things still feel fuzzy is translating that information to non-engineers. After an incident, leadership often wants a clear answer to questions like “What happened?”, “How bad was it?”, “Is it fixed?”, and “How do we prevent it?” - and the raw observability data doesn’t always map cleanly to those answers.

I’ve seen teams handle this in very different ways:

curated executive dashboards, incident summaries written manually, SLOs as a shared language, or just engineers explaining things live over zoom.

For those of you who’ve found this gap, what actually worked for you?

Do you design observability with "business communication" in mind, or do you treat that translation as a separate step after the fact?

https://redd.it/1qqfjzu
@r_devops
Yet another Lens / Kubernetes Dashboard alternative

Me and the team at Skyhook got frustrated with the current tools - Lens, openlens/freelens, headlamp, kubernetes dashboard... all of them we found lacking in various ways. So we built yet another and thought we'd share :)

Note: this is not what our company is selling, we just released this as fully free OSS not tied to anything else, nothing commercial.

Tell me what you think, takes less than a minute to install and run:

https://github.com/skyhook-io/radar

https://redd.it/1qqk10r
@r_devops
our ci/cd testing is so slow devs just ignore failures now"

we've got about 800 automated tests running in our ci/cd pipeline and they take forever. 45 minutes on average, sometimes over an hour if things are slow.

worse than the time is the flakiness. maybe 5 to 10 tests fail randomly on each run, always different ones. so now devs just rerun the pipeline and hope it passes the second time. which obviously defeats the purpose.

we're trying to do multiple deploys per day but the qa stage has become the bottleneck. either we wait for tests or we start ignoring failures which feels dangerous.

tried parallelizing more but we hit resource limits. tried being more selective about what runs on each pr but then we miss stuff. feels like we're stuck between slow and unreliable.

anyone solved this? need tests that run fast, don't fail randomly, and actually catch real issues.

https://redd.it/1qr00b5
@r_devops
made one rule for PRs: no diagram means no review. reviews got way faster.

tried a small experiment on our repo. every PR needed a simple flow diagram, nothing fancy, just how things move. surprisingly, code reviews became way easier. fewer back-and-forths, fewer “wait what does this touch?” moments. seeing the flow first changed how everyone read the code.

curious if anyone else here uses diagrams seriously in dev workflows??



https://redd.it/1qr131v
@r_devops
Build once, deploy everywhere and build on merge.

Hey everyone, I'd like to ask you a question.

I'm a developer learning some things in the DevOps field, and at my job I was asked to configure the CI/CD workflow. Since we have internal servers, and the company doesn't want to spend money on anything cloud-based, I looked for as many open-source and free solutions as possible given my limited knowledge.

I configured a basic IaC with bash noscripts to manage ephemeral self-hosted runners from GitHub (I should have used GitHub's Action Runner Controller, but I didn't know about it at the time), the Docker registry to maintain the different repository images, and the workflows in each project.

Currently, the CI/CD workflow is configured like this:

A person opens a PR, Docker builds it, and that build is sent to the registry. When the PR is merged into the base branch, Docker deploys based on that built image.

But if two different PRs originating from the same base occur, if PR A is merged, the deployment happens with the changes from PR A. If PR B is merged later, the deployment happens with the changes from PR B without the changes from PR A, because the build has already happened and was based on the previous base without the changes from PR A.

For the changes from PR A and PR B to appear in a deployment, a new PR C must be opened after the merge of PR A and PR B.

I did it this way because, researching it, I saw the concept of "Build once, deploy everywhere".

However, this flow doesn't seem very productive, so researching again, I saw the idea of ​​"Build on Merge", but wouldn't Build on Merge go against the Build once, deploy everywhere flow?

What flow do you use and what tips would you give me?

https://redd.it/1qqhrbs
@r_devops
ECR alternative

Hey all,

We’ve been using AWS ECR for a while and it was fine, no drama. Now I’m starting work with a customer in a regulated environment and suddenly “just a registry” isn’t enough.

They’re asking how we know an image was built in GitHub Actions, how we prove nobody pushed it manually, where scan results live, and how we show evidence during audits. With ECR I feel like I’m stitching together too many things and still not confident I can answer those questions cleanly.

Did anyone go through this? Did you extend ECR or move to something else? How painful was the migration and what would you do differently if you had to do it again?

https://redd.it/1qr2zq2
@r_devops
What internal tool did you build that’s actually better than the commercial SaaS equivalent?

I feel like the market is flooded with complex platforms, but the best tools I see are usually the noscripts and dashboards engineers hack together to solve a specific headache.
​Who here is building something on the side (or internally) that actually works?

https://redd.it/1qr4ipm
@r_devops
Argo CD Image updater with GAR

Hii everyone! I need help finding the resources related to ArgoCD image updater with Google artifact registry also whole setup if possible I read official docs , It has detialied steps with ACR on Azure but couldn't find specifically for GCP can anyone suggest any good blog related to this setup or maybe give a helping hand ..

https://redd.it/1qr6j5n
@r_devops
AGENTS.md for tbdflow: the Flowmaster

I’ve been experimenting with something a bit meta lately: giving my CLI tool a Skill.

A Skill is a formal, machine-readable denoscription of how an AI agent should use a tool correctly. In my case, I wrote a SKILL.md for tbdflow, a CLI that enforces Trunk-Based Development.

One thing became very clear very quickly:
as soon as you put an AI agent in the loop, vagueness turns into a bug.

Trunk-Based Development only works if the workflow is respected. Humans get away with fuzzy rules because we fill in gaps with judgement, but agents don’t. They follow whatever boundaries you actually draw, and if you are not very explicit of what _not_ to do; they will do it...

The SKILL.md for tbdflow does things like:

Enforce short-lived branches
Standardise commits
Reduce Git decision-making
Maintain a fast, safe path back to trunk (main)

What surprised me was how much behavioural clarity and explicitness suddenly matters when the “user” isn’t human.

Probably something we should apply to humans as well, but I digress.

If you don’t explicitly say “staging is handled by the tool”, the agent will happily reach for git add.

And that is because I (the skill author) didn’t draw the boundary.

Writing the Skill forced me to make implicit workflow rules explicit, and to separate intent from implementation.

From there, step two was writing an AGENTS.md.

`AGENTS.md` is about who the agent is when operating in your repo: its persona, mission, tone, and non-negotiables.

The final line of the agent contract is:

>Your job is not to be helpful at any cost.

>Your job is to keep trunk healthy.

Giving tbdflow a Skill was step one, giving it a Persona and a Mission was step two.

Overall, this has made me think of Trunk-Based Development less as a set of practices and more as something you design for, especially when agents are involved.

Curious if others here are experimenting with agent-aware tooling, or encoding DevOps practices in more explicit, machine-readable ways.

SKILL.md:

https://github.com/cladam/tbdflow/blob/main/SKILL.md

AGENTS.md:

https://github.com/cladam/tbdflow/blob/main/AGENTS.md

https://redd.it/1qr76ye
@r_devops
Python Crash Course Notebook for Data Engineering

Hey everyone! Sometime back, I put together a crash course on Python specifically tailored for Data Engineers. I hope you find it useful! I have been a data engineer for 5+ years and went through various blogs, courses to make sure I cover the essentials along with my own experience.

Feedback and suggestions are always welcome!

📔 Full Notebook: Google Colab

🎥 Walkthrough Video (1 hour): YouTube \- Already has almost 20k views & 99%+ positive ratings

💡 Topics Covered:

1. Python Basics \- Syntax, variables, loops, and conditionals.

2. Working with Collections \- Lists, dictionaries, tuples, and sets.

3. File Handling \- Reading/writing CSV, JSON, Excel, and Parquet files.

4. Data Processing \- Cleaning, aggregating, and analyzing data with pandas and NumPy.

5. Numerical Computing \- Advanced operations with NumPy for efficient computation.

6. Date and Time Manipulations\- Parsing, formatting, and managing date time data.

7. APIs and External Data Connections \- Fetching data securely and integrating APIs into pipelines.

8. Object-Oriented Programming (OOP) \- Designing modular and reusable code.

9. Building ETL Pipelines \- End-to-end workflows for extracting, transforming, and loading data.

10. Data Quality and Testing \- Using `unittest`, `great_expectations`, and `flake8` to ensure clean and robust code.

11. Creating and Deploying Python Packages \- Structuring, building, and distributing Python packages for reusability.

Note: I have not considered PySpark in this notebook, I think PySpark in itself deserves a separate notebook!

https://redd.it/1qr93s8
@r_devops
Devops Project Ideas For Resume

Hey everyone! I’m a fresher currently preparing for my campus placements in about six months. I want to build a strong DevOps portfolio—could anyone suggest some solid, resume-worthy projects? I'm looking for things that really stand out to recruiters. Thanks in advance!

https://redd.it/1qr5t6q
@r_devops
How do you track and manage expirations at scale? (certs, API keys, licenses, etc.)

Hey folks,

I’m curious how other teams handle time-bound assets in real life. Things like:

* TLS certificates
* API keys and credentials
* Licenses and subnoscriptions
* Domains
* Contracts or compliance documents

In theory this stuff is simple. In practice, I’ve seen outages, broken pipelines, access loss, and last minute fire drills because something expired and nobody noticed in time.

I’ve worked in a few DevOps and SRE teams now, and I keep seeing the same patterns:

* spreadsheets that slowly rot
* shared calendars nobody owns
* reminder emails that get ignored
* “Oh yeah, X was supposed to renew that”
* "There is too much tools for that and people don't communicate properly on the new time-bound assets or the new places where they are used"

So I wanted to ask the community:

**How are you handling this today?**

Some specific questions I’m really interested in:

* Where do you store expiration info? Code, CMDB, wiki, spreadsheet, somewhere else?
* Do you track ownership or is it mostly implicit?
* How far in advance do you alert, if at all?
* Are expirations tied into incident response or ticketing?
* What’s broken for you today that you’ve just learned to live with?

I’m especially curious how this scales once you’re dealing with:

* multiple teams
* multiple cloud providers
* audits and compliance requirements
* people rotating in and out

If you’ve had a failure caused by an expiration, I’d love to hear what happened and what you changed afterward, if anything.

Context: I’m a DevOps engineer myself. After getting burned by this problem a few too many times, I ended up building a small tool focused purely on expiration lifecycle management. I won’t pitch it here unless people ask. The goal of this post is genuinely to learn how others are solving this today.

Looking forward to the war stories and lessons learned.

https://redd.it/1qrdfm8
@r_devops
Resources for Debugging Best Practices

Do you guys have any books, papers, videos or other resources to develop a more disciplined or systematic approach to debugging, either in the infrastructure / system space or just general software development? I feel like I spend a huge amount of time debugging, and while learning through experience is great, I’d love to know if there were any books that you found useful.

Edit: when I say debugging I guess I should broaden it to also include like troubleshooting — debug suggest mostly code or terraform files or something, but maybe there’s more basic principles to think about

https://redd.it/1qreise
@r_devops