Reddit DevOps – Telegram
How do I optimise wasted runs on github actions

This is from one repo that has not been that active in the last 7 days :

\- 39 total CI minutes

\- 14 minutes were non-productive

\- Biggest driver: failed/re-run workflows and Duplicate runs for the same PR



We always assumed “this is normal, but with billing changes, it adds up fast.

I am looking into some tools that could help with this, but I am curious how others are handling this...

\- Do you actively cancel outdated PR runs?

\- Or just accept the cost as the price of speed?



https://redd.it/1pppfsd
@r_devops
My "just don't f***ing dance" moment: I just automated 90% of our L2 maintenance team workload and I'm keeping it to myself

I have been an early adopter of the hype around AI, like LLMs and code assistants like copilot and I was hyped too, I honestly I believed the narrative that says AI won't replace replace humans, but it's more of a productivity multiplier, something to augments devs and ops people and the boring tasks like the writing markdowns or docstrings.
Then came the agentic stuff which at first I didn't but much into because I was skeptical about vendor lock in. until MCP came along and suddenly my entire workflow at work changed (at the time i was saying it evolved but I'm not in that mood), Terraform MCP, Context 7, k8s MCP, I was very impressed but still very optimistic about the things we can do with it and how it will improve our daily lives, it connected to our actual infrastructure, it wasn't a gimmick anymore.
Then come Opus 4.5 and Gemini 3 Pro, people called them beasts so I did what I always do, I pushed the enveloppe to see how far it can go.

**And it went far ..**

I built an agentic app that monitors our nightly CI jobs, watches for failures and errors and maybe rerun if necessary or push small fixes.
it also monitors certain apps on our k8s cluster and runs the necessary fixes. These fixes aren't magic, it's everything we documented as guidelines for our L2 maintenance team. the ai agent just .. does it. sometimes better than humans, because it creates bug tickets with a level of details I have never seen from a person.

I was beyond hyped, already planning my presentations and my demo and show them how great all of this is.

**Then it hit me.**

I kept thinking about the scene from the big short when Brad Pitt as Ben told the young bankers when they started celebrating betting against the housing market where he said : *"If we're right, people lose homes. People lose jobs. People lose retirement savings, people lose pensions. You know what I hate about fucking banking? It reduces people to numbers. Here's a number - every 1% unemployment goes up, 40,000 people die, did you know that?"*

What I've made doesn't bet against the housing market, but the L2 maintenance team, actual people in our workforce, people i enjoy having coffee with.

So, I kept it to myself .. but here's the thing, I don't share it, someone else will, or some other company will propose it as a service with the promis of cost reduction and whatnot, my silence won't help anyone, it just means to me that i'm not the one to pull the trigger.

I don't have a clear idea on how to feel about this and i'm not here to moralize anyone, I'm just ... really really confused.

This technology is something genuinely amazing, but somehow, I don't fee like dancing.

**TL;DR:** Built an agentic SRE tool and it effectively was able to replace the whole L2 maintenance team. Realized the human cost and decided to keep it to myself.


UPDATE:
I will pull together my resources together and push them to s github repository, I did my tests in a GCP project, where I deployed a cloud run with a service account that has IAM rights on one of our less important gke clusters it interfaces with gemini and uses a already existing token for gitlab.
I use git mcp, kube mcp and Google Cloud observability mcp plus the gitlab python sdk.
For the guidelines, I exported them as markdown files and added them in the app to use for the context.
The bug tickets were tests tickets send via a http request to jira using my PAT.

https://redd.it/1ppr3qu
@r_devops
Terraform, Terragrunt ... and Terratest?

I'm tasked with figuring out how to integrate terratest (TT) into a moderately large terraform (TF) repo for AWS resources. The deployment and orchestration is all done with terragrunt (TG) (it passes in the variables, etc.). The organization itself has fully adopted using TG with TF.

My question to you all is about using terratest for integration testing of terraform modules that are themselves orchestrated via terragrunt. My searches for best practices, lessons learned, etc. have returned little useful results. Perhaps most telling, no reddit posts have surfaced that either promote or decry using TF+TG+TT. Even the terratest documentation on Gruntworks has zero mention of terragrunt, and there are zero examples in their provided repositories of using TG+TT.

I'm wondering if anyone has gone down this path before and has any lessons learned they could share (good or bad).

Thanks in advance

https://redd.it/1ppqsxv
@r_devops
Is this normal in Devops

I joined my organization last week as Devops intern, 2nd day worked on someones projects built a custom dashboard on cloudwatch , 3rd day got assigned in project also got every accces stage to prod + mac for working and 5 days working is this the best life ? 🤔 or am I missing something....

https://redd.it/1ppuvza
@r_devops
Jr DevOps profile. Is it enough?

Hello guys,

I am trying to get my first job in DevOps but I wonder is my profile is even eligible for a company right now. I would really like to have the opinion of the pros to see if I am the kind of person you hire for a jr role. My assets are:

Im a Telecommunications Engineer by the biggest engineering university in Spain (Madrid). I studied in Sweden for a year also, in case that counts for you.

Focus on networking and programming. I know networking and troubleshooting with WireShark and languages like Java, Python, C...

I have only 1 year of experience as an engineer. In a very big tech company, doing things that are hardly related to devOps. I have good referals from my former colleagues at the job.

I just got AWS Cloud Practitioner Certificate.

Now I know this is enough to be hired here, but i am trying to move to another country in EU and I am not sure if this is enough to get interviews. I dont even care about the money right now, i just want to start.

On the meanwhile I am working on small projects on Linux and learning basic devops skills, and see if I can make myself a repository...






https://redd.it/1ppvrjk
@r_devops
At what headcount did you feel you lost the "Ground Truth" of your engineering org?

There seems to be a specific breaking point in engineering orgs.

When we were 20 people, I knew everyone’s name, their strengths, and exactly what they shipped yesterday.
Now that we are pushing 60+, I feel like I’m relying entirely on layers of management (Directors/EMs) to tell me what’s going on. It feels like a game of "Broken Telephone"—by the time the signal hits my desk, it’s polished, biased, and often late.

I’m trying to avoid hiring a Chief of Staff (feels expensive right now), but I need a way to get raw visibility without micromanaging or skipping levels.

How do you guys stay plugged into the raw data (Jira/Git signals) without becoming a micromanager?

https://redd.it/1ppwxli
@r_devops
Migrating from AppDynamics to Datadog

Im wondering if anyone has done a migration from AppDynamics to Datadog and can provide some insight into best practices for noscripting this. I need to parse existing AppDynamics agent config.xml files, pull relevant fields, and place those into the new Datadog agent yaml config file when it is installed.

https://redd.it/1ppy748
@r_devops
Need help for a stack of a saap that have the potential to be a supperapp , priority is performance , responce speed not animation and useless features that will slow down my app

i have an idea of saas and i'm searching for tecknologies to build this and make it in real , but i have some confusions , my priority is performance and user experiance because it have the potential to be superapp .So what frontend teck should i use. Also, in the backend i want to use node.js(express) and fastapi for ml tasks is it the best option with rest api and json data format for dabases i will use postgresql , mongodb and redis

https://redd.it/1ppx81x
@r_devops
What unfinished side-project are you hoping to finally finish over the holidays?

With the holidays coming up, I'm curious what side-projects everyone has sitting in the "almost done” (or "started... then life happened”) pile.

It Could be:

A repo that's 80% complete
An app missing "just one more feature”
A tool you built for yourself that never got polished
Something you want to open-source but haven't yet

What is it, and what's stopping you from finishing it?

Bonus points if you drop a link or explain what "done” actually looks like for you.

Hoping this thread gives some motivation (and maybe accountability) to finally ship something before the new year.

https://redd.it/1pq2zhu
@r_devops
The Future of Kubernetes Networking: Gateway API Explained

Hi All,

I put together a video explaining Gateway API purely from an architectural and mental-model perspective (no YAML deep dive, no controller comparison).

Video: The Future of Kubernetes Networking: Gateway API Explained

Your feedback is welcome, comments (Good & Bad) are welcome as well :-)


Cheers

https://redd.it/1pq4vkq
@r_devops
Looking for Career Advice


Hello, everyone.

I don’t know where to begin with, but I’ll try. I want to learn Devops for the long-term, however it seems there are programming courses in my city, but they also promise hiring you if you end up being the best one. The programming courses have 3 phrases, each month is 110$, my salary is around 650$ in my country.

Currently, i don’t know what to do? Save money to learn Devops (each month - 210$) orrr go for the programming course and if i perform the best, i might end up getting hired.


https://redd.it/1pq6ng7
@r_devops
Inference is underpriced. Designing systems as if that’s permanent feels risky.

From an ops perspective, something about current AI system design feels off.



Inference for LLM-backed systems is often priced below marginal cost right now

to accelerate adoption. The gap is being covered by venture capital.



That creates incentives that look fine short-term but feel risky operationally:

\- Heavy fan-out and retry loops instead of tighter control

\- Latency + quality prioritized over efficiency

\- Deep coupling to a single provider’s API semantics

\- Little pressure to build portability, guardrails, or eval infra



We’ve seen this movie before (fiber glut, early cloud pricing).



The interesting question isn’t “is AI overhyped?”

It’s "which systems survive when pricing and providers normalize?"



Curious how other teams are thinking about this from a durability and cost

containment perspective?


Wrote up a clearer explanation + simple diagram here if helpful.

https://redd.it/1pq814z
@r_devops
Hiring JavaScript / React Developer (2+ Years Experience) | Long-Term Contract

We’re expanding our development team and are searching for a skilled JavaScript & React developer who’s interested in a long-term hourly engagement.



💼 Role Overview

Develop and enhance front-end features using React and modern JavaScript (ES6+)

Translate designs and requirements into clean, reusable components

Integrate APIs and handle dynamic data flows

Improve performance, fix bugs, and refactor existing code

Communicate progress clearly and meet agreed timelines



Requirements

Minimum 2+ years of professional experience with JavaScript and React

Strong understanding of hooks, component lifecycle, and state management

Experience working with RESTful APIs

Ability to work independently and take ownership of tasks

Clear communication and reliability

Bonus Skills

Next.js, TypeScript, Redux, or similar tools

Familiarity with Git and collaborative workflows

Eye for UI/UX details

💰 Compensation

Hourly rate: $35 – $42

Consistent workload with long-term potential

📩 To Apply

Please include:

A short intro about your experience

Relevant portfolio, GitHub, or live project links

Your availability (hours/week)

We’re looking for someone dependable who wants to grow with an ongoing project—not a short-term gig. If that’s you, let’s talk.

https://redd.it/1pqb4j2
@r_devops
Is Bare Metal Kubernetes Worth the Effort? An Engineer's Experience Report

I wrote a experience report on setting up a production-ready, high-availability k3s cluster on OVHcloud bare metal servers. My goal was to significantly reduce infrastructure costs compared to managed services like AWS EKS, and this setup costs just $178/month compared to $550+/month for a comparable cloud setup.

The post is a practical walk-through covering:

Provisioning servers and a private network with Terraform.
Building a resilient 3-node k3s control plane with HAProxy and Keepalived.
Using Cloudflare for cheap load balancing.
Securing the cluster with mTLS and Kubernetes Network Policies.

Here is the link: https://academy.fpblock.com/blog/ovhcloud-k8s/

https://redd.it/1pqby1b
@r_devops
What are some examples of devops/SRE/cloud projects to pin on GitHub?

Is having stuff on GitHub even necessary for us? I mean, what kind of stuff would be there? I just noticed that I had mostly front-end code (React), which probably made me look like a React developer, not the DevOps/SRE/cloud guy that I am. Anyway, I'm open for jobs and just wondering what works these days.

https://redd.it/1pqdsht
@r_devops
Built an open-source CLI to deterministically remove secrets from logs (no ML, no guessing)

Hi r/devops,

I’ve been working on a small open-source CLI called **LogShield**.
The idea was to explore whether **deterministic, rule-based log sanitization** can be safer than probabilistic masking when logs are shared or shipped.

Key characteristics:

* Reads from **stdin**, writes sanitized logs to **stdout**
* Explicit, inspectable rules (no ML, no heuristics)
* Same input → same output (deterministic)
* Designed to minimize false positives that break debugging
* Works as a drop-in filter in pipelines

Typical use cases I had in mind:

* Sanitizing logs before uploading CI/CD artifacts
* Preventing accidental secret leaks when logs are shared in tickets or Slack
* Pre-filtering logs before shipping to third-party services

Example:

cat app.log | logshield scan --strict > safe.log


The ruleset is intentionally conservative and fully inspectable.

I’d really appreciate feedback from a DevOps perspective on:

* Whether deterministic redaction is something you’d trust in pipelines
* Edge cases where this would break real-world workflows
* Cases where you’d prefer masking to fail *closed* vs *fail open*

Repo: [https://github.com/afria85/LogShield](https://github.com/afria85/LogShield)
Landing page: [https://logshield.dev](https://logshield.dev)

Thanks — looking forward to criticism.

https://redd.it/1pqep6a
@r_devops
Finding newbits & netnum in Terraforms cidrsubnet()

Does anyone have a quick way either within TF or externally which can take the base_cidr, your "desired cidr", and then spit out the needed newbits and netnum?

If the subnets are fairly simple I can usually just guess them and verify using the console. Anything more complex I calculate by hand.


So I'm hoping there's something more sophisticated available (short of writing my own tool).


Thanks in advance.

https://redd.it/1pqfn36
@r_devops
Confusion about the “Plan” phase in DevOps, is it official and what is it based on?

Hi everyone,
I’m studying DevOps from an academic perspective, and I’m a bit stuck on the “Plan” phase that is often shown as the first phase of the DevOps lifecycle.

Many blogs and diagrams mention phases like Plan → Code → Build → Test → Release → Deploy → Operate → Monitor.
However, I’m struggling to find clear, authoritative references (papers, books, or standards) that explicitly define:
1. What the Plan phase in DevOps exactly is.
2. What it is based on (Agile planning? business requirements? product management?)
3. Whether it is an official DevOps concept or more of a conceptual/educational abstraction.
4. How it differs from planning in Agile/Scrum.

Most explanations online are high-level blog posts, and they don’t clearly cite academic or industry sources.
If you know book, research paper, or credible industry reference, or have practical experience explaining how planning actually works in real DevOps teams.

I’d really appreciate your insights.

Thanks in advance!

https://redd.it/1pqfj53
@r_devops
How to get into cloud/devops within 2-3 years of experience in Infrastructure Administration (Virtualization)

I'm currently working in service based company and my project is basically about Virtualization using Vsphere and Nutanix, I do find Cloud Computing intersting and I've been trying to self learn, improving my bash noscripting skills by doing projects and acquiring certifications. But the issue I face is how can I transition myself from a Virtualization Engineer role to a Cloud Computing role? Without much hands on experience? Like would working on projects on my own count as one? Since every job opening require 4+ years of experience. What are the best choices I could make? Switching internally to a cloud based project and then trying to switch companies?

What could be a better roadmap to get into cloud? Cause at times i feel like I'm just going around in circles without a defenitive idea, it feels like I need to master bash and move on to auto ating things with python, learn docker, kubernetes, terraform,jenkins etc sometimes I do feel like it's overwhelming but i really wanna crack it down, i just need some advise?

Could you please help me out?

https://redd.it/1pqf0tm
@r_devops
Where can I host an API for free so a friend can pentest it?

Hey guys, I want to ask something.

I have an API built using Golang, and I want to host it so my friend can test it. He’s a pen tester, and I want to give him access to the API endpoint rather than sharing my API folders and source files right away.

The problem is, I’m not sure where to host it for free, just for testing purposes. This is mainly for security testing, not production.

Do you have any recommendations for free platforms or setups to host a Go API temporarily for testing?

Thanks in advance!

https://redd.it/1pqi9aa
@r_devops
Who's responsible for contract testing on your team?

We are just starting off with contract testing in our organization and would love your inputs on which team typically owns the effort.

View Poll

https://redd.it/1pqj775
@r_devops