Reddit DevOps – Telegram
Upgraded Github runner version - now facing build errors

We were forced to upgrade from v2.28 to v2.330 as the older one was deprecated.

However now we have dependency issues :

>C:\\Users\\ContainerAdministrator\\.nuget\\packages\\microsoft.build.sql\\0.1.14-preview\\Sdk\\Sdk.targets(22,89): error MSB4226: The imported project "C:\\Program Files (x86)\\Microsoft Visual Studio\\18\\BuildTools\\MSBuild\\Microsoft\\VisualStudio\\v16.0\\SSDT\\Microsoft.Data.Tools.Schema.SqlTasks.targets" was not found. Also, tried to find "Microsoft\\VisualStudio\\v16.0\\SSDT\\Microsoft.Data.Tools.Schema.SqlTasks.targets" in the fallback search path(s) for $(MSBuildExtensionsPath) - "C:\\Program Files (x86)\\MSBuild" . These search paths are defined in "C:\\Program Files (x86)\\Microsoft Visual Studio\\18\\BuildTools\\MSBuild\\Current\\Bin\\msbuild.exe.Config". Confirm that the path in the <Import> declaration is correct, and that the file exists on disk in one of the search paths. [C:\\home\\runner\\_work\\Main\\Main\\src\\Core.Database\\Core.Database.sqlproj\]

Any idea how I could fix this?

https://redd.it/1po2lvy
@r_devops
Do you actually trust K8s rightsizing recommendations?

Working at a bank, I've noticed teams straight up ignore cost optimization tools because the recommendations feel risky — cutting resources too aggressively can cause outages, and nobody wants to get paged at 3 am to save $50/month.

So the tools just... get ignored.

Got me thinking: would it help if a tool was explicitly asymmetric? Meaning it prioritizes "don't break anything" over "save maximum money" — recommending conservative cuts that won't cause OOMKills, even if it leaves some savings on the table.

For those managing K8s clusters:

* Do you actually follow rightsizing suggestions today?
* Would you trust a tool more if it guaranteed no under-provisioning risk?
* Or is the problem something else entirely?

Genuinely curious how others handle this tradeoff.

https://redd.it/1po3lto
@r_devops
What's your note-taking system for tech learning?

I've been jumping between note apps trying to find the "perfect" system - Notion, Obsidian, Logseq, Inkdrop, Affine... you name it, I've probably tried it.

But here's my problem: I take all these notes and then never actually remember the stuff later. I'll write detailed notes about Docker or some AWS service, then 2 weeks later I'm googling the same thing again like I never learned it.

So I'm curious:
- What note-taking app/system do you actually use?
- More importantly, how do you take notes so you actually remember things later?
- Or do you just not bother with notes and learn by doing?

Feels like I'm spending more time organizing notes than learning. Maybe I'm overthinking this whole thing?

What works for you?

https://redd.it/1po3nok
@r_devops
need grafana alternatives

Hey, good chance that i dont know how to use grafana but is there a better "logs visualizer" then it?
for context i come from uptrace, amazing frontend, but grafana has been a pita to get logs, filter etc , my other backend is victorialogs which has vlogscli, but i was hoping some something simpler like vmui for metrics, please lmk if yall know of anything.

Have a good one

https://redd.it/1po25gk
@r_devops
Github Actions introducing a per-minute fee for self-hosted runners

Github have just sent out an email announcing a $0.002/minute fee for self-hosted runners.

Just ran the numbers, and for us, that's close to $3.5k a month extra on our GitHub bill.

https://resources.github.com/actions/2026-pricing-changes-for-github-actions/

https://redd.it/1po8hj5
@r_devops
Pricing changes for GitHub Actions

On January 1, 2026, you will receive up to a 39% reduction in the net price of GitHub-hosted runners.
On March 1, 2026, we are introducing a new $0.002 per-minute GitHub Actions cloud platform charge that will apply to self-hosted runner usage. Any usage subject to this charge will count toward the minutes included in your plan.

"Please note the price for runner usage in public repositories will remain free, and there will be no changes in price structure for GitHub Enterprise Server customers"

source: https://resources.github.com/actions/2026-pricing-changes-for-github-actions/


p.s their email states 96% of users will see a cost reduction, but the actual extended link says 15%...make your own conclusions...

https://redd.it/1po92bm
@r_devops
Working for a company where people maybe don’t have that much tech knowledge

I’m not sure because I haven’t started yet but it seems they may not be so knowledgeable about current technology but maybe I’m getting the wrong impression. I know for sure I’m the only one who knows the cloud we will be using.

What are the pros and cons of working in this kinda environment?

I’m excited for how much I can be involved in but a little nervous about how much might be on my plate right away and a potential lack of onboarding/time to understand the new environment I’m in. Any tips? Thank you!

https://redd.it/1poa3nv
@r_devops
All Pods memory for a service being utilised to max regardless of less traffic

Hi all,
We use kubernetes along with Jenkins for CI.
We have a service that currently has 4 pods running and for that service it has always had its memory utilised to max capacity (the k8s resource website literally shows the memory utilisation as red marks for the pod).
I have to analyse what the main cause for this is and resolve it.

Can you please help me out here explaining how I can at least get to know the root cause of this issue?

https://redd.it/1poaov2
@r_devops
People who do on-call: assuming no MDM, do you prefer 2 separate phones, on 2 eSIMs installed into your personal phone? Why?

Assuming no MDM is required, when you’re on-call, do you prefer to have 2 physically separate phones, or a 2nd SIM/eSIM installed into your personal phone?

EDIT: meant to say “or 2 eSIMs” instead of “on”.

https://redd.it/1po97bh
@r_devops
Amazon confirms a Russian GRU unit hacked Western energy and infrastructure networks for years

Amazon confirms a Russian GRU unit hacked Western energy and infrastructure networks for years.

The threat wasn’t malware, it was silent credential theft from live traffic.

From 2021-2025, APT44 relied less on zero-days and more on exposed routers and VPN gateways

source: https://thehackernews.com/2025/12/amazon-exposes-years-long-gru-cyber.html

https://redd.it/1po8v1p
@r_devops
What’s the hardest thing to actually “see”/observe in your system, and what incident misled you the most?

TL;DR: Curious about two things: what feels basically invisible in your system even though you have monitoring, and what is the most misleading incident you have dealt with.

1. What is the hardest thing to actually see in your system today?

I do not mean “we forgot to add a metric.” I mean the things that stay fuzzy even when you are staring at all the graphs. Maybe it is concurrency weirdness that only shows up under load. Maybe it is figuring out what really changed when you have multiple deploy paths and config surfaces. Maybe it is hidden dependencies that only show up when they are on fire. For you, what is that blind spot that always makes incidents messier than they should be?

2. What is the most misleading incident you have worked?

I love the stories where all the symptoms pointed at the wrong thing. CPU looked bad but the real issue was a retry storm. Latency screamed “network” but it was actually cache. Everyone blamed the database and it turned out to be some tiny config or feature flag. You know, the “we debugged the wrong thing for three hours and only then saw it” moments.

For me it is that “what actually changed” question. I have been in situations where everyone swore nothing changed, and then three tools later we find some “small” config tweak or background job rollout that no one thought counted as a real change. On paper everything was monitored. In reality we were just poking around until someone tripped over the real diff.

That experience is what made me curious about how people actually reason during incidents, not just which tool they use.



https://redd.it/1pocyvd
@r_devops
What’s the best way to practice DevOps tools? I built something for beginners + need your thoughts

A lot of people entering DevOps keep asking the same question:
“Where can I practice CI/CD, Kubernetes, Terraform, etc. without paying for a bootcamp?”

Instead of repeating answers, I ended up building a small learning hub that has:

Free DevOps tutorials blogs
Hands-on practice challenges
Simple explanations of complex tools
Mini projects for beginners

If any of you are willing to take a look and tell me what’s good/bad/missing, I’d appreciate it:
**https://thedevopsworld.com**

Not selling anything — just trying to make a genuinely useful practice resource for newcomers to our field.
it will always remain free and with no intentions of making money.

Would love your suggestions on features, topics, or improvements, if you already tried!
future updates
We will be adding community mentoring feature
We have signed a collaboration with agentic ai for cloud deployment company to provide playground for our super.

#please don't sell anything or anyone's paid service, we respect you but the community runs on different funding model and non of it comes from users.

https://redd.it/1pog8iw
@r_devops
why is devops so hard😩

backend developer here trying to learn devops. is it just me who feels it is complex to understand devops as a beginner? isn't there an easy way to do this?

https://redd.it/1pooror
@r_devops
From C++ Terminal Tetris to Kubernetes and AI: My open source journey (60k+ stars total)

I have been writing code for many years. Recently, I looked back at my GitHub profile. The projects I led have accumulated over 60,000 stars.

I wanted to share my path and some thoughts.

The Journey

* In College: I started with C++. I wrote a Tetris game that runs entirely in the terminal. I had to handle cursor movement and color erasing manually. It was raw but fun. (Repo: `fanux/tetris`)
* Early Career: I switched to Go. I wrote lhttp, a websocket framework. (Repo: `fanux/lhttp`)
* Infrastructure Era: Later, I focused on Kubernetes. I built Sealos, a Kubernetes distribution. This was my first big project. (Repo: `labring/sealos`)
* Startup Founder: Then I started my own company. We built Laf (serverless) and FastGPT (AI knowledge base). (Repo: `labring/laf` and `labring/FastGPT`)
* Now: I am building Fulling, an AI coding tool. (Repo: `FullAgent/fulling`)

My Thoughts

Even though I am a CEO now, I still insist on doing open source. Here is what I learned:

1. The Drive: Open source is fun. Creating value for the developer community is my internal drive. It is the only reason I can keep doing this for so long.
2. The Challenge: Just pushing code to GitHub is meaningless. The hardest part is the start. You have to accumulate early users one by one. Promoting a project is a very long-term process.
3. No Shortcuts: After all these years, I still haven't found a shortcut. To make a project successful, I still have to do the "dumb" work: writing blogs, creating content, and explaining the value.

The Struggle

Honestly, it is sometimes painful. Every time I start a new project (like the current one), it feels like starting from zero. I often feel lonely because I have to do the promotion myself.

Writing code makes me happy and fulfilled. But writing code that no one uses makes me sad. So I have to force myself to do marketing, which I am not naturally good at. It is a conflict.

How do you balance the joy of coding with the pain of promotion?

https://redd.it/1poq7dn
@r_devops
MSP DevOps vs Product DevOps — I learned different things in each. How do you balance “new tech” and “deep domain”?

Hey folks,

I’m a Senior DevOps engineer and I’ve worked in both multinational managed services (MSP) companies and product-based companies. I’m not trying to start a war here 😄 — I’m genuinely curious how others handle this trade-off long term, especially if you’re thinking about business/networking in the future.

In MSPs:

I learned a lot fast (new tools, cloud stuff, CI/CD patterns, incident handling, “figure it out yesterday” mode).
Got certifications, touched many stacks, improved adaptability.
But the downsides were real: time zone work, pressure, and lots of context switching.
Projects were short or multiple projects at once, so I rarely got to learn the domain deeply. It was always “DevOps focus” more than understanding the business.

In a product company:

Much better work-life balance and personal time.
I work tasks end-to-end, and I’m finally learning the domain properly (what users need, why systems exist, how decisions affect business).
But I feel like I’m learning “new tech” slower because product teams don’t switch tools that often (which makes sense).

So I’m trying to balance:

1. staying current and sharp technically
2. building deep domain understanding
3. building relationships / networking (I want to do business in the future, and I think community matters)

Questions for you:

If you’ve done both MSP and product, did you feel the same trade-off?
How do you keep learning new tech without burning out or sacrificing family/personal time?
Any advice for networking in DevOps/infra in a genuine way (not “selling”)?

Would love to hear your experiences, especially from people who moved into consulting, freelancing, or started something on the side later.

https://redd.it/1popwk9
@r_devops
Why Kubernetes Ingress Confuses So Many Engineers (and the Mental Model That Finally Clicks)

Hi All,

I kept seeing the same confusion around Ingress:
“Is it a load balancer?”
“Is it a controller?”
“Why does it behave differently on every cluster?”

I put together a short breakdown focused on the mental model, not YAML.
It explains what Ingress really is, what it is not, and how traffic actually flows.

If this helps anyone, here’s the video: Kuberbetes Ingress Deep Dive

Cheers

https://redd.it/1pos498
@r_devops
I built a local formatting workflow to stay in control of my code

I built a local VS Code formatting and cleanup pack for my own workflow.

Over time, I realized that most formatting tools were either:

– too automatic

– too intrusive

– or hard to control once they were enabled

I wanted something explicit and predictable.

So I built a setup that works fully locally, without extensions,

and only runs when I decide to trigger it.

What it does:

– manual re-indentation (HTML, CSS, JS, JSON, Python)

– detection and cleanup of unnecessary margins (global / active file / custom selection)

– CRLF → LF normalization

– Python formatting on the active file only

– automatic timestamped backups on Ctrl+S

What it doesn’t do:

– no SaaS

– no background automation

– no forced formatting

– no Prettier or Black conflicts

– no external services

Everything runs locally through VS Code tasks and Python noscripts.

Each action is explicit, documented, and reversible.

I built this to spend less time fighting tooling

and more time actually writing code.

Sharing the result here.

https://redd.it/1pot7ti
@r_devops
Kubernetes v1.35 - full guide testing the best features with RC1 code

Since my 1.33/1.34 posts got decent feedback for the practical approach, so here's 1.35. (yeah I know it's on a vendor blog, but it's all about covering and testing the new features)

Tested on RC1. A few non-obvious gotchas:

\- Memory shrink doesn't OOM, it gets stuck. Resize from 4Gi to 2Gi while using 3Gi? Kubelet refuses to lower the limit. Spec says 2Gi, container runs at 4Gi, resize hangs forever. Use resizePolicy: RestartContainer for memory.

\- VPA silently ignores single-replica workloads. Default --min-replicas=2 means recommendations get calculated but never applied. No error. Add minReplicas: 1 to your VPA spec.

\- kubectl exec broken after upgrade? It's RBAC, not networking. WebSocket now needs create on pods/exec, not get.

Full writeup covers In-Place Resize GA, Gang Scheduling, cgroup v1 removal (hard fail, not warning), and more (including an upgrade checklist). Here's the link:

https://scaleops.com/blog/kubernetes-1-35-release-overview/

https://redd.it/1pou9ed
@r_devops
Docker just made hardened container images free and open source

Hey folks,

Docker just made **Docker Hardened Images (DHI)** free and open source for everyone.
Blog: [https://www.docker.com/blog/a-safer-container-ecosystem-with-docker-free-docker-hardened-images/]()

Why this matters:

* Secure, minimal **production-ready base images**
* Built on **Alpine & Debian**
* **SBOM + SLSA Level 3 provenance**
* No hidden CVEs, fully transparent
* Apache 2.0, no licensing surprises

This means, that one can start with a hardened base image by default instead of rolling your own or trusting opaque vendor images. Paid tiers still exist for strict SLAs, FIPS/STIG, and long-term patching, but the core images are free for all devs.

Feels like a big step toward making **secure-by-default containers** the norm.

Anyone planning to switch their base images to DHI? Would love to know your opinions!

https://redd.it/1poxncf
@r_devops
Already 1.1 YOE in DevOps/SRE — Is Switching to SDE Worth It?

I have ~1.1 YOE as **DevOps/SRE** (first job). I didn’t “choose” it intentionally — this was the offer I got.
In college I did **web dev + some DSA**, but I’m not strongly inclined toward any single path.

My concern:

* How is **long-term growth for DevOps/SRE** in **top product-based companies**?
* I keep hearing that **DSA + coding rounds are still required** even for good DevoOps/SRE roles.
* Given that, does it make sense to **revisit development**, or is it **better to stay in DevOps/SRE**, prepare DSA, and target top PBC SRE roles?

I am planning to switch and start the journey of learning again , but I feel stuck to begin with Development path along with brushing up the DevOps skills or just stay in DevOps role and aim for top companies and career growth.

I’m not emotionally attached to SDE or DevOps/SRE — I just want **strong growth, good roles, and long-term optionality**.

Would love to hear from experienced folks who’ve been in SRE / DevOps / SDE roles.

https://redd.it/1povxbz
@r_devops