NEW BOT Телеграм, страница

Reddit DevOps

What 2025 Taught Me About DevOps

As 2025 comes to an end, I’ve been reflecting on what actually changed for me as a DevOps Engineer.
\- Not the tools I use.
\- Not the certifications I hold.
But how I think about systems, failures, and tradeoffs.

This year brought up patterns I’ve now seen repeatedly across interviews, production incidents, and mentoring other engineers.

Here are the 10 lessons that stood out, starting with the one that matters most:

\- Most outages come from small configuration changes, not big mistakes, eg, AWS and Cloudflare outages.

\- Tools are just tools. Understanding systems is what separates engineers.

\- Platform engineering is becoming the default way teams scale safely.

\- GitOps is no longer optional in serious environments.

\- Certifications stopped being reliable signals on their own.

\- AI increases leverage, but only if fundamentals are solid.

\- Boring and proven technology continues to outperform trendy/fancy alternatives.

\- Cloud cost is now an engineering responsibility, not just finance.

\- Fundamentals (Linux, networking, Git, communication) outlast trends.

\- Protect your peace, or this field will eat you alive

None of these lessons came from theory.
They came from actual systems, real interviews, real failures, and real conversations with engineers at different stages of their careers.

If you’re heading into 2026 trying to decide what to focus on next, my advice is simple: Understand fundamentals, understand systems and services before tools, learn to communicate well, and remember to protect your peace.

Happy holidays, and may your deployments be ever in your favour. 🎄🧑🏽‍🎄

https://redd.it/1pw4ury
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views15:28

Reddit DevOps

What checks do you run before deploying that tests and CI won’t catch?

Curious how others handle this.

Even with solid test coverage and CI in place, there always seem to be a few classes of issues that only show up after a deploy, things like misconfigured env vars, expired certs, health endpoints returning something unexpected, missing redirects, or small infra or config mistakes.

I’m interested in what manual or pre deploy checks people still rely on today, whether that’s noscripts, checklists, conventions, or just experience.

What are the things you’ve learned to double check before shipping that tests and CI don’t reliably cover?

https://redd.it/1pw5phr
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

6 views16:28

Reddit DevOps

Scaling beyond basic VPS+nginx: Next steps for a growing Go backend?

I come from a background of working in companies with established infrastructure where everything usually just works. Recently, I've been building my own SaaS and micro-SaaS projects using Go (backend) and Angular. It's been a great learning experience, but I’ve noticed that my backends occasionally fail—nothing catastrophic, just small hiccups, occasional 500 errors, or brief downtime.

My current setup is as basic as it gets: a single VPS running nginx as a reverse proxy, with a systemd service running my Go executable. It works fine for now, but I'm expecting user growth and want to be prepared for hundreds of thousands of users.

My question is: once you’ve outgrown this simple setup, what’s the logical next step to scale without overcomplicating things? I’m not looking to jump straight into Kubernetes or a full-blown microservices architecture just yet, but I do need something more resilient and scalable than a single point of failure.

What would you recommend? I’d love to hear about your experiences and any straightforward, incremental improvements you’ve made to scale your Go applications.

Thanks in advance!

https://redd.it/1pw5uwx
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views17:28

Reddit DevOps

Migrating legacy GCE-based API stack to GKE

Hi everyone!

Solo DevOps looking for a solid starting point

I’m starting a new project where I’m essentially the only DevOps / infra guy, and I need to build a clear plan for a fairly complex setup.

Current architecture (high level)

* Java-based API services
* Running on multiple Compute Engine Instance Groups
* A dedicated HAProxy VM in front, routing traffic based on URL and request payload
* One very large MySQL database running on a GCE VM
* Several smaller Cloud SQL MySQL instances replicating selected tables from the main DB (apparently to reduce load on the primary)
* One service requires outbound internet access, so there’s a custom NAT solution backed by two GCE VMs (Cloud NAT was avoided due to cost concerns)

Target direction / my ideas so far

* Establish a solid IaC foundation using Terraform + GitHub Actions
* Design VPCs and subnetting from scratch (first time doing this for a high-load production environment)
* Build proper CI/CD for the APIs (Docker + Helm)
* Gradually migrate services to GKE, starting with the least critical ones

My concerns/open questions:

* What’s a cost-effective and low-maintenance NAT strategy in GCP for this kind of setup?
* How would you approach eliminating HAProxy in a GKE-based architecture (Ingress, Gateway API, L7 LB, etc.)?
* Any red flags in the current DB setup that should be addressed early?
* How would you structure the migration to minimize risk, given there’s no existing IaC?

If you’ve done a similar GCE → GKE migration or built something like this from scratch:

* What would you tackle first?
* Any early decisions you wish you had made differently?
* Any recommended starting point, reference architecture, or pitfalls to watch out for?

Appreciate any insights 🙏

https://redd.it/1pw7evz
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

5 views18:28

Reddit DevOps

Building my Open-Source 11labs Ops Tool: Secure Backups + Team Access

I am building an open-source, free tool to help teams manage and scale ElevenLabs voice agents safely in production.

I currently run 71 agents in production for multiple clients, and once you hit that level, some things become painful very fast: collaboration, QA, access control, backups, and compliance.

This project is my attempt to solve those problems in a clean, in-tenant way.

Advanced workflow optimization: Let senior team members run staging versions of their workflow and agent, do controlled A/B testing with real conversation QA, compare production vs. staging, and deploy changes with proper QA and approbation process.
Granular conversation access for teams: Filter and scope access by location, client, case type, etc. Session-backed permissions ensure people only see what they are authorized to see.
Advanced workflow optimization and QA: Run staging versions of agents and workflows, replay real conversations, do controlled A/B testing, compare staging vs production, and deploy changes with proper review.
Incremental backups and granular restore: Hourly, daily, or custom schedules. Restore only what you need, for example workflow or KB for a specific agent.
Agent and configuration migration: Migrate agents between accounts or batch-update settings and KBs across many agents.
Full in-tenant data sovereignty: Configs, workflows, backups, and conversation history stay in your cloud or infrastructure. No third-party egress.
Flexible deployment options: Terraform or Helm/Kubernetes Self-hosted Docker (including bare metal with NAS backups) Optional 100 percent Cloudflare Workers and Workers AI deployment

Demo (rough but shows the core inspector, workflow replay, permissions, backups, etc.):

Video: https://www.youtube.com/watch?v=Pzu2CVWnpl8

I'll push the code to GitHub early January 2026. Project name will change soon (current temp name conflicts with an existing "Eleven Guard" SSL monitoring company).

I am building this primarily for my own use, but I suspect others running ElevenLabs at scale may run into the same issues. If you have feature requests, concerns, or feel there are tools missing to better manage ElevenLabs within your company, I would genuinely love to hear about them. 😄

https://redd.it/1pwb44y
@r_devops

YouTube

11Guard - Voice AI Ops for ElevenLabs

Building an open-source, free tool new for Voice AI Ops.
• A command center for managing large-scale ElevenLabs voice agent operations
• Inspect conversations, workflows, and tool usage in real time with granular permissions
• Govern access with true in-tenant…

6 views19:28

Reddit DevOps

Guidance for my DevOps journey

Hello everyone, I'm interested in getting into DevOps but I don't know where to start, I'm currently in a private university in Berlin Germany and I'm performing bachelors of computers science, my studies stared 3 months ago, I just wanted to get a headstart in getting into DevOps early, my questions are:

1- Is there any masters field that's more preferred for getting into DevOps?

2- I keep seeing people say it's hard to get into junior DevOps jobs, so most try to get into other jobs like system administrator, and cloud related jobs, I wanted to know which ones would be best for DevOps.

3- Which languages are best for DevOps field

4- Do people work in DevOps related jobs before getting promoted and becoming a DevOps engineer, or do they just work DevOps related jobs and then apply for different companies on the basis of those other jobs as relavent experience?

5- Which skills would I need for DevOps

6- Do I need certificates for every skill? Or is job experience I'm related field enough?

Any other advice given would be helpful too

https://redd.it/1pw998h
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views20:28

Reddit DevOps

The State of DevOps Jobs in H2 2025

Hi guys, since I did an 2025 H1 report a followup was in order for the H2 period.

I'm not an expert in data analysis and I'm just getting started to get into the analysis of it all but I hope this will benefit you a bit and you'll get a sense of how the second part of this year was for the DevOps market.

https://devopsprojectshq.com/role/devops-market-h2-2025/

https://redd.it/1pwf717
@r_devops

DevOps Projects

DevOps Job Market Report H2 2025

832 jobs analyzed • $177,500 median salary • 70.6% remote work • AWS, Kubernetes, Python dominate

9 views21:28

Reddit DevOps

Im creating new app that will help to new DevOps developers better understand concepts of DevOps and how it works

So, im a passionate developer based in Lithuania and now im trying to start my own project that will help to others to better understand and use devops/ci-cd/docker instances.

The concept is here! The name is PipeViz that will be visualzing your ideas, schemas, and CI/CD pipelines that they actually are. and of course im creating GitHub,GitLab, Google auth for further implementation.

What could you add to the project? what ideas i could realize that? i know, the design maybe is suck, but im still at the beginning of it!

Now im working on the full e2e auth with Github/GitLab/Google/Apple for further work and pipelines. I wish this project has future and you will love it!

I will appreciate all ideas and fixes from the devops Community! Hope that it will be my step to real world programming!

https://redd.it/1pwesig
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

10 views23:28

Reddit DevOps

How to leverage HashiCorp Packer to automatically provision VM templates for Proxmox

Hey, my fellow engineers

I recently published a post (on medium) regarding the use of HashiCorp's Packer tool to automatically provision VM templates for Proxmox. I would greatly appreciate your feedback.

Here is the link

Thank you, and happy holidays.

https://redd.it/1pwjehy
@r_devops

Medium

Phase II — Part 1: Automating VM Provisioning in Proxmox w/ Packer

Use some Packer for a better life as a cloud/devops engineer

9 views00:28

Reddit DevOps

Joined As Devops Engineer

Hi Everyone,

I hope you all are doing well.

Recently I cleared interview and joined as Devops Engineer Intern in a company.

Please guide me:

How should I start my journey?
What should be my day-to-day activities
Any suggestions?
Any mistakes should I avoid?
How to reach from intern to in good position in this field in next 5 years?
How can I contribute to company?

https://redd.it/1pwky8r
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views01:28

Reddit DevOps

As a second year student near Hinjawadi ,pune

I am a second year student(currently in 4th sem) who is most interested in DevOps and I strongly want to do internship by end of this sem I already started with Linux and git CI/CD and also has a prior experience of hosting a website debugging it and it also has real users ... Plz help me to do correct things ....

https://redd.it/1pwplbl
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views05:28

Reddit DevOps

Built this DevOps game. Please review!

Hey guys,

I just built this simple DevOps Simulation Game: https://uptime9999.vercel.app/

Please check it out and give me some reviews. Still thinking of ideas to make it more engaging and interactive. Appreciated if received!

Play it on laptop or pc though! I haven't worked on making it playable on mobile Ul wise.

There is a software infrastructure system that you have to keep running, considering the funds you have.

https://redd.it/1pwquij
@r_devops

11 views06:28

Reddit DevOps

Supercheck.io - Built an open source alternative for running Playwright and k6 tests - self-hosted with AI features

Been working on this for a while and finally made it open source. It's a self-hosted platform for running Playwright and k6 tests from a web UI.

**What it does:**

* Write and run Playwright browser, API, and database tests
* Run k6 load tests with streaming logs
* Multi-region execution (US, EU, Asia Pacific)
* Synthetic monitoring - schedule Playwright tests to run on intervals
* AI can generate test noscripts from plain English or fix failing tests
* HTTP/Ping/Port monitors with alerting (Slack, Discord, Email, etc.)
* Status pages for incidents

Everything runs on your own servers with Docker Compose.

Took inspiration from tools like Grafana k6 Cloud and BrowserStack but wanted something self-hosted without recurring costs.

GitHub: [https://github.com/supercheck-io/supercheck](https://github.com/supercheck-io/supercheck)

Happy to answer any questions.

https://redd.it/1pwpkgr
@r_devops

GitHub

GitHub - supercheck-io/supercheck: Open Source AI-Powered Test Automation & Monitoring Platform

Open Source AI-Powered Test Automation & Monitoring Platform - supercheck-io/supercheck

11 views08:28

Reddit DevOps

Do you think we need a CNPG open source restore manager?

I was wondering that if there is a need for an oss alternative to kasten or similar(well in this limited sense at least) that can recover your CNPG cluster and perform automated DR drills. I asked something similar in a postgresql community and got crickets.. I persoanally envision something like a report being sent to me with a checkbox: yep your org will survive this.
Every project I surveyed does the backup, none the guarentee to restore and automated DR drills.

https://redd.it/1pwt143
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views09:28

Reddit DevOps

”Aspiring to Secretless Machine-to-Machine Authentication and Authorization” question

So I came across this article

https://medium.com/@jaredhatfield/aspiring-to-secretless-machine-to-machine-authentication-and-authorization-70df900cb1e1

I like the idea of a cetralized Authentication and Authorization service. The article also float the concept of a Workload Identity that will issue keys for the workload so a solutions microservices do not need to hold keys. But it fails to explain exactly how such a system would be implemented. Wouldnt it just be another place where pub/private keys need to be stored and rotated?

I think this is something similar to IAM that AWS is using. But how would one translate this concept to on-premise machines?
My requirements are a purely on-prem setup, with as much open source tools as possible - like KeyCloak or other for the OAuth Server.

https://redd.it/1pwui9k
@r_devops

Medium

Aspiring to Secretless Machine-to-Machine Authentication and Authorization

Secure authentication and authorization between microservices is straightforward in simple deployments but becomes difficult to scale…

9 views10:28

Reddit DevOps

First job, no senior, already responsible for everything

I have just graduated and this is my first job ever. The company has just opened a branch in my country, so everything is barely established (HR, R&D team, infrastructure, etc.)

They handed me a project and paired me with another guy who’s also a fresher. The project is basically migrating the company's Windows app to the web. We are in charge of everything, from setting up the database host machine, git, writing APIs to designing the UI, testing and delivery.

We have no senior engineer to review our code or showing us how things should be done properly. The bright side is that I get to touch and learn a lot of things, but I am worried I will end up picking up lots of bad habits and practices.

I’m not sure if this is a great opportunity or a risky situation for someone at the very start of their career. How do I avoid building bad habits when there’s no senior guidance. What should I focus on to make sure I’m actually learning in the right direction? I’d really appreciate advices from you guys.

https://redd.it/1pwtnud
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views11:28

Reddit DevOps

CosmosCost - unified cloud cost tracking for AWS, GCP & Azure

Hey everyone 👋

After internally testing it with some mid-large size companies, today I'm launching [https://cosmoscost.com](https://cosmoscost.com) \- a cloud cost management platform I built after getting fed up with juggling separate billing dashboards for AWS, GCP, and Azure.

**The Problem**

If you run multi-cloud infrastructure, you know the pain:

* AWS calls them "EC2 Instances", GCP says "Compute Engine", Azure has "Virtual Machines" - same thing, zero clarity on comparative costs
* Surprise charges from idle resources every month
* Exporting to spreadsheets that go stale overnight

**What I Built**

* Unified dashboard across all three major cloud providers
* Unified terminology - EC2, Compute Engine, and VMs all show as "Compute Instances" so you can actually compare apples to apples
* Privacy-first AI insights - runs 100% locally in your browser using WebGPU (your data never leaves your device)
* Easy reporting

Would love feedback from anyone dealing with multi-cloud cost chaos. What features would make this a must-have for your stack?

🔗 [https://cosmoscost.com](https://cosmoscost.com)

https://redd.it/1pwvg5w
@r_devops

Cosmoscost

CosmosCost - Cloud Cost Aggregator

Unified cloud cost management across AWS, GCP, and Azure

7 views12:28

Reddit DevOps

Building a new cli that simplifies and enhances functionality of official cli of GCP, AWS, Azure

I’m planning to build a new cli that wraps official APIs of GCP, AWS, and Azure to simplify and enhance the functionality of official cli.

Things like better logs with easier filters, faster changing projects or profiles, and more.

It will be written in Golang, so it will run faster than the official cli tools which are written in Python.

Any feedback or what features you’d like to see?

https://redd.it/1pwxock
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views13:28

Reddit DevOps

3+ years DevOps experience, still underpaid — looking for blunt feedback

I’ve got 3+ years of DevOps experience. After a 6-month gap, I joined a startup where I worked on containerizing open-source apps, Docker/K8s deployments, and supervised services supporting AI agent training. That role didn’t last, and now I’m doing a mix of QA + some dev + infra work.

I’ll be upfront: I used ChatGPT to tighten the wording here, but the situation is 100% real.

I’m currently in an on-site role, around 42k/month, and working ~1000 km away from my hometown. The instability + pay mismatch is starting to wear me down. I keep seeing people with similar experience landing solid DevOps roles (including remote US-based ones), and I’m clearly missing something.

What I’d appreciate:

What should I fix first — skills, positioning, or proof of work?

What actually helped you move up in DevOps?

Any platforms or strategies that worked for landing remote roles?

Not looking for sympathy — just blunt, practical advice.

https://redd.it/1pwyx4a
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

12 views14:28

Reddit DevOps

What's the best free site to easily apply for remote devops positions?

By easily I mean you just upload your resume and click apply and move on to the next job post, instead of being required to sign up/register and fill in endless forms about my experience, only to be asked to upload my resume again.

https://redd.it/1pwymen
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views15:28

Reddit DevOps

hetzner-k3s v2.4.4 is out - Open source tool for Kubernetes on Hetzner Cloud

For those not familiar with it, it's by far the easiest way to set up cheap Kubernetes on Hetzner Cloud. The tool is open source and free to use, so you only pay for the infrastructure you use. This new version improves network requests handling when talking to the Hetzner Cloud API, as well as the custom local firewall setup for large clusters. Check it out! https://hetzner-k3s.com/

If you give it a try, let me know how it goes. If you have already used this tool, I'd appreciate some feedback. :)

If have chosen other tools over hetzner-k3s, I would love to learn about them and why you chose them, so that I can improve the tool or the documentation etc.

https://redd.it/1px0nqn
@r_devops

Hetzner-K3S

hetzner-k3s — Production Kubernetes on Hetzner Cloud in Minutes

The easiest and fastest way to create production-ready Kubernetes clusters on Hetzner Cloud. No Terraform, no management cluster, complete control.

13 views16:28

About

Blog

Apps

Platform