NEW BOT Телеграм, страница

Reddit DevOps

I built an open source AI agent for incident response

I worked on database infra at a big company and spent a lot of time on call. We had a ton of alerts and dashboards, and I hated jumping between a million tabs just to understand what was going on.

So I built an open source AI agent to help with that.

It runs alongside an incident and:

reads alerts, logs, metrics, and Slack
keeps a running summary of what’s happening
tracks what’s been tried and what hasn’t
suggests mitigations (like rolling back a deploy or drafting a fix PR), but a human has to approve anything before it runs

I used earlier versions during real incidents and it was useful enough that I kept working on it. This is the first open source release.

Repo: https://github.com/incidentfox/incidentfox
README has setup instructions and a demo you can run locally.

https://redd.it/1qkjqqf
@r_devops

GitHub

GitHub - incidentfox/incidentfox: AI-powered SRE platform for automated incident investigation

AI-powered SRE platform for automated incident investigation - incidentfox/incidentfox

5 views07:28

Reddit DevOps

Terraform AWS Infrastructure Framework (Multi-Env, Name-Based, Scales by Config)

🚀 Excited to share my latest open-source project: a Terraform framework for AWS focused on multi-environment infrastructure management.

After building and refining patterns across multiple environments, I open-sourced a framework that helps teams keep deployments consistent across dev / qe / prod.

The problem:
- Managing AWS infra across dev / qe / prod usually leads to:
- Configuration drift between environments
- Hardcoded resource IDs everywhere
- Repetitive boilerplate when adding “one more” resource
- Complex dependency management across modules

The solution:
A workspace-based framework with automation:

- ✅ Automatic resource linking — reference resources by name, not IDs. The framework resolves and injects IDs automatically across modules.
- ✅ DRY architecture — one codebase for dev / qe / prod using Terraform workspaces.
- ✅ Scale by configuration, not code — create unlimited resources WITHOUT re-calling modules. Just add entries in a .tfvars file using plain-English names (e.g., “prod_vpc”, “private_subnet_az1”, “eks_cluster_sg”).

What’s included:
- VPC networking (multi-AZ, public/private subnets)
- Internet gateway, NAT gateway, route tables, EIPs
- Security groups + SG-to-SG references
- VPC endpoints (Gateway & Interface)
- EKS cluster + managed node groups

Real example:
# terraform.tfvars (add more entries, no new module blocks)
eks_clusters = {
prod = {
my_cluster = {
cluster_version = "1.34"
vpc_name = "prod_vpc" # name, not ID
subnet_name = ["pri_sub1", "pri_sub2"] # names, not IDs
sg_name = ["eks_cluster_sg"] # name, not ID
}
}
}
# Framework injects vpc_id, subnet_ids, sg_ids automatically

GitHub:
https://github.com/rajarshigit2441139/terraform-aws-infrastructure-framework

Looking for:
- Feedback from the community
- Contributors interested in IaC patterns
- Teams standardizing AWS deployments

Question:
What are your biggest challenges with multi-environment Terraform? How do you handle cross-module references today?

#Terraform #AWS #InfrastructureAsCode #DevOps #CloudEngineering #EKS #Kubernetes #OpenSource #CloudArchitecture #SRE

https://redd.it/1qkjko7
@r_devops

GitHub

GitHub - rajarshigit2441139/terraform-aws-infrastructure-framework: This Terraform framework provides a comprehensive, production…

This Terraform framework provides a comprehensive, production-ready infrastructure-as-code solution for AWS. Built with modularity, scalability, and multi-environment support at its core, it enable...

10 views08:28

Reddit DevOps

Shall we introduce Rule against AI Generated Content?

We’ve been seeing an increase in AI generated content, especially from new accounts.

We’re considering adding a Low-effort / Low-quality rule that would include AI-generated posts.

We want your input before making changes.. please share your thoughts below.

https://redd.it/1qkliqo
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

7 views09:28

Reddit DevOps

Is specialising in GCP good for my career or should I move?

Hey,

Looking for advice.

I have spent nearly 5 years at my current devops job because it's ideal for me in terms of team chemistry, learning and WLB. The only "issue" is that we use Google Cloud- which I like using, but not sure if that matters.

I know AWS is the dominant cloud provider, am I sabotaging my career development by staying longer at this place? Obviously you can say cloud skills transfer over but loads of job denoscriptions say (2/3/4+ years experience in AWS/Azure) which is a lot of roles I might just be screened out of.

Everyone is different but wondered what other people's opinion would be on this. I would probably have to move to a similar mid or junior level, should I move just to improve career prospects? Could I still get hired for other cloud roles with extensive experience in GCP if i showed I could learn?

Also want to add I have already built personal projects in AWS, but they only have value up to a certain point I feel. Employers want production management and org level adminstration experience, of that I have very little.

https://redd.it/1qkm8w5
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views10:29

Reddit DevOps

When to use Ansible vs Terraform, and where does Argo CD fit?

I’m trying to clearly understand where Ansible, Terraform, and Argo CD fit in a modern Kubernetes/GitOps setup, and I’d like to sanity-check my understanding with the community.

From what I understand so far:

Terraform is used for infrastructure provisioning (VMs, networks, cloud resources, managed K8s, etc.)
Ansible is used for server configuration (OS packages, files, services), usually before or outside Kubernetes

This part makes sense to me.

Where I get confused is Argo CD.

Let’s say:

A Kubernetes cluster (EKS / k3s / etc.) is created using Terraform
Now I want to install Argo CD on that cluster

Questions:

1. What is the industry-standard way to install Argo CD?
Terraform Kubernetes provider?
Ansible?
Or just a simple `kubectl apply` / bash noscript?
2. Is the common pattern:
Terraform → infra + cluster
One-time bootstrap (`kubectl apply`) → Argo CD
Argo CD → manages everything else in the cluster?
3. In my case, I plan to:
Install a base Argo CD
Then use Argo CD itself to install and manage the Argo CD Vault Plugin

Basically, I want to avoid tool overlap and follow what’s actually used in production today, not just what’s technically possible.

Would appreciate hearing how others are doing this in real setups.

\---
Disclaimer:
Used AI to help write and format this post for grammar and readability.

https://redd.it/1qkn8vd
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

8 views11:29

Reddit DevOps

As an SWE, for your next greenfield project, would you choose Pulumi over OpenTofu/Terraform/Ansible for the infra part?

I'm curious about the long-term alive-ness and future-proofing of investing time into Pulumi. As someone currently looking at a fresh start, is it worth the pivot for a new project?

https://redd.it/1qkp531
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

9 views12:33

Reddit DevOps

ARM build server for hosting Gitlab runners

I'm in academia where we don't have the most sophisticated DevOps setup. Hope it's acceptable to ask a basic question here.

I want to deploy docker images from our Gitlab's CI/CD to ARM-based linux systems and am looking for a cost-efficient solution to do so. Using our x86 build server to build for ARM via QEMU wasn't a good solution - it takes forever and the result differ from native builds. So I'm looking to set up a small ARM server specific to this task.

A Mac Mini appears to be an inexpensive yet relatively powerful solution to me. Any reason why this would be a bad idea? Would love to hear opinions!

https://redd.it/1qkqbsw
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views13:41

Reddit DevOps

59,000,000 People Watched at the Same Time Here’s How this company Backend Didn’t Go Down

During the Cricket World Cup, **Hotstar**(An indian OTT) handled **\~59 million concurrent live streams**.

That number sounds fake until you think about what it really means:

* Millions of open TCP connections
* Sudden traffic spikes within seconds
* Kubernetes clusters scaling under pressure
* NAT Gateways, IP exhaustion, autoscaling limits
* One misconfiguration → total outage

I made a breakdown video explaining **how Hotstar’s backend survived this scale**, focusing on **real engineering problems**, not marketing slides.

Topics I coverd:

* Kubernetes / EKS behavior during traffic bursts
* Why NAT Gateways and IPs become silent killers at scale
* Load balancing + horizontal autoscaling under live traffic
* Lessons applicable to any high-traffic system (not just OTT)

Netflix Mike Tyson vs Jake Paul was 65 million concurrent viewers and jake paul iconic statement was "We crashed the site". So, even company like netflix have hard time handling big loads

If you’ve ever worked on:

* High-traffic systems
* Live streaming
* Kubernetes at scale
* Incident response during peak load

You’ll probably enjoy this.

[https://www.youtube.com/watch?v=rgljdkngjpc](https://www.youtube.com/watch?v=rgljdkngjpc)

Happy to answer questions or go deeper into any part.

https://redd.it/1qksl00
@r_devops

YouTube

59,000,000 People Watched at Once How Did Hotstar Not Crash?

How did Disney+ Hotstar survive 59,000,000 concurrent viewers during a single live cricket match without crashing?

Actual Blog:
https://blog.hotstar.com/scaling-infrastructure-for-millions-from-challenges-to-triumphs-part-1-6099141a99ef

Timestamps:
00:00…

12 views14:47

Reddit DevOps

Our enterprise cloud security budget is under scrutiny. We’re paying $250K for current CNAPP, Orca came in 40% cheaper. Would you consider switching?

Our CFO questioned our current CNAPP (wiz) spend at $250K+ annually in the last cost review. Had to find ways to get it down. Got a quote from Orca that's 40% less for similar coverage.

For those who've evaluated both platforms is the price gap justified for enterprise deployments? We're heavy on AWS/Azure with about 2K workloads. The current tool works but the cost scrutiny is real.

Our main concerns are detection quality, false positive rates, and how well each integrates with our existing CI/CD pipeline. Any experiences would help.

https://redd.it/1qkwfrx
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

16 views18:19

Reddit DevOps

Incident management across teams is an absolute disaster

We have a decent setup for tracking our own infrastructure incidents but when something affects multiple teams it becomes total chaos. When a major incident happens we're literally updating three different places and nobody has a single source of truth. Post mortems take forever because we're piecing together timelines from different tools. Our on call rotation also doesn't sync well with who actually needs to respond. I wonder, how are you successfully handling cross functional incident tracking without creating more overhead?

https://redd.it/1qkzwlf
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

16 views20:58

Reddit DevOps

Advice Failed SC

So I wanted to get some advice from anyone who's had this happen or been through anything similar.

For context today I've just failed my required SC which was a conditional part of the job offer.

Without divulging much info it wasn't due to me or anything I did it was just to an association with someone (although haven't spoke to them in years) so I was/am a bit blindsided by this as I'm very likely to be terminated and left without a job.

Nothing has been fully confirmed yet and my current lead/manager has expressed he does not want to lose me and will try his best to keep me but its not fully his decision and termination has not been taken off the table.

Any advice/guidance?

https://redd.it/1ql1oim
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

15 views23:32

Reddit DevOps

Kubernetes IDE options

Hey everyone, I am currently using Lens as k8s IDE but it consumes too much resources it seems. I want to change it. So I wonder what Kubernetes IDE you are using.

https://redd.it/1ql1ncy
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

11 views02:15

Reddit DevOps

I have tons of commit in by hands-on project just to verify CI pipeline. how professional solve this problem ?

I have a pipeline to test my app and if it passes, push the new image of the app to github, but github actions require my secret key for a specific feature. I want to run the app in kubernetes statefulset so I deactivate my secret key require feature. but every change I done in my yaml files or in webapp code, I have to push it to github repo, so it will trigger actions and if it pass the test step, it will move to push new image step and my statefulset can pull the latest image and I can see that change I have done effect my statefulset.
so if I want to add a feature in my webapp, I have to think run it in my local, then I have to think about will it be problem in github actions and statefulset.
I just too tried from this cycle. is there any way to test my github actions before I push it to github repo? or how you guys test your yaml files ?

here is my solutions :
1 - Instead pull the image from the repo, I can create the image locally and I can try, but I won't know will it pass my test step of pipeline
2 - I can create a fork from the main repo and push too many commit, when I merge it with main, it will look 1 commit
3 - I find an app named "act" to run github actions locally, but they are not pulling variables from github repo

https://redd.it/1ql4fq1
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

15 views04:50

Reddit DevOps

How are you actually handling observability in 2026? (Beyond the marketing fluff)

Every year observability gets pitched as simpler and basically solved. Unified platforms, clean dashboards, smarter alerts.

In reality, when something breaks it still feels messy.

I am curious how people are actually handling this in 2026. What does observability look like for you in practice right now.

https://redd.it/1qlj4h7
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

15 views10:05

Reddit DevOps

DevOps Vouchers Extension

Hi

I bought a DevOps foundation and SRE exam voucher from the DevOps institute back in 2022.
A few life events happened and I wasn't able to give the exam. I'd like to attempt the exams now.

The platform was webassessor back then. Now i think its peoplecert.

I emailed their customer support and the people cert team picked up stating they have no records of my purchase.

I can provide the receipt emails, voucher codes and my email id for proof of payments.

Any one who encountered such an issue before or knows how to resolve?

Will really appreciate because its around $400 of hard earned money

https://redd.it/1qllumj
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

20 views12:41

Reddit DevOps

From DevOps Engineer to Consultant

Has anyone in Europe gone from a DevOps engineer role to work self employed in Europe? How easy or difficult is it? Any tips on how to do the change?

https://redd.it/1qlmufo
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

19 views15:14

Reddit DevOps

curl killed their bug bounty because of AI slop. So what’s your org’s “rate limit” for human attention?

curl just shut down their bug bounty program because they were getting buried in low-quality AI “vuln reports.”

This feels like alert fatigue, but for security intake. If it’s basically free to generate noise, the humans become the bottleneck, everyone stops trusting the channel, and the one real report gets lost in the pile.

How are you handling this in your org? Security side or ops side. Any filters/gating that actually work?

Source: https://github.com/curl/curl/pull/20312

https://redd.it/1qlqgnt
@r_devops

GitHub

BUG-BOUNTY.md: we stop the bug-bounty end of Jan 2026 by bagder · Pull Request #20312 · curl/curl

Remove mentions of the bounty and hackerone.
There will be more mentions, blog posts, timings etc in the coming weeks.
Blog post: https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/

21 views18:10

Reddit DevOps

Udemy course recommendations for a graduate platform enginner

hi all, I'll be starting my first job as a graduate platform engineer soon

so i would like enquire about what udemy courses would you recommend to get a graduate platform engineer up to speed as fast as possible, as they are to many courses on udemy to choose from.

all recommendations and advice is greatly appreciated, thanks

https://redd.it/1qlug67
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

22 views20:53

Reddit DevOps

A CLI to Tame OWASP Dependency-Track Version Sprawl in CI/CD

Like many of you, I struggled with automating Dependency-Track. Using curl was messy, and my dashboard was flooded with hundreds of "Active" versions from old CI builds, destroying my metrics.

I built a small CLI tool (Go) to solve this. It handles the full lifecycle in one command:

* Uploads the SBOM.
* Tags the new version as Latest.
* Auto-archives old versions (sets active: false) so only the deployed version counts toward risk scores.

It’s open source and works as a single binary. Hope it saves you some bash-noscripting headaches!

Repo: [https://github.com/MedUnes/dtrack-cli](https://github.com/MedUnes/dtrack-cli)

https://redd.it/1qm066u
@r_devops

GitHub

GitHub - MedUnes/dtrack-cli: A Go-based CLI tool to automate the upload and lifecycle management of Software Bill of Materials…

A Go-based CLI tool to automate the upload and lifecycle management of Software Bill of Materials (SBOM) in OWASP Dependency-Track. - MedUnes/dtrack-cli

28 views00:44

Reddit DevOps

Is there any useful tool that allows you to test your kubernetes configs without deploying or running it locally?

Is there any useful tool that allows you to test your kubernetes configs without deploying or running it locally? I am wondering if there's anything like that, because I have a large config with a lot of resources.

https://redd.it/1qm89l8
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

29 views04:50

Reddit DevOps

How should i pivot to devops, without losing half my salary?

Hey guys,

Here’s my situation. I’m currently working as a Cloud Engineer, mostly with IaaS, PaaS and IaC. I’ve been in the cloud space for about a year now, and overall I have around 5–6 years of IT experience.

In the cert side, i have AZ-900, AZ-104, AZ-305, and AZ-400

In my current role I worked my way up to a medior level, but my real goal is to move into DevOps. I know that means I need solid Docker and Kubernetes knowledge, so I’ve started learning and practicing them in my limited free time. I’ve even built some small projects already.

The problem is that my current salary is around standard market level, which is great, but when I apply for DevOps roles, I usually run into two outcomes:

1, I don’t even get invited to an interview,

2, I get an interview, but they offer me about half my current salary because they would hire me as a junior DevOps engineer due to my lack of hands-on experience with Docker and Kubernetes.

Right now I simply can’t afford to cut my salary in half. On top of that, my current company doesn’t really use Docker or Kubernetes, so I don’t have the chance to gain real work experience with them.

I know the market is shit for switching jobs right now, but living in a country where salaries are already much lower than in most of Europe makes this even more frustrating. Honestly, it’s hard to see a clear way forward.

What would you do in my situation? How would you successfully pivot into DevOps without taking such a big financial step back? Any advice would be really appreciated.

https://redd.it/1qmcemd
@r_devops

From the devops community on Reddit

Explore this post and more from the devops community

31 views10:59

About

Blog

Apps

Platform