On-call / Ops folks: what actually happens when a deployment breaks at 2 AM?
Hi everyone,
I’m doing research to better understand real on-call / operations workflows, especially around deployments, rollbacks, and incident handling.
This is not a product pitch and I’m not selling anything.
I’m trying to learn from people who actually handle production responsibility.
If you’re involved in:
- deployments
- rollbacks
- uptime monitoring
- on-call rotations
I’d really appreciate your input.
You can reply publicly or DM if you prefer.
❓ Questions
1. What happens when a deployment goes wrong in production?
(Step by step — alerts, decisions, actions)
2. Who usually decides to rollback, and how fast does that happen?
3. What tools are you actively using during an incident?
(CI/CD, monitoring, logs, noscripts, manual steps)
4. What part of this process is the most stressful or error-prone?
5. What happens if the main on-call person is unavailable?
6. Is there anything you wish was automated — but currently isn’t? Why?
7. What would you never trust automation to do?
8. (Optional) How often does a bad deploy cause customer impact?
Thanks in advance — I’m genuinely trying to understand how this works in the real world.
https://redd.it/1q0lkb4
@r_devops
Hi everyone,
I’m doing research to better understand real on-call / operations workflows, especially around deployments, rollbacks, and incident handling.
This is not a product pitch and I’m not selling anything.
I’m trying to learn from people who actually handle production responsibility.
If you’re involved in:
- deployments
- rollbacks
- uptime monitoring
- on-call rotations
I’d really appreciate your input.
You can reply publicly or DM if you prefer.
❓ Questions
1. What happens when a deployment goes wrong in production?
(Step by step — alerts, decisions, actions)
2. Who usually decides to rollback, and how fast does that happen?
3. What tools are you actively using during an incident?
(CI/CD, monitoring, logs, noscripts, manual steps)
4. What part of this process is the most stressful or error-prone?
5. What happens if the main on-call person is unavailable?
6. Is there anything you wish was automated — but currently isn’t? Why?
7. What would you never trust automation to do?
8. (Optional) How often does a bad deploy cause customer impact?
Thanks in advance — I’m genuinely trying to understand how this works in the real world.
https://redd.it/1q0lkb4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you prove Incident response works?
We have an incident response plan, on call rotations, alerts and postmortems. Now that customers are asking about how we test incident response, I realized we’ve never really treated it as something that needed evidence.
We handle incidents and we do have evidence like log files/hives/history etc but I want to know how to collect them faster and on a daily basis so they can be more presentable.
What do I show besides screenshots and does the more the merrier go for this type of topic?
Any input helps ty!
https://redd.it/1q0nrdy
@r_devops
We have an incident response plan, on call rotations, alerts and postmortems. Now that customers are asking about how we test incident response, I realized we’ve never really treated it as something that needed evidence.
We handle incidents and we do have evidence like log files/hives/history etc but I want to know how to collect them faster and on a daily basis so they can be more presentable.
What do I show besides screenshots and does the more the merrier go for this type of topic?
Any input helps ty!
https://redd.it/1q0nrdy
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
PostHog vs BetterStack
I'm moving off Sentry. Just underwhelmed with the value.
I'm an indie dev.
Post Hog and Better Stack seem to be two of the best options under $50/mo.
Anyone tried both or either of them and have any insight they can share?
https://redd.it/1q0spin
@r_devops
I'm moving off Sentry. Just underwhelmed with the value.
I'm an indie dev.
Post Hog and Better Stack seem to be two of the best options under $50/mo.
Anyone tried both or either of them and have any insight they can share?
https://redd.it/1q0spin
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Best DevOps roadmaps for 2025/26?
I’m a student who has been trying to get into DevOps for the past year or so, but I’m having a hard time picking up a start.
I’ve worked on a lot of projects with .NET mainly for school and whatnot, I’ve also had to learn some React and Flutter throughout my journey.
I’ve really liked the concept of DevOps for a while now, and usually I’ve learned a lot of the stuff I know about software engineering in general through courses, roadmaps and personal projects.
There is a really popular roadmap site which I like to browse through sometimes (not sure if mentioning it will be considered ad so I’ll best avoid it), but it doesn’t feel complete.
I tried youtube tutorials, but most of them feel very forced in their way of teaching and are probably sponsored by a course provider anyway.
So my question the community - is there a proven and tested source of an optimal DevOps roadmap in 2025 (heading into 2026)? So far I’ve peeped into Docker and I got comfortable with using Linux, but it’s not so easy for me to do project based learning, since you need some general knowledge of what the problems are in DevOps. I don’t struggle with finding projects on technology I already know because I know what it can do and what it can’t do. But I’m barely touching the tip of the iceberg here! DevOps seems like such a huge rabbit hole, but it seems very interesting and I do want to learn more about it.
All help is much appreciated!
https://redd.it/1q0u9fg
@r_devops
I’m a student who has been trying to get into DevOps for the past year or so, but I’m having a hard time picking up a start.
I’ve worked on a lot of projects with .NET mainly for school and whatnot, I’ve also had to learn some React and Flutter throughout my journey.
I’ve really liked the concept of DevOps for a while now, and usually I’ve learned a lot of the stuff I know about software engineering in general through courses, roadmaps and personal projects.
There is a really popular roadmap site which I like to browse through sometimes (not sure if mentioning it will be considered ad so I’ll best avoid it), but it doesn’t feel complete.
I tried youtube tutorials, but most of them feel very forced in their way of teaching and are probably sponsored by a course provider anyway.
So my question the community - is there a proven and tested source of an optimal DevOps roadmap in 2025 (heading into 2026)? So far I’ve peeped into Docker and I got comfortable with using Linux, but it’s not so easy for me to do project based learning, since you need some general knowledge of what the problems are in DevOps. I don’t struggle with finding projects on technology I already know because I know what it can do and what it can’t do. But I’m barely touching the tip of the iceberg here! DevOps seems like such a huge rabbit hole, but it seems very interesting and I do want to learn more about it.
All help is much appreciated!
https://redd.it/1q0u9fg
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Defensive CI/CD & IaC pre-commit scanner (Bash) — seeking abuse-case feedback
I built a defensive pre-commit security scanner in Bash focused on overlooked attack surfaces (static sites, IaC, CI/CD). Looking for threat-model and abuse-case review—not validation or promotion.
Zimara\_v0.49.5
https://redd.it/1q0xtjr
@r_devops
I built a defensive pre-commit security scanner in Bash focused on overlooked attack surfaces (static sites, IaC, CI/CD). Looking for threat-model and abuse-case review—not validation or promotion.
Zimara\_v0.49.5
https://redd.it/1q0xtjr
@r_devops
GitHub
GitHub - oob-skulden/zimara: Comprehensive static site security scanner. 24+ automated checks for Hugo, Jekyll, Gatsby & more.…
Comprehensive static site security scanner. 24+ automated checks for Hugo, Jekyll, Gatsby & more. Finds misconfigurations, headers, secrets, and overlooked vulnerabilities before attackers ...
Intermediate DevOps Project Ideas looking for Suggestions to Tie My Skills Together (AWS, Docker, Jenkins, etc.)
Hey r/devops,
I've been diving deeper into DevOps over the past year and feel like I've got a solid grasp on a bunch of tools, but now I want to put them into a real-ish project to solidify everything and have something cool for my portfolio/learning.
Here's what I've learned/practiced so far:
- AWS: EC2, ECS (Fargate mostly), S3, IAM, RDS, VPC
- Linux shell noscripting
- Docker (containerizing apps)
- Jenkins (pipelines, plugins)
- SonarQube (code quality)
- Trivy (image scanning)
- GitLab (repos, basic CI)
- Ansible (playbooks, config management)
I haven't touched Terraform or Kubernetes yet (planning to start Terraform soon), so ideally something that doesn't require those.
I'm thinking something like a full CI/CD pipeline for a simple web app (maybe a Flask/Node todo app with RDS backend): GitLab -> Jenkins build/scan/push to ECR -> Ansible to deploy/update ECS service, with proper IAM/VPC security, etc.
But I'm open to better/more realistic ideas! What projects have helped you level up at this stage? Bonus if it's something that mimics real-world workflows without being too basic (no just "hello world" deploy).
Appreciate any suggestions, resources, or even "don't do X because Y" advice. Thanks in advance!
https://redd.it/1q0zd6v
@r_devops
Hey r/devops,
I've been diving deeper into DevOps over the past year and feel like I've got a solid grasp on a bunch of tools, but now I want to put them into a real-ish project to solidify everything and have something cool for my portfolio/learning.
Here's what I've learned/practiced so far:
- AWS: EC2, ECS (Fargate mostly), S3, IAM, RDS, VPC
- Linux shell noscripting
- Docker (containerizing apps)
- Jenkins (pipelines, plugins)
- SonarQube (code quality)
- Trivy (image scanning)
- GitLab (repos, basic CI)
- Ansible (playbooks, config management)
I haven't touched Terraform or Kubernetes yet (planning to start Terraform soon), so ideally something that doesn't require those.
I'm thinking something like a full CI/CD pipeline for a simple web app (maybe a Flask/Node todo app with RDS backend): GitLab -> Jenkins build/scan/push to ECR -> Ansible to deploy/update ECS service, with proper IAM/VPC security, etc.
But I'm open to better/more realistic ideas! What projects have helped you level up at this stage? Bonus if it's something that mimics real-world workflows without being too basic (no just "hello world" deploy).
Appreciate any suggestions, resources, or even "don't do X because Y" advice. Thanks in advance!
https://redd.it/1q0zd6v
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Built a CLI that auto-fixes CI build failures - is this useful?
I've been working on a side project and need a reality check from people who actually deal with CI/CD pipelines daily.
The idea: A build wrapper that automatically diagnoses failures, applies fixes, and retries - without human intervention.
\# Instead of your CI failing at 2am and waiting for you:
$ cyxmake build
✗ SDL2 not found
→ Installing via apt... ✓
→ Retrying... ✓
✗ undefined reference to 'boost::filesystem'
→ Adding link flag... ✓
→ Retrying... ✓
Build successful. Fixed 2 errors automatically.
How it works:
\- 50+ hardcoded error patterns (missing deps, linker errors, CMake/npm/cargo issues)
\- Pattern match → generate fix → apply → retry loop
\- Optional LLM fallback for unknown errors
My honest concerns:
1. Is this solving a real problem? Or do most teams just fix CI configs once and move on?
2. Security implications - a tool that auto-installs packages in CI feels risky
3. Scope creep - every build system is different, am I just recreating Dependabot + build system plugins?
What I think the use case is:
\- New projects where CI breaks often during setup
\- Open source projects where contributors have different environments
\- That 3am pipeline failure that could self-heal instead of paging someone
What I'm NOT trying to do:
\- Replace proper CI config management
\- Be smarter than a human who knows the codebase
GitHub: https://github.com/CYXWIZ-Lab/cyxmake (Apache 2.0, written in C)
Honest questions:
\- Would you actually use this, or is it a solution looking for a problem?
\- What would make you trust it in a real pipeline?
\- Am I missing something obvious that makes this a bad idea?
Appreciate any feedback, even "this is pointless" - rather know now than after another 6 months.
https://redd.it/1q122so
@r_devops
I've been working on a side project and need a reality check from people who actually deal with CI/CD pipelines daily.
The idea: A build wrapper that automatically diagnoses failures, applies fixes, and retries - without human intervention.
\# Instead of your CI failing at 2am and waiting for you:
$ cyxmake build
✗ SDL2 not found
→ Installing via apt... ✓
→ Retrying... ✓
✗ undefined reference to 'boost::filesystem'
→ Adding link flag... ✓
→ Retrying... ✓
Build successful. Fixed 2 errors automatically.
How it works:
\- 50+ hardcoded error patterns (missing deps, linker errors, CMake/npm/cargo issues)
\- Pattern match → generate fix → apply → retry loop
\- Optional LLM fallback for unknown errors
My honest concerns:
1. Is this solving a real problem? Or do most teams just fix CI configs once and move on?
2. Security implications - a tool that auto-installs packages in CI feels risky
3. Scope creep - every build system is different, am I just recreating Dependabot + build system plugins?
What I think the use case is:
\- New projects where CI breaks often during setup
\- Open source projects where contributors have different environments
\- That 3am pipeline failure that could self-heal instead of paging someone
What I'm NOT trying to do:
\- Replace proper CI config management
\- Be smarter than a human who knows the codebase
GitHub: https://github.com/CYXWIZ-Lab/cyxmake (Apache 2.0, written in C)
Honest questions:
\- Would you actually use this, or is it a solution looking for a problem?
\- What would make you trust it in a real pipeline?
\- Am I missing something obvious that makes this a bad idea?
Appreciate any feedback, even "this is pointless" - rather know now than after another 6 months.
https://redd.it/1q122so
@r_devops
GitHub
GitHub - CYXWIZ-Lab/cyxmake
Contribute to CYXWIZ-Lab/cyxmake development by creating an account on GitHub.
FAANG/MAANG devops?
Hi guys,
Anybody here working as a devops engineer in FAANG/maang companies? If yes what's the interview look like ? What all rounds, questions they have? Is DSA necessary?
https://redd.it/1q11j0t
@r_devops
Hi guys,
Anybody here working as a devops engineer in FAANG/maang companies? If yes what's the interview look like ? What all rounds, questions they have? Is DSA necessary?
https://redd.it/1q11j0t
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A year of cost optimization resulted 10% savings
This is mostly a venting post. It's my first year as a DevOps engineer at a medium sized b2b software company. I kind of took it upon myself to lower our cloud costs, even though no one else really cares that much. I turned it into a bit of a crusade (honestly, also thinking this was a low hanging fruit to show my worth and dedication, and also a learning experience). Even wrote here a few times about previous attempts.
After doing this for the better part of a year, got us to maybe 10% cost reduction. Rightsizing, killing idle capacity, requests/limits tuning, the usual janitorial work. After that every extra percent is a fight.
Our workloads are quite bursty, HPA driven, mostly stateless. Nothing exotic. Multiple instance types, multiple AZs, TTLs tuned, PDBs not insane, images pre pulled, startup times are reasonable.
We recently moved from Cluster Autoscaler to Karpenter and I really hoped this would finally let us drop baseline capacity.
Still doesn’t matter. We're not very well-utilized. Cluster utilization is mostly 20–50% CPU and memory Min replicas are pretty high. But no one wants to touch those as they are our safety net.
Most solutions work very well on steady workloads that are polite enough to rise slowly and at constant intervals. That's not really the case for most people I think.
That's it. I don't really have a question here. If anyone is feeling this, you're welcome to reply.
https://redd.it/1q13gbs
@r_devops
This is mostly a venting post. It's my first year as a DevOps engineer at a medium sized b2b software company. I kind of took it upon myself to lower our cloud costs, even though no one else really cares that much. I turned it into a bit of a crusade (honestly, also thinking this was a low hanging fruit to show my worth and dedication, and also a learning experience). Even wrote here a few times about previous attempts.
After doing this for the better part of a year, got us to maybe 10% cost reduction. Rightsizing, killing idle capacity, requests/limits tuning, the usual janitorial work. After that every extra percent is a fight.
Our workloads are quite bursty, HPA driven, mostly stateless. Nothing exotic. Multiple instance types, multiple AZs, TTLs tuned, PDBs not insane, images pre pulled, startup times are reasonable.
We recently moved from Cluster Autoscaler to Karpenter and I really hoped this would finally let us drop baseline capacity.
Still doesn’t matter. We're not very well-utilized. Cluster utilization is mostly 20–50% CPU and memory Min replicas are pretty high. But no one wants to touch those as they are our safety net.
Most solutions work very well on steady workloads that are polite enough to rise slowly and at constant intervals. That's not really the case for most people I think.
That's it. I don't really have a question here. If anyone is feeling this, you're welcome to reply.
https://redd.it/1q13gbs
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking for a structured, free, hands-on DevOps / DevSecOps learning path
Hi everyone,
I work in information security, mainly in penetration testing and secure application development (Secure SDLC).
I’m now looking to learn DevOps and especially DevSecOps in a deep and practical way.
I recently followed a DevOps course on LabEx, which worked very well for me because it was lab-based, step-by-step, and structured.
What I’m specifically looking for now is a free, structured, hands-on learning path,
not a collection of scattered tutorials or random resources.
Most lab-based DevOps / DevSecOps platforms I’ve found so far are paid, so I’d really appreciate recommendations for a clear, well-defined, free path that makes sense for someone with a security background.
Thanks in advance for any suggestions.
https://redd.it/1q14ux0
@r_devops
Hi everyone,
I work in information security, mainly in penetration testing and secure application development (Secure SDLC).
I’m now looking to learn DevOps and especially DevSecOps in a deep and practical way.
I recently followed a DevOps course on LabEx, which worked very well for me because it was lab-based, step-by-step, and structured.
What I’m specifically looking for now is a free, structured, hands-on learning path,
not a collection of scattered tutorials or random resources.
Most lab-based DevOps / DevSecOps platforms I’ve found so far are paid, so I’d really appreciate recommendations for a clear, well-defined, free path that makes sense for someone with a security background.
Thanks in advance for any suggestions.
https://redd.it/1q14ux0
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
As a fresher
Hey guys I haven't graduated yet I am in 2nd year rn I am really thinking to do Devops and try for their roles as I hv done one internship in that domain or go blockchain web3 as I will graduate in 2028 what should I pick as I heard to learn Devops I have to spend money before to seriously learn it please exp devs in here guide me
https://redd.it/1q15hmg
@r_devops
Hey guys I haven't graduated yet I am in 2nd year rn I am really thinking to do Devops and try for their roles as I hv done one internship in that domain or go blockchain web3 as I will graduate in 2028 what should I pick as I heard to learn Devops I have to spend money before to seriously learn it please exp devs in here guide me
https://redd.it/1q15hmg
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Pivot to DevOps: Have the skills and projects, but the resume isn't working. What am I missing?
Hello,
I am looking for a sanity check on my job search strategy.
I am trying to break into DevOps. I have built several projects involving k8s and terraform to bridge the gap between my past experience in cybersecurity and this new role.
I have tailored my resume to match the ATS stands, but I am met with silence.
Prior to this I was in cybersecurity domain for 1.7 years and due to some family issues i has to drop out. And currently I am having 1.3 years career gap.
https://redd.it/1q0ybpk
@r_devops
Hello,
I am looking for a sanity check on my job search strategy.
I am trying to break into DevOps. I have built several projects involving k8s and terraform to bridge the gap between my past experience in cybersecurity and this new role.
I have tailored my resume to match the ATS stands, but I am met with silence.
Prior to this I was in cybersecurity domain for 1.7 years and due to some family issues i has to drop out. And currently I am having 1.3 years career gap.
https://redd.it/1q0ybpk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
AWS Support → DevOps Engineer (Product/Startup) – Need Guidance
Hi all,
I’m working in an AWS cloud support role in India and preparing for the AWS Solutions Architect Associate exam.
My goal is to move into a DevOps Engineer role (product/startup, not support) by 2026.
I’m a complete beginner in DevOps and need realistic advice
If I start now, how long does it realistically take to become job-ready for DevOps?
Which skills matter most for product/startup companies?
Should I focus more on hands-on projects or certifications after SAA?
Any honest guidance or roadmap would really help.
Thanks 🙏
https://redd.it/1q1asx0
@r_devops
Hi all,
I’m working in an AWS cloud support role in India and preparing for the AWS Solutions Architect Associate exam.
My goal is to move into a DevOps Engineer role (product/startup, not support) by 2026.
I’m a complete beginner in DevOps and need realistic advice
If I start now, how long does it realistically take to become job-ready for DevOps?
Which skills matter most for product/startup companies?
Should I focus more on hands-on projects or certifications after SAA?
Any honest guidance or roadmap would really help.
Thanks 🙏
https://redd.it/1q1asx0
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
The 8 Fallacies of Distributed Computing: All You Need To Know + Why It’s Still Relevant In 2026
https://lukasniessen.medium.com/the-8-fallacies-of-distributed-computing-all-you-need-to-know-why-its-still-relevant-in-2026-078b4d8a98f1
https://redd.it/1q1chjj
@r_devops
https://lukasniessen.medium.com/the-8-fallacies-of-distributed-computing-all-you-need-to-know-why-its-still-relevant-in-2026-078b4d8a98f1
https://redd.it/1q1chjj
@r_devops
Medium
The 8 Fallacies of Distributed Computing: All You Need To Know + Why It’s Still Relevant In 2026
Back in 1994, Peter Deutsch at Sun Microsystems wrote down something that every distributed systems engineer eventually learns the hard…
Orion-Belt – Open-source SSH/SCP Bastion with Reverse Tunnels & ReBAC (Seeking Early Contributors)
Hey everyone,
I’ve spent the last few months building **Orion-Belt**, a secure SSH/SCP bastion system for teams that need to manage infrastructure **without opening a single inbound firewall port**.
The problem I wanted to solve: Traditional bastions are either too simple (no auditing) or too complex/expensive (enterprise PAM tools).
**How it works:**
* Your servers (behind firewalls) establish **Reverse SSH Tunnels** to the Orion-Belt gateway.
* Clients connect via `osh` (SSH) or `ocp` (SCP), and the gateway routes traffic through those tunnels.
* Everything is audited, controlled, and time-bound.
**Key Features:**
* **ReBAC** – Relationship-Based Access Control (fine-grained permissions, no “all-or-nothing”).
* **Session Recording** – Every keystroke is captured for audit and replay.
* **Temporary Access** – Request/approve workflow with automatic expiration.
* **No Inbound Rules** – Works in locked-down VPCs, home labs, or private networks.
It’s currently in **Alpha** (APIs and internals may change) and written in Go. I’m looking for **early adopters and contributors** to break it, give feedback, and help shape the architecture.
GitHub: [https://github.com/zrougamed/orion-belt](https://github.com/zrougamed/orion-belt)
I’d love to hear your thoughts on the approach and how you handle privileged access in your environments!
If this resonates, consider forking the repo, testing it in your setup, and sharing feedback or PRs — your input could directly shape Orion-Belt’s design and feature set!
https://redd.it/1q1dl3q
@r_devops
Hey everyone,
I’ve spent the last few months building **Orion-Belt**, a secure SSH/SCP bastion system for teams that need to manage infrastructure **without opening a single inbound firewall port**.
The problem I wanted to solve: Traditional bastions are either too simple (no auditing) or too complex/expensive (enterprise PAM tools).
**How it works:**
* Your servers (behind firewalls) establish **Reverse SSH Tunnels** to the Orion-Belt gateway.
* Clients connect via `osh` (SSH) or `ocp` (SCP), and the gateway routes traffic through those tunnels.
* Everything is audited, controlled, and time-bound.
**Key Features:**
* **ReBAC** – Relationship-Based Access Control (fine-grained permissions, no “all-or-nothing”).
* **Session Recording** – Every keystroke is captured for audit and replay.
* **Temporary Access** – Request/approve workflow with automatic expiration.
* **No Inbound Rules** – Works in locked-down VPCs, home labs, or private networks.
It’s currently in **Alpha** (APIs and internals may change) and written in Go. I’m looking for **early adopters and contributors** to break it, give feedback, and help shape the architecture.
GitHub: [https://github.com/zrougamed/orion-belt](https://github.com/zrougamed/orion-belt)
I’d love to hear your thoughts on the approach and how you handle privileged access in your environments!
If this resonates, consider forking the repo, testing it in your setup, and sharing feedback or PRs — your input could directly shape Orion-Belt’s design and feature set!
https://redd.it/1q1dl3q
@r_devops
GitHub
GitHub - zrougamed/orion-belt: Secure SSH/SCP bastion with ReBAC, reverse tunnels, session recording, and temporary access workflow
Secure SSH/SCP bastion with ReBAC, reverse tunnels, session recording, and temporary access workflow - zrougamed/orion-belt
Securing a small production VPS by actually watching SSH and HTTP logs
I run a small production VPS (Docker, reverse proxy, SSH keys). Traffic is low, but after looking at the logs I saw constant SSH brute force and HTTP probing for .env, credentials, and random paths.
Nothing was compromised, but it made it clear I wasn’t really watching.
I documented how I approached this using log-based detection, temporary bans, and automation. CrowdSec wasn’t an obvious fit at first (especially with Kamal and container logs), but I got it working after some trial and error.
Article:
https://muthuishere.medium.com/securing-a-production-vps-in-practice-e3feaa9545af
Code / automation:
https://github.com/muthuishere/automated-crowdsec-kamal
Would be interested to hear how others handle this on small production servers.
https://redd.it/1q1d8lf
@r_devops
I run a small production VPS (Docker, reverse proxy, SSH keys). Traffic is low, but after looking at the logs I saw constant SSH brute force and HTTP probing for .env, credentials, and random paths.
Nothing was compromised, but it made it clear I wasn’t really watching.
I documented how I approached this using log-based detection, temporary bans, and automation. CrowdSec wasn’t an obvious fit at first (especially with Kamal and container logs), but I got it working after some trial and error.
Article:
https://muthuishere.medium.com/securing-a-production-vps-in-practice-e3feaa9545af
Code / automation:
https://github.com/muthuishere/automated-crowdsec-kamal
Would be interested to hear how others handle this on small production servers.
https://redd.it/1q1d8lf
@r_devops
Medium
Securing a Production VPS in Practice
Let’s start with a simple assumption.