Is “EnvSecOps” a thing?
Been a while folks... long-time lurker — also engineer / architect / DevOps / whatever we’re calling ourselves this week.
I’ve racked physical servers, written plenty of code, automated all the things, and (like everyone else lately) built a few LLM agents on the side — because that’s the modern-day “todo app,” isn’t it? I’ve collected dotfiles, custom zsh prompts, fzf noscripts, shell aliases, and eventually moved most of that mess into devcontainers.
They’ve become one of my favorite building blocks, and honestly they’re wildly undersold in the ops world. (Don’t get me started on Jupyter notebooks... squirrel!) They make a great foundation for standardized stacks and keep all those wriggly little ops noscripts from sprawling into fifteen different versions across a team. Remember when Terraform wasn’t backwards compatible with state? Joy.
Recently I was brushing up for the AWS Security cert (which, honestly, barely scratches real-world security... SASL what? Sigstore who?), and during one of the practice tests something clicked out of nowhere. Something I’ve been trying to scratch for years suddenly felt reachable.
I don’t want zero trust — I want zero drift. From laptop to prod.
Everything we do depends on where it runs. Same tooling, same policies, same runtime assumptions. If your laptop can deploy to prod, that laptop is prod.
So I’m here asking for guidance or abuse... actually both, from the infinite wisdom of the r/devops trenches. I’m calling it “EnvSecOps.” Change my mind.
But in all seriousness, I can’t unsee it now. We scan containers, lock down pipelines, version our infrastructure... but the developer environment itself is still treated like a disposable snowflake. Why? Why can’t the same container that’s used to develop a service also build it, deploy it, run it, and support it in production? Wouldn’t that also make a perfect sandbox for automation or agents — without giving them full reign over your laptop or prod?
Feels like we’ve got all the tooling in the world, just nothing tying it all together. But I think we actually can. A few hashes here, a little provenance there, a sprinkle of attestations… some layered, composable, declarative, and verified tooling. Now I’ve got a verified, maybe even signed environment.
No signature? No soup for you.
(No creds, either.)
Yes, I know it’s not that simple. But all elegant solutions seem simple in hindsight.
Lots of thoughts here. Reign me in. Roast me. Work with me. But I feel naked and exposed now that I’ve seen the light.
And yeah, I ran this past GPT.
It agreed a little too quickly — which makes me even more suspicious. But it fixed all my punctuation and typos, so here we are.
Am I off, or did I just invent the next buzzword we’re all gonna hate?
https://redd.it/1okm1ih
@r_devops
Been a while folks... long-time lurker — also engineer / architect / DevOps / whatever we’re calling ourselves this week.
I’ve racked physical servers, written plenty of code, automated all the things, and (like everyone else lately) built a few LLM agents on the side — because that’s the modern-day “todo app,” isn’t it? I’ve collected dotfiles, custom zsh prompts, fzf noscripts, shell aliases, and eventually moved most of that mess into devcontainers.
They’ve become one of my favorite building blocks, and honestly they’re wildly undersold in the ops world. (Don’t get me started on Jupyter notebooks... squirrel!) They make a great foundation for standardized stacks and keep all those wriggly little ops noscripts from sprawling into fifteen different versions across a team. Remember when Terraform wasn’t backwards compatible with state? Joy.
Recently I was brushing up for the AWS Security cert (which, honestly, barely scratches real-world security... SASL what? Sigstore who?), and during one of the practice tests something clicked out of nowhere. Something I’ve been trying to scratch for years suddenly felt reachable.
I don’t want zero trust — I want zero drift. From laptop to prod.
Everything we do depends on where it runs. Same tooling, same policies, same runtime assumptions. If your laptop can deploy to prod, that laptop is prod.
So I’m here asking for guidance or abuse... actually both, from the infinite wisdom of the r/devops trenches. I’m calling it “EnvSecOps.” Change my mind.
But in all seriousness, I can’t unsee it now. We scan containers, lock down pipelines, version our infrastructure... but the developer environment itself is still treated like a disposable snowflake. Why? Why can’t the same container that’s used to develop a service also build it, deploy it, run it, and support it in production? Wouldn’t that also make a perfect sandbox for automation or agents — without giving them full reign over your laptop or prod?
Feels like we’ve got all the tooling in the world, just nothing tying it all together. But I think we actually can. A few hashes here, a little provenance there, a sprinkle of attestations… some layered, composable, declarative, and verified tooling. Now I’ve got a verified, maybe even signed environment.
No signature? No soup for you.
(No creds, either.)
Yes, I know it’s not that simple. But all elegant solutions seem simple in hindsight.
Lots of thoughts here. Reign me in. Roast me. Work with me. But I feel naked and exposed now that I’ve seen the light.
And yeah, I ran this past GPT.
It agreed a little too quickly — which makes me even more suspicious. But it fixed all my punctuation and typos, so here we are.
Am I off, or did I just invent the next buzzword we’re all gonna hate?
https://redd.it/1okm1ih
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
"terraform template" similar to "helm template"
I use
I wish there were a similar tool for Terraform modules so that I could run like
I tried building it myself, but my skills aren't enough for the task.
Does anyone else think this would be a great idea?
https://redd.it/1okn28g
@r_devops
I use
helm template to pre-render all my manifests, and it works beautifully for PR reviews.I wish there were a similar tool for Terraform modules so that I could run like
terraform template, and it would output the raw HCL resources instead of the one-line git diff that could potentially trigger hundreds of resources during terraform plan.I tried building it myself, but my skills aren't enough for the task.
Does anyone else think this would be a great idea?
https://redd.it/1okn28g
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
DoubleClickjacking: Modern UI Redressing Attacks Explained
https://instatunnel.my/blog/doubleclickjacking-modern-ui-redressing-attacks-explained
https://redd.it/1okleal
@r_devops
https://instatunnel.my/blog/doubleclickjacking-modern-ui-redressing-attacks-explained
https://redd.it/1okleal
@r_devops
InstaTunnel
Clickjacking 2.0: UI Redressing Attacks in SPAs (2025)
Discover advanced clickjacking techniques in 2025: DoubleClickjacking bypasses all defenses, drag-and-drop file theft, and SPA vulnerabilities. Learn protection
What’s that one cloud mistake that still haunts your budget? [Halloween spl]
A while back, I asked the Reddit community to share some of their worst cloud cost horror stories, and you guys did not disappoint.
For Halloween, I thought I’d bring back a few of the most haunting ones:
* There was one where a DDoS attack quietly racked up $450K in egress charges overnight.
* Another where a BigQuery noscript ran on dev Friday night and by Saturday morning, €1M was gone.
* And one where a Lambda retry loop spiraled out of control that turned $0.12/day into $400/day before anyone noticed.
The scary part is obviously that these aren’t at all rare. They happen all the time and are hidden behind dashboards, forgotten tags, or that one “testing” account nobody checks.
Check out the full list here: [https://amnic.com/blogs/cloud-cost-horror-stories](https://amnic.com/blogs/cloud-cost-horror-stories?utm_source=chatgpt.com)
And if you’ve got your own such story, drop it below. I’m so gonna make a part 2 of these stories!!
https://redd.it/1okps2j
@r_devops
A while back, I asked the Reddit community to share some of their worst cloud cost horror stories, and you guys did not disappoint.
For Halloween, I thought I’d bring back a few of the most haunting ones:
* There was one where a DDoS attack quietly racked up $450K in egress charges overnight.
* Another where a BigQuery noscript ran on dev Friday night and by Saturday morning, €1M was gone.
* And one where a Lambda retry loop spiraled out of control that turned $0.12/day into $400/day before anyone noticed.
The scary part is obviously that these aren’t at all rare. They happen all the time and are hidden behind dashboards, forgotten tags, or that one “testing” account nobody checks.
Check out the full list here: [https://amnic.com/blogs/cloud-cost-horror-stories](https://amnic.com/blogs/cloud-cost-horror-stories?utm_source=chatgpt.com)
And if you’ve got your own such story, drop it below. I’m so gonna make a part 2 of these stories!!
https://redd.it/1okps2j
@r_devops
Amnic
24 Cloud Cost Horror Stories Redditors Shared That’ll Keep You Up at Night | Amnic
Explore 24 real cloud cost horror stories shared by Redditors that includes runaway autoscaling, forgotten logs that racked up six-figure bills. A must-read for anyone managing cloud costs.
GlueKube: Kubernetes integration test with ansible and molecule
https://medium.com/@GlueOps/gluekube-kubernetes-integration-test-with-molecule-f88da7c41a34
https://redd.it/1okp4bi
@r_devops
https://medium.com/@GlueOps/gluekube-kubernetes-integration-test-with-molecule-f88da7c41a34
https://redd.it/1okp4bi
@r_devops
Medium
GlueKube: Kubernetes integration test with molecule
At GlueOps, we have been working on an internal tool to deploy and manage Kubernetes clusters across cloud providers and datacenters…
Do developers actually trust AI to do marketing?
Developers definitely understand the pros and cons of AI better than most people. Do AI companies or developers actually trust AI tools when it comes to marketing?
I’ve noticed that a lot of so-called “AI-powered” marketing products are pretty bad in practice, and it sometimes feels like they’re just trying to ride the hype.
Would love to hear what others think.
https://redd.it/1okr3n5
@r_devops
Developers definitely understand the pros and cons of AI better than most people. Do AI companies or developers actually trust AI tools when it comes to marketing?
I’ve noticed that a lot of so-called “AI-powered” marketing products are pretty bad in practice, and it sometimes feels like they’re just trying to ride the hype.
Would love to hear what others think.
https://redd.it/1okr3n5
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Tangent: Log processing without DSLs (built on Rust & WebAssembly)
https://github.com/telophasehq/tangent/
Hey y'all – The problem Ive been dealing with is that each company I work at implements many of the same log transformations. Additionally, LLMs are much better at writing python and go than DSLs.
WASM has recently made major performance improvements (with more exciting things to come like async!) and it felt like a good time to experiment to see if we could build a better pipeline on top of it.
Check it out and let me know what you think :)
https://redd.it/1okv8ys
@r_devops
https://github.com/telophasehq/tangent/
Hey y'all – The problem Ive been dealing with is that each company I work at implements many of the same log transformations. Additionally, LLMs are much better at writing python and go than DSLs.
WASM has recently made major performance improvements (with more exciting things to come like async!) and it felt like a good time to experiment to see if we could build a better pipeline on top of it.
Check it out and let me know what you think :)
https://redd.it/1okv8ys
@r_devops
GitHub
GitHub - telophasehq/tangent: Stream processing with real languages, not DSLs.
Stream processing with real languages, not DSLs. Contribute to telophasehq/tangent development by creating an account on GitHub.
How do you get secrets into VMs without baking them into the image?
Hey folks,
I’m used to working with AWS, where you can just attach an instance profile and have the instance securely pull secrets from Secrets Manager or SSM Parameter Store without hardcoding anything.
Now I’m working in DigitalOcean, and that model doesn’t translate well. I’m using Infisical for secret management, but I’m trying to figure out the best way to get those secrets into my droplets securely at boot time — without baking them into the AMI or passing them as plain user data.
So I’m curious:
How do you all handle secret injection in environments like DigitalOcean, Hetzner, or other non-AWS clouds?
How do you handle initial authentication when there’s no instance identity mechanism like AWS provides?
https://redd.it/1okxnz4
@r_devops
Hey folks,
I’m used to working with AWS, where you can just attach an instance profile and have the instance securely pull secrets from Secrets Manager or SSM Parameter Store without hardcoding anything.
Now I’m working in DigitalOcean, and that model doesn’t translate well. I’m using Infisical for secret management, but I’m trying to figure out the best way to get those secrets into my droplets securely at boot time — without baking them into the AMI or passing them as plain user data.
So I’m curious:
How do you all handle secret injection in environments like DigitalOcean, Hetzner, or other non-AWS clouds?
How do you handle initial authentication when there’s no instance identity mechanism like AWS provides?
https://redd.it/1okxnz4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
do you guys still code, or just debug what ai writes?
lately at work i’ve been using ChatGPT, Cosine, and sometimes Claude to speed up feature work. it’s great half my commits are ready in hours instead of days. but sometimes i look at the codebase and realize i barely remember how certain parts even work.
it’s like my role slowly shifted from developer to prompt engineer. i’m mostly reviewing, debugging, and refactoring what the bot spits out.
curious how others feel
https://redd.it/1okz9hc
@r_devops
lately at work i’ve been using ChatGPT, Cosine, and sometimes Claude to speed up feature work. it’s great half my commits are ready in hours instead of days. but sometimes i look at the codebase and realize i barely remember how certain parts even work.
it’s like my role slowly shifted from developer to prompt engineer. i’m mostly reviewing, debugging, and refactoring what the bot spits out.
curious how others feel
https://redd.it/1okz9hc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Non-vscode AI agents
Hi guys, recently my claude sonnet 4 disappeared from vscode. Can anyone help me?
He literally wrote the code for me on the front-end, then I could calmly develop the back-end.
If anyone has another agent alternative that can write, update, edit, delete, etc. in vacode or another ide. Thanks
https://redd.it/1okyltc
@r_devops
Hi guys, recently my claude sonnet 4 disappeared from vscode. Can anyone help me?
He literally wrote the code for me on the front-end, then I could calmly develop the back-end.
If anyone has another agent alternative that can write, update, edit, delete, etc. in vacode or another ide. Thanks
https://redd.it/1okyltc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Tell me if I'm in the wrong here
Context: I work on a very large contract. The different technical disciplines are broken up into authoritative departments. I'm on Platform Engineering. We're responsible for building application images and deploying them. There is also a Cybersecurity team, which largely sets policy and pushes out requests for patches and such.
Before I explain this process I offer this disclaimer: I know this process is crap. I hate it and I'm working very hard to change it. But as it stands now, this is what they ask me to do:
We are asked by the CSD team about every 3 months to take the newest CPU base image from WebLogic and run pipelines that build images for each of the apps on a specific cluster. You read that right - cluster. Why? Well, because instead of injecting the .ear file at runtime, they build an image with a very long-ass tag name that has the base image, the specific app and the specific app version on it. These pipelines call to a configuration management database which says "Here is the image name and version" and uses that to make an individual tailored image for that.
After that's done, they have a "mass deploy" pipeline which then deploys the snowflake images for dozens of applications into a Kubernetes cluster.
Now, this is where I get pissed.
I played nice and did the mass build pipeline. However, because its a fucking convoluted process I missed a step and had to re-run it. It takes like 3 hours every time it runs because its Jenkins. (Another huge problem.) This delayed my timeline according to CSD and they were already getting hot and bothered by it. However, after the success of building all those images, I decided this was where I take my stand. I said I would not deploy all these apps to our development cluster. Instead, I would rather that we deploy a few apps and scream-test them with some dev teams. Why? Because we have NO FUCKING QA. We just expect its gonna work. I am not gonna do that.
That didn't make CSD happy but they played along until I said I wasn't going to run the mass deploy pipeline on a Friday afternoon on Halloween. They wanted me to run it because "It's just dev" and "It's no big deal". To me, it is a big deal, because if we plan to promote to the test cluster on Monday, I want more time from the devs to give me feedback. I want testing of the pods and dependent services. I want some actual feedback that we have spot checked scenarios before they make their way up to prod. Dev would be the place to catch it before it gets out of hand because if we find something we promoted to test is wrong then we now have twice as many apps to rollback. The devs also have families too. I'm not going to put more stress on them because the CSD wanted to rush something out.
Anyway, CSD is now tussling with my boss because I unplugged my computer and went home. I am going to play video games the rest of the day and then go trick or treating with my kids. They can have some other sucker do their dirty work.
But am I wrong? Didn't I make a mountain out of a molehill? Or am I correct that this is a disaster waiting to happen and I need to draw the line in the sand here and now?
https://redd.it/1ol38s2
@r_devops
Context: I work on a very large contract. The different technical disciplines are broken up into authoritative departments. I'm on Platform Engineering. We're responsible for building application images and deploying them. There is also a Cybersecurity team, which largely sets policy and pushes out requests for patches and such.
Before I explain this process I offer this disclaimer: I know this process is crap. I hate it and I'm working very hard to change it. But as it stands now, this is what they ask me to do:
We are asked by the CSD team about every 3 months to take the newest CPU base image from WebLogic and run pipelines that build images for each of the apps on a specific cluster. You read that right - cluster. Why? Well, because instead of injecting the .ear file at runtime, they build an image with a very long-ass tag name that has the base image, the specific app and the specific app version on it. These pipelines call to a configuration management database which says "Here is the image name and version" and uses that to make an individual tailored image for that.
After that's done, they have a "mass deploy" pipeline which then deploys the snowflake images for dozens of applications into a Kubernetes cluster.
Now, this is where I get pissed.
I played nice and did the mass build pipeline. However, because its a fucking convoluted process I missed a step and had to re-run it. It takes like 3 hours every time it runs because its Jenkins. (Another huge problem.) This delayed my timeline according to CSD and they were already getting hot and bothered by it. However, after the success of building all those images, I decided this was where I take my stand. I said I would not deploy all these apps to our development cluster. Instead, I would rather that we deploy a few apps and scream-test them with some dev teams. Why? Because we have NO FUCKING QA. We just expect its gonna work. I am not gonna do that.
That didn't make CSD happy but they played along until I said I wasn't going to run the mass deploy pipeline on a Friday afternoon on Halloween. They wanted me to run it because "It's just dev" and "It's no big deal". To me, it is a big deal, because if we plan to promote to the test cluster on Monday, I want more time from the devs to give me feedback. I want testing of the pods and dependent services. I want some actual feedback that we have spot checked scenarios before they make their way up to prod. Dev would be the place to catch it before it gets out of hand because if we find something we promoted to test is wrong then we now have twice as many apps to rollback. The devs also have families too. I'm not going to put more stress on them because the CSD wanted to rush something out.
Anyway, CSD is now tussling with my boss because I unplugged my computer and went home. I am going to play video games the rest of the day and then go trick or treating with my kids. They can have some other sucker do their dirty work.
But am I wrong? Didn't I make a mountain out of a molehill? Or am I correct that this is a disaster waiting to happen and I need to draw the line in the sand here and now?
https://redd.it/1ol38s2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
API first vs GUI for 3rd party services
Your teams decided to buy a new tool to solve a problem. You have narrowed down the options to
Tool A:
Minimal UI, Mainly API driven, good docs and sdks
Tool B:
Nearly all work is done inside the tool UI either browser based or desktop app. Minimal APIs exposed no sdks
Assume all the features are the same it’s just the way you interact with the tool. What one are you advocating for? What one do you see your team adopting?
https://redd.it/1okyiqz
@r_devops
Your teams decided to buy a new tool to solve a problem. You have narrowed down the options to
Tool A:
Minimal UI, Mainly API driven, good docs and sdks
Tool B:
Nearly all work is done inside the tool UI either browser based or desktop app. Minimal APIs exposed no sdks
Assume all the features are the same it’s just the way you interact with the tool. What one are you advocating for? What one do you see your team adopting?
https://redd.it/1okyiqz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
"Validate problems before rushing into tools, frameworks etc" quote
Weird question and sorry that it's probably inappropriate for the sub, but someone posted an image of this lady in a (platform?) convention with a caption that goes something like the noscript.
To be honest I can't even remember if it were posted here or in r/kubernetes, I did try to find it myself but to no avail. Does it ring a bell to anyone? I would really like to watch the presentation myself, or at the very least find the image itself. Thanks!
https://redd.it/1ol5g1z
@r_devops
Weird question and sorry that it's probably inappropriate for the sub, but someone posted an image of this lady in a (platform?) convention with a caption that goes something like the noscript.
To be honest I can't even remember if it were posted here or in r/kubernetes, I did try to find it myself but to no avail. Does it ring a bell to anyone? I would really like to watch the presentation myself, or at the very least find the image itself. Thanks!
https://redd.it/1ol5g1z
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
launching my new side project pipedash today - a desktop app for managing ci/cd pipelines from multiple providers
ideally we'd just use one ci/cd platform for everything and this wouldn't need to exist. but most of us deal with multiple platforms and i kept forgetting which pipeline was where. got tired of it so i built this.
it's new and still rough around the edges, so bugs will happen... if you run into any, just open an issue. drop a star if it helps :D
https://github.com/hcavarsan/pipedash
https://redd.it/1olc3q4
@r_devops
ideally we'd just use one ci/cd platform for everything and this wouldn't need to exist. but most of us deal with multiple platforms and i kept forgetting which pipeline was where. got tired of it so i built this.
it's new and still rough around the edges, so bugs will happen... if you run into any, just open an issue. drop a star if it helps :D
https://github.com/hcavarsan/pipedash
https://redd.it/1olc3q4
@r_devops
GitHub
GitHub - hcavarsan/pipedash: Manage CI/CD pipelines from multiple providers (self hosted and desktop app)
Manage CI/CD pipelines from multiple providers (self hosted and desktop app) - GitHub - hcavarsan/pipedash: Manage CI/CD pipelines from multiple providers (self hosted and desktop app)
LDAP Injection: The Forgotten Injection Attack on Enterprise Authentication 🏢
https://instatunnel.my/blog/ldap-injection-the-forgotten-injection-attack-on-enterprise-authentication
https://redd.it/1olfb06
@r_devops
https://instatunnel.my/blog/ldap-injection-the-forgotten-injection-attack-on-enterprise-authentication
https://redd.it/1olfb06
@r_devops
InstaTunnel
LDAP Injection: The Forgotten Enterprise Security Threat
Discover how LDAP injection attacks bypass Active Directory authentication, extract user data, and escalate privileges.Learn prevention techniques and realworld
I need your advice/feedback on "webhooks as a service" platforms
Hello everyone,
About a year ago, I started a side project to create a "Webhook as a Service" platform. Essentially, it lets you create a proxy between services that send webhooks to your API-like Stripe, GitHub, Shopify, and redirects them to multiple destinations (your API, Slack, …).
All of this with automatic retries, filters, payload transformation with JavaScript, monitoring, and alerts.
Additionally, I built a webhook inspector, a tool to simply debug webhooks and visualise the headers, body, etc.
The problem is that the vast majority of users are only using the webhook inspector.
I know there are already some competitors in this sector, but, as developers or infrastructure engineers, do you see this as something useful? Or should I pivot Hooklistener to something else?
Thanks to everyone for the feedback.
https://redd.it/1olk3s4
@r_devops
Hello everyone,
About a year ago, I started a side project to create a "Webhook as a Service" platform. Essentially, it lets you create a proxy between services that send webhooks to your API-like Stripe, GitHub, Shopify, and redirects them to multiple destinations (your API, Slack, …).
All of this with automatic retries, filters, payload transformation with JavaScript, monitoring, and alerts.
Additionally, I built a webhook inspector, a tool to simply debug webhooks and visualise the headers, body, etc.
The problem is that the vast majority of users are only using the webhook inspector.
I know there are already some competitors in this sector, but, as developers or infrastructure engineers, do you see this as something useful? Or should I pivot Hooklistener to something else?
Thanks to everyone for the feedback.
https://redd.it/1olk3s4
@r_devops
Hooklistener
Hooklistener - Webhook Gateway & Visual Debugger | Reliable Event Delivery
Queue, transform, and route events with automatic retries, monitoring, and alerts — plus a built-in visual webhook debugger.
The plane that crashed because of a light bulb - and what it teaches about DevOps focus
In December 1972, Eastern Air Lines Flight 401 was on final approach to Miami when a small green light failed to illuminate. That tiny bulb indicated whether the nose landing gear was locked. The crew couldn’t confirm if the landing gear was down, so they climbed to 2,000 feet to troubleshoot.
All three flight crew members became fixated on that light. While they pulled apart the panel to check the bulb, the captain accidentally nudged the control yoke, which disengaged the autopilot’s altitude hold. Slowly, silently, the aircraft began to descend. Nobody noticed. The warning chime sounded, the altimeter unwound-but everyone’s attention was still on the light. Minutes later, the wide-body jet slammed into the Everglades, killing more than 100 people 1.
The landing gear was fine. It was just the bulb. The crash happened because nobody was “flying the plane.”
For DevOps and SRE teams, this is a hauntingly familiar pattern. During incidents, we sometimes fixate on one metric, one alert, one suspicious log line-while the real problem is unfolding elsewhere. Flight 401’s lesson is simple but deep: when pressure mounts, someone must always keep an eye on the system’s overall health. In aviation, they call it “Aviate, Navigate, Communicate.” In operations, it’s “Stabilize, Observe, Diagnose.”
Have clear roles. Designate an incident commander whose job is to maintain situational awareness. Don’t let a small mystery consume all attention while the system degrades unnoticed. Above all, remember to fly the plane.
I’ve explored more incidents like this-and what software teams can learn from aviation’s culture of safety-in my book Code from the Cockpit link below. But even if you never read it, I hope Flight 401’s story stays with you next time an alert goes off.
Sources:
1 National Transportation Safety Board, Aircraft Accident Report NTSB/AAR-73-14, “Eastern Air Lines L-1011, N310EA, Miami, Florida, December 29, 1972” (Official Investigation Report)
Book reference: Code from the Cockpit – What Software Engineering Can Learn from Aviation Disasters (https://www.amazon.com/dp/B0FKTV3NX2)
https://redd.it/1oll8zb
@r_devops
In December 1972, Eastern Air Lines Flight 401 was on final approach to Miami when a small green light failed to illuminate. That tiny bulb indicated whether the nose landing gear was locked. The crew couldn’t confirm if the landing gear was down, so they climbed to 2,000 feet to troubleshoot.
All three flight crew members became fixated on that light. While they pulled apart the panel to check the bulb, the captain accidentally nudged the control yoke, which disengaged the autopilot’s altitude hold. Slowly, silently, the aircraft began to descend. Nobody noticed. The warning chime sounded, the altimeter unwound-but everyone’s attention was still on the light. Minutes later, the wide-body jet slammed into the Everglades, killing more than 100 people 1.
The landing gear was fine. It was just the bulb. The crash happened because nobody was “flying the plane.”
For DevOps and SRE teams, this is a hauntingly familiar pattern. During incidents, we sometimes fixate on one metric, one alert, one suspicious log line-while the real problem is unfolding elsewhere. Flight 401’s lesson is simple but deep: when pressure mounts, someone must always keep an eye on the system’s overall health. In aviation, they call it “Aviate, Navigate, Communicate.” In operations, it’s “Stabilize, Observe, Diagnose.”
Have clear roles. Designate an incident commander whose job is to maintain situational awareness. Don’t let a small mystery consume all attention while the system degrades unnoticed. Above all, remember to fly the plane.
I’ve explored more incidents like this-and what software teams can learn from aviation’s culture of safety-in my book Code from the Cockpit link below. But even if you never read it, I hope Flight 401’s story stays with you next time an alert goes off.
Sources:
1 National Transportation Safety Board, Aircraft Accident Report NTSB/AAR-73-14, “Eastern Air Lines L-1011, N310EA, Miami, Florida, December 29, 1972” (Official Investigation Report)
Book reference: Code from the Cockpit – What Software Engineering Can Learn from Aviation Disasters (https://www.amazon.com/dp/B0FKTV3NX2)
https://redd.it/1oll8zb
@r_devops
Tooling price rises
Hey,
Who here runs a lab environment to practice coding/DevOps techs?
I have an environment with TeamCity, Octopus Deploy, Prometheus, k3s, etc.
However, has anyone noticed the constant price rises in tooling?
Octopus Deploy went up (there's threads here from a year or two ago).
TeamCity renewal licensing has changed.
And for a lot of system admin tooling, likewise, eg Veeam and VMWare.
It makes running a lab environment difficult.
https://redd.it/1olmixw
@r_devops
Hey,
Who here runs a lab environment to practice coding/DevOps techs?
I have an environment with TeamCity, Octopus Deploy, Prometheus, k3s, etc.
However, has anyone noticed the constant price rises in tooling?
Octopus Deploy went up (there's threads here from a year or two ago).
TeamCity renewal licensing has changed.
And for a lot of system admin tooling, likewise, eg Veeam and VMWare.
It makes running a lab environment difficult.
https://redd.it/1olmixw
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
API Gateway horror stories?
Recently came over a post mentioning that if API endpoint gets discovered by a mischievous bot - it may drain lots of funds off your account. Could somebody explain please?
And maybe stories from own experience? Thanks all!
https://redd.it/1oljk5m
@r_devops
Recently came over a post mentioning that if API endpoint gets discovered by a mischievous bot - it may drain lots of funds off your account. Could somebody explain please?
And maybe stories from own experience? Thanks all!
https://redd.it/1oljk5m
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I created an Open Source tool to fork Kubernetes environments it is like "Git Fork" but for k8s.
Hi Folks,
I created an open-source tool that lets you create, fork, and hibernate entire Kubernetes environments.
With Forkspacer, you can fork your deployments while also migrating your data.. not just the manifests, but the entire data plane as well. We support different modes of forking: by default, every fork spins up a managed, dedicated virtual cluster, but you can also point the destination of your fork to a self-managed cluster. You can even set up multi-cloud environments and fork an environment from one provider (e.g., AWS) to another (e.g., GKE, AKE, or on-prem).
You can clone full setups, test changes in isolation, and automatically hibernate idle workspaces to save resources all declaratively, with GitOps-style reproducibility.
It’s especially useful for spinning up dev, test, pre-prod, and prod environments, and for teams where each developer needs a personal, forked environment from a shared baseline.
License is Apace 2.0 and it is written in Go using Kubebuilder SDK
***https://github.com/forkspacer/forkspacer*** \- source code
Please give it a try let me know, thank you
https://redd.it/1olo2l4
@r_devops
Hi Folks,
I created an open-source tool that lets you create, fork, and hibernate entire Kubernetes environments.
With Forkspacer, you can fork your deployments while also migrating your data.. not just the manifests, but the entire data plane as well. We support different modes of forking: by default, every fork spins up a managed, dedicated virtual cluster, but you can also point the destination of your fork to a self-managed cluster. You can even set up multi-cloud environments and fork an environment from one provider (e.g., AWS) to another (e.g., GKE, AKE, or on-prem).
You can clone full setups, test changes in isolation, and automatically hibernate idle workspaces to save resources all declaratively, with GitOps-style reproducibility.
It’s especially useful for spinning up dev, test, pre-prod, and prod environments, and for teams where each developer needs a personal, forked environment from a shared baseline.
License is Apace 2.0 and it is written in Go using Kubebuilder SDK
***https://github.com/forkspacer/forkspacer*** \- source code
Please give it a try let me know, thank you
https://redd.it/1olo2l4
@r_devops
GitHub
GitHub - forkspacer/forkspacer: Forkspacer is a Kubernetes-native tool for orchestrating and managing workspaces with modular environments…
Forkspacer is a Kubernetes-native tool for orchestrating and managing workspaces with modular environments and automation hooks. - forkspacer/forkspacer
Understanding Docker Multi-platform Builds with QEMU
https://cefboud.com/posts/qemu-virtualzation-docker-multi-build/
https://redd.it/1olod4p
@r_devops
https://cefboud.com/posts/qemu-virtualzation-docker-multi-build/
https://redd.it/1olod4p
@r_devops
Moncef Abboud
Inside Multi-Platform Docker Builds with QEMU
A deep dive into how Docker uses QEMU and binfmt-misc to build and run multi-architecture container images, enabling cross-platform support for x86, ARM, and beyond.