What’s the best way to practice DevOps tools? I built something for beginners + need your thoughts
A lot of people entering DevOps keep asking the same question:
“Where can I practice CI/CD, Kubernetes, Terraform, etc. without paying for a bootcamp?”
Instead of repeating answers, I ended up building a small learning hub that has:
Free DevOps tutorials blogs
Hands-on practice challenges
Simple explanations of complex tools
Mini projects for beginners
If any of you are willing to take a look and tell me what’s good/bad/missing, I’d appreciate it:
**https://thedevopsworld.com**
Not selling anything — just trying to make a genuinely useful practice resource for newcomers to our field.
it will always remain free and with no intentions of making money.
Would love your suggestions on features, topics, or improvements!
https://redd.it/1pjn6aj
@r_devops
A lot of people entering DevOps keep asking the same question:
“Where can I practice CI/CD, Kubernetes, Terraform, etc. without paying for a bootcamp?”
Instead of repeating answers, I ended up building a small learning hub that has:
Free DevOps tutorials blogs
Hands-on practice challenges
Simple explanations of complex tools
Mini projects for beginners
If any of you are willing to take a look and tell me what’s good/bad/missing, I’d appreciate it:
**https://thedevopsworld.com**
Not selling anything — just trying to make a genuinely useful practice resource for newcomers to our field.
it will always remain free and with no intentions of making money.
Would love your suggestions on features, topics, or improvements!
https://redd.it/1pjn6aj
@r_devops
DevOps Worlds
DevOps Worlds - Master DevOps with 800+ Practice Problems
Learn Kubernetes, Docker, Terraform, AWS, CI/CD, and more with hands-on tutorials. Free forever for all DevOps learners.
blackblaze or aws s3 or google cloud storage for photo hosting website?
I have a client who want to make photography hosting website. He has tons of images(\~20MB/image), all those photos can be viewed around the world. what is the best cloud option for storing photos?
thx
https://redd.it/1pjn652
@r_devops
I have a client who want to make photography hosting website. He has tons of images(\~20MB/image), all those photos can be viewed around the world. what is the best cloud option for storing photos?
thx
https://redd.it/1pjn652
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Anyone tried the Debug Mode for coding agents? Does it change anything?
I'm not sure if I can mention the editor's name here. Anyway, they've released a new feature called Debug Mode.
>Coding agents are great at lots of things, but some bugs consistently stump them. That's why we're introducing Debug Mode, an entirely new agent loop built around runtime information and human verification.
>How it works
>1. Describe the bug - Select Debug Mode and describe the issue. The agent generates hypotheses and adds logging.
>2. Reproduce the bug - Trigger the bug while the agent collects runtime data (variable states, execution paths, timing).
>3. Verify the fix - Test the proposed fix. If it works, the agent removes instrumentation. If not, it refines and tries again.
What do you all think about how useful this feature is in actual debugging processes?
I think debugging is definitely one of the biggest pain points when using coding agents. This approach stabilizes what was already being done in the agent loop.
But when I'm debugging, I don't want to describe so much context, and sometimes bugs are hard to reproduce. So, I previously created an editor extension that can continuously access runtime context, which means I don't have to make the agent waste tokens by adding logs—just send the context directly to the agent to fix the bug.
I guess they won't implement something like that, since it would save too much on quotas, lol.
https://redd.it/1pjo2zz
@r_devops
I'm not sure if I can mention the editor's name here. Anyway, they've released a new feature called Debug Mode.
>Coding agents are great at lots of things, but some bugs consistently stump them. That's why we're introducing Debug Mode, an entirely new agent loop built around runtime information and human verification.
>How it works
>1. Describe the bug - Select Debug Mode and describe the issue. The agent generates hypotheses and adds logging.
>2. Reproduce the bug - Trigger the bug while the agent collects runtime data (variable states, execution paths, timing).
>3. Verify the fix - Test the proposed fix. If it works, the agent removes instrumentation. If not, it refines and tries again.
What do you all think about how useful this feature is in actual debugging processes?
I think debugging is definitely one of the biggest pain points when using coding agents. This approach stabilizes what was already being done in the agent loop.
But when I'm debugging, I don't want to describe so much context, and sometimes bugs are hard to reproduce. So, I previously created an editor extension that can continuously access runtime context, which means I don't have to make the agent waste tokens by adding logs—just send the context directly to the agent to fix the bug.
I guess they won't implement something like that, since it would save too much on quotas, lol.
https://redd.it/1pjo2zz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Inherited a legacy project with zero API docs any fast way to map all endpoints?
I just inherited a 5-year-old legacy project and found out… there’s zero API documentation.
No Swagger/OpenAPI, no Postman collections, and the frontend is full of hardcoded URLs.
Manually tracing every endpoint is possible, but realistically it would take days.
Before I spend the whole week digging through the codebase, I wanted to ask:
Is there a fast, reliable way to generate API documentation from an existing system?
I’ve seen people mention packet-capture workflows (mitmproxy, Fiddler, Apidog’s capture mode, etc.) where you run the app and let the tool record all HTTP requests, then turn them into structured API docs.
Has anyone here tried this on a legacy service?
Did it help, or did it create more noise than value?
I’d love to hear how DevOps/infra teams handle undocumented backend systems in the real world.
https://redd.it/1pjqnqi
@r_devops
I just inherited a 5-year-old legacy project and found out… there’s zero API documentation.
No Swagger/OpenAPI, no Postman collections, and the frontend is full of hardcoded URLs.
Manually tracing every endpoint is possible, but realistically it would take days.
Before I spend the whole week digging through the codebase, I wanted to ask:
Is there a fast, reliable way to generate API documentation from an existing system?
I’ve seen people mention packet-capture workflows (mitmproxy, Fiddler, Apidog’s capture mode, etc.) where you run the app and let the tool record all HTTP requests, then turn them into structured API docs.
Has anyone here tried this on a legacy service?
Did it help, or did it create more noise than value?
I’d love to hear how DevOps/infra teams handle undocumented backend systems in the real world.
https://redd.it/1pjqnqi
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Protecting your own machine
Hi all. I've been promoted (if that's the proper word) to devops after 20+ years of being a developer, so I'm learning a lot of stuff on the fly...
One of the things I wouldn't like to learn the hard way is how to protect your own machine (the one holding the access keys). My passwords are in a password manager, my ssh keys are passphrase protected, i pull the repos in a virtual machine... What else can and should I do? I'm really afraid that some of these junior devs will download some malicious library and fuck everything up.
https://redd.it/1pjt2bj
@r_devops
Hi all. I've been promoted (if that's the proper word) to devops after 20+ years of being a developer, so I'm learning a lot of stuff on the fly...
One of the things I wouldn't like to learn the hard way is how to protect your own machine (the one holding the access keys). My passwords are in a password manager, my ssh keys are passphrase protected, i pull the repos in a virtual machine... What else can and should I do? I'm really afraid that some of these junior devs will download some malicious library and fuck everything up.
https://redd.it/1pjt2bj
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Workload on GKE: Migrating from Zonal to Regional Persistent Disk for true Multi-Zone
Hey folks,
I'm running Jenkins on GKE as a StatefulSet with a 2.5TB persistent volume, and I'm trying to achieve true high availability across multiple zones.
**Current Setup:**
* Jenkins StatefulSet in `devops-tools` namespace
* Node pool currently in `us-central1-a`, adding `us-central1-b`
* PVC using `premium-rwo` StorageClass (pd-ssd)
* The underlying PV has `nodeAffinity` locked to `us-central1-a`
**The Problem:** The PersistentVolume is zonal (pinned to us-central1-a), which means my Jenkins pod can only schedule on nodes in that zone. This defeats the purpose of having a multi-zone node pool.
**What I'm Considering:**
Migrate to a regional persistent disk (replicated across us-central1-a and us-central1-b)
**Questions:**
* Has anyone successfully migrated a large PV from zonal to regional on GKE? Any gotchas?
* What's the typical downtime window for creating a snapshot and provisioning a \~2.5TB regional disk?
* Are there better approaches I'm missing for achieving HA with StatefulSets in GKE?
The regional disk approach seems cleanest (snapshot → create regional disk → update PVC), but I'd love to hear from anyone who's done this in production before committing to the migration.
Thanks!
https://redd.it/1pju1b3
@r_devops
Hey folks,
I'm running Jenkins on GKE as a StatefulSet with a 2.5TB persistent volume, and I'm trying to achieve true high availability across multiple zones.
**Current Setup:**
* Jenkins StatefulSet in `devops-tools` namespace
* Node pool currently in `us-central1-a`, adding `us-central1-b`
* PVC using `premium-rwo` StorageClass (pd-ssd)
* The underlying PV has `nodeAffinity` locked to `us-central1-a`
**The Problem:** The PersistentVolume is zonal (pinned to us-central1-a), which means my Jenkins pod can only schedule on nodes in that zone. This defeats the purpose of having a multi-zone node pool.
**What I'm Considering:**
Migrate to a regional persistent disk (replicated across us-central1-a and us-central1-b)
**Questions:**
* Has anyone successfully migrated a large PV from zonal to regional on GKE? Any gotchas?
* What's the typical downtime window for creating a snapshot and provisioning a \~2.5TB regional disk?
* Are there better approaches I'm missing for achieving HA with StatefulSets in GKE?
The regional disk approach seems cleanest (snapshot → create regional disk → update PVC), but I'd love to hear from anyone who's done this in production before committing to the migration.
Thanks!
https://redd.it/1pju1b3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Droplets compromised!!!
Hi everyone,
I’m dealing with a server security issue and wanted to explain what happened to get some opinions.
I had two different DigitalOcean droplets that were both flagged by DigitalOcean for sending DDoS traffic. This means the droplets were compromised and used as part of a botnet attack.
The strange thing is that I had already hardened SSH on both servers:
SSH key authentication only
Password login disabled
Root SSH login disabled
So SSH access should not have been possible.
After investigating inside the server, I found a malware process running as root from the /dev directory, and it kept respawning under different names. I also saw processes running that were checking for cryptomining signatures, which suggests the machine was infected with a mining botnet.
This makes me believe that the attacker didn’t get in through SSH, but instead through my application — I had a Node/Next.js server exposed on port 3000, and it was running as root. So it was probably an application-level vulnerability or an exposed service that got exploited, not an SSH breach.
At this point I’m planning to back up my data, destroy the droplet, and rebuild everything with stricter security (non-root user, close all ports except 22/80/443, Nginx reverse proxy, fail2ban, firewall rules, etc.).
If anyone has seen this type of attack before or has suggestions on how to prevent it in the future, I’d appreciate any insights.
https://redd.it/1pjujj7
@r_devops
Hi everyone,
I’m dealing with a server security issue and wanted to explain what happened to get some opinions.
I had two different DigitalOcean droplets that were both flagged by DigitalOcean for sending DDoS traffic. This means the droplets were compromised and used as part of a botnet attack.
The strange thing is that I had already hardened SSH on both servers:
SSH key authentication only
Password login disabled
Root SSH login disabled
So SSH access should not have been possible.
After investigating inside the server, I found a malware process running as root from the /dev directory, and it kept respawning under different names. I also saw processes running that were checking for cryptomining signatures, which suggests the machine was infected with a mining botnet.
This makes me believe that the attacker didn’t get in through SSH, but instead through my application — I had a Node/Next.js server exposed on port 3000, and it was running as root. So it was probably an application-level vulnerability or an exposed service that got exploited, not an SSH breach.
At this point I’m planning to back up my data, destroy the droplet, and rebuild everything with stricter security (non-root user, close all ports except 22/80/443, Nginx reverse proxy, fail2ban, firewall rules, etc.).
If anyone has seen this type of attack before or has suggestions on how to prevent it in the future, I’d appreciate any insights.
https://redd.it/1pjujj7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I am currently finishing my college degree in germany. Any advice on future career path?
Next month I will graduate and wanted to hear advice on what kind of field is advancing and preferbly secure and accessible in germany?
I am a decent student. Not the best. But my biggest interests were in theoritcle and math orientated classes. But I am willing to delv my knowledge into any direction.
I don‘t know how much should I fear AI development in terms of job security. But I would like to hear some advice for the future if somebody has anything to give?
https://redd.it/1pjvequ
@r_devops
Next month I will graduate and wanted to hear advice on what kind of field is advancing and preferbly secure and accessible in germany?
I am a decent student. Not the best. But my biggest interests were in theoritcle and math orientated classes. But I am willing to delv my knowledge into any direction.
I don‘t know how much should I fear AI development in terms of job security. But I would like to hear some advice for the future if somebody has anything to give?
https://redd.it/1pjvequ
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Do you use Postman to monitor your APIs?
As a developer who recently started using Postman and primarily uses it only to create collections and do some manual testing, I was wondering if is also helpful to monitor API health and performance.
View Poll
https://redd.it/1pjxrbr
@r_devops
As a developer who recently started using Postman and primarily uses it only to create collections and do some manual testing, I was wondering if is also helpful to monitor API health and performance.
View Poll
https://redd.it/1pjxrbr
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Rust and Go Malware: Cross-Platform Threats Evading Traditional Defenses 🦀
https://instatunnel.my/blog/rust-and-go-malware-cross-platform-threats-evading-traditional-defenses
https://redd.it/1pjz1f5
@r_devops
https://instatunnel.my/blog/rust-and-go-malware-cross-platform-threats-evading-traditional-defenses
https://redd.it/1pjz1f5
@r_devops
InstaTunnel
Rust and Go Malware:The Cross-Platform Threat Evading Modern
Learn why cybercriminals are adopting Rust and Go to build fast, cross-platform malware that bypasses traditional antivirus defenses.Discover detection challeng
We got 30 api access tickets per week, platform team became the bottleneck
Three months ago a PM stood up in standup and said "we can't ship because we're waiting on platform for api keys, this has been 4 days." Everyone went quiet and I felt my face get hot.Checked jira right after and 28 open tickets just for api access. Average close time was 4 days. We're 6 people supporting 14 product teams. Every ticket is the same and takes 2 hours per ticket spread over 3 days because everyone's in meetings. When I tracked it 60 hours of team time just doing manual api access, that's more than one full person.I told management we need to hire, but they said fix the process first. I tried some ideas like confluence docs but nobody reads them. Tried a spreadsheet with all the api keys but got out of date in 2 weeks. My lead engineer said "we need self service, like how stripe does it." I said yeah obviously but we don't have time to build that. He said we don't have time NOT to build it.So I did my research and readme does docs but not key management, we could built custom portal with react but gave up after realizing we would build an entire user management system. Looked at api management platforms but most were insane enterprise pricing, found something that had the developer portal built in. Set it up, connected our apis, took a month to completely set up gravitee, but it was the only thing that wasn't $50k per year and had self service built in. Rolled it out to 2 teams first as beta, they found bugs, we fixed it and rolled out to everyone. Took like a month and half to roll out to everyone, the tickets dropped to about 5 and they're weird stuff like "my team was deleted how do I recover it."If your platform team is drowning in api access tickets you have two options. Hire more people to do manual work or build self service. We're too small to hire so we had to build it, took way longer than I wanted but it worked.
https://redd.it/1pk17a7
@r_devops
Three months ago a PM stood up in standup and said "we can't ship because we're waiting on platform for api keys, this has been 4 days." Everyone went quiet and I felt my face get hot.Checked jira right after and 28 open tickets just for api access. Average close time was 4 days. We're 6 people supporting 14 product teams. Every ticket is the same and takes 2 hours per ticket spread over 3 days because everyone's in meetings. When I tracked it 60 hours of team time just doing manual api access, that's more than one full person.I told management we need to hire, but they said fix the process first. I tried some ideas like confluence docs but nobody reads them. Tried a spreadsheet with all the api keys but got out of date in 2 weeks. My lead engineer said "we need self service, like how stripe does it." I said yeah obviously but we don't have time to build that. He said we don't have time NOT to build it.So I did my research and readme does docs but not key management, we could built custom portal with react but gave up after realizing we would build an entire user management system. Looked at api management platforms but most were insane enterprise pricing, found something that had the developer portal built in. Set it up, connected our apis, took a month to completely set up gravitee, but it was the only thing that wasn't $50k per year and had self service built in. Rolled it out to 2 teams first as beta, they found bugs, we fixed it and rolled out to everyone. Took like a month and half to roll out to everyone, the tickets dropped to about 5 and they're weird stuff like "my team was deleted how do I recover it."If your platform team is drowning in api access tickets you have two options. Hire more people to do manual work or build self service. We're too small to hire so we had to build it, took way longer than I wanted but it worked.
https://redd.it/1pk17a7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A Production Incident Taught Me the Real Difference Between Git Token Types
We hit a strange issue during deployment last month. Our production was pulling code using a developer’s PAT.
That turned into a rabbit hole about which Git tokens are actually meant for humans vs machines.
Wrote down the learning in case others find it useful.
Link : https://medium.com/stackademic/git-authentication-tokens-explained-personal-access-token-vs-deploy-token-vs-other-tokens-f555e92b3918?sk=27b6dab0ff08fcb102c4215823168d7e
https://redd.it/1pk2nc7
@r_devops
We hit a strange issue during deployment last month. Our production was pulling code using a developer’s PAT.
That turned into a rabbit hole about which Git tokens are actually meant for humans vs machines.
Wrote down the learning in case others find it useful.
Link : https://medium.com/stackademic/git-authentication-tokens-explained-personal-access-token-vs-deploy-token-vs-other-tokens-f555e92b3918?sk=27b6dab0ff08fcb102c4215823168d7e
https://redd.it/1pk2nc7
@r_devops
Medium
Git Authentication Tokens Explained : Personal Access Token vs Deploy Token vs Other Tokens
A few months ago, I was helping a junior engineer fix a failing deployment. The app was running fine locally, but the production server…
I made a tool that lets AI see your screen and fix technical problems step-by-step
I got tired of fixing the same tech issues for people at work again and again.
Demo: **Creating an S3 bucket on Google Cloud**
So, I built Screen Vision. It’s an open source, browser-based app where you share your screen with an AI, and it gives you step-by-step instructions to solve your problem in real-time. It's like having an expert looking at your screen, giving you the exact steps to navigate software, change settings, or debug issues instantly.
Crucially: no user screen data is stored.
See the code: https://github.com/bullmeza/screen.vision
Would love to hear what you think! What frustrating problem would you use Screen Vision to solve first?
https://redd.it/1pk72bu
@r_devops
I got tired of fixing the same tech issues for people at work again and again.
Demo: **Creating an S3 bucket on Google Cloud**
So, I built Screen Vision. It’s an open source, browser-based app where you share your screen with an AI, and it gives you step-by-step instructions to solve your problem in real-time. It's like having an expert looking at your screen, giving you the exact steps to navigate software, change settings, or debug issues instantly.
Crucially: no user screen data is stored.
See the code: https://github.com/bullmeza/screen.vision
Would love to hear what you think! What frustrating problem would you use Screen Vision to solve first?
https://redd.it/1pk72bu
@r_devops
YouTube
Screen Vision Demo
Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.
Manual SBOM validation is killing my team, what base images are you folks using?
Current vendor requires manual SBOM validation for every image update. My team spends 15+ hours weekly cross-referencing CVE feeds against their bloated Ubuntu derivatives. 200+ packages per image, half we don't even use.
Need something with signed SBOMs that work, daily rebuilds, and minimal attack surface. Tired of vendors promising enterprise security then dumping manual processes on us.
Considered Chainguard but it became way too expensive for our scale. Heard of Minimus but my team is sceptical
What's working for you? Skip the marketing pitch please.
https://redd.it/1pk9rgg
@r_devops
Current vendor requires manual SBOM validation for every image update. My team spends 15+ hours weekly cross-referencing CVE feeds against their bloated Ubuntu derivatives. 200+ packages per image, half we don't even use.
Need something with signed SBOMs that work, daily rebuilds, and minimal attack surface. Tired of vendors promising enterprise security then dumping manual processes on us.
Considered Chainguard but it became way too expensive for our scale. Heard of Minimus but my team is sceptical
What's working for you? Skip the marketing pitch please.
https://redd.it/1pk9rgg
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Meta replaces SELinux with eBPF
SELinux was too slow for Meta so they replaced it with an eBPF based sandbox to safely run untrusted code.
bpfjailer handles things legacy MACs struggle with, like signed binary enforcement and deep protocol interception, without waiting for upstream kernel patches and without a measurable performance regressions across any workload/host type.
Full presentation here: https://lpc.events/event/19/contributions/2159/attachments/1833/3929/BpfJailer%20LPC%202025.pdf
https://redd.it/1pkhl58
@r_devops
SELinux was too slow for Meta so they replaced it with an eBPF based sandbox to safely run untrusted code.
bpfjailer handles things legacy MACs struggle with, like signed binary enforcement and deep protocol interception, without waiting for upstream kernel patches and without a measurable performance regressions across any workload/host type.
Full presentation here: https://lpc.events/event/19/contributions/2159/attachments/1833/3929/BpfJailer%20LPC%202025.pdf
https://redd.it/1pkhl58
@r_devops
I didn't like that cloud certificate practice exams cost money, so i built some free ones
https://exam-prep-6e334.web.app/
https://redd.it/1pklpt2
@r_devops
https://exam-prep-6e334.web.app/
https://redd.it/1pklpt2
@r_devops
Help troubleshooting Skopeo copy to GCP Artifact Registry
I wrote a small noscript that copies a list of public images to a private Artifact Registry account. I used skopeo and everything works on my local machine, but won't when run in the pipeline.
The error I see is reported below, and it seems to be related to the permissions of the service account used for skopeo but it is a artifactRegistry.admin...
https://redd.it/1pkmv3j
@r_devops
I wrote a small noscript that copies a list of public images to a private Artifact Registry account. I used skopeo and everything works on my local machine, but won't when run in the pipeline.
The error I see is reported below, and it seems to be related to the permissions of the service account used for skopeo but it is a artifactRegistry.admin...
time="2025-12-11T17:06:12Z" level=fatal msg="copying system image from manifest list: trying to reuse blob sha256:507427cecf82db8f5dc403dcb4802d090c9044954fae6f3622917a5ff1086238 at destination: checking whether a blob sha256:507427cecf82db8f5dc403dcb4802d090c9044954fae6f3622917a5ff1086238 exists in europe-west8-docker.pkg.dev/myregistry/bitnamilegacy/cert-manager: authentication required"
https://redd.it/1pkmv3j
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
EKS CI/CD security gates, too many false positives?
We’ve been trying this security gate in our EKS pipelines. It looks solid but its not… Webhook pushes risk scores and critical stuff into PRs. If certain IAM or S3 issues pop up, merges get blocked automatically. The problem is medium severity false positives keep breaking dev PRs. Old dependencies in non-prod namespaces constantly trip the gate. Custom Node.js policies help a bit, but tuning thresholds across prod, stage, and dev for five accounts is a nightmare. Feels like the tool slows devs down more than it protects production. Anyone here running EKS deploy gates? How do you cut the noise? Ideally, you only block criticals for assets that are actually exposed. Scripts or templates for multi-account policy inheritance would be amazing. Right now we poll
https://redd.it/1pko996
@r_devops
We’ve been trying this security gate in our EKS pipelines. It looks solid but its not… Webhook pushes risk scores and critical stuff into PRs. If certain IAM or S3 issues pop up, merges get blocked automatically. The problem is medium severity false positives keep breaking dev PRs. Old dependencies in non-prod namespaces constantly trip the gate. Custom Node.js policies help a bit, but tuning thresholds across prod, stage, and dev for five accounts is a nightmare. Feels like the tool slows devs down more than it protects production. Anyone here running EKS deploy gates? How do you cut the noise? Ideally, you only block criticals for assets that are actually exposed. Scripts or templates for multi-account policy inheritance would be amazing. Right now we poll
/api/v1/scans after Helm dry-run It works, but it’s clunky. Feels like we are bending CI/CD pipelines to fit the tool rather than the other way around. Any better approaches or tools that handle EKS pipelines cleanly?https://redd.it/1pko996
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
The agents I built are now someone elses problem
Two months since I left and I still get random anxiety about systems I dont own anymore
Did I ever actually document why that endpoint needs a retry with a 3 second sleep? Or did I just leave a comment that says "dont touch this". Pretty sure it was the comment.
Knowledge transfer was two weeks. Guy taking over seemed smart but had never worked with agents. Walked him through everything I could remember but so much context just lives in your head. Why certain prompts are phrased weird. Which integrations fail silently. That one thing that breaks on tuesdays for reasons I never figured out.
He messaged me once the first week asking about a config file and then nothing since. Either everything is fine or hes rebuilt it all or its on fire and nobody told me. I keep checking their status page like a psycho.
I know some of that code is bad. I know the docs have gaps. I know theres at least two hardcoded things I kept meaning to fix. Thats all someone elses problem now and I cant do anything about it.
Does this feeling go away or do you just collect ghosts from every job
https://redd.it/1pkrsm5
@r_devops
Two months since I left and I still get random anxiety about systems I dont own anymore
Did I ever actually document why that endpoint needs a retry with a 3 second sleep? Or did I just leave a comment that says "dont touch this". Pretty sure it was the comment.
Knowledge transfer was two weeks. Guy taking over seemed smart but had never worked with agents. Walked him through everything I could remember but so much context just lives in your head. Why certain prompts are phrased weird. Which integrations fail silently. That one thing that breaks on tuesdays for reasons I never figured out.
He messaged me once the first week asking about a config file and then nothing since. Either everything is fine or hes rebuilt it all or its on fire and nobody told me. I keep checking their status page like a psycho.
I know some of that code is bad. I know the docs have gaps. I know theres at least two hardcoded things I kept meaning to fix. Thats all someone elses problem now and I cant do anything about it.
Does this feeling go away or do you just collect ghosts from every job
https://redd.it/1pkrsm5
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Buildstash - Platform to organize, share, and distribute software binaries
We just launched a tool I'm working on called Buildstash. It's a platform for managing and sharing software binaries.
I'd worked across game dev, mobile apps, and agencies - and found every team had no real system for managing their built binaries. Often just dumped in a shared folder (if someone remembered!)
No proper system for versioning, keeping track of who'd signed off what when, or what exact build had gone to a client, etc.
Existing tools out there for managing build artifacts are really more focused on package repository management. But miss all the other types of software not being deployed that way.
That's the gap we'd seen and looked to solve with Buildstash. It's for organizing and distributing software binaries targeting any and all platforms, however they're deployed.
And we've really focused on the UX and making sure it's super easy to get setup - integrating with CI/CD or catching local builds, with a focus on making it accessible to teams of all sizes.
For mobile apps, it'll handle integrated beta distribution. For games, it has no problem with massive binaries targeting PC, consoles, or XR. Embedded teams who are keeping track of binaries across firmware, apps, and tools are also a great fit.
We launched open sign up on the product Monday and then another feature every day this week -
Today we launched Portals - a custom-branded space you can host on your website, and publish releases or entire build streams to your users. Think GitHub Releases but way more powerful. Or even think about any time you've seen some custom-built interface on a developers website for finding past builds by platform, looking through nightlies, viewing releases etc - Buildstash Portals can do all that out the box for you, customizable in a few minutes.
So that's the idea! I'd really love feedback from this community on what we've built so far / what you think we should focus on next?
- Here's a demo video - https://youtu.be/t4Fr6M_vIIc
- landing - https://buildstash.com
- and our GitHub - https://github.com/buildstash
https://redd.it/1pkslis
@r_devops
We just launched a tool I'm working on called Buildstash. It's a platform for managing and sharing software binaries.
I'd worked across game dev, mobile apps, and agencies - and found every team had no real system for managing their built binaries. Often just dumped in a shared folder (if someone remembered!)
No proper system for versioning, keeping track of who'd signed off what when, or what exact build had gone to a client, etc.
Existing tools out there for managing build artifacts are really more focused on package repository management. But miss all the other types of software not being deployed that way.
That's the gap we'd seen and looked to solve with Buildstash. It's for organizing and distributing software binaries targeting any and all platforms, however they're deployed.
And we've really focused on the UX and making sure it's super easy to get setup - integrating with CI/CD or catching local builds, with a focus on making it accessible to teams of all sizes.
For mobile apps, it'll handle integrated beta distribution. For games, it has no problem with massive binaries targeting PC, consoles, or XR. Embedded teams who are keeping track of binaries across firmware, apps, and tools are also a great fit.
We launched open sign up on the product Monday and then another feature every day this week -
Today we launched Portals - a custom-branded space you can host on your website, and publish releases or entire build streams to your users. Think GitHub Releases but way more powerful. Or even think about any time you've seen some custom-built interface on a developers website for finding past builds by platform, looking through nightlies, viewing releases etc - Buildstash Portals can do all that out the box for you, customizable in a few minutes.
So that's the idea! I'd really love feedback from this community on what we've built so far / what you think we should focus on next?
- Here's a demo video - https://youtu.be/t4Fr6M_vIIc
- landing - https://buildstash.com
- and our GitHub - https://github.com/buildstash
https://redd.it/1pkslis
@r_devops
YouTube
Buildstash Product Demo - Software binary and release management
CEO + co-founder Robbie Cargill presents a Buildstash product demo.
Buildstash is the software binary and release management platform.
Buildstash is the software binary and release management platform.
Is the promise of "AI-driven" incident management just marketing hype for DevOps teams?
We are constantly evaluating new platforms to streamline our on-call workflow and reduce alert fatigue. Tools that promise AI-driven incident management and full automation are everywhere now, like MonsterOps and similar providers.
I’m skeptical about whether these AIOps platforms truly deliver significant value for a team that already has well-defined runbooks and decent observability. Does the cost, complexity, and setup time for full automation really pay off in drastically reducing Mean Time To Resolution compared to simply improving our manual processes?
Did the AI significantly speed up your incident response, or did it mainly just reduce the noise?
https://redd.it/1pku5b6
@r_devops
We are constantly evaluating new platforms to streamline our on-call workflow and reduce alert fatigue. Tools that promise AI-driven incident management and full automation are everywhere now, like MonsterOps and similar providers.
I’m skeptical about whether these AIOps platforms truly deliver significant value for a team that already has well-defined runbooks and decent observability. Does the cost, complexity, and setup time for full automation really pay off in drastically reducing Mean Time To Resolution compared to simply improving our manual processes?
Did the AI significantly speed up your incident response, or did it mainly just reduce the noise?
https://redd.it/1pku5b6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community