The Log Reading Commands That Save Me During On-call
Sharing a guide on the Ubuntu commands that help during log-heavy debugging sessions. These are the ones I use during outages or incident analysis. Might help someone on pager duty.
Link : https://medium.com/stackademic/the-15-ubuntu-commands-i-use-every-time-i-troubleshoot-logs-0858dd876572?sk=b7c55fa75369ceed88e9310a3c94456a
https://redd.it/1pj9u6j
@r_devops
Sharing a guide on the Ubuntu commands that help during log-heavy debugging sessions. These are the ones I use during outages or incident analysis. Might help someone on pager duty.
Link : https://medium.com/stackademic/the-15-ubuntu-commands-i-use-every-time-i-troubleshoot-logs-0858dd876572?sk=b7c55fa75369ceed88e9310a3c94456a
https://redd.it/1pj9u6j
@r_devops
Medium
The 15 Linux commands I use every time I troubleshoot logs
When a Java or Spring Boot service slows down, the first thing I touch is not the code. It is the log file. Logs tell you the story. They…
Best way to create an offline iso proxmox with custom packages + zfs
I have tried proxmox autoinstall. I managed to create an iso. But I have no idea how to make it work by including python ansible and setup zfs. Maybe there is better ways of doing it? I am installing 50 proxmox servers physically
https://redd.it/1pj6hxb
@r_devops
I have tried proxmox autoinstall. I managed to create an iso. But I have no idea how to make it work by including python ansible and setup zfs. Maybe there is better ways of doing it? I am installing 50 proxmox servers physically
https://redd.it/1pj6hxb
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I built a unified CLI tool to query logs from Splunk, K8s, CloudWatch, Docker, and SSH with a single syntax.
Hi everyone,
I’m a dev who got tired of constantly context-switching between multiples Splunk UI, multiples OpenSearch,`kubectl logs`, AWS Console, and SSHing into servers just to debug a distributed issue. And that rather have everything in my terminal.
I built a tool written in Go called **LogViewer**. It’s a unified CLI interface that lets you query multiple different log backends using a consistent syntax, extract fields from unstructured text, and format the output exactly how you want it.
**1. What does it do?** LogViewer acts as a universal client. You configure your "contexts" (environments/sources) in a YAML file, and then you can query them all the same way.
It supports:
* **Kubernetes**
* **Splunk**
* **OpenSearch / Elasticsearch / Kibana**
* **AWS CloudWatch**
* **Docker** (Local & Remote)
* **SSH / Local Files**
**2. How does it help?**
* **Unified Syntax:** You don't need to remember SPL (Splunk), KQL, or specific AWS CLI flags. One set of flags works for everything.
* **Multi-Source Querying:** You can query your `prod-api` (on K8s) and your `legacy-db` (on VM via SSH) in a single command. Results are merged and sorted by timestamp.
* **Field Extraction:** It uses Regex (named groups) or JSON parsing to turn raw text logs into structured data you can filter on (e.g., `-f level=ERROR`).
* **AI Integration (MCP):** It implements the **Model Context Protocol**, meaning you can connect it to Claude Desktop or GitHub Copilot to let AI agents query and analyze your infrastructure logs directly.
[Link to github repo](https://github.com/bascanada/logviewer)
VHS Demo: [https://github.com/bascanada/logviewer/blob/main/demo.gif](https://github.com/bascanada/logviewer/blob/main/demo.gif)
**3. How to use it?**
It comes with an interactive wizard to get started quickly:
logviewer configure
Once configured, you can query logs easily:
Basic query (last 10 mins) for the prod-k8s and prod-splunk context:
logviewer -i prod-k8s -i prod-splunk --last 10m query log
Filter by field (works even on text logs via regex extraction):
logviewer -i prod-k8s -f level=ERROR -f trace_id=abc-123 query log
Custom Formatting:
logviewer -i prod-docker --format "[{{.Timestamp}}] {{.Level}} {{KV .Fields}}: {{.Message}}" query log
It’s open source (GPL3) and I’d love to get feedback on the implementation or feature requests!
https://redd.it/1pj6d9i
@r_devops
Hi everyone,
I’m a dev who got tired of constantly context-switching between multiples Splunk UI, multiples OpenSearch,`kubectl logs`, AWS Console, and SSHing into servers just to debug a distributed issue. And that rather have everything in my terminal.
I built a tool written in Go called **LogViewer**. It’s a unified CLI interface that lets you query multiple different log backends using a consistent syntax, extract fields from unstructured text, and format the output exactly how you want it.
**1. What does it do?** LogViewer acts as a universal client. You configure your "contexts" (environments/sources) in a YAML file, and then you can query them all the same way.
It supports:
* **Kubernetes**
* **Splunk**
* **OpenSearch / Elasticsearch / Kibana**
* **AWS CloudWatch**
* **Docker** (Local & Remote)
* **SSH / Local Files**
**2. How does it help?**
* **Unified Syntax:** You don't need to remember SPL (Splunk), KQL, or specific AWS CLI flags. One set of flags works for everything.
* **Multi-Source Querying:** You can query your `prod-api` (on K8s) and your `legacy-db` (on VM via SSH) in a single command. Results are merged and sorted by timestamp.
* **Field Extraction:** It uses Regex (named groups) or JSON parsing to turn raw text logs into structured data you can filter on (e.g., `-f level=ERROR`).
* **AI Integration (MCP):** It implements the **Model Context Protocol**, meaning you can connect it to Claude Desktop or GitHub Copilot to let AI agents query and analyze your infrastructure logs directly.
[Link to github repo](https://github.com/bascanada/logviewer)
VHS Demo: [https://github.com/bascanada/logviewer/blob/main/demo.gif](https://github.com/bascanada/logviewer/blob/main/demo.gif)
**3. How to use it?**
It comes with an interactive wizard to get started quickly:
logviewer configure
Once configured, you can query logs easily:
Basic query (last 10 mins) for the prod-k8s and prod-splunk context:
logviewer -i prod-k8s -i prod-splunk --last 10m query log
Filter by field (works even on text logs via regex extraction):
logviewer -i prod-k8s -f level=ERROR -f trace_id=abc-123 query log
Custom Formatting:
logviewer -i prod-docker --format "[{{.Timestamp}}] {{.Level}} {{KV .Fields}}: {{.Message}}" query log
It’s open source (GPL3) and I’d love to get feedback on the implementation or feature requests!
https://redd.it/1pj6d9i
@r_devops
GitHub
GitHub - bascanada/logviewer: Terminal based log viewer with multiple datasource (OpenSearch, Splunk, Docker, K8S, SSH, Local Command)
Terminal based log viewer with multiple datasource (OpenSearch, Splunk, Docker, K8S, SSH, Local Command) - bascanada/logviewer
How would you improve DevOps on a system not owned by the dev team
I work in a niche field and we work with a vendor that manages our core system. It’s similar to SalesForce but it’s a banking system that allows us to edit the files and write noscripts in a proprietary programming language. So far no company I’ve worked for that works for this system has figured it out. The core software runs on IBM AIX so containerizing is not an option.
Currently we have a single dev environment that every dev makes their changes on at the same time, with no source control used at all. When changes are approved to go live the files are simply manually moved from test to production.
Additionally there is no release schedule in our team. New features are moved from dev to prod as soon as the business unit says they are happy with the functionality.
I am not an expert in devops but I have been tasked with solving this for my organization. The problems I’ve identified that make our situation unique are as follows:
No way to create individual dev environments
The core system runs on an IBM PowerPC server running AIX. Dev machines are Windows or Mac, and from my research, there is no way to run locally. It is possible to create multiple instances on a single server, but the disk space on the server is quite limiting.
No release schedule
I touched on this above but there is no project management. We get a ticket, write the code, and when the business unit is happy with the code, someone manually copies all of the relevant files to production that night.
System is managed by an external organization
This one isn't too much of an issue but we are limited as to what can be installed on the host machines, though we are able to perform operations such as transferring files between the instances/servers via a console which can be accessed in any SSH terminal.
The code is not testable
I'd be happy to be told why this is incorrect but the proprietary language is very bare bones and doesn't even really have functions. It's basically SQL (but worse) if someone decided you should also be able to build UIs with is.
As said in my last point, I'd be happy to be told that nothing about this is a particularly difficult problem to solve, but I haven't been able to find a clean solution.
My current draft for devops is as follows:
1. Keep all files that we want versioned in a git repository - this would be hosted on ADO.
2. Set up 3 environments: Dev, Staging, and Production, these would be 3 different servers or at lest Dev would be a separate server from Staging and Production.
3. Initialize all 3 environments to be copies of production and create a branch on the repo to correspond to each environment
4. When a dev receives a ticket, they will create a feature branch off of Dev. This is where I'm not sure how to continue. We may be able to create a new instance for each feature branch on the dev server, but it would be a hard sell to get my organization to purchase more disk space to make this feasible. At a previous organization, we couldn't do it, and the way that we got around that is by having the repo not actually be connected to dev. So devs would pull the dev branch to their local, and when they made changes to the dev environment they would manually copy the changed files into their local repo after every change and push to the dev branch from there. People eventually got tired of doing that and our repo became difficult to maintain.
5. When a dev completes their work, push it to Dev and make a PR to staging. At this point is there a way for us to set up a workflow that would automatically update the Staging environment when code is pushed to the Staging branch? I've done this with git workflows in .NET applications but we wouldn't want it to 'build' anything. Just move the files and run AIX console commands depending on the type of file being updated (i.e. some files need to be 'installed' which is an operation provided by the aforementioned console).
6. Repeat 5 but Staging to Production
So essentially I
I work in a niche field and we work with a vendor that manages our core system. It’s similar to SalesForce but it’s a banking system that allows us to edit the files and write noscripts in a proprietary programming language. So far no company I’ve worked for that works for this system has figured it out. The core software runs on IBM AIX so containerizing is not an option.
Currently we have a single dev environment that every dev makes their changes on at the same time, with no source control used at all. When changes are approved to go live the files are simply manually moved from test to production.
Additionally there is no release schedule in our team. New features are moved from dev to prod as soon as the business unit says they are happy with the functionality.
I am not an expert in devops but I have been tasked with solving this for my organization. The problems I’ve identified that make our situation unique are as follows:
No way to create individual dev environments
The core system runs on an IBM PowerPC server running AIX. Dev machines are Windows or Mac, and from my research, there is no way to run locally. It is possible to create multiple instances on a single server, but the disk space on the server is quite limiting.
No release schedule
I touched on this above but there is no project management. We get a ticket, write the code, and when the business unit is happy with the code, someone manually copies all of the relevant files to production that night.
System is managed by an external organization
This one isn't too much of an issue but we are limited as to what can be installed on the host machines, though we are able to perform operations such as transferring files between the instances/servers via a console which can be accessed in any SSH terminal.
The code is not testable
I'd be happy to be told why this is incorrect but the proprietary language is very bare bones and doesn't even really have functions. It's basically SQL (but worse) if someone decided you should also be able to build UIs with is.
As said in my last point, I'd be happy to be told that nothing about this is a particularly difficult problem to solve, but I haven't been able to find a clean solution.
My current draft for devops is as follows:
1. Keep all files that we want versioned in a git repository - this would be hosted on ADO.
2. Set up 3 environments: Dev, Staging, and Production, these would be 3 different servers or at lest Dev would be a separate server from Staging and Production.
3. Initialize all 3 environments to be copies of production and create a branch on the repo to correspond to each environment
4. When a dev receives a ticket, they will create a feature branch off of Dev. This is where I'm not sure how to continue. We may be able to create a new instance for each feature branch on the dev server, but it would be a hard sell to get my organization to purchase more disk space to make this feasible. At a previous organization, we couldn't do it, and the way that we got around that is by having the repo not actually be connected to dev. So devs would pull the dev branch to their local, and when they made changes to the dev environment they would manually copy the changed files into their local repo after every change and push to the dev branch from there. People eventually got tired of doing that and our repo became difficult to maintain.
5. When a dev completes their work, push it to Dev and make a PR to staging. At this point is there a way for us to set up a workflow that would automatically update the Staging environment when code is pushed to the Staging branch? I've done this with git workflows in .NET applications but we wouldn't want it to 'build' anything. Just move the files and run AIX console commands depending on the type of file being updated (i.e. some files need to be 'installed' which is an operation provided by the aforementioned console).
6. Repeat 5 but Staging to Production
So essentially I
am looking to answer two questions. Firstly, how do I explain to the team that their current process is not up to standard? Many of them do not come from a technical background and have been updating these noscripts this way for years and are quite comfortable in their workflow, I experienced quite a bit of pushback trying to do this in my last organization. Is implementing a devops process even worth it in this case? Secondly, does my proposed process seem sound and how would you address the concerns I brought up in points 4 and 5 above?
Some additional info: If it would make the process cleaner then I believe I could convince my manager to move to scheduled releases. Also, I am a developer, so anything that doesn't just work out of the box, I can build, but I want to find the cleanest solution possible.
Thank you for taking the time to read!
https://redd.it/1pjd1mh
@r_devops
Some additional info: If it would make the process cleaner then I believe I could convince my manager to move to scheduled releases. Also, I am a developer, so anything that doesn't just work out of the box, I can build, but I want to find the cleanest solution possible.
Thank you for taking the time to read!
https://redd.it/1pjd1mh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Another Need feedback on resume post :))
Resume
It's been really hard landing a job, even for roles that are "Junior/Entry DevOps Engineer" roles. I don't know if it's because my resume screams red flags, or if the market is just in general tough.
1. Yes, I do have a 2 year work gap from graduation to now(traveling aha). I am still trying to stay hands-on though through curated DevOps roadmaps and doing end-to-end projects.
2. Does my work experience section come off as "too advanced" as someone who only worked as a DevOps Engineer Intern?
I just feel like the whole internship might've been a waste now and that it left me kind of in a "grey" area? Maybe I should start off as a System admin/It support guy? But even then, those are still hard to land lol.
https://redd.it/1pjfojr
@r_devops
Resume
It's been really hard landing a job, even for roles that are "Junior/Entry DevOps Engineer" roles. I don't know if it's because my resume screams red flags, or if the market is just in general tough.
1. Yes, I do have a 2 year work gap from graduation to now(traveling aha). I am still trying to stay hands-on though through curated DevOps roadmaps and doing end-to-end projects.
2. Does my work experience section come off as "too advanced" as someone who only worked as a DevOps Engineer Intern?
I just feel like the whole internship might've been a waste now and that it left me kind of in a "grey" area? Maybe I should start off as a System admin/It support guy? But even then, those are still hard to land lol.
https://redd.it/1pjfojr
@r_devops
Reddit
From the devops community on Reddit: Another *Need feedback on resume* post :))
Explore this post and more from the devops community
Built a Visual Docker Compose Editor - Looking for Feedback!
Hey
I've been wrestling with Docker Compose YAML files for way too long, so I built something to make it easier, a visual editor that lets you build and manage multi-container Docker applications without the YAML headaches.
The Problem
We've all been there:
\- Forgetting the exact YAML syntax
\- Spending hours debugging indentation issues
\- Copy-pasting configs and hoping they work
\- Managing environment variables, volumes, and ports manually
The Solution
A visual, form-based editor that:
\- ✅ No YAML knowledge required
\- ✅ See your YAML update in real-time as you type
\- ✅ Upload your docker-compose.yml and edit it visually
\- ✅ Download your configuration as a ready-to-use YAML file
\- ✅ No sign-up required to try the editor
What I've Built (MVP)
Core Features:
\- Visual form-based configuration
\- Service templates (Nginx, PostgreSQL, Redis)
\- Environment variables management
\- Volume mapping
\- Port configuration
\- Health checks
\- Resource limits (CPU/Memory)
\- Service dependencies
\- Multi-service support
Try it here: https://docker-compose-manager.vercel.app/
Why I'm Sharing This
This is an MVP and I'm looking for honest feedback from the community:
\- Does this solve a real problem for you?
\- What features are missing?
\- What would make you actually use this?
\- Any bugs or UX issues?
I've set up a quick waitlist for early access to future features (multi-environment management, team collaboration, etc.), but the editor is 100% free and functional right now - no sign-up needed.
Tech Stack
\- Angular 18
\- Firebase (Firestore + Analytics)
\- EmailJS (for contact form)
\- Deployed on Vercel
What's Next?
Based on your feedback, I'm planning:
\- Multi-service editing in one view
\- Environment-specific configurations
\- Team collaboration features
\- Integration with Docker Hub
\- More service templates
Feedback: Drop a comment or DM me!
TL;DR: Built a visual Docker Compose editor because YAML is painful. It's free, works now, and I'd love your feedback! 🚀
https://redd.it/1pjgrjx
@r_devops
Hey
I've been wrestling with Docker Compose YAML files for way too long, so I built something to make it easier, a visual editor that lets you build and manage multi-container Docker applications without the YAML headaches.
The Problem
We've all been there:
\- Forgetting the exact YAML syntax
\- Spending hours debugging indentation issues
\- Copy-pasting configs and hoping they work
\- Managing environment variables, volumes, and ports manually
The Solution
A visual, form-based editor that:
\- ✅ No YAML knowledge required
\- ✅ See your YAML update in real-time as you type
\- ✅ Upload your docker-compose.yml and edit it visually
\- ✅ Download your configuration as a ready-to-use YAML file
\- ✅ No sign-up required to try the editor
What I've Built (MVP)
Core Features:
\- Visual form-based configuration
\- Service templates (Nginx, PostgreSQL, Redis)
\- Environment variables management
\- Volume mapping
\- Port configuration
\- Health checks
\- Resource limits (CPU/Memory)
\- Service dependencies
\- Multi-service support
Try it here: https://docker-compose-manager.vercel.app/
Why I'm Sharing This
This is an MVP and I'm looking for honest feedback from the community:
\- Does this solve a real problem for you?
\- What features are missing?
\- What would make you actually use this?
\- Any bugs or UX issues?
I've set up a quick waitlist for early access to future features (multi-environment management, team collaboration, etc.), but the editor is 100% free and functional right now - no sign-up needed.
Tech Stack
\- Angular 18
\- Firebase (Firestore + Analytics)
\- EmailJS (for contact form)
\- Deployed on Vercel
What's Next?
Based on your feedback, I'm planning:
\- Multi-service editing in one view
\- Environment-specific configurations
\- Team collaboration features
\- Integration with Docker Hub
\- More service templates
Feedback: Drop a comment or DM me!
TL;DR: Built a visual Docker Compose editor because YAML is painful. It's free, works now, and I'd love your feedback! 🚀
https://redd.it/1pjgrjx
@r_devops
docker-compose-manager.vercel.app
Docker Compose Manager - Visual Editor for Docker Compose Files
Stop wrestling with YAML. Build, manage, and deploy multi-container Docker applications with a visual editor that just works.
Argocd upgrade strategy
Hello everyone,
I’m looking to upgrade an existing Argo CD installation from v1.8.x to the latest stable release, and I’d love to hear from anyone who has gone through a similar jump.
Given how old our version is, I’m assuming a straight upgrade probably isn’t safe. So, I’m currently going with incremental upgrade.
A few questions I have:
1) Any major breaking changes or gotchas I should be aware of?
2) Any other upgrades strategies you’d recommend ?
3) Anything related to CRD updates, repo-server changes, RBAC, or controller behavior that I should watch out for?
4) Any tips for minimizing downtime?
If you have links, guides, or personal notes from your migration, I’d really appreciate it. Thanks!
https://redd.it/1pji3tv
@r_devops
Hello everyone,
I’m looking to upgrade an existing Argo CD installation from v1.8.x to the latest stable release, and I’d love to hear from anyone who has gone through a similar jump.
Given how old our version is, I’m assuming a straight upgrade probably isn’t safe. So, I’m currently going with incremental upgrade.
A few questions I have:
1) Any major breaking changes or gotchas I should be aware of?
2) Any other upgrades strategies you’d recommend ?
3) Anything related to CRD updates, repo-server changes, RBAC, or controller behavior that I should watch out for?
4) Any tips for minimizing downtime?
If you have links, guides, or personal notes from your migration, I’d really appreciate it. Thanks!
https://redd.it/1pji3tv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
observability ina box
I always hated how devs don't have access to production like stack at home so with help of my good friend copilot i coded OIB - Observability in a box.
https://github.com/matijazezelj/oib
With single make install you'll get grafana, open telemetry, loki, prometheus, node exporter, alloy..., all interconnected, with exposed open telemetry endpoints and grafana dashboards and examples how to implement those in your setup. someone may find it useful, rocks may be thrown my way but hey it helped me:)
if you have any ideas PRs are always welcome, or just steal from it:)
https://redd.it/1pjgqw4
@r_devops
I always hated how devs don't have access to production like stack at home so with help of my good friend copilot i coded OIB - Observability in a box.
https://github.com/matijazezelj/oib
With single make install you'll get grafana, open telemetry, loki, prometheus, node exporter, alloy..., all interconnected, with exposed open telemetry endpoints and grafana dashboards and examples how to implement those in your setup. someone may find it useful, rocks may be thrown my way but hey it helped me:)
if you have any ideas PRs are always welcome, or just steal from it:)
https://redd.it/1pjgqw4
@r_devops
GitHub
GitHub - matijazezelj/oib: Observability in a box
Observability in a box. Contribute to matijazezelj/oib development by creating an account on GitHub.
What’s the best way to practice DevOps tools? I built something for beginners + need your thoughts
A lot of people entering DevOps keep asking the same question:
“Where can I practice CI/CD, Kubernetes, Terraform, etc. without paying for a bootcamp?”
Instead of repeating answers, I ended up building a small learning hub that has:
Free DevOps tutorials blogs
Hands-on practice challenges
Simple explanations of complex tools
Mini projects for beginners
If any of you are willing to take a look and tell me what’s good/bad/missing, I’d appreciate it:
**https://thedevopsworld.com**
Not selling anything — just trying to make a genuinely useful practice resource for newcomers to our field.
it will always remain free and with no intentions of making money.
Would love your suggestions on features, topics, or improvements!
https://redd.it/1pjn6aj
@r_devops
A lot of people entering DevOps keep asking the same question:
“Where can I practice CI/CD, Kubernetes, Terraform, etc. without paying for a bootcamp?”
Instead of repeating answers, I ended up building a small learning hub that has:
Free DevOps tutorials blogs
Hands-on practice challenges
Simple explanations of complex tools
Mini projects for beginners
If any of you are willing to take a look and tell me what’s good/bad/missing, I’d appreciate it:
**https://thedevopsworld.com**
Not selling anything — just trying to make a genuinely useful practice resource for newcomers to our field.
it will always remain free and with no intentions of making money.
Would love your suggestions on features, topics, or improvements!
https://redd.it/1pjn6aj
@r_devops
DevOps Worlds
DevOps Worlds - Master DevOps with 800+ Practice Problems
Learn Kubernetes, Docker, Terraform, AWS, CI/CD, and more with hands-on tutorials. Free forever for all DevOps learners.
blackblaze or aws s3 or google cloud storage for photo hosting website?
I have a client who want to make photography hosting website. He has tons of images(\~20MB/image), all those photos can be viewed around the world. what is the best cloud option for storing photos?
thx
https://redd.it/1pjn652
@r_devops
I have a client who want to make photography hosting website. He has tons of images(\~20MB/image), all those photos can be viewed around the world. what is the best cloud option for storing photos?
thx
https://redd.it/1pjn652
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Anyone tried the Debug Mode for coding agents? Does it change anything?
I'm not sure if I can mention the editor's name here. Anyway, they've released a new feature called Debug Mode.
>Coding agents are great at lots of things, but some bugs consistently stump them. That's why we're introducing Debug Mode, an entirely new agent loop built around runtime information and human verification.
>How it works
>1. Describe the bug - Select Debug Mode and describe the issue. The agent generates hypotheses and adds logging.
>2. Reproduce the bug - Trigger the bug while the agent collects runtime data (variable states, execution paths, timing).
>3. Verify the fix - Test the proposed fix. If it works, the agent removes instrumentation. If not, it refines and tries again.
What do you all think about how useful this feature is in actual debugging processes?
I think debugging is definitely one of the biggest pain points when using coding agents. This approach stabilizes what was already being done in the agent loop.
But when I'm debugging, I don't want to describe so much context, and sometimes bugs are hard to reproduce. So, I previously created an editor extension that can continuously access runtime context, which means I don't have to make the agent waste tokens by adding logs—just send the context directly to the agent to fix the bug.
I guess they won't implement something like that, since it would save too much on quotas, lol.
https://redd.it/1pjo2zz
@r_devops
I'm not sure if I can mention the editor's name here. Anyway, they've released a new feature called Debug Mode.
>Coding agents are great at lots of things, but some bugs consistently stump them. That's why we're introducing Debug Mode, an entirely new agent loop built around runtime information and human verification.
>How it works
>1. Describe the bug - Select Debug Mode and describe the issue. The agent generates hypotheses and adds logging.
>2. Reproduce the bug - Trigger the bug while the agent collects runtime data (variable states, execution paths, timing).
>3. Verify the fix - Test the proposed fix. If it works, the agent removes instrumentation. If not, it refines and tries again.
What do you all think about how useful this feature is in actual debugging processes?
I think debugging is definitely one of the biggest pain points when using coding agents. This approach stabilizes what was already being done in the agent loop.
But when I'm debugging, I don't want to describe so much context, and sometimes bugs are hard to reproduce. So, I previously created an editor extension that can continuously access runtime context, which means I don't have to make the agent waste tokens by adding logs—just send the context directly to the agent to fix the bug.
I guess they won't implement something like that, since it would save too much on quotas, lol.
https://redd.it/1pjo2zz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Inherited a legacy project with zero API docs any fast way to map all endpoints?
I just inherited a 5-year-old legacy project and found out… there’s zero API documentation.
No Swagger/OpenAPI, no Postman collections, and the frontend is full of hardcoded URLs.
Manually tracing every endpoint is possible, but realistically it would take days.
Before I spend the whole week digging through the codebase, I wanted to ask:
Is there a fast, reliable way to generate API documentation from an existing system?
I’ve seen people mention packet-capture workflows (mitmproxy, Fiddler, Apidog’s capture mode, etc.) where you run the app and let the tool record all HTTP requests, then turn them into structured API docs.
Has anyone here tried this on a legacy service?
Did it help, or did it create more noise than value?
I’d love to hear how DevOps/infra teams handle undocumented backend systems in the real world.
https://redd.it/1pjqnqi
@r_devops
I just inherited a 5-year-old legacy project and found out… there’s zero API documentation.
No Swagger/OpenAPI, no Postman collections, and the frontend is full of hardcoded URLs.
Manually tracing every endpoint is possible, but realistically it would take days.
Before I spend the whole week digging through the codebase, I wanted to ask:
Is there a fast, reliable way to generate API documentation from an existing system?
I’ve seen people mention packet-capture workflows (mitmproxy, Fiddler, Apidog’s capture mode, etc.) where you run the app and let the tool record all HTTP requests, then turn them into structured API docs.
Has anyone here tried this on a legacy service?
Did it help, or did it create more noise than value?
I’d love to hear how DevOps/infra teams handle undocumented backend systems in the real world.
https://redd.it/1pjqnqi
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Protecting your own machine
Hi all. I've been promoted (if that's the proper word) to devops after 20+ years of being a developer, so I'm learning a lot of stuff on the fly...
One of the things I wouldn't like to learn the hard way is how to protect your own machine (the one holding the access keys). My passwords are in a password manager, my ssh keys are passphrase protected, i pull the repos in a virtual machine... What else can and should I do? I'm really afraid that some of these junior devs will download some malicious library and fuck everything up.
https://redd.it/1pjt2bj
@r_devops
Hi all. I've been promoted (if that's the proper word) to devops after 20+ years of being a developer, so I'm learning a lot of stuff on the fly...
One of the things I wouldn't like to learn the hard way is how to protect your own machine (the one holding the access keys). My passwords are in a password manager, my ssh keys are passphrase protected, i pull the repos in a virtual machine... What else can and should I do? I'm really afraid that some of these junior devs will download some malicious library and fuck everything up.
https://redd.it/1pjt2bj
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Workload on GKE: Migrating from Zonal to Regional Persistent Disk for true Multi-Zone
Hey folks,
I'm running Jenkins on GKE as a StatefulSet with a 2.5TB persistent volume, and I'm trying to achieve true high availability across multiple zones.
**Current Setup:**
* Jenkins StatefulSet in `devops-tools` namespace
* Node pool currently in `us-central1-a`, adding `us-central1-b`
* PVC using `premium-rwo` StorageClass (pd-ssd)
* The underlying PV has `nodeAffinity` locked to `us-central1-a`
**The Problem:** The PersistentVolume is zonal (pinned to us-central1-a), which means my Jenkins pod can only schedule on nodes in that zone. This defeats the purpose of having a multi-zone node pool.
**What I'm Considering:**
Migrate to a regional persistent disk (replicated across us-central1-a and us-central1-b)
**Questions:**
* Has anyone successfully migrated a large PV from zonal to regional on GKE? Any gotchas?
* What's the typical downtime window for creating a snapshot and provisioning a \~2.5TB regional disk?
* Are there better approaches I'm missing for achieving HA with StatefulSets in GKE?
The regional disk approach seems cleanest (snapshot → create regional disk → update PVC), but I'd love to hear from anyone who's done this in production before committing to the migration.
Thanks!
https://redd.it/1pju1b3
@r_devops
Hey folks,
I'm running Jenkins on GKE as a StatefulSet with a 2.5TB persistent volume, and I'm trying to achieve true high availability across multiple zones.
**Current Setup:**
* Jenkins StatefulSet in `devops-tools` namespace
* Node pool currently in `us-central1-a`, adding `us-central1-b`
* PVC using `premium-rwo` StorageClass (pd-ssd)
* The underlying PV has `nodeAffinity` locked to `us-central1-a`
**The Problem:** The PersistentVolume is zonal (pinned to us-central1-a), which means my Jenkins pod can only schedule on nodes in that zone. This defeats the purpose of having a multi-zone node pool.
**What I'm Considering:**
Migrate to a regional persistent disk (replicated across us-central1-a and us-central1-b)
**Questions:**
* Has anyone successfully migrated a large PV from zonal to regional on GKE? Any gotchas?
* What's the typical downtime window for creating a snapshot and provisioning a \~2.5TB regional disk?
* Are there better approaches I'm missing for achieving HA with StatefulSets in GKE?
The regional disk approach seems cleanest (snapshot → create regional disk → update PVC), but I'd love to hear from anyone who's done this in production before committing to the migration.
Thanks!
https://redd.it/1pju1b3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Droplets compromised!!!
Hi everyone,
I’m dealing with a server security issue and wanted to explain what happened to get some opinions.
I had two different DigitalOcean droplets that were both flagged by DigitalOcean for sending DDoS traffic. This means the droplets were compromised and used as part of a botnet attack.
The strange thing is that I had already hardened SSH on both servers:
SSH key authentication only
Password login disabled
Root SSH login disabled
So SSH access should not have been possible.
After investigating inside the server, I found a malware process running as root from the /dev directory, and it kept respawning under different names. I also saw processes running that were checking for cryptomining signatures, which suggests the machine was infected with a mining botnet.
This makes me believe that the attacker didn’t get in through SSH, but instead through my application — I had a Node/Next.js server exposed on port 3000, and it was running as root. So it was probably an application-level vulnerability or an exposed service that got exploited, not an SSH breach.
At this point I’m planning to back up my data, destroy the droplet, and rebuild everything with stricter security (non-root user, close all ports except 22/80/443, Nginx reverse proxy, fail2ban, firewall rules, etc.).
If anyone has seen this type of attack before or has suggestions on how to prevent it in the future, I’d appreciate any insights.
https://redd.it/1pjujj7
@r_devops
Hi everyone,
I’m dealing with a server security issue and wanted to explain what happened to get some opinions.
I had two different DigitalOcean droplets that were both flagged by DigitalOcean for sending DDoS traffic. This means the droplets were compromised and used as part of a botnet attack.
The strange thing is that I had already hardened SSH on both servers:
SSH key authentication only
Password login disabled
Root SSH login disabled
So SSH access should not have been possible.
After investigating inside the server, I found a malware process running as root from the /dev directory, and it kept respawning under different names. I also saw processes running that were checking for cryptomining signatures, which suggests the machine was infected with a mining botnet.
This makes me believe that the attacker didn’t get in through SSH, but instead through my application — I had a Node/Next.js server exposed on port 3000, and it was running as root. So it was probably an application-level vulnerability or an exposed service that got exploited, not an SSH breach.
At this point I’m planning to back up my data, destroy the droplet, and rebuild everything with stricter security (non-root user, close all ports except 22/80/443, Nginx reverse proxy, fail2ban, firewall rules, etc.).
If anyone has seen this type of attack before or has suggestions on how to prevent it in the future, I’d appreciate any insights.
https://redd.it/1pjujj7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I am currently finishing my college degree in germany. Any advice on future career path?
Next month I will graduate and wanted to hear advice on what kind of field is advancing and preferbly secure and accessible in germany?
I am a decent student. Not the best. But my biggest interests were in theoritcle and math orientated classes. But I am willing to delv my knowledge into any direction.
I don‘t know how much should I fear AI development in terms of job security. But I would like to hear some advice for the future if somebody has anything to give?
https://redd.it/1pjvequ
@r_devops
Next month I will graduate and wanted to hear advice on what kind of field is advancing and preferbly secure and accessible in germany?
I am a decent student. Not the best. But my biggest interests were in theoritcle and math orientated classes. But I am willing to delv my knowledge into any direction.
I don‘t know how much should I fear AI development in terms of job security. But I would like to hear some advice for the future if somebody has anything to give?
https://redd.it/1pjvequ
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Do you use Postman to monitor your APIs?
As a developer who recently started using Postman and primarily uses it only to create collections and do some manual testing, I was wondering if is also helpful to monitor API health and performance.
View Poll
https://redd.it/1pjxrbr
@r_devops
As a developer who recently started using Postman and primarily uses it only to create collections and do some manual testing, I was wondering if is also helpful to monitor API health and performance.
View Poll
https://redd.it/1pjxrbr
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Rust and Go Malware: Cross-Platform Threats Evading Traditional Defenses 🦀
https://instatunnel.my/blog/rust-and-go-malware-cross-platform-threats-evading-traditional-defenses
https://redd.it/1pjz1f5
@r_devops
https://instatunnel.my/blog/rust-and-go-malware-cross-platform-threats-evading-traditional-defenses
https://redd.it/1pjz1f5
@r_devops
InstaTunnel
Rust and Go Malware:The Cross-Platform Threat Evading Modern
Learn why cybercriminals are adopting Rust and Go to build fast, cross-platform malware that bypasses traditional antivirus defenses.Discover detection challeng
We got 30 api access tickets per week, platform team became the bottleneck
Three months ago a PM stood up in standup and said "we can't ship because we're waiting on platform for api keys, this has been 4 days." Everyone went quiet and I felt my face get hot.Checked jira right after and 28 open tickets just for api access. Average close time was 4 days. We're 6 people supporting 14 product teams. Every ticket is the same and takes 2 hours per ticket spread over 3 days because everyone's in meetings. When I tracked it 60 hours of team time just doing manual api access, that's more than one full person.I told management we need to hire, but they said fix the process first. I tried some ideas like confluence docs but nobody reads them. Tried a spreadsheet with all the api keys but got out of date in 2 weeks. My lead engineer said "we need self service, like how stripe does it." I said yeah obviously but we don't have time to build that. He said we don't have time NOT to build it.So I did my research and readme does docs but not key management, we could built custom portal with react but gave up after realizing we would build an entire user management system. Looked at api management platforms but most were insane enterprise pricing, found something that had the developer portal built in. Set it up, connected our apis, took a month to completely set up gravitee, but it was the only thing that wasn't $50k per year and had self service built in. Rolled it out to 2 teams first as beta, they found bugs, we fixed it and rolled out to everyone. Took like a month and half to roll out to everyone, the tickets dropped to about 5 and they're weird stuff like "my team was deleted how do I recover it."If your platform team is drowning in api access tickets you have two options. Hire more people to do manual work or build self service. We're too small to hire so we had to build it, took way longer than I wanted but it worked.
https://redd.it/1pk17a7
@r_devops
Three months ago a PM stood up in standup and said "we can't ship because we're waiting on platform for api keys, this has been 4 days." Everyone went quiet and I felt my face get hot.Checked jira right after and 28 open tickets just for api access. Average close time was 4 days. We're 6 people supporting 14 product teams. Every ticket is the same and takes 2 hours per ticket spread over 3 days because everyone's in meetings. When I tracked it 60 hours of team time just doing manual api access, that's more than one full person.I told management we need to hire, but they said fix the process first. I tried some ideas like confluence docs but nobody reads them. Tried a spreadsheet with all the api keys but got out of date in 2 weeks. My lead engineer said "we need self service, like how stripe does it." I said yeah obviously but we don't have time to build that. He said we don't have time NOT to build it.So I did my research and readme does docs but not key management, we could built custom portal with react but gave up after realizing we would build an entire user management system. Looked at api management platforms but most were insane enterprise pricing, found something that had the developer portal built in. Set it up, connected our apis, took a month to completely set up gravitee, but it was the only thing that wasn't $50k per year and had self service built in. Rolled it out to 2 teams first as beta, they found bugs, we fixed it and rolled out to everyone. Took like a month and half to roll out to everyone, the tickets dropped to about 5 and they're weird stuff like "my team was deleted how do I recover it."If your platform team is drowning in api access tickets you have two options. Hire more people to do manual work or build self service. We're too small to hire so we had to build it, took way longer than I wanted but it worked.
https://redd.it/1pk17a7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A Production Incident Taught Me the Real Difference Between Git Token Types
We hit a strange issue during deployment last month. Our production was pulling code using a developer’s PAT.
That turned into a rabbit hole about which Git tokens are actually meant for humans vs machines.
Wrote down the learning in case others find it useful.
Link : https://medium.com/stackademic/git-authentication-tokens-explained-personal-access-token-vs-deploy-token-vs-other-tokens-f555e92b3918?sk=27b6dab0ff08fcb102c4215823168d7e
https://redd.it/1pk2nc7
@r_devops
We hit a strange issue during deployment last month. Our production was pulling code using a developer’s PAT.
That turned into a rabbit hole about which Git tokens are actually meant for humans vs machines.
Wrote down the learning in case others find it useful.
Link : https://medium.com/stackademic/git-authentication-tokens-explained-personal-access-token-vs-deploy-token-vs-other-tokens-f555e92b3918?sk=27b6dab0ff08fcb102c4215823168d7e
https://redd.it/1pk2nc7
@r_devops
Medium
Git Authentication Tokens Explained : Personal Access Token vs Deploy Token vs Other Tokens
A few months ago, I was helping a junior engineer fix a failing deployment. The app was running fine locally, but the production server…