When a missing flag breaks your deploy: -D vs -P in Java builds
I once hit a weird deployment issue because I confused
It’s aimed at junior engineers working on CI/CD or build noscripts who want to know when to use which flag.
Read it here -> https://medium.com/stackademic/two-tiny-flags-that-confuses-java-devs-d-and-p-in-java-and-maven-5dfd0e04455f?sk=6b0d660c1a031576b629d7979054fd88
https://redd.it/1omn1hr
@r_devops
I once hit a weird deployment issue because I confused
-Denv=prod with -Pprod. Wrote a short note to help newer devs understand what actually happens under the hood.It’s aimed at junior engineers working on CI/CD or build noscripts who want to know when to use which flag.
Read it here -> https://medium.com/stackademic/two-tiny-flags-that-confuses-java-devs-d-and-p-in-java-and-maven-5dfd0e04455f?sk=6b0d660c1a031576b629d7979054fd88
https://redd.it/1omn1hr
@r_devops
Medium
Two Tiny Flags That Confuses Java devs: -D and -P in Java and Maven
A few years ago, I was working late, trying to fix a deployment issue. Everything worked fine on my machine but failed miserably on…
Looking for advice - I've built an AI-augmented Network Configuration and Troubleshooting Agent - worth it?
While it may look like self-promo, I'm looking for a feedback from fellow network engineers who had hands-on experience with AI agents and their implementations.
To provide more context:
As we all know, network devices (routers, switches, firewalls) are configured via CLI over SSH, sometimes REST/API. All traditional automation (Ansible, Python noscripts) requires predefined playbooks for every scenario. I wanted something that could:
* Reason about network problems dynamically
* Consult vendor documentation before acting
* Handle multi-vendor environments without rigid playbooks
* Operate safely with strong guardrails, lots of strong guardrails
* Work in a multi-tenant architecture
**Key parts:**
**RAG Implementation**
* AWS OpenSearch cluster with vendor documentation (Cisco, Juniper, Fortinet, etc.)
* Chunking strategy: per-command documentation + contextual sections
* Metadata tagging: device type, OS version, command category
* Retrieval: hybrid search (semantic + keyword) to find relevant docs before execution
* Challenge: Vendor docs are inconsistent in format/quality - had to build custom parsers per vendor
**Tool Design**
* `ssh_execute`: Run commands with device context awareness
* `get_device_config`: Retrieve current configs for analysis
* `consult_docs`: RAG retrieval before any config change
* `validate_syntax`: Pre-check commands against vendor syntax rules
* `rollback`: Automatic config snapshots before changes
**Guardrails**
* Restricted command whitelist/blacklist per environment
* Read-only mode by default
* Required approval workflow for config changes
* Device type validation (won't run Cisco commands on Juniper)
* Rate limiting on CLI execution
* Automatic rollback on detected errors
**Multi-Agent Pattern (Considering)** Currently single-agent with tool use, but exploring:
* Planner agent: decides approach
* Execution agent: runs commands
* Validation agent: checks results
* Documentation agent: pure RAG queries
Not sure if the added complexity is worth it yet.
Here is a snippet of how it replies when asked about configuring ZTNA server on the firewall device:
[https://imgur.com/a/dUjQrV3](https://imgur.com/a/dUjQrV3)
[https://imgur.com/a/fdIgr91](https://imgur.com/a/fdIgr91)
It first queries the devices, then searches through the docs for the info:
[https://imgur.com/a/PTqzTnN](https://imgur.com/a/PTqzTnN)
I picked two random products just to see how it responds when it comes do maintenance window recommendations.
[https://imgur.com/a/qbMpDfa](https://imgur.com/a/qbMpDfa)
[https://imgur.com/a/oPuhg1o](https://imgur.com/a/oPuhg1o)
**Where I would love your feedback:**
1. Which vendor tasks are the biggest time sinks: SR creation, RMA, firmware advisories, license renewals, config drift, SLA tracking, something else?
2. If you’ve used agents, where did they help/hurt (triage, enrichment, execution, hallucinations, RBAC/approvals)?
3. Integration realities: ConnectWise/Autotask, common RMMs/ITSMs, data residency, SSO, on-prem constraints.
4. What metrics would convince you this is worth it (MTTA/MTTR, SLA hit rate, case duration, renewal touch time, engineer hours saved)?
5. Any absolute non-starters (lock-in, privacy, vendor T&Cs, API rate limits)?
Not a pitch — trying to be realistic about this thing. When we were building it - things like compliance and scalability were first in mind.
https://redd.it/1ommjxd
@r_devops
While it may look like self-promo, I'm looking for a feedback from fellow network engineers who had hands-on experience with AI agents and their implementations.
To provide more context:
As we all know, network devices (routers, switches, firewalls) are configured via CLI over SSH, sometimes REST/API. All traditional automation (Ansible, Python noscripts) requires predefined playbooks for every scenario. I wanted something that could:
* Reason about network problems dynamically
* Consult vendor documentation before acting
* Handle multi-vendor environments without rigid playbooks
* Operate safely with strong guardrails, lots of strong guardrails
* Work in a multi-tenant architecture
**Key parts:**
**RAG Implementation**
* AWS OpenSearch cluster with vendor documentation (Cisco, Juniper, Fortinet, etc.)
* Chunking strategy: per-command documentation + contextual sections
* Metadata tagging: device type, OS version, command category
* Retrieval: hybrid search (semantic + keyword) to find relevant docs before execution
* Challenge: Vendor docs are inconsistent in format/quality - had to build custom parsers per vendor
**Tool Design**
* `ssh_execute`: Run commands with device context awareness
* `get_device_config`: Retrieve current configs for analysis
* `consult_docs`: RAG retrieval before any config change
* `validate_syntax`: Pre-check commands against vendor syntax rules
* `rollback`: Automatic config snapshots before changes
**Guardrails**
* Restricted command whitelist/blacklist per environment
* Read-only mode by default
* Required approval workflow for config changes
* Device type validation (won't run Cisco commands on Juniper)
* Rate limiting on CLI execution
* Automatic rollback on detected errors
**Multi-Agent Pattern (Considering)** Currently single-agent with tool use, but exploring:
* Planner agent: decides approach
* Execution agent: runs commands
* Validation agent: checks results
* Documentation agent: pure RAG queries
Not sure if the added complexity is worth it yet.
Here is a snippet of how it replies when asked about configuring ZTNA server on the firewall device:
[https://imgur.com/a/dUjQrV3](https://imgur.com/a/dUjQrV3)
[https://imgur.com/a/fdIgr91](https://imgur.com/a/fdIgr91)
It first queries the devices, then searches through the docs for the info:
[https://imgur.com/a/PTqzTnN](https://imgur.com/a/PTqzTnN)
I picked two random products just to see how it responds when it comes do maintenance window recommendations.
[https://imgur.com/a/qbMpDfa](https://imgur.com/a/qbMpDfa)
[https://imgur.com/a/oPuhg1o](https://imgur.com/a/oPuhg1o)
**Where I would love your feedback:**
1. Which vendor tasks are the biggest time sinks: SR creation, RMA, firmware advisories, license renewals, config drift, SLA tracking, something else?
2. If you’ve used agents, where did they help/hurt (triage, enrichment, execution, hallucinations, RBAC/approvals)?
3. Integration realities: ConnectWise/Autotask, common RMMs/ITSMs, data residency, SSO, on-prem constraints.
4. What metrics would convince you this is worth it (MTTA/MTTR, SLA hit rate, case duration, renewal touch time, engineer hours saved)?
5. Any absolute non-starters (lock-in, privacy, vendor T&Cs, API rate limits)?
Not a pitch — trying to be realistic about this thing. When we were building it - things like compliance and scalability were first in mind.
https://redd.it/1ommjxd
@r_devops
I made an Android app to manage my Docker containers on the go
Hello Everyone,
As a guy who likes to self host everything from side project backends to multiple arr's for media hosting, it has always bugged me that for checking logs, starting containers etc. I had to open my laptop and ssh into the server. And while solutions like sshing from termux exist, it's really hard to do on a phone's screen.
Docker manager solves that. Docker Manager lets you manage your containers, images, networks, and volumes — right from your phone. Do whatever you could possibly want on your server from your phone all with beautiful Material UI.
You can get it on play store here: https://play.google.com/store/apps/details?id=com.pavit.docker
Key Features
\- Add multiple servers with password or key-based SSH auth
\- Seamlessly switch between multiple servers
\- Manage containers — start, stop, restart, inspect, and view logs
\- Get a shell inside containers or on the host itself (/bin/bash, redis-cli, etc.)
\- Build or pull images from any registry, and rename/delete them easily
\- Manage networks and volumes — inspect, rename, and remove
\- View real-time server stats (CPU, memory, load averages)
\- Light/Dark/System theme support
\- Works over your phone’s own network stack (VPNs like Tailscale supported)
https://redd.it/1omoxnn
@r_devops
Hello Everyone,
As a guy who likes to self host everything from side project backends to multiple arr's for media hosting, it has always bugged me that for checking logs, starting containers etc. I had to open my laptop and ssh into the server. And while solutions like sshing from termux exist, it's really hard to do on a phone's screen.
Docker manager solves that. Docker Manager lets you manage your containers, images, networks, and volumes — right from your phone. Do whatever you could possibly want on your server from your phone all with beautiful Material UI.
You can get it on play store here: https://play.google.com/store/apps/details?id=com.pavit.docker
Key Features
\- Add multiple servers with password or key-based SSH auth
\- Seamlessly switch between multiple servers
\- Manage containers — start, stop, restart, inspect, and view logs
\- Get a shell inside containers or on the host itself (/bin/bash, redis-cli, etc.)
\- Build or pull images from any registry, and rename/delete them easily
\- Manage networks and volumes — inspect, rename, and remove
\- View real-time server stats (CPU, memory, load averages)
\- Light/Dark/System theme support
\- Works over your phone’s own network stack (VPNs like Tailscale supported)
https://redd.it/1omoxnn
@r_devops
Google Play
Docker Manager - Apps on Google Play
Manage Docker Containers on your Mobile Phone!
❤1
check this cool vs-code extension I created
https://marketplace.visualstudio.com/items?itemName=AmitPatole.mcp-marketplace&ssr=false#overview
https://redd.it/1omtd40
@r_devops
https://marketplace.visualstudio.com/items?itemName=AmitPatole.mcp-marketplace&ssr=false#overview
https://redd.it/1omtd40
@r_devops
Visualstudio
MCP Marketplace - Visual Studio Marketplace
Extension for Visual Studio Code - Browse, generate, and install MCP servers directly from VS Code
In a conundrum after a layoff. I feel like my experience is too broad and not specialized enough. Help?
I was recently laid off from a DevOps role I held for almost 4 years, and I'm struggling to understand what employers are actually looking for. My experience spans Jenkins, Nomad, AWS, ELK, DataDog, VMWare, Foreman, Kubernetes, Docker, Linux sys admin, and programming in Ruby, Python, and Bash. I thought this breadth would be an asset, but I'm starting to worry it's working against me.
Recent rejections have left me confused about my positioning:
* Rejected from a platform engineer role because I lacked traditional software engineering experience contributing directly to a product
* Rejected from an observability engineer position for insufficient DataDog experience (despite having used it)
* Likely about to be rejected from another role because my AWS experience apparently isn't deep enough
I don't consider myself a novice in these technologies, I'm confident I can handle most tasks they'd throw at me, with some research for the more complex scenarios. But that doesn't seem to be enough.
I'm genuinely at a loss. Is this just the current market allowing hiring managers to be incredibly selective? Or am I delusional in thinking my level of knowledge is sufficient? Should I have achieved complete mastery of each tool to the point where I can discuss intricate edge cases without preparation?
Any advice or perspective would be appreciated.
https://redd.it/1omvegl
@r_devops
I was recently laid off from a DevOps role I held for almost 4 years, and I'm struggling to understand what employers are actually looking for. My experience spans Jenkins, Nomad, AWS, ELK, DataDog, VMWare, Foreman, Kubernetes, Docker, Linux sys admin, and programming in Ruby, Python, and Bash. I thought this breadth would be an asset, but I'm starting to worry it's working against me.
Recent rejections have left me confused about my positioning:
* Rejected from a platform engineer role because I lacked traditional software engineering experience contributing directly to a product
* Rejected from an observability engineer position for insufficient DataDog experience (despite having used it)
* Likely about to be rejected from another role because my AWS experience apparently isn't deep enough
I don't consider myself a novice in these technologies, I'm confident I can handle most tasks they'd throw at me, with some research for the more complex scenarios. But that doesn't seem to be enough.
I'm genuinely at a loss. Is this just the current market allowing hiring managers to be incredibly selective? Or am I delusional in thinking my level of knowledge is sufficient? Should I have achieved complete mastery of each tool to the point where I can discuss intricate edge cases without preparation?
Any advice or perspective would be appreciated.
https://redd.it/1omvegl
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
DevOps IT Professional Program from Linux
did anyone try DevOps IT Professional Program course from the Linux Foundation ?
if so, how was it?
worth it?
hard ?
did you get certs at the end?
https://redd.it/1omvt5r
@r_devops
did anyone try DevOps IT Professional Program course from the Linux Foundation ?
if so, how was it?
worth it?
hard ?
did you get certs at the end?
https://redd.it/1omvt5r
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Any way to test mobile browsers with system-level permissions?
Need to test camera/mic access in mobile Safari + Chrome. Emulators fake it, real devices needed. Short of buying phones, any ideas?
https://redd.it/1omwyx0
@r_devops
Need to test camera/mic access in mobile Safari + Chrome. Emulators fake it, real devices needed. Short of buying phones, any ideas?
https://redd.it/1omwyx0
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Stuck at service based company as a DevOps Engineer, seeking for guidance!
Hey I am 2025 fresher, I have contributed in many internships and also done some good projects, but I have stuck in mid size service based company, were salary is too low and growth and opportunities also, people working in maang or other good companies like Redhat, rubrik, calonical etc, please guide me how can I be there, my resume is cooked as of now coz of this company and I need to stay here for atleast one year, as market is also cooked there are very few infra realted job postings for fresher. Please guide me
https://redd.it/1on01vl
@r_devops
Hey I am 2025 fresher, I have contributed in many internships and also done some good projects, but I have stuck in mid size service based company, were salary is too low and growth and opportunities also, people working in maang or other good companies like Redhat, rubrik, calonical etc, please guide me how can I be there, my resume is cooked as of now coz of this company and I need to stay here for atleast one year, as market is also cooked there are very few infra realted job postings for fresher. Please guide me
https://redd.it/1on01vl
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking for guidance or help with The Cloud Resume Challenge (Azure Edition)
I’ve noticed a few folks here completed The Cloud Resume Challenge (Azure Edition) — that’s really impressive! I’m planning to start the same challenge. If you’re comfortable, would you be willing to Lend your copy of book for a short time.
https://redd.it/1on3b7n
@r_devops
I’ve noticed a few folks here completed The Cloud Resume Challenge (Azure Edition) — that’s really impressive! I’m planning to start the same challenge. If you’re comfortable, would you be willing to Lend your copy of book for a short time.
https://redd.it/1on3b7n
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How can I improve my Kubernetes and cloud skills
Basically, that’s it.
I have little to no experience with Kubernetes or cloud technologies. I wasn’t involved in any meaningful work with either of them in my previous roles. I’m currently unemployed and would love to gain some real, hands-on skills with both Kubernetes and AWS. Could you recommend any projects that would help me gain practical knowledge?
https://redd.it/1on5cjn
@r_devops
Basically, that’s it.
I have little to no experience with Kubernetes or cloud technologies. I wasn’t involved in any meaningful work with either of them in my previous roles. I’m currently unemployed and would love to gain some real, hands-on skills with both Kubernetes and AWS. Could you recommend any projects that would help me gain practical knowledge?
https://redd.it/1on5cjn
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Which Azure cert begin with and is it hard for someone who has 8 years experience as a Data Engineer?
Im looking to get a cert in Azure just to get it and make any future jobs that require Azure easier and less stressful and these certs seems valuable af. My last job were trying to hire like 4 people with 5 years of general experience in data development but they had to have a azure cert and oh man our higher ups set up a pedestal for anyone who had this and tbh when I was training them I could tell they did not have 5 years of data development. But
Im pretty knowledgeable in everything data as I can confidently say I mastered Azure ADP's predecessor called SSIS already as working as an ETL Dev for most of my career was my bread and butter,
Question is Do I have to do azure certs in order or can I pick either the mid on and start studying from there? What would you reccommend?
Edit: they did not have 5 years of general experience
https://redd.it/1on6n95
@r_devops
Im looking to get a cert in Azure just to get it and make any future jobs that require Azure easier and less stressful and these certs seems valuable af. My last job were trying to hire like 4 people with 5 years of general experience in data development but they had to have a azure cert and oh man our higher ups set up a pedestal for anyone who had this and tbh when I was training them I could tell they did not have 5 years of data development. But
Im pretty knowledgeable in everything data as I can confidently say I mastered Azure ADP's predecessor called SSIS already as working as an ETL Dev for most of my career was my bread and butter,
Question is Do I have to do azure certs in order or can I pick either the mid on and start studying from there? What would you reccommend?
Edit: they did not have 5 years of general experience
https://redd.it/1on6n95
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Concentric AI - Devops engineer interview
I have an interview with Concentric AI for the role of DevOps Engineer. My profile shows 4+ years of experience in DevOps, but to be honest, most of my work has been around setting up simple CI/CD pipelines (built from scratch). I don’t have much hands-on experience with cloud technologies.
What should I expect from the interview, and how should I prepare?
Can someone please help?
https://redd.it/1on6frl
@r_devops
I have an interview with Concentric AI for the role of DevOps Engineer. My profile shows 4+ years of experience in DevOps, but to be honest, most of my work has been around setting up simple CI/CD pipelines (built from scratch). I don’t have much hands-on experience with cloud technologies.
What should I expect from the interview, and how should I prepare?
Can someone please help?
https://redd.it/1on6frl
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Clarity from an experienced cloud architect/DevOps engineer
How secure is path-based routing and is it industry standard for a 3-tier cloud native application that makes use of ECS and CodePipeline for CI/CD?
https://redd.it/1on8nuk
@r_devops
How secure is path-based routing and is it industry standard for a 3-tier cloud native application that makes use of ECS and CodePipeline for CI/CD?
https://redd.it/1on8nuk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
From Linux System Engineer to DevOps - Looking for Advice and Experiences
Hi everyone, I’ve wanted to transition into DevOps for a long time, but I only started seriously working toward it in February this year, building up the necessary skills. In the meantime, I received an offer to work as a Linux System Engineer, and I’ve been in that role for about four months now. I accepted it thinking it would help me transition to DevOps because of the skill similarities. Before that, I completed a three-year System Administrator apprenticeship here in Germany (“Ausbildung zum Fachinformatiker für Systemintegration”), where I mainly worked with Windows servers until the company introduced a deployment pipeline for its software. Unfortunately, the only overlapping skills in my current role are noscripting and Linux. The rest, Ansible, Kubernetes, CI/CD pipelines, etc. are not part of my job. I recently told my boss that I had expected more hands-on work with tools like Ansible and Terraform, and I asked whether there’s a way for me to transition internally to a DevOps position or possibly take on a new DevOps-focused role. Has anyone here gone through a similar transition? If so, I’d really appreciate hearing your detailed experience and any good tips you might have.
https://redd.it/1onacpn
@r_devops
Hi everyone, I’ve wanted to transition into DevOps for a long time, but I only started seriously working toward it in February this year, building up the necessary skills. In the meantime, I received an offer to work as a Linux System Engineer, and I’ve been in that role for about four months now. I accepted it thinking it would help me transition to DevOps because of the skill similarities. Before that, I completed a three-year System Administrator apprenticeship here in Germany (“Ausbildung zum Fachinformatiker für Systemintegration”), where I mainly worked with Windows servers until the company introduced a deployment pipeline for its software. Unfortunately, the only overlapping skills in my current role are noscripting and Linux. The rest, Ansible, Kubernetes, CI/CD pipelines, etc. are not part of my job. I recently told my boss that I had expected more hands-on work with tools like Ansible and Terraform, and I asked whether there’s a way for me to transition internally to a DevOps position or possibly take on a new DevOps-focused role. Has anyone here gone through a similar transition? If so, I’d really appreciate hearing your detailed experience and any good tips you might have.
https://redd.it/1onacpn
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
We had perfect observability but still struggled during incidents. Here's what fixed it
We built a solid observability stack. OpenTelemetry pipelines, unified metrics, logs, traces. Beautiful Grafana dashboards. Everything instrumented. We could see everything.
But when incidents hit, we still struggled. Alerts fired, but we didn't know: is this severe? What do we do? Who should respond? Everyone had different opinions. "2% error rate is fine" vs "2% is catastrophic." We were improvising every time.
The missing piece wasn't technical. It was organizational. We needed SLOs to define what "working" means (so severity isn't subjective), runbooks to codify remediation steps (so response isn't improvisation), and post-mortems to learn from failures systematically (so we don't repeat mistakes).
Here's what actually worked for us:
SLOs: We use availability SLIs from OpenTelemetry span-metrics in Prometheus. We calculate percentage of successful requests by comparing successful calls (2xx/3xx) against total calls for each service. This gives us availability. We set 99.5% as our SLO, which creates a 0.5% error budget (14.4 hours downtime per month). Now we know when something is actually broken, not just "different." When we're burning error budget faster than expected, we slow feature releases.
Runbooks: We connect runbooks directly to alerts via PagerDuty. When an alert fires, the notification includes what's broken (service name, error rate), current vs expected (SLO threshold), where to look (dashboard link, trace query), and what to do (runbook link). The on-call engineer clicks the runbook and follows steps. No guessing, no Slack archaeology trying to remember what worked last time.
Post-mortems: We use a simple template: Impact (users affected, SLO impact), Timeline, Root Cause, What Went Well/Poorly, Action Items (with owners, priorities P0-P2, and due dates). The key is prioritizing action items in sprint planning. Otherwise post-mortems become theater where everyone nods, writes "we should monitor better" and changes nothing.
After implementing these practices, our MTTR dropped by 60% in three months. Not because we collected more data, but because we knew how to act on it.
I wrote about the framework, templates, and practical steps here: From Signals to Reliability: SLOs, Runbooks and Post-Mortems
What practices have helped your team move from reactive firefighting to proactive reliability?
https://redd.it/1ona979
@r_devops
We built a solid observability stack. OpenTelemetry pipelines, unified metrics, logs, traces. Beautiful Grafana dashboards. Everything instrumented. We could see everything.
But when incidents hit, we still struggled. Alerts fired, but we didn't know: is this severe? What do we do? Who should respond? Everyone had different opinions. "2% error rate is fine" vs "2% is catastrophic." We were improvising every time.
The missing piece wasn't technical. It was organizational. We needed SLOs to define what "working" means (so severity isn't subjective), runbooks to codify remediation steps (so response isn't improvisation), and post-mortems to learn from failures systematically (so we don't repeat mistakes).
Here's what actually worked for us:
SLOs: We use availability SLIs from OpenTelemetry span-metrics in Prometheus. We calculate percentage of successful requests by comparing successful calls (2xx/3xx) against total calls for each service. This gives us availability. We set 99.5% as our SLO, which creates a 0.5% error budget (14.4 hours downtime per month). Now we know when something is actually broken, not just "different." When we're burning error budget faster than expected, we slow feature releases.
Runbooks: We connect runbooks directly to alerts via PagerDuty. When an alert fires, the notification includes what's broken (service name, error rate), current vs expected (SLO threshold), where to look (dashboard link, trace query), and what to do (runbook link). The on-call engineer clicks the runbook and follows steps. No guessing, no Slack archaeology trying to remember what worked last time.
Post-mortems: We use a simple template: Impact (users affected, SLO impact), Timeline, Root Cause, What Went Well/Poorly, Action Items (with owners, priorities P0-P2, and due dates). The key is prioritizing action items in sprint planning. Otherwise post-mortems become theater where everyone nods, writes "we should monitor better" and changes nothing.
After implementing these practices, our MTTR dropped by 60% in three months. Not because we collected more data, but because we knew how to act on it.
I wrote about the framework, templates, and practical steps here: From Signals to Reliability: SLOs, Runbooks and Post-Mortems
What practices have helped your team move from reactive firefighting to proactive reliability?
https://redd.it/1ona979
@r_devops
Fatih Koç
From Signals to Reliability: SLOs, Runbooks and Post-Mortems
Build reliability with SLOs, runbooks and post-mortems. Turn observability into systematic incident response and learning. Practical examples for Kubernetes environments.
How are you enforcing code-quality gates automatically in CI/CD?
Right now our CI just runs unit tests. We keep saying we’ll add coverage and complexity gates, but every time someone tries, the pipeline slows to a crawl or throws false positives. I’d love a way to enforce basic standards - test coverage > 80%, no new critical issues - without babysitting every PR.
https://redd.it/1onb20l
@r_devops
Right now our CI just runs unit tests. We keep saying we’ll add coverage and complexity gates, but every time someone tries, the pipeline slows to a crawl or throws false positives. I’d love a way to enforce basic standards - test coverage > 80%, no new critical issues - without babysitting every PR.
https://redd.it/1onb20l
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Combining code review and SAST results - possible?
Security runs their scans separately, devs review manually, and we’re constantly duplicating effort. Ideally, reviewers should see security warnings inline with the code diff. Has anyone achieved that?
https://redd.it/1ona5yo
@r_devops
Security runs their scans separately, devs review manually, and we’re constantly duplicating effort. Ideally, reviewers should see security warnings inline with the code diff. Has anyone achieved that?
https://redd.it/1ona5yo
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Anyone using AI for pull-request reviews yet?
Copilot is fine for writing code, but it doesn’t help during reviews. I’m wondering if anyone has used AI that can actually review a PR - like summarize changes, highlight risky logic, or point out missing edge cases.
https://redd.it/1onfv66
@r_devops
Copilot is fine for writing code, but it doesn’t help during reviews. I’m wondering if anyone has used AI that can actually review a PR - like summarize changes, highlight risky logic, or point out missing edge cases.
https://redd.it/1onfv66
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
PostMessage Vulnerabilities: When Cross-Window Communication Goes Wrong 📬
https://instatunnel.my/blog/postmessage-vulnerabilities-when-cross-window-communication-goes-wrong
https://redd.it/1onbbod
@r_devops
https://instatunnel.my/blog/postmessage-vulnerabilities-when-cross-window-communication-goes-wrong
https://redd.it/1onbbod
@r_devops
InstaTunnel
PostMessage Vulnerabilities: Cross-Window Security Risks
Learn how improper postMessage usage enables XSS and token exfiltration, plus strict origin checks and mitigation recipes to secure cross-window communication.
AI is a Corporate Fad where I work
The noscript says it all. In my workplace (big company) we have non-technical decision makers asking for integrations of technology that they don't understand with existing technologies that they don't understand. What could go wrong financially?
My only hope is that this fad replaces the existing fad of hiring swaths of inexpensive out of town engineers to provide "top notch" solution design that falls flat at the implementation phase.
What's your experience?
https://redd.it/1onilgi
@r_devops
The noscript says it all. In my workplace (big company) we have non-technical decision makers asking for integrations of technology that they don't understand with existing technologies that they don't understand. What could go wrong financially?
My only hope is that this fad replaces the existing fad of hiring swaths of inexpensive out of town engineers to provide "top notch" solution design that falls flat at the implementation phase.
What's your experience?
https://redd.it/1onilgi
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Gprxy: Go based SSO-first, psql-compatible proxy
https://github.com/sathwick-p/gprxy
Hey all,
I built a postgresql proxy for AWS RDS, the reason i wrote this is because the current way to access and run queries on RDS is via having db users and in bigger organization it is impractical to have multiple db users for each user/team, and yes even IAM authentication exists for this same reason in RDS i personally did not find it the best way to use as it would required a bunch of configuration and changes in the RDS.
The idea here is by connecting via this proxy you would just have to run the login command that would let you do a SSO based login which will authenticate you through an IDP like azure AD before connecting to the db. Also helps me with user level audit logs
I had been looking for an opensource solution but could not find any hence rolled out my own, currently deployed and being used via k8s
Please check it out and let me know if you find it useful or have feedback, I’d really appreciate hearing from y'all.
Thanks!
https://redd.it/1oni3df
@r_devops
https://github.com/sathwick-p/gprxy
Hey all,
I built a postgresql proxy for AWS RDS, the reason i wrote this is because the current way to access and run queries on RDS is via having db users and in bigger organization it is impractical to have multiple db users for each user/team, and yes even IAM authentication exists for this same reason in RDS i personally did not find it the best way to use as it would required a bunch of configuration and changes in the RDS.
The idea here is by connecting via this proxy you would just have to run the login command that would let you do a SSO based login which will authenticate you through an IDP like azure AD before connecting to the db. Also helps me with user level audit logs
I had been looking for an opensource solution but could not find any hence rolled out my own, currently deployed and being used via k8s
Please check it out and let me know if you find it useful or have feedback, I’d really appreciate hearing from y'all.
Thanks!
https://redd.it/1oni3df
@r_devops
GitHub
GitHub - sathwick-p/gprxy: Go based SSO-first, psql-compatible proxy
Go based SSO-first, psql-compatible proxy. Contribute to sathwick-p/gprxy development by creating an account on GitHub.