PM to DevOps
Worked 15 years as IT project manager and recently got laid off. Thinking of shifting to DevOps domain. Is it a good decision? Where do I start and how to get a start?
https://redd.it/1ph62bm
@r_devops
Worked 15 years as IT project manager and recently got laid off. Thinking of shifting to DevOps domain. Is it a good decision? Where do I start and how to get a start?
https://redd.it/1ph62bm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
[Tool] Anyone running n8n in CI? I added SARIF + JUnit output to a workflow linter and would love feedback
Hey folks,
I’m working on a static analysis tool for n8n workflows (FlowLint) and a few teams running it in CI/CD asked for better integration with the stuff they already use: GitHub Code Scanning, Jenkins, GitLab CI, etc.
So I’ve just added SARIF, JUnit XML and GitHub Actions annotations as output formats, on top of the existing human-readable and JSON formats.
TL;DR
* Tool: FlowLint (lints n8n workflows: missing error handling, unsafe patterns, etc.)
* New: `sarif`, `junit`, `github-actions` output formats
* Goal: surface workflow issues in the same places as your normal test / code quality signals
Why this exists at all
The recurring complaint from early users was basically:
"JSON is nice, but I don't want to maintain a custom parser just to get comments in PRs or red tests in Jenkins."
Most CI systems already know how to consume:
* SARIF for code quality / security (GitHub Code Scanning, Azure DevOps, VS Code)
* JUnit XML for test reports (Jenkins, GitLab CI, CircleCI, Azure Pipelines)
So instead of everyone reinventing glue code, FlowLint now speaks those formats natively.
What FlowLint outputs now (v0.3.8)
* stylish – colorful terminal output for local dev
* json – structured data for custom integrations
* sarif – SARIF 2.1.0 for code scanning / security dashboards
* junit – JUnit XML for test reports
* github-actions – native workflow commands (inline annotations in logs)
Concrete CI snippets
GitHub Code Scanning (persistent PR annotations):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format sarif --out-file flowlint.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: flowlint.sarif
GitHub Actions annotations (warnings/errors in the log stream):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format github-actions --fail-on-error
Jenkins (JUnit + test report UI):
sh 'flowlint scan ./workflows --format junit --out-file flowlint.xml'
junit 'flowlint.xml'
GitLab CI (JUnit report):
flowlint:
noscript:
- npm install -g flowlint
- flowlint scan ./workflows --format junit --out-file flowlint.xml
artifacts:
reports:
junit: flowlint.xml
Why anyone in r/devops should care
* It’s basically “policy-as-code” for n8n workflows, but integrated where you already look: PR reviews, test reports, build logs.
* You can track “workflow linting pass rate” next to unit / integration test pass rate instead of leaving workflow quality invisible.
* For GitHub specifically, SARIF means the comments actually stick around after merge, so you have some audit trail of “why did we ask for this change”.
Caveats / gotchas
* GitHub Code Scanning SARIF upload needs `security-events: write` (so not on free public repos).
* JUnit has no real concept of severity levels, so MUST / SHOULD / NIT all show as failures.
* GitHub Actions log annotations are great for quick feedback but don’t persist after the run (for history you want SARIF).
Questions for you all
1. If you’re running n8n (or similar workflow tools) in CI: how are you currently linting / enforcing best practices? Custom noscripts? Nothing?
2. Any CI systems where a dedicated output format would actually make your life easier? (TeamCity, Bamboo, Drone, Buildkite, something more niche?)
3. Would a self-contained HTML report (one file, all findings) be useful for you as a build artifact?
If this feels close but not quite right for your setup, I’d love to hear what would make it actually useful in your pipelines.
Tool: [https://flowlint.dev/cli](https://flowlint.dev/cli)
Install:
npm install -g flowlint
# or
npx flowlint scan ./workflows
Current version: v0.3.8
https://redd.it/1phb3tq
@r_devops
Hey folks,
I’m working on a static analysis tool for n8n workflows (FlowLint) and a few teams running it in CI/CD asked for better integration with the stuff they already use: GitHub Code Scanning, Jenkins, GitLab CI, etc.
So I’ve just added SARIF, JUnit XML and GitHub Actions annotations as output formats, on top of the existing human-readable and JSON formats.
TL;DR
* Tool: FlowLint (lints n8n workflows: missing error handling, unsafe patterns, etc.)
* New: `sarif`, `junit`, `github-actions` output formats
* Goal: surface workflow issues in the same places as your normal test / code quality signals
Why this exists at all
The recurring complaint from early users was basically:
"JSON is nice, but I don't want to maintain a custom parser just to get comments in PRs or red tests in Jenkins."
Most CI systems already know how to consume:
* SARIF for code quality / security (GitHub Code Scanning, Azure DevOps, VS Code)
* JUnit XML for test reports (Jenkins, GitLab CI, CircleCI, Azure Pipelines)
So instead of everyone reinventing glue code, FlowLint now speaks those formats natively.
What FlowLint outputs now (v0.3.8)
* stylish – colorful terminal output for local dev
* json – structured data for custom integrations
* sarif – SARIF 2.1.0 for code scanning / security dashboards
* junit – JUnit XML for test reports
* github-actions – native workflow commands (inline annotations in logs)
Concrete CI snippets
GitHub Code Scanning (persistent PR annotations):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format sarif --out-file flowlint.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: flowlint.sarif
GitHub Actions annotations (warnings/errors in the log stream):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format github-actions --fail-on-error
Jenkins (JUnit + test report UI):
sh 'flowlint scan ./workflows --format junit --out-file flowlint.xml'
junit 'flowlint.xml'
GitLab CI (JUnit report):
flowlint:
noscript:
- npm install -g flowlint
- flowlint scan ./workflows --format junit --out-file flowlint.xml
artifacts:
reports:
junit: flowlint.xml
Why anyone in r/devops should care
* It’s basically “policy-as-code” for n8n workflows, but integrated where you already look: PR reviews, test reports, build logs.
* You can track “workflow linting pass rate” next to unit / integration test pass rate instead of leaving workflow quality invisible.
* For GitHub specifically, SARIF means the comments actually stick around after merge, so you have some audit trail of “why did we ask for this change”.
Caveats / gotchas
* GitHub Code Scanning SARIF upload needs `security-events: write` (so not on free public repos).
* JUnit has no real concept of severity levels, so MUST / SHOULD / NIT all show as failures.
* GitHub Actions log annotations are great for quick feedback but don’t persist after the run (for history you want SARIF).
Questions for you all
1. If you’re running n8n (or similar workflow tools) in CI: how are you currently linting / enforcing best practices? Custom noscripts? Nothing?
2. Any CI systems where a dedicated output format would actually make your life easier? (TeamCity, Bamboo, Drone, Buildkite, something more niche?)
3. Would a self-contained HTML report (one file, all findings) be useful for you as a build artifact?
If this feels close but not quite right for your setup, I’d love to hear what would make it actually useful in your pipelines.
Tool: [https://flowlint.dev/cli](https://flowlint.dev/cli)
Install:
npm install -g flowlint
# or
npx flowlint scan ./workflows
Current version: v0.3.8
https://redd.it/1phb3tq
@r_devops
Authorization breaks when B2B SaaS scales - role explosion, endless support tickets for access requests, blocked deployments every time permissions change. How policy-as-code fixes it (what my team and I have learned).
If you're running B2B SaaS at scale, you might have experienced frustrating things like authorization logic being scattered across your codebase, every permission change requiring deployments, and no clear answer to who can access what. Figured I'd share an approach that's been working well for teams dealing with this (this is from personal experience at my company, helping users resolve the above issues).
So the operational pain we keep seeing is that teams ship with basic RBAC. Works fine initially. Then they scale to multiple customers and hit the multitenant wall - John needs Admin at Company A but only Viewer at Company B. Same user, different contexts.
The kneejerk fix is usually to create tenant-specific roles. Editor_TenantA, Editor_TenantB, Admin_TenantA etc
Six months later they've got more roles than users, bloated JWTs, and authorization checks scattered everywhere. Each customer onboarding means another batch of role variants. Nobody can answer who can access X? without digging through code. Worse for ops, when you need to audit access or update permissions, you're touching code across repos.
Here's what we've seen work ->
Moving to tenant-aware authorization where roles are evaluated per-tenant. Same user, different permissions per tenant context. No role multiplication needed.
Then layering in ABAC for business logic, policy checks attributes instead of creating roles. Things like resource.owner_id, tenant_id, department, amount, status.
Big shift though is externalizing to a policy decision point. Decouple authorization from application code entirely. App asks is this allowed?, PDP responds based on policy. You can test policies in isolation, get consistent enforcement across your stack, have a complete audit trail in one place, and change rules without touching app code or redeploying.
The policy-as-code part now :) Policies live in Git with version control and PR reviews. Automated policy tests run in CI/CD, we've seen teams with 800+ test cases that execute in seconds. Policy changes become reviewable diffs instead of mysteries, and you can deploy policy updates independently from application deployments.
What this means is that authorization becomes observable and auditable, policy updates don't require application deployments, you get a centralized decision point with a single audit log, you can A/B test authorization rules, and compliance teams can review policy diffs in PRs.
Wrote up the full breakdown with architecture diagrams here if it's helpful: https://www.cerbos.dev/blog/how-to-implement-scalable-multitenant-authorization
Curious what approaches others are using.
https://redd.it/1phaszc
@r_devops
If you're running B2B SaaS at scale, you might have experienced frustrating things like authorization logic being scattered across your codebase, every permission change requiring deployments, and no clear answer to who can access what. Figured I'd share an approach that's been working well for teams dealing with this (this is from personal experience at my company, helping users resolve the above issues).
So the operational pain we keep seeing is that teams ship with basic RBAC. Works fine initially. Then they scale to multiple customers and hit the multitenant wall - John needs Admin at Company A but only Viewer at Company B. Same user, different contexts.
The kneejerk fix is usually to create tenant-specific roles. Editor_TenantA, Editor_TenantB, Admin_TenantA etc
Six months later they've got more roles than users, bloated JWTs, and authorization checks scattered everywhere. Each customer onboarding means another batch of role variants. Nobody can answer who can access X? without digging through code. Worse for ops, when you need to audit access or update permissions, you're touching code across repos.
Here's what we've seen work ->
Moving to tenant-aware authorization where roles are evaluated per-tenant. Same user, different permissions per tenant context. No role multiplication needed.
Then layering in ABAC for business logic, policy checks attributes instead of creating roles. Things like resource.owner_id, tenant_id, department, amount, status.
Big shift though is externalizing to a policy decision point. Decouple authorization from application code entirely. App asks is this allowed?, PDP responds based on policy. You can test policies in isolation, get consistent enforcement across your stack, have a complete audit trail in one place, and change rules without touching app code or redeploying.
The policy-as-code part now :) Policies live in Git with version control and PR reviews. Automated policy tests run in CI/CD, we've seen teams with 800+ test cases that execute in seconds. Policy changes become reviewable diffs instead of mysteries, and you can deploy policy updates independently from application deployments.
What this means is that authorization becomes observable and auditable, policy updates don't require application deployments, you get a centralized decision point with a single audit log, you can A/B test authorization rules, and compliance teams can review policy diffs in PRs.
Wrote up the full breakdown with architecture diagrams here if it's helpful: https://www.cerbos.dev/blog/how-to-implement-scalable-multitenant-authorization
Curious what approaches others are using.
https://redd.it/1phaszc
@r_devops
www.cerbos.dev
How to implement scalable multitenant authorization
Implement multitenant authorization without role explosion. Learn how tenant-aware roles, ABAC, and externalized policy engines provide secure, flexible, and maintainable access control for multitenant applications.
Looking for real DevOps project experience. I want to learn how the real work happens.
Hey everyone,
I’m a fresher trying to break into DevOps. I’ve learned and practiced tools like Linux, Jenkins, SonarQube, Trivy, Docker, Ansible, AWS, shell noscripting, and Python. I can use them in practice setups, but I’ve never worked on a real project with real issues or real workflows.
I’m at a point where I understand the tools but I don’t know how DevOps actually works inside a company — things like real CI/CD pipelines, debugging failures, deployments, infra tasks, teamwork, all of that.
I’m also doing a DevOps course, but the internship is a year away and it won’t include real tasks. I don’t want to wait that long. I want real exposure now so I can learn properly and build confidence.
If anyone here is working on a project (open-source, startup, internal demo, anything) and needs someone who’s serious and learns fast, I’d love to help and get some real experience.
https://redd.it/1phde65
@r_devops
Hey everyone,
I’m a fresher trying to break into DevOps. I’ve learned and practiced tools like Linux, Jenkins, SonarQube, Trivy, Docker, Ansible, AWS, shell noscripting, and Python. I can use them in practice setups, but I’ve never worked on a real project with real issues or real workflows.
I’m at a point where I understand the tools but I don’t know how DevOps actually works inside a company — things like real CI/CD pipelines, debugging failures, deployments, infra tasks, teamwork, all of that.
I’m also doing a DevOps course, but the internship is a year away and it won’t include real tasks. I don’t want to wait that long. I want real exposure now so I can learn properly and build confidence.
If anyone here is working on a project (open-source, startup, internal demo, anything) and needs someone who’s serious and learns fast, I’d love to help and get some real experience.
https://redd.it/1phde65
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
For early reliability issues when standard observability metrics remain stable
All available dashboards indicated stability. CPU utilization remained low, memory usage was steady, P95 latency showed minimal variation, and error rates appeared insignificant. Despite this users continued to report intermittent slowness not outages or outright failures but noticeable hesitation and inconsistency. Requests completed successfully yet the overall system experience proved unreliable. No alerts were triggered no thresholds were exceeded and no single indicator appeared problematic when assessed independently.
The root cause became apparent only under conditions of partial stress. minor dependency slowdowns background processes competing for limited shared resources, retry logic subtly amplifying system load and queues recovering more slowly following small traffic bursts. This exposed a meaningful gap in our observability strategy. We were measuring capacity rather than runtime behavior. The system itself was not unhealthy it was structurally imbalanced.
Which indicators do you rely on beyond standard CPU, memory, or latency metrics to identify early signs of reliability issues?
https://redd.it/1phbgy4
@r_devops
All available dashboards indicated stability. CPU utilization remained low, memory usage was steady, P95 latency showed minimal variation, and error rates appeared insignificant. Despite this users continued to report intermittent slowness not outages or outright failures but noticeable hesitation and inconsistency. Requests completed successfully yet the overall system experience proved unreliable. No alerts were triggered no thresholds were exceeded and no single indicator appeared problematic when assessed independently.
The root cause became apparent only under conditions of partial stress. minor dependency slowdowns background processes competing for limited shared resources, retry logic subtly amplifying system load and queues recovering more slowly following small traffic bursts. This exposed a meaningful gap in our observability strategy. We were measuring capacity rather than runtime behavior. The system itself was not unhealthy it was structurally imbalanced.
Which indicators do you rely on beyond standard CPU, memory, or latency metrics to identify early signs of reliability issues?
https://redd.it/1phbgy4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What do you think is the most valuable or important to learn?
Hey everyone, I’m trying to figure out what to focus on next and I’m kinda stuck. Out of these, what do you think is the most valuable or important to learn?
* Docker
* Ansible
* Kubernetes
* Databases / DB maintenance
* Security
My team covers all of these and I have an opportunity to become poc for a few but I'm not sure which one would benefit me the most since I am interested in all of them. I would like to learn and get hands on experience for the ones that would allow me to find another job.
https://redd.it/1phe3fb
@r_devops
Hey everyone, I’m trying to figure out what to focus on next and I’m kinda stuck. Out of these, what do you think is the most valuable or important to learn?
* Docker
* Ansible
* Kubernetes
* Databases / DB maintenance
* Security
My team covers all of these and I have an opportunity to become poc for a few but I'm not sure which one would benefit me the most since I am interested in all of them. I would like to learn and get hands on experience for the ones that would allow me to find another job.
https://redd.it/1phe3fb
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
React2shell: new remote code execution vulnerability in react
New react vulnerability that allows remote code execution
https://jfrog.com/blog/2025-55182-and-2025-66478-react2shell-all-you-need-to-know/
https://redd.it/1phhzy1
@r_devops
New react vulnerability that allows remote code execution
https://jfrog.com/blog/2025-55182-and-2025-66478-react2shell-all-you-need-to-know/
https://redd.it/1phhzy1
@r_devops
JFrog
CVE-2025-55182 and CVE-2025-66478 ("React2Shell"): All you need to know - UPDATED
Updated and latest information regarding the critical React RCE vulnerability (React2Shell CVE-2025-55182) . Learn how to detect and protect with JFrog.
PAM Implementation tool
hey everyone, me and my friend created this https://github.com/gateplane-io
It is a just in time, privileged access management tool from us for the community. if anyone wants to try it out and give us feedback, feel free!
https://redd.it/1phj5jh
@r_devops
hey everyone, me and my friend created this https://github.com/gateplane-io
It is a just in time, privileged access management tool from us for the community. if anyone wants to try it out and give us feedback, feel free!
https://redd.it/1phj5jh
@r_devops
GitHub
GatePlane
Just-in-Time Access. Self-Hosted. Yours. GatePlane has 10 repositories available. Follow their code on GitHub.
Manager in C-suite meeting tries to “fix error costs” by renaming HTTP status codes and thinks 200 means £200 earned
I just watched the funniest career disaster I’ve think I have ever seen, actually I challenge anyone to find another one. Big meeting. Full C-suite. This is for a real product used by more than forty thousand people every month. The engineering project manager running part of the presentation isn't technical and prides himself on saying "I am not technical" as many. times as he can, its sort of his badge of honor you know the type. You could tell he’d copied something from ChatGPT, and all the hallucinations in all their abject glory or some nonsense LinkedIn post equally as bad.
He did a whole section about “reducing the cost of errors.” Sounded normal at first. Everyone assumed he meant improving reliability or fixing failure paths. Then he started explaining his logic. He honestly believed an HTTP 200 status code meant the company earned money, like “200” meant £200 for a successful request. And he thought 400s, 500s, and everything else meant we were losing that amount of money each time. He had built a dashboard that totalled these numbers. Charts. Graphs. Sums. He spoke with total confidence like he’d uncovered some hidden financial leak. His dashboard adding these “costs” together. Totals and everything. Then he proposed a “fix.” He wanted to change all OK responses to status code 1000. And all errors to tiny numbers like 1, 2, 3. He said this would “reduce the cost of errors.” It looked like something scraped from a bad LinkedIn influencer post, but he stood there presenting it to executives as if he’d discovered a new engineering principle.
He wasn’t joking. Not even slightly. He even went as far to claimed some developers were being “difficult” because they didn’t want to implement the system he invented.
The room went silent. Then someone said, very carefully, “Let’s park this and talk after the meeting.” He genuinely thought he’d revolutionised API design by renaming status codes. It was the purest form of second-hand embarrassment. A man so confident he never thought to ask what a status code actually is.
#
https://redd.it/1phktdd
@r_devops
I just watched the funniest career disaster I’ve think I have ever seen, actually I challenge anyone to find another one. Big meeting. Full C-suite. This is for a real product used by more than forty thousand people every month. The engineering project manager running part of the presentation isn't technical and prides himself on saying "I am not technical" as many. times as he can, its sort of his badge of honor you know the type. You could tell he’d copied something from ChatGPT, and all the hallucinations in all their abject glory or some nonsense LinkedIn post equally as bad.
He did a whole section about “reducing the cost of errors.” Sounded normal at first. Everyone assumed he meant improving reliability or fixing failure paths. Then he started explaining his logic. He honestly believed an HTTP 200 status code meant the company earned money, like “200” meant £200 for a successful request. And he thought 400s, 500s, and everything else meant we were losing that amount of money each time. He had built a dashboard that totalled these numbers. Charts. Graphs. Sums. He spoke with total confidence like he’d uncovered some hidden financial leak. His dashboard adding these “costs” together. Totals and everything. Then he proposed a “fix.” He wanted to change all OK responses to status code 1000. And all errors to tiny numbers like 1, 2, 3. He said this would “reduce the cost of errors.” It looked like something scraped from a bad LinkedIn influencer post, but he stood there presenting it to executives as if he’d discovered a new engineering principle.
He wasn’t joking. Not even slightly. He even went as far to claimed some developers were being “difficult” because they didn’t want to implement the system he invented.
The room went silent. Then someone said, very carefully, “Let’s park this and talk after the meeting.” He genuinely thought he’d revolutionised API design by renaming status codes. It was the purest form of second-hand embarrassment. A man so confident he never thought to ask what a status code actually is.
#
https://redd.it/1phktdd
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Here's My Go ASDF plugin for 60+ Tools
Both Mise and ASDF can be tricky to bootstrap from scratch. I perceive scattered repositories with distributed admin permissions as a ticking bomb. It only amplifies the long-term ownership risks.
https://github.com/sumicare/universal-asdf-plugin
So, I developed an ASDF plugin in Go that consolidates all installations into a single binary.
Added:
\- self-update for `.tool-versions`
\- hashsum managment for downloaded tools into `.tool-sums`
At this stage, it's a bit of an over-refactored AI Slop kitchensink...
Took about three days, roughly 120 Windsurf queries, and 300K lines of code condensed down to 30K. Not exactly a badge of honor, but it works.
Hopefully, someone finds this useful.
Next, I'll be working on consolidating Kubernetes autoscaling and cost reporting.
This time in Rust, leveraging aya eBPF for good measure.
https://redd.it/1phmfpd
@r_devops
Both Mise and ASDF can be tricky to bootstrap from scratch. I perceive scattered repositories with distributed admin permissions as a ticking bomb. It only amplifies the long-term ownership risks.
https://github.com/sumicare/universal-asdf-plugin
So, I developed an ASDF plugin in Go that consolidates all installations into a single binary.
Added:
\- self-update for `.tool-versions`
\- hashsum managment for downloaded tools into `.tool-sums`
At this stage, it's a bit of an over-refactored AI Slop kitchensink...
Took about three days, roughly 120 Windsurf queries, and 300K lines of code condensed down to 30K. Not exactly a badge of honor, but it works.
Hopefully, someone finds this useful.
Next, I'll be working on consolidating Kubernetes autoscaling and cost reporting.
This time in Rust, leveraging aya eBPF for good measure.
https://redd.it/1phmfpd
@r_devops
GitHub
GitHub - sumicare/universal-asdf-plugin: Universal ASDF plugin
Universal ASDF plugin. Contribute to sumicare/universal-asdf-plugin development by creating an account on GitHub.
Artifactory borked?
Can anyone help me confirm that the latest self hosted Artifactory-OSS 7.125 is broken?
No matter how I install it, the front end is inaccessible. The API seems to work, but you can’t login to the webapp.
For the life of me, I can’t figure it out. It seems like portions of the webapp are just…missing.
This applies to all 7.125 OSS versions.
https://redd.it/1phiug8
@r_devops
Can anyone help me confirm that the latest self hosted Artifactory-OSS 7.125 is broken?
No matter how I install it, the front end is inaccessible. The API seems to work, but you can’t login to the webapp.
For the life of me, I can’t figure it out. It seems like portions of the webapp are just…missing.
This applies to all 7.125 OSS versions.
https://redd.it/1phiug8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Question on the stack for blog/mobile app
I'm setting up the infrastructure for a news and contest blog (and a future React Native app). The focus is on maximum optimization and low operating costs at scale (aiming for 200k+ users).
I'd like a reality check on my stack:
• Frontend Web: Next.js (Vercel Hosting + Cloudflare CDN).
• Mobile: React Native.
• CMS/Backend API: Strapi, hosted on Fly.io.
• Database: PostgreSQL via Neon (Serverless DB).
• Authentication/Users: Firebase.
Is this combination the best possible to ensure efficiency and low infrastructure costs in the long run, or is there any bottleneck (mainly in the Strapi/Fly.io/Neon trio) that I should correct before launching the app?
https://redd.it/1phr5jk
@r_devops
I'm setting up the infrastructure for a news and contest blog (and a future React Native app). The focus is on maximum optimization and low operating costs at scale (aiming for 200k+ users).
I'd like a reality check on my stack:
• Frontend Web: Next.js (Vercel Hosting + Cloudflare CDN).
• Mobile: React Native.
• CMS/Backend API: Strapi, hosted on Fly.io.
• Database: PostgreSQL via Neon (Serverless DB).
• Authentication/Users: Firebase.
Is this combination the best possible to ensure efficiency and low infrastructure costs in the long run, or is there any bottleneck (mainly in the Strapi/Fly.io/Neon trio) that I should correct before launching the app?
https://redd.it/1phr5jk
@r_devops
Developers, pls stop treating Datadog like ur personal diary
I’m slowly losing my mind here because some of our devs refuse to filter their logs.
Our Datadog bill is skyrocketing, and for what? to store masterpieces like:
Process starting...
Process started...
Process REALLY started...
Plus 300 lines of “not-an-error” stack traces.
Every time I ask them to log less, or "let me create a filter for that", I get
“we might need it later” or “it’s only a few lines” - sure, times 3 million.
Anyone else fighting the “log everything forever” cult? How do you handle this type of battle? as I need them to agree to drop much of that spend, but also respect that they may need some of it...
https://redd.it/1phs661
@r_devops
I’m slowly losing my mind here because some of our devs refuse to filter their logs.
Our Datadog bill is skyrocketing, and for what? to store masterpieces like:
Process starting...
Process started...
Process REALLY started...
Plus 300 lines of “not-an-error” stack traces.
Every time I ask them to log less, or "let me create a filter for that", I get
“we might need it later” or “it’s only a few lines” - sure, times 3 million.
Anyone else fighting the “log everything forever” cult? How do you handle this type of battle? as I need them to agree to drop much of that spend, but also respect that they may need some of it...
https://redd.it/1phs661
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Amateur Docker mistake
Hello all,
VERY much an amateur here, just now learning Docker and things. I have been working on a small project to learn using Nexus and Docker.
Since I have a new Mac, I was informed running Nexus via Docker was best due to some OS limitations. Well, everything worked fine until I made one dumb rookie mistake.
I created a repo named “docker hosted” on Nexus and needed to add port 8083. So I stopped my container. Removed it and added the additional port. What my uneducated amateur brain didn’t realize was doing this would cause me to generate a new admin password and lose all the previous user, role, blob store and rules I had created.
If you ask about backups, the project I’ve been following along with didn’t do that or hadn’t talked about that yet. So no backups. I looked for the volumes on my machine and unfortunately the previous one wasn’t there.
All this to say.. when you were first learning.. did you make any silly mistakes like this?
I feel real dumb. lol thankfully this is just for learning experience and not for work.
https://redd.it/1phs7xc
@r_devops
Hello all,
VERY much an amateur here, just now learning Docker and things. I have been working on a small project to learn using Nexus and Docker.
Since I have a new Mac, I was informed running Nexus via Docker was best due to some OS limitations. Well, everything worked fine until I made one dumb rookie mistake.
I created a repo named “docker hosted” on Nexus and needed to add port 8083. So I stopped my container. Removed it and added the additional port. What my uneducated amateur brain didn’t realize was doing this would cause me to generate a new admin password and lose all the previous user, role, blob store and rules I had created.
If you ask about backups, the project I’ve been following along with didn’t do that or hadn’t talked about that yet. So no backups. I looked for the volumes on my machine and unfortunately the previous one wasn’t there.
All this to say.. when you were first learning.. did you make any silly mistakes like this?
I feel real dumb. lol thankfully this is just for learning experience and not for work.
https://redd.it/1phs7xc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
6 years in devops — do i need to study dsa now?
hey folks,
i’ve been a devops engineer for about 6 years, mostly working with kubernetes and cloud infra. my role hasn’t really involved much coding.
now i’m aiming for bigger companies in India, and i keep hearing that they ask dsa in the first round even for devops roles. i don’t mind learning dsa if it’s actually needed, but i’m wondering if it’s worth the time.
for those who’ve interviewed recently, is dsa really required for devops/sre roles at big companies, or should i focus more on system design, cloud, and infra instead?
thanks in advance!
https://redd.it/1phfe4o
@r_devops
hey folks,
i’ve been a devops engineer for about 6 years, mostly working with kubernetes and cloud infra. my role hasn’t really involved much coding.
now i’m aiming for bigger companies in India, and i keep hearing that they ask dsa in the first round even for devops roles. i don’t mind learning dsa if it’s actually needed, but i’m wondering if it’s worth the time.
for those who’ve interviewed recently, is dsa really required for devops/sre roles at big companies, or should i focus more on system design, cloud, and infra instead?
thanks in advance!
https://redd.it/1phfe4o
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need Suggestions
Actually, i completed my Devops learning journey as much needed for fresher to get job.
I started applying and I know it's takes time to get job now. Because I am fresher and also from non it background with not it degree.
Therefore I need to keep patience.
Along with applying, i need to practice my things regularly so that I won't forget anything.
So my question is hos should I divide my timing for both- i have total 3.5 hours daily.
Consider these points as well before answering:
I need job it's very important for me
But patient i need to consider
Also just for revision and keep practicing is also important
Note: just divide timing between applying and practical
https://redd.it/1phweuz
@r_devops
Actually, i completed my Devops learning journey as much needed for fresher to get job.
I started applying and I know it's takes time to get job now. Because I am fresher and also from non it background with not it degree.
Therefore I need to keep patience.
Along with applying, i need to practice my things regularly so that I won't forget anything.
So my question is hos should I divide my timing for both- i have total 3.5 hours daily.
Consider these points as well before answering:
I need job it's very important for me
But patient i need to consider
Also just for revision and keep practicing is also important
Note: just divide timing between applying and practical
https://redd.it/1phweuz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What’s an AI tool you tried recently that actually earned a permanent spot in your workflow?
Lately it feels like there’s a new “game-changing” AI tool dropping every 10 minutes, slick websites, big claims, and then… I use it once and never open it again.
I keep finding myself going back to the same few tools, so I’m genuinely curious:
Has anything you’ve tried recently stuck enough to become part of your daily or weekly routine?
Not talking about hype or one-off demos, I mean a tool that genuinely surprised you and proved useful long-term.
Always looking for real recommendations from people who actually use this stuff, not marketing pages.
https://redd.it/1phyeoo
@r_devops
Lately it feels like there’s a new “game-changing” AI tool dropping every 10 minutes, slick websites, big claims, and then… I use it once and never open it again.
I keep finding myself going back to the same few tools, so I’m genuinely curious:
Has anything you’ve tried recently stuck enough to become part of your daily or weekly routine?
Not talking about hype or one-off demos, I mean a tool that genuinely surprised you and proved useful long-term.
Always looking for real recommendations from people who actually use this stuff, not marketing pages.
https://redd.it/1phyeoo
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need your opinions
Hey devops folk i got a question which is a common thing every dev thinks. What you think about DSA? I mean i have seen so many resumes in this sub but I didn't saw DSA thing in those resumes. Isn't DSA important in devops? I shared this question with my aunt's son who works in Japan. He was first in SRE and later shifted towards DevSecOps. He used to work in rakuten and then now working in different company. He told me that without knowing coding no company will even ask you. He said coding is also important and having knowledge of DSA is also important. He said that tool like docker.k8s,aws, linux,etc are common and you can see these tools on every resume. I mean he is not wrong and he has 7+ Y.O.E. What you think about this? I have seen devops courses from Scalar,geeksforgeeks and they also had DSA in there curriculum. So please share your opinions because i think that 90%+ students are going in devops just to avoid dsa and coding or they are going for better package. I haven't seen any youtuber discussing about DSA in devops. So is DSA also important just like any other tools which we are learning in devops?
https://redd.it/1phziiz
@r_devops
Hey devops folk i got a question which is a common thing every dev thinks. What you think about DSA? I mean i have seen so many resumes in this sub but I didn't saw DSA thing in those resumes. Isn't DSA important in devops? I shared this question with my aunt's son who works in Japan. He was first in SRE and later shifted towards DevSecOps. He used to work in rakuten and then now working in different company. He told me that without knowing coding no company will even ask you. He said coding is also important and having knowledge of DSA is also important. He said that tool like docker.k8s,aws, linux,etc are common and you can see these tools on every resume. I mean he is not wrong and he has 7+ Y.O.E. What you think about this? I have seen devops courses from Scalar,geeksforgeeks and they also had DSA in there curriculum. So please share your opinions because i think that 90%+ students are going in devops just to avoid dsa and coding or they are going for better package. I haven't seen any youtuber discussing about DSA in devops. So is DSA also important just like any other tools which we are learning in devops?
https://redd.it/1phziiz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
GitLab CI trigger merge request pipeline on push to target branch
Is there any way to trigger merge request pipeline on push/merge to TARGET (aka main) branch? Default behavior of
Maybe there is any other way to handle it? It's important to retrigger tests on MR-s after any change in main branch as they may not be valid
Now I'm looking into server hooks or just restart MR test jobs by API on merge/push to main in additional job
https://redd.it/1pi16io
@r_devops
Is there any way to trigger merge request pipeline on push/merge to TARGET (aka main) branch? Default behavior of
if: $CI_PIPELINE_SOURCE == 'merge_request_event' does not provide such behaviorMaybe there is any other way to handle it? It's important to retrigger tests on MR-s after any change in main branch as they may not be valid
Now I'm looking into server hooks or just restart MR test jobs by API on merge/push to main in additional job
https://redd.it/1pi16io
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is it possible to run iOS CI/CD from a Jenkins Linux build node? (Mac agents isn't an option) - Anyone used xtool?
I'm trying to set up CI/CD for an iOS app, but we cannot use Jenkins macOS agents (no EC2 Mac, no on-prem Mac minis - Mac based EC2 instance are crazy-ass costly ).
Our entire pipeline runs on Linux-based Jenkins nodes, and we’d prefer to keep it that way.
I came across xtool, which claims to let you run iOS builds from Linux by offloading the actual Xcode build to their cloud macOS environment: https://github.com/xtool-org/xtool
Has anyone here:
1. Run iOS CI/CD entirely from Linux Jenkins using something like xtool?
2. Used xtool in production? How reliable is it?
3. Faced any limitations (signing, keychain handling, test runners, caching, build times)?
Basically:
Is xtool a viable alternative to running a Jenkins Mac node?
Or am I missing something fundamental in the iOS build pipeline that still requires macOS locally?
Any guidance or real-world experience would be super helpful :)
https://redd.it/1pi1gwj
@r_devops
I'm trying to set up CI/CD for an iOS app, but we cannot use Jenkins macOS agents (no EC2 Mac, no on-prem Mac minis - Mac based EC2 instance are crazy-ass costly ).
Our entire pipeline runs on Linux-based Jenkins nodes, and we’d prefer to keep it that way.
I came across xtool, which claims to let you run iOS builds from Linux by offloading the actual Xcode build to their cloud macOS environment: https://github.com/xtool-org/xtool
Has anyone here:
1. Run iOS CI/CD entirely from Linux Jenkins using something like xtool?
2. Used xtool in production? How reliable is it?
3. Faced any limitations (signing, keychain handling, test runners, caching, build times)?
Basically:
Is xtool a viable alternative to running a Jenkins Mac node?
Or am I missing something fundamental in the iOS build pipeline that still requires macOS locally?
Any guidance or real-world experience would be super helpful :)
https://redd.it/1pi1gwj
@r_devops
GitHub
GitHub - xtool-org/xtool: Cross-platform Xcode replacement. Build and deploy iOS apps with SwiftPM on Linux, Windows, macOS.
Cross-platform Xcode replacement. Build and deploy iOS apps with SwiftPM on Linux, Windows, macOS. - xtool-org/xtool
Is anyone using feature flags to implement chaos engineering techniques?
I'm thinking of failure injections like additional latency, API timeouts, dependency errors, etc.
It sounds useful to have a deploy-free way to inject chaos using a flag. But you also have automatic circuit breakers and other mechanisms in place to remediate issues. Is there an overlapping?
How do you integrate feature flags and kill switches with chaos experiments, circuit breakers, and so on?
https://redd.it/1pi4x79
@r_devops
I'm thinking of failure injections like additional latency, API timeouts, dependency errors, etc.
It sounds useful to have a deploy-free way to inject chaos using a flag. But you also have automatic circuit breakers and other mechanisms in place to remediate issues. Is there an overlapping?
How do you integrate feature flags and kill switches with chaos experiments, circuit breakers, and so on?
https://redd.it/1pi4x79
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community