Bitbucket to GitHub + Actions (self-hosted) Migration
Our engineering department is moving our entire operation from bitbucket to github, and we're struggling with a few fundamental changes in how github handles things compared to bitbucket projects.
We have about 70 repositories in our department, and we are looking for real world advice on how to manage this scale, especially since we aren't organization level administrators.
Here are the four big areas we're trying to figure out:
# 1. Managing Secrets and Credentials
In bitbucket, secrets were often stored in jenkins/our build server. Now that we're using github actions, we need a better, more secure approach for things like cloud provider keys, database credentials, and artifactory tokens.
Where do you store high-value secrets? Do you rely on github organization secrets (which feel a bit basic) or do you integrate with a dedicated vault like hashicorp vault or aws/azure key vault?
How do you fetch them securely? If you use an external vault, what's the recommended secure, passwordless way for a github action to grab a secret? We've heard about OIDC - is this the standard and how hard is it to set up?
# 2. Best Way to Use jfrog
We rely heavily on artifactory (for packages) and xray (for security scanning).
What are the best practices for integrating jfrog with github actions?
How do you securely pass artifactory tokens to your build pipelines?
# 3. Managing Repositories at Scale (70+ Repos)
In bitbucket, we had a single "project" folder for our entire department, making it easy to apply the same permissions and rules to all 70 repos at once. github doesn't have this.
How do you enforce consistent rules (like required checks, branch protection, or team access) across dozens of repos when you don't control the organization's settings?
Configuration as Code (CaC): Is using terraform (or similar tools) to manage our repository settings and github rulesets the recommended way to handle this scale and keep things in sync?
# 4. Tracking Build Health and Performance
We need to track more than just if a pipeline passed or failed. We want to monitor the stability, performance, and flakiness of our builds over time.
What are the best tools or services you use to monitor and track CI/CD performance and stability within github actions?
Are people generally exporting this data to monitoring systems or using specialized github-focused tools?
Any advice, especially from those who have done this specific migration, would be incredibly helpful! Thanks!
https://redd.it/1pghkmk
@r_devops
Our engineering department is moving our entire operation from bitbucket to github, and we're struggling with a few fundamental changes in how github handles things compared to bitbucket projects.
We have about 70 repositories in our department, and we are looking for real world advice on how to manage this scale, especially since we aren't organization level administrators.
Here are the four big areas we're trying to figure out:
# 1. Managing Secrets and Credentials
In bitbucket, secrets were often stored in jenkins/our build server. Now that we're using github actions, we need a better, more secure approach for things like cloud provider keys, database credentials, and artifactory tokens.
Where do you store high-value secrets? Do you rely on github organization secrets (which feel a bit basic) or do you integrate with a dedicated vault like hashicorp vault or aws/azure key vault?
How do you fetch them securely? If you use an external vault, what's the recommended secure, passwordless way for a github action to grab a secret? We've heard about OIDC - is this the standard and how hard is it to set up?
# 2. Best Way to Use jfrog
We rely heavily on artifactory (for packages) and xray (for security scanning).
What are the best practices for integrating jfrog with github actions?
How do you securely pass artifactory tokens to your build pipelines?
# 3. Managing Repositories at Scale (70+ Repos)
In bitbucket, we had a single "project" folder for our entire department, making it easy to apply the same permissions and rules to all 70 repos at once. github doesn't have this.
How do you enforce consistent rules (like required checks, branch protection, or team access) across dozens of repos when you don't control the organization's settings?
Configuration as Code (CaC): Is using terraform (or similar tools) to manage our repository settings and github rulesets the recommended way to handle this scale and keep things in sync?
# 4. Tracking Build Health and Performance
We need to track more than just if a pipeline passed or failed. We want to monitor the stability, performance, and flakiness of our builds over time.
What are the best tools or services you use to monitor and track CI/CD performance and stability within github actions?
Are people generally exporting this data to monitoring systems or using specialized github-focused tools?
Any advice, especially from those who have done this specific migration, would be incredibly helpful! Thanks!
https://redd.it/1pghkmk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Do tools like Semgrep or Snyk Upload Any Part of My Codebase?
Hey everyone, quick question. How much of my codebase actually gets sent to third-party servers when using tools like Semgrep or Snyk? I’m working on something that involves confidential code, so I want to be sure nothing sensitive is shared.
https://redd.it/1pgkwq3
@r_devops
Hey everyone, quick question. How much of my codebase actually gets sent to third-party servers when using tools like Semgrep or Snyk? I’m working on something that involves confidential code, so I want to be sure nothing sensitive is shared.
https://redd.it/1pgkwq3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Anyone else hit by Sha1-Hulud 2.0 transitive NPM infections in CI builds?
My team got hit months ago, three different Node.js microservices pulling malicious packages through transitive deps we didn't even know existed. Our SBOM tooling caught it but only after images were already built and tagged.
The bottleneck is we're running legacy base images with hundreds of CVEs each, so when the real threat shows up it gets buried in noise. Spent hours last week mapping which services were affected because our dependency graphs are a mess. We have never recovered.
Anyone found a clean way to block these at build time without breaking your CI pipeline? We don’t want a repeat ever.
https://redd.it/1pglm8j
@r_devops
My team got hit months ago, three different Node.js microservices pulling malicious packages through transitive deps we didn't even know existed. Our SBOM tooling caught it but only after images were already built and tagged.
The bottleneck is we're running legacy base images with hundreds of CVEs each, so when the real threat shows up it gets buried in noise. Spent hours last week mapping which services were affected because our dependency graphs are a mess. We have never recovered.
Anyone found a clean way to block these at build time without breaking your CI pipeline? We don’t want a repeat ever.
https://redd.it/1pglm8j
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Final Year Project in DevOps
Hi Guys,
I am in my Final year of my BSc and am cleat that I want to pursue my career in DevOps. I already have AWS cloud practitioner and Terraform Associate certification. I would like suggestions on what my Final year project should be. I want it to help me stand out from other candidates in future when applying for jobs. I would really appreciate your thoughts.
https://redd.it/1pgl52u
@r_devops
Hi Guys,
I am in my Final year of my BSc and am cleat that I want to pursue my career in DevOps. I already have AWS cloud practitioner and Terraform Associate certification. I would like suggestions on what my Final year project should be. I want it to help me stand out from other candidates in future when applying for jobs. I would really appreciate your thoughts.
https://redd.it/1pgl52u
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
GWLB, GWLBe, and Suricata setup
Hi, I would like to ask for insights regarding setting up GWLBe and GWLB. I tried following the diagram on this guide to implement inspection in a test setup that I have, my setup is almost the same as in the diagram except the fact that my servers is in an EKS setup. I'm not sure what I did wrong rn, as I followed the diagram perfectly but Im not seeing GENEVE traffic in my suricata instance(port 6081) and I'm not quiet sure how to check if my gwlbe is routing traffic to my GWLB.
Here's what I've tried so far:
1.) Reachability analyzer shows my IGW is reaching the GWLBe just fine.
2.) My route tables are as shown in the diagram, my app route table is 0.0.0.0/0 \> gwlbe and app vpc cidr > local. for the suricata ec2 instance route table(security vpc) its security vpc cidr > local
3.) I have 2 gwlbe and its both pointed to my vpc endpoint service, while my vpc endpoint service is pointed to my 2 GWLB in security vpc(all in available and active status)
4.) Target group of my GWLB is also properly attached and it shows my ec2 suricata instance(I only have 1 instance) registered and is on healthy status and port is 6081.
5.) systemctl status suricata shows its running with 46k rules successfully loaded
Any tips/advice/guidance regarding this is highly appreciated.
For reference here are the documents/guides I've browsed so far.
https://forum.suricata.io/t/suricata-as-ips-in-aws-with-gwlb/2465
https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-aws-gateway-load-balancer-supported-architecture-patterns/
https://www.youtube.com/watch?v=zD1vBvHu8eA&t=1523s
https://www.youtube.com/watch?v=GZzt0iJPC9Q
https://www.youtube.com/watch?v=fLp-W7pLwPY
https://redd.it/1pglwlr
@r_devops
Hi, I would like to ask for insights regarding setting up GWLBe and GWLB. I tried following the diagram on this guide to implement inspection in a test setup that I have, my setup is almost the same as in the diagram except the fact that my servers is in an EKS setup. I'm not sure what I did wrong rn, as I followed the diagram perfectly but Im not seeing GENEVE traffic in my suricata instance(port 6081) and I'm not quiet sure how to check if my gwlbe is routing traffic to my GWLB.
Here's what I've tried so far:
1.) Reachability analyzer shows my IGW is reaching the GWLBe just fine.
2.) My route tables are as shown in the diagram, my app route table is 0.0.0.0/0 \> gwlbe and app vpc cidr > local. for the suricata ec2 instance route table(security vpc) its security vpc cidr > local
3.) I have 2 gwlbe and its both pointed to my vpc endpoint service, while my vpc endpoint service is pointed to my 2 GWLB in security vpc(all in available and active status)
4.) Target group of my GWLB is also properly attached and it shows my ec2 suricata instance(I only have 1 instance) registered and is on healthy status and port is 6081.
5.) systemctl status suricata shows its running with 46k rules successfully loaded
Any tips/advice/guidance regarding this is highly appreciated.
For reference here are the documents/guides I've browsed so far.
https://forum.suricata.io/t/suricata-as-ips-in-aws-with-gwlb/2465
https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-aws-gateway-load-balancer-supported-architecture-patterns/
https://www.youtube.com/watch?v=zD1vBvHu8eA&t=1523s
https://www.youtube.com/watch?v=GZzt0iJPC9Q
https://www.youtube.com/watch?v=fLp-W7pLwPY
https://redd.it/1pglwlr
@r_devops
YouTube
Gateway Load Balancers
In this video Adrian explains Gateway Load balancers which help you easily deploy, scale, and manage your third-party virtual appliances. It gives you one gateway for distributing traffic across multiple virtual appliances while scaling them up or down, based…
Focus on DevSecOps or Cybersecurity?
I am currently pursuing my Masters in Cybersecurity and have a Bachelor’s in CSE with specialisation in Cloud Computing. I am confused if I should pursue my career solely focusing on Cybersecurity or in DevSecOps. I can fully focus on 1 stream only currently. I have a mediocre knowledge in both the fields but going forward want to focus on one field only. Please someone help me or give some advice.
https://redd.it/1pgpng5
@r_devops
I am currently pursuing my Masters in Cybersecurity and have a Bachelor’s in CSE with specialisation in Cloud Computing. I am confused if I should pursue my career solely focusing on Cybersecurity or in DevSecOps. I can fully focus on 1 stream only currently. I have a mediocre knowledge in both the fields but going forward want to focus on one field only. Please someone help me or give some advice.
https://redd.it/1pgpng5
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Workflow challenges
Curious to hear from others: what’s a challenge you've been dealing with lately in your workflow that feels unnecessary or frustrating?
https://redd.it/1pgpkh7
@r_devops
Curious to hear from others: what’s a challenge you've been dealing with lately in your workflow that feels unnecessary or frustrating?
https://redd.it/1pgpkh7
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Small Change, Big Result.
I was working with someone recently who had been applying for months with almost no responses. Good experience, solid projects, strong recommendations — but her application kept disappearing into the void.
We reviewed everything together, and nothing major stood out. But there was one tiny thing:
Her resume opened with a generic line — “Experienced software engineer seeking opportunities to contribute and grow.”
It didn’t say who she was, what she was great at, or why someone should care.
I asked her to rewrite just the first 3 lines. That’s it. No full overhaul. No template change. Just a sharper intro.
She changed it to:
“Software engineer with 6+ years in backend systems, reduced API latency 40% at my last company, and passionate about building reliable, scalable infrastructure.”
That tiny change repositioned her instantly.
Within 20 hours of apply4u , she started getting callbacks
It reminded me how often job search breakthroughs come from small clarifications, not big reinventions.
Curious — what’s one small change you made in your career or job search that created a surprisingly big result?
https://redd.it/1pgttlm
@r_devops
I was working with someone recently who had been applying for months with almost no responses. Good experience, solid projects, strong recommendations — but her application kept disappearing into the void.
We reviewed everything together, and nothing major stood out. But there was one tiny thing:
Her resume opened with a generic line — “Experienced software engineer seeking opportunities to contribute and grow.”
It didn’t say who she was, what she was great at, or why someone should care.
I asked her to rewrite just the first 3 lines. That’s it. No full overhaul. No template change. Just a sharper intro.
She changed it to:
“Software engineer with 6+ years in backend systems, reduced API latency 40% at my last company, and passionate about building reliable, scalable infrastructure.”
That tiny change repositioned her instantly.
Within 20 hours of apply4u , she started getting callbacks
It reminded me how often job search breakthroughs come from small clarifications, not big reinventions.
Curious — what’s one small change you made in your career or job search that created a surprisingly big result?
https://redd.it/1pgttlm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
AI for monitor system automatically.
I just thinking about AI for monitoring & predict what can cause issue for my whole company system
Any solution advices? Thanks so many!
https://redd.it/1ph1rcm
@r_devops
I just thinking about AI for monitoring & predict what can cause issue for my whole company system
Any solution advices? Thanks so many!
https://redd.it/1ph1rcm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking to migrate company off GitHub. What’s the best alternative?
I’m exploring options to move our engineering org off GitHub. The main drivers are pricing, reliability and wanting more control over our code hosting.
For teams that have already made the switch:
* Which platforms did you evaluate?
* What did you ultimately choose (GitLab, Gitea, Bitbucket, something else)?
* Any major surprises during the migration?
Looking for practical, experience-based input before we commit to a direction.
https://redd.it/1ph3dca
@r_devops
I’m exploring options to move our engineering org off GitHub. The main drivers are pricing, reliability and wanting more control over our code hosting.
For teams that have already made the switch:
* Which platforms did you evaluate?
* What did you ultimately choose (GitLab, Gitea, Bitbucket, something else)?
* Any major surprises during the migration?
Looking for practical, experience-based input before we commit to a direction.
https://redd.it/1ph3dca
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
AI Is Going To Run Cloud Infrastructure. Whether You Believe It Or Not.
There it is. Another tech change where people inside the system (including many of the folks here) insist their jobs are too nuanced, too complex, too “human-required” to ever be automated.
Right up until the day they aren't. Cloud infrastructure is next. Not partially automated, not “assistive tooling,” but fully AI-operated.
Provisioning cloud resources isn’t more complex than plenty of work AI already handles. Even coordinating and ordering groceries is a mess of constraints, substitutions, preferences, inventory drift, routing, and budgets... And AI can already manage that today.
In 2010 Warner Bros exec dismissed Netflix in 2010 saying “the American army is not preparing for an Albanian invasion.” This week, Netflix basically bought them...
But you are smarter. Nothing can replace you... right?
Cloud infrastructure will be AI-run.
https://redd.it/1ph5bzp
@r_devops
There it is. Another tech change where people inside the system (including many of the folks here) insist their jobs are too nuanced, too complex, too “human-required” to ever be automated.
Right up until the day they aren't. Cloud infrastructure is next. Not partially automated, not “assistive tooling,” but fully AI-operated.
Provisioning cloud resources isn’t more complex than plenty of work AI already handles. Even coordinating and ordering groceries is a mess of constraints, substitutions, preferences, inventory drift, routing, and budgets... And AI can already manage that today.
In 2010 Warner Bros exec dismissed Netflix in 2010 saying “the American army is not preparing for an Albanian invasion.” This week, Netflix basically bought them...
But you are smarter. Nothing can replace you... right?
Cloud infrastructure will be AI-run.
https://redd.it/1ph5bzp
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Setting up a Linux server for production. What do you actually do in the real world?
Hey folks,
I’d like to hear how you prepare a fresh Linux server before deploying a new web application.
Scenario:
A web API, a web frontend, background jobs/workers, and a few internal-only routes that should be reachable from specific IPs only (though I’m not sure how to handle IP rotation reliably).
These are the areas I’m trying to understand:
---
1) Security and basic hardening
What are the first things you lock down on a new server?
How do you handle firewall rules, SSH configuration, and restricting internal-only endpoints?
2) Users and access management
When a developer joins or leaves, how do you add/remove their access?
Separate system users, SSH keys only, or automated provisioning tools (Ansible/Terraform)?
3) Deployment workflow
What do you use to run your services: systemd, Docker, PM2, something else?
CI/CD or manual deployments?
Do you deploy the web API, web frontend, and workers through separate pipelines, or a single pipeline that handles everything?
4) Monitoring and notifications
What do you keep an eye on (CPU, memory, logs, service health, uptime)?
Which tools do you prefer (Prometheus/Grafana, BetterStack, etc.)?
How do you deliver alerts?
5) Backups
What exactly do you back up (database only, configs, full system snapshots)?
How do you trigger and schedule backups?
How often do you test restoring them?
6) Database setup
Do you host the database on the same VPS or use a managed service?
If it's local, how do you secure it and handle updates and backups?
7) Reverse proxy and TLS
What reverse proxy do you use (Nginx, Traefik, Caddy)?
How do you automate certificates and TLS management?
8) Logging
How do you handle logs? Local storage, log rotation, or remote logging?
Do you use ELK/EFK stacks or simpler solutions?
9) Resource isolation
Do you isolate services with containers or run everything directly on the host?
How do you set CPU/memory limits for different components?
10) Automatic restarts and health checks
What ensures your services restart automatically when they fail?
systemd, Docker health checks, or another tool?
11) Secrets management
How do you store environment variables and secrets?
Simple .env files, encrypted storage, or tools like Vault/SOPS?
12) Auditing and configuration tracking
How do you track changes made on the server?
Do you rely on audit logs, command history, or Git-backed config management?
13) Network architecture
Do you use private/internal networks for internal services?
What do you expose publicly, and what stays behind a reverse proxy?
14) Background job handling
On Windows, Task Scheduler caused deployment issues when jobs were still running.
How should this be handled on Linux?
If a job is still running during a new deployment, do you stop it, let it finish, or rely on a queue system to avoid conflicts?
15) Securing tools like Grafana and admin-only routes
What’s the best way to prevent tools like Grafana from being publicly reachable?
Is IP allowlisting reliable, or does IP rotation make it impractical?
For admin-only routes, would using a VPN be a better approach—especially for non-developers who need the simplest workflow?
---
I asked ChatGPT these questions as well, but I’m more interested in how people actually handle these things in real-world.
https://redd.it/1ph5lut
@r_devops
Hey folks,
I’d like to hear how you prepare a fresh Linux server before deploying a new web application.
Scenario:
A web API, a web frontend, background jobs/workers, and a few internal-only routes that should be reachable from specific IPs only (though I’m not sure how to handle IP rotation reliably).
These are the areas I’m trying to understand:
---
1) Security and basic hardening
What are the first things you lock down on a new server?
How do you handle firewall rules, SSH configuration, and restricting internal-only endpoints?
2) Users and access management
When a developer joins or leaves, how do you add/remove their access?
Separate system users, SSH keys only, or automated provisioning tools (Ansible/Terraform)?
3) Deployment workflow
What do you use to run your services: systemd, Docker, PM2, something else?
CI/CD or manual deployments?
Do you deploy the web API, web frontend, and workers through separate pipelines, or a single pipeline that handles everything?
4) Monitoring and notifications
What do you keep an eye on (CPU, memory, logs, service health, uptime)?
Which tools do you prefer (Prometheus/Grafana, BetterStack, etc.)?
How do you deliver alerts?
5) Backups
What exactly do you back up (database only, configs, full system snapshots)?
How do you trigger and schedule backups?
How often do you test restoring them?
6) Database setup
Do you host the database on the same VPS or use a managed service?
If it's local, how do you secure it and handle updates and backups?
7) Reverse proxy and TLS
What reverse proxy do you use (Nginx, Traefik, Caddy)?
How do you automate certificates and TLS management?
8) Logging
How do you handle logs? Local storage, log rotation, or remote logging?
Do you use ELK/EFK stacks or simpler solutions?
9) Resource isolation
Do you isolate services with containers or run everything directly on the host?
How do you set CPU/memory limits for different components?
10) Automatic restarts and health checks
What ensures your services restart automatically when they fail?
systemd, Docker health checks, or another tool?
11) Secrets management
How do you store environment variables and secrets?
Simple .env files, encrypted storage, or tools like Vault/SOPS?
12) Auditing and configuration tracking
How do you track changes made on the server?
Do you rely on audit logs, command history, or Git-backed config management?
13) Network architecture
Do you use private/internal networks for internal services?
What do you expose publicly, and what stays behind a reverse proxy?
14) Background job handling
On Windows, Task Scheduler caused deployment issues when jobs were still running.
How should this be handled on Linux?
If a job is still running during a new deployment, do you stop it, let it finish, or rely on a queue system to avoid conflicts?
15) Securing tools like Grafana and admin-only routes
What’s the best way to prevent tools like Grafana from being publicly reachable?
Is IP allowlisting reliable, or does IP rotation make it impractical?
For admin-only routes, would using a VPN be a better approach—especially for non-developers who need the simplest workflow?
---
I asked ChatGPT these questions as well, but I’m more interested in how people actually handle these things in real-world.
https://redd.it/1ph5lut
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Hybrid Multi-Tenancy DevOps Challenge: Managing Migrations & Deployment for Shared Schemas vs. Dedicated DB Stacks (AWS/GCP)
We are architecting a Django SaaS application and are adopting a hybrid multi-tenancy model to balance cost and compliance, relying entirely on managed cloud services (AWS Fargate/Cloud Run, RDS/Cloud SQL).
Our setup requires two different tenant environments:
1. Standard Tenants (90%): Deployed via a single shared application stack connected to one large PostgreSQL instance using Separate Schemas per Tenant (for cost efficiency).
2. Enterprise Tenants (10%): Must have Dedicated, Isolated Stacks (separate application deployment and separate managed PostgreSQL database instance) for full compliance/isolation.
The core DevOps challenge lies in managing the single codebase across these two fundamentally different infrastructure patterns.
We're debating two operational approaches:
A) Single Application / Custom Router: Deploy one central application that uses a custom router to switch between:
The main shared database connection (where schema switching occurs).
Specific dedicated database connections defined in Django settings.
B) Dual Deployment Pipeline: Maintain two separate CI/CD pipelines (or one pipeline with branching logic):
Pipeline 1: Deploys to the single shared stack.
Pipeline 2: Automates the deployment/migration across all N dedicated tenant stacks.
Key DevOps Questions:
Migration Management: Which approach is more robust for ensuring atomic, consistent migrations across Ndedicated DB instances and all the schemas in the shared DB? Is a custom management command sufficient for the dedicated DBs?
Cost vs. Effort: Does the cost savings gained from having 90% of tenants on the schema model outweigh the significant operational complexity and automation required for managing Pipeline B (scaling and maintaining N isolated stacks)?
We're looking for experience from anyone who has run a production environment managing two distinct infrastructure paradigms from a single codebase.
https://redd.it/1ph787v
@r_devops
We are architecting a Django SaaS application and are adopting a hybrid multi-tenancy model to balance cost and compliance, relying entirely on managed cloud services (AWS Fargate/Cloud Run, RDS/Cloud SQL).
Our setup requires two different tenant environments:
1. Standard Tenants (90%): Deployed via a single shared application stack connected to one large PostgreSQL instance using Separate Schemas per Tenant (for cost efficiency).
2. Enterprise Tenants (10%): Must have Dedicated, Isolated Stacks (separate application deployment and separate managed PostgreSQL database instance) for full compliance/isolation.
The core DevOps challenge lies in managing the single codebase across these two fundamentally different infrastructure patterns.
We're debating two operational approaches:
A) Single Application / Custom Router: Deploy one central application that uses a custom router to switch between:
The main shared database connection (where schema switching occurs).
Specific dedicated database connections defined in Django settings.
B) Dual Deployment Pipeline: Maintain two separate CI/CD pipelines (or one pipeline with branching logic):
Pipeline 1: Deploys to the single shared stack.
Pipeline 2: Automates the deployment/migration across all N dedicated tenant stacks.
Key DevOps Questions:
Migration Management: Which approach is more robust for ensuring atomic, consistent migrations across Ndedicated DB instances and all the schemas in the shared DB? Is a custom management command sufficient for the dedicated DBs?
Cost vs. Effort: Does the cost savings gained from having 90% of tenants on the schema model outweigh the significant operational complexity and automation required for managing Pipeline B (scaling and maintaining N isolated stacks)?
We're looking for experience from anyone who has run a production environment managing two distinct infrastructure paradigms from a single codebase.
https://redd.it/1ph787v
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need help in a devops project
Can some skilled devops engineers help me in project i am new to devops and your help would be much appreciated.
https://redd.it/1ph79lh
@r_devops
Can some skilled devops engineers help me in project i am new to devops and your help would be much appreciated.
https://redd.it/1ph79lh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
PM to DevOps
Worked 15 years as IT project manager and recently got laid off. Thinking of shifting to DevOps domain. Is it a good decision? Where do I start and how to get a start?
https://redd.it/1ph62bm
@r_devops
Worked 15 years as IT project manager and recently got laid off. Thinking of shifting to DevOps domain. Is it a good decision? Where do I start and how to get a start?
https://redd.it/1ph62bm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
[Tool] Anyone running n8n in CI? I added SARIF + JUnit output to a workflow linter and would love feedback
Hey folks,
I’m working on a static analysis tool for n8n workflows (FlowLint) and a few teams running it in CI/CD asked for better integration with the stuff they already use: GitHub Code Scanning, Jenkins, GitLab CI, etc.
So I’ve just added SARIF, JUnit XML and GitHub Actions annotations as output formats, on top of the existing human-readable and JSON formats.
TL;DR
* Tool: FlowLint (lints n8n workflows: missing error handling, unsafe patterns, etc.)
* New: `sarif`, `junit`, `github-actions` output formats
* Goal: surface workflow issues in the same places as your normal test / code quality signals
Why this exists at all
The recurring complaint from early users was basically:
"JSON is nice, but I don't want to maintain a custom parser just to get comments in PRs or red tests in Jenkins."
Most CI systems already know how to consume:
* SARIF for code quality / security (GitHub Code Scanning, Azure DevOps, VS Code)
* JUnit XML for test reports (Jenkins, GitLab CI, CircleCI, Azure Pipelines)
So instead of everyone reinventing glue code, FlowLint now speaks those formats natively.
What FlowLint outputs now (v0.3.8)
* stylish – colorful terminal output for local dev
* json – structured data for custom integrations
* sarif – SARIF 2.1.0 for code scanning / security dashboards
* junit – JUnit XML for test reports
* github-actions – native workflow commands (inline annotations in logs)
Concrete CI snippets
GitHub Code Scanning (persistent PR annotations):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format sarif --out-file flowlint.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: flowlint.sarif
GitHub Actions annotations (warnings/errors in the log stream):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format github-actions --fail-on-error
Jenkins (JUnit + test report UI):
sh 'flowlint scan ./workflows --format junit --out-file flowlint.xml'
junit 'flowlint.xml'
GitLab CI (JUnit report):
flowlint:
noscript:
- npm install -g flowlint
- flowlint scan ./workflows --format junit --out-file flowlint.xml
artifacts:
reports:
junit: flowlint.xml
Why anyone in r/devops should care
* It’s basically “policy-as-code” for n8n workflows, but integrated where you already look: PR reviews, test reports, build logs.
* You can track “workflow linting pass rate” next to unit / integration test pass rate instead of leaving workflow quality invisible.
* For GitHub specifically, SARIF means the comments actually stick around after merge, so you have some audit trail of “why did we ask for this change”.
Caveats / gotchas
* GitHub Code Scanning SARIF upload needs `security-events: write` (so not on free public repos).
* JUnit has no real concept of severity levels, so MUST / SHOULD / NIT all show as failures.
* GitHub Actions log annotations are great for quick feedback but don’t persist after the run (for history you want SARIF).
Questions for you all
1. If you’re running n8n (or similar workflow tools) in CI: how are you currently linting / enforcing best practices? Custom noscripts? Nothing?
2. Any CI systems where a dedicated output format would actually make your life easier? (TeamCity, Bamboo, Drone, Buildkite, something more niche?)
3. Would a self-contained HTML report (one file, all findings) be useful for you as a build artifact?
If this feels close but not quite right for your setup, I’d love to hear what would make it actually useful in your pipelines.
Tool: [https://flowlint.dev/cli](https://flowlint.dev/cli)
Install:
npm install -g flowlint
# or
npx flowlint scan ./workflows
Current version: v0.3.8
https://redd.it/1phb3tq
@r_devops
Hey folks,
I’m working on a static analysis tool for n8n workflows (FlowLint) and a few teams running it in CI/CD asked for better integration with the stuff they already use: GitHub Code Scanning, Jenkins, GitLab CI, etc.
So I’ve just added SARIF, JUnit XML and GitHub Actions annotations as output formats, on top of the existing human-readable and JSON formats.
TL;DR
* Tool: FlowLint (lints n8n workflows: missing error handling, unsafe patterns, etc.)
* New: `sarif`, `junit`, `github-actions` output formats
* Goal: surface workflow issues in the same places as your normal test / code quality signals
Why this exists at all
The recurring complaint from early users was basically:
"JSON is nice, but I don't want to maintain a custom parser just to get comments in PRs or red tests in Jenkins."
Most CI systems already know how to consume:
* SARIF for code quality / security (GitHub Code Scanning, Azure DevOps, VS Code)
* JUnit XML for test reports (Jenkins, GitLab CI, CircleCI, Azure Pipelines)
So instead of everyone reinventing glue code, FlowLint now speaks those formats natively.
What FlowLint outputs now (v0.3.8)
* stylish – colorful terminal output for local dev
* json – structured data for custom integrations
* sarif – SARIF 2.1.0 for code scanning / security dashboards
* junit – JUnit XML for test reports
* github-actions – native workflow commands (inline annotations in logs)
Concrete CI snippets
GitHub Code Scanning (persistent PR annotations):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format sarif --out-file flowlint.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: flowlint.sarif
GitHub Actions annotations (warnings/errors in the log stream):
- name: Run FlowLint
run: npx flowlint scan ./workflows --format github-actions --fail-on-error
Jenkins (JUnit + test report UI):
sh 'flowlint scan ./workflows --format junit --out-file flowlint.xml'
junit 'flowlint.xml'
GitLab CI (JUnit report):
flowlint:
noscript:
- npm install -g flowlint
- flowlint scan ./workflows --format junit --out-file flowlint.xml
artifacts:
reports:
junit: flowlint.xml
Why anyone in r/devops should care
* It’s basically “policy-as-code” for n8n workflows, but integrated where you already look: PR reviews, test reports, build logs.
* You can track “workflow linting pass rate” next to unit / integration test pass rate instead of leaving workflow quality invisible.
* For GitHub specifically, SARIF means the comments actually stick around after merge, so you have some audit trail of “why did we ask for this change”.
Caveats / gotchas
* GitHub Code Scanning SARIF upload needs `security-events: write` (so not on free public repos).
* JUnit has no real concept of severity levels, so MUST / SHOULD / NIT all show as failures.
* GitHub Actions log annotations are great for quick feedback but don’t persist after the run (for history you want SARIF).
Questions for you all
1. If you’re running n8n (or similar workflow tools) in CI: how are you currently linting / enforcing best practices? Custom noscripts? Nothing?
2. Any CI systems where a dedicated output format would actually make your life easier? (TeamCity, Bamboo, Drone, Buildkite, something more niche?)
3. Would a self-contained HTML report (one file, all findings) be useful for you as a build artifact?
If this feels close but not quite right for your setup, I’d love to hear what would make it actually useful in your pipelines.
Tool: [https://flowlint.dev/cli](https://flowlint.dev/cli)
Install:
npm install -g flowlint
# or
npx flowlint scan ./workflows
Current version: v0.3.8
https://redd.it/1phb3tq
@r_devops
Authorization breaks when B2B SaaS scales - role explosion, endless support tickets for access requests, blocked deployments every time permissions change. How policy-as-code fixes it (what my team and I have learned).
If you're running B2B SaaS at scale, you might have experienced frustrating things like authorization logic being scattered across your codebase, every permission change requiring deployments, and no clear answer to who can access what. Figured I'd share an approach that's been working well for teams dealing with this (this is from personal experience at my company, helping users resolve the above issues).
So the operational pain we keep seeing is that teams ship with basic RBAC. Works fine initially. Then they scale to multiple customers and hit the multitenant wall - John needs Admin at Company A but only Viewer at Company B. Same user, different contexts.
The kneejerk fix is usually to create tenant-specific roles. Editor_TenantA, Editor_TenantB, Admin_TenantA etc
Six months later they've got more roles than users, bloated JWTs, and authorization checks scattered everywhere. Each customer onboarding means another batch of role variants. Nobody can answer who can access X? without digging through code. Worse for ops, when you need to audit access or update permissions, you're touching code across repos.
Here's what we've seen work ->
Moving to tenant-aware authorization where roles are evaluated per-tenant. Same user, different permissions per tenant context. No role multiplication needed.
Then layering in ABAC for business logic, policy checks attributes instead of creating roles. Things like resource.owner_id, tenant_id, department, amount, status.
Big shift though is externalizing to a policy decision point. Decouple authorization from application code entirely. App asks is this allowed?, PDP responds based on policy. You can test policies in isolation, get consistent enforcement across your stack, have a complete audit trail in one place, and change rules without touching app code or redeploying.
The policy-as-code part now :) Policies live in Git with version control and PR reviews. Automated policy tests run in CI/CD, we've seen teams with 800+ test cases that execute in seconds. Policy changes become reviewable diffs instead of mysteries, and you can deploy policy updates independently from application deployments.
What this means is that authorization becomes observable and auditable, policy updates don't require application deployments, you get a centralized decision point with a single audit log, you can A/B test authorization rules, and compliance teams can review policy diffs in PRs.
Wrote up the full breakdown with architecture diagrams here if it's helpful: https://www.cerbos.dev/blog/how-to-implement-scalable-multitenant-authorization
Curious what approaches others are using.
https://redd.it/1phaszc
@r_devops
If you're running B2B SaaS at scale, you might have experienced frustrating things like authorization logic being scattered across your codebase, every permission change requiring deployments, and no clear answer to who can access what. Figured I'd share an approach that's been working well for teams dealing with this (this is from personal experience at my company, helping users resolve the above issues).
So the operational pain we keep seeing is that teams ship with basic RBAC. Works fine initially. Then they scale to multiple customers and hit the multitenant wall - John needs Admin at Company A but only Viewer at Company B. Same user, different contexts.
The kneejerk fix is usually to create tenant-specific roles. Editor_TenantA, Editor_TenantB, Admin_TenantA etc
Six months later they've got more roles than users, bloated JWTs, and authorization checks scattered everywhere. Each customer onboarding means another batch of role variants. Nobody can answer who can access X? without digging through code. Worse for ops, when you need to audit access or update permissions, you're touching code across repos.
Here's what we've seen work ->
Moving to tenant-aware authorization where roles are evaluated per-tenant. Same user, different permissions per tenant context. No role multiplication needed.
Then layering in ABAC for business logic, policy checks attributes instead of creating roles. Things like resource.owner_id, tenant_id, department, amount, status.
Big shift though is externalizing to a policy decision point. Decouple authorization from application code entirely. App asks is this allowed?, PDP responds based on policy. You can test policies in isolation, get consistent enforcement across your stack, have a complete audit trail in one place, and change rules without touching app code or redeploying.
The policy-as-code part now :) Policies live in Git with version control and PR reviews. Automated policy tests run in CI/CD, we've seen teams with 800+ test cases that execute in seconds. Policy changes become reviewable diffs instead of mysteries, and you can deploy policy updates independently from application deployments.
What this means is that authorization becomes observable and auditable, policy updates don't require application deployments, you get a centralized decision point with a single audit log, you can A/B test authorization rules, and compliance teams can review policy diffs in PRs.
Wrote up the full breakdown with architecture diagrams here if it's helpful: https://www.cerbos.dev/blog/how-to-implement-scalable-multitenant-authorization
Curious what approaches others are using.
https://redd.it/1phaszc
@r_devops
www.cerbos.dev
How to implement scalable multitenant authorization
Implement multitenant authorization without role explosion. Learn how tenant-aware roles, ABAC, and externalized policy engines provide secure, flexible, and maintainable access control for multitenant applications.
Looking for real DevOps project experience. I want to learn how the real work happens.
Hey everyone,
I’m a fresher trying to break into DevOps. I’ve learned and practiced tools like Linux, Jenkins, SonarQube, Trivy, Docker, Ansible, AWS, shell noscripting, and Python. I can use them in practice setups, but I’ve never worked on a real project with real issues or real workflows.
I’m at a point where I understand the tools but I don’t know how DevOps actually works inside a company — things like real CI/CD pipelines, debugging failures, deployments, infra tasks, teamwork, all of that.
I’m also doing a DevOps course, but the internship is a year away and it won’t include real tasks. I don’t want to wait that long. I want real exposure now so I can learn properly and build confidence.
If anyone here is working on a project (open-source, startup, internal demo, anything) and needs someone who’s serious and learns fast, I’d love to help and get some real experience.
https://redd.it/1phde65
@r_devops
Hey everyone,
I’m a fresher trying to break into DevOps. I’ve learned and practiced tools like Linux, Jenkins, SonarQube, Trivy, Docker, Ansible, AWS, shell noscripting, and Python. I can use them in practice setups, but I’ve never worked on a real project with real issues or real workflows.
I’m at a point where I understand the tools but I don’t know how DevOps actually works inside a company — things like real CI/CD pipelines, debugging failures, deployments, infra tasks, teamwork, all of that.
I’m also doing a DevOps course, but the internship is a year away and it won’t include real tasks. I don’t want to wait that long. I want real exposure now so I can learn properly and build confidence.
If anyone here is working on a project (open-source, startup, internal demo, anything) and needs someone who’s serious and learns fast, I’d love to help and get some real experience.
https://redd.it/1phde65
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
For early reliability issues when standard observability metrics remain stable
All available dashboards indicated stability. CPU utilization remained low, memory usage was steady, P95 latency showed minimal variation, and error rates appeared insignificant. Despite this users continued to report intermittent slowness not outages or outright failures but noticeable hesitation and inconsistency. Requests completed successfully yet the overall system experience proved unreliable. No alerts were triggered no thresholds were exceeded and no single indicator appeared problematic when assessed independently.
The root cause became apparent only under conditions of partial stress. minor dependency slowdowns background processes competing for limited shared resources, retry logic subtly amplifying system load and queues recovering more slowly following small traffic bursts. This exposed a meaningful gap in our observability strategy. We were measuring capacity rather than runtime behavior. The system itself was not unhealthy it was structurally imbalanced.
Which indicators do you rely on beyond standard CPU, memory, or latency metrics to identify early signs of reliability issues?
https://redd.it/1phbgy4
@r_devops
All available dashboards indicated stability. CPU utilization remained low, memory usage was steady, P95 latency showed minimal variation, and error rates appeared insignificant. Despite this users continued to report intermittent slowness not outages or outright failures but noticeable hesitation and inconsistency. Requests completed successfully yet the overall system experience proved unreliable. No alerts were triggered no thresholds were exceeded and no single indicator appeared problematic when assessed independently.
The root cause became apparent only under conditions of partial stress. minor dependency slowdowns background processes competing for limited shared resources, retry logic subtly amplifying system load and queues recovering more slowly following small traffic bursts. This exposed a meaningful gap in our observability strategy. We were measuring capacity rather than runtime behavior. The system itself was not unhealthy it was structurally imbalanced.
Which indicators do you rely on beyond standard CPU, memory, or latency metrics to identify early signs of reliability issues?
https://redd.it/1phbgy4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What do you think is the most valuable or important to learn?
Hey everyone, I’m trying to figure out what to focus on next and I’m kinda stuck. Out of these, what do you think is the most valuable or important to learn?
* Docker
* Ansible
* Kubernetes
* Databases / DB maintenance
* Security
My team covers all of these and I have an opportunity to become poc for a few but I'm not sure which one would benefit me the most since I am interested in all of them. I would like to learn and get hands on experience for the ones that would allow me to find another job.
https://redd.it/1phe3fb
@r_devops
Hey everyone, I’m trying to figure out what to focus on next and I’m kinda stuck. Out of these, what do you think is the most valuable or important to learn?
* Docker
* Ansible
* Kubernetes
* Databases / DB maintenance
* Security
My team covers all of these and I have an opportunity to become poc for a few but I'm not sure which one would benefit me the most since I am interested in all of them. I would like to learn and get hands on experience for the ones that would allow me to find another job.
https://redd.it/1phe3fb
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
React2shell: new remote code execution vulnerability in react
New react vulnerability that allows remote code execution
https://jfrog.com/blog/2025-55182-and-2025-66478-react2shell-all-you-need-to-know/
https://redd.it/1phhzy1
@r_devops
New react vulnerability that allows remote code execution
https://jfrog.com/blog/2025-55182-and-2025-66478-react2shell-all-you-need-to-know/
https://redd.it/1phhzy1
@r_devops
JFrog
CVE-2025-55182 and CVE-2025-66478 ("React2Shell"): All you need to know - UPDATED
Updated and latest information regarding the critical React RCE vulnerability (React2Shell CVE-2025-55182) . Learn how to detect and protect with JFrog.