Are there any good Infra/DevOps events in Berlin?
I’ve been trying to find more local events around Infra and DevOps. Came across something called Infra Night Berlin happening mid of October with Grafana, Terramate, and NetBird. Anyone from here going or got other similar events you’d recommend? Always nice to exchange ideas with technical fellows.
https://redd.it/1nysazb
@r_devops
I’ve been trying to find more local events around Infra and DevOps. Came across something called Infra Night Berlin happening mid of October with Grafana, Terramate, and NetBird. Anyone from here going or got other similar events you’d recommend? Always nice to exchange ideas with technical fellows.
https://redd.it/1nysazb
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Monitoring AWS Instances in US region using my Raspberry Pis at home in Europe
Hello. I wanted to ask a question about monitoring my application servers on the budget. I am planning to run applications on AWS EC2 Instances located in `us-east-2`, but in the beginning I want to save some money on infrastructure and just run Prometheus and Grafana on my Raspberry Pis at home that I have. But I am currently located in Europe so I imagine the latency will be bad when Prometheus scrapes tha data from Instances located in United States. Later on when the budget will increase I plan to move out the monitoring to AWS.
Is this a bad solution ? I have some unused Raspberry Pis and want to put them to use.
https://redd.it/1nythdb
@r_devops
Hello. I wanted to ask a question about monitoring my application servers on the budget. I am planning to run applications on AWS EC2 Instances located in `us-east-2`, but in the beginning I want to save some money on infrastructure and just run Prometheus and Grafana on my Raspberry Pis at home that I have. But I am currently located in Europe so I imagine the latency will be bad when Prometheus scrapes tha data from Instances located in United States. Later on when the budget will increase I plan to move out the monitoring to AWS.
Is this a bad solution ? I have some unused Raspberry Pis and want to put them to use.
https://redd.it/1nythdb
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I built a tool to automate phone verification testing — here’s what I learned about scaling reliability
When we were building and testing signup flows for our web apps, we kept hitting the same pain point — phone verification.
Every test deployment meant waiting for OTPs manually, retyping them, and dealing with flaky SMS deliveries.
So I started automating it. I built a small internal setup that spins up temporary test numbers and polls for incoming SMS automatically, validating codes during CI/CD runs.
That little internal project slowly grew into something bigger — a complete testing and verification layer that we now use (and share) to help other teams test safely without exposing personal numbers.
Along the way, I learned a few things about scaling this kind of service:
* Reliable delivery isn’t just about carrier coverage — routing logic matters.
* Privacy by design needs to be baked in early.
* Most “temporary number” providers don’t play well with automated test environments.
I’m curious if other founders here have run into the same problem while building verification-based systems — did you automate your own testing too, or rely on third-party APIs?
*(Happy to share what worked for us if anyone’s designing similar flows.)*
https://redd.it/1nyv8oh
@r_devops
When we were building and testing signup flows for our web apps, we kept hitting the same pain point — phone verification.
Every test deployment meant waiting for OTPs manually, retyping them, and dealing with flaky SMS deliveries.
So I started automating it. I built a small internal setup that spins up temporary test numbers and polls for incoming SMS automatically, validating codes during CI/CD runs.
That little internal project slowly grew into something bigger — a complete testing and verification layer that we now use (and share) to help other teams test safely without exposing personal numbers.
Along the way, I learned a few things about scaling this kind of service:
* Reliable delivery isn’t just about carrier coverage — routing logic matters.
* Privacy by design needs to be baked in early.
* Most “temporary number” providers don’t play well with automated test environments.
I’m curious if other founders here have run into the same problem while building verification-based systems — did you automate your own testing too, or rely on third-party APIs?
*(Happy to share what worked for us if anyone’s designing similar flows.)*
https://redd.it/1nyv8oh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How to ensure Sentry errors always include traces without setting tracesSampleRate to 1?
Hi guys. Hopefully this is a appropriate subreddit to post to.
I’m currently using Sentry with both Performance Monitoring (Tracing) and Session Replay enabled.
My goal is to have complete traces automatically attached to every error event for better debugging context — for example, when an error occurs in production, I’d like to see the trace that led to it and ideally a session replay as well.
Right now, I have the following configuration:
tracesSampleRate = 1; // in production
replaysOnErrorSampleRate = 1; // so every error includes a replay
This works functionally, but I’m concerned that tracesSampleRate = 1 will generate too many transaction events and quickly burn through my performance quota.
I’d like to know:
• What’s the best way to ensure traces are captured whenever an error occurs, without tracing every transaction?
• Is there any best-practice pattern or recommended configuration from Sentry for this setup?
My ideal outcome:
• Errors always include a linked trace + replay
• Non-error requests are sampled at a lower rate (e.g., 10%)
• Quota remains under control in production
https://redd.it/1nyxgd2
@r_devops
Hi guys. Hopefully this is a appropriate subreddit to post to.
I’m currently using Sentry with both Performance Monitoring (Tracing) and Session Replay enabled.
My goal is to have complete traces automatically attached to every error event for better debugging context — for example, when an error occurs in production, I’d like to see the trace that led to it and ideally a session replay as well.
Right now, I have the following configuration:
tracesSampleRate = 1; // in production
replaysOnErrorSampleRate = 1; // so every error includes a replay
This works functionally, but I’m concerned that tracesSampleRate = 1 will generate too many transaction events and quickly burn through my performance quota.
I’d like to know:
• What’s the best way to ensure traces are captured whenever an error occurs, without tracing every transaction?
• Is there any best-practice pattern or recommended configuration from Sentry for this setup?
My ideal outcome:
• Errors always include a linked trace + replay
• Non-error requests are sampled at a lower rate (e.g., 10%)
• Quota remains under control in production
https://redd.it/1nyxgd2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Reduce time spent changing mysql table structure with large amount of data.
I have a table with 5 million records, I have a column with enum type, I need to add enum value to that column, I use sequelize as usual to change it from code, but for a small number of records it is okay, but for a large number it takes a long time or errors, I have consulted other tools like ghost(github) but it also takes a lot of time with a small change, is there any solution for this everyone? I use mysql 8.0.
https://redd.it/1nyz1s8
@r_devops
I have a table with 5 million records, I have a column with enum type, I need to add enum value to that column, I use sequelize as usual to change it from code, but for a small number of records it is okay, but for a large number it takes a long time or errors, I have consulted other tools like ghost(github) but it also takes a lot of time with a small change, is there any solution for this everyone? I use mysql 8.0.
https://redd.it/1nyz1s8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Zero downtime deployments
I wanted to share a small noscript I've been using to do near-zero downtime deployments for a Node.js app, without Docker or any container setup. It's basically a simple blue-green deployment pattern implemented in PM2 and Nginx.
Idea.
Two directories:
1. Detects the active instance by checking PM2 process states.
2. Pulls latest code into the inactive directory and does a clean reset
3. Installs dependencies and builds using pnpm.
4. Starts the inactive instance with PM2 on its assigned port.
5. Runs a basic health check loop with curl to make sure it's actually responding before switching.
6. Once ready, updates the Nginx upstream port and reloads Nginx gracefully.
7. Waits a few seconds for existing connections to drain, then stops the old instance.
Not fancy, but it works. No downtime, no traffic loss, and it rolls back if Nginx config test fails.
Zero/near-zero downtime
No Docker or Kubernetes overhead
Runs fine on a simple VPS
Rollback-safe
So I'm just curious if anyone's know other good ways to handle zero-downtime or atomic deployments without using Docker.
https://redd.it/1nywxv3
@r_devops
I wanted to share a small noscript I've been using to do near-zero downtime deployments for a Node.js app, without Docker or any container setup. It's basically a simple blue-green deployment pattern implemented in PM2 and Nginx.
Idea.
Two directories:
subwatch-blue and subwatch-green. Only one is live at a time. When I deploy, the noscript figures out which one is currently active, then deploys the new version to the inactive one.1. Detects the active instance by checking PM2 process states.
2. Pulls latest code into the inactive directory and does a clean reset
3. Installs dependencies and builds using pnpm.
4. Starts the inactive instance with PM2 on its assigned port.
5. Runs a basic health check loop with curl to make sure it's actually responding before switching.
6. Once ready, updates the Nginx upstream port and reloads Nginx gracefully.
7. Waits a few seconds for existing connections to drain, then stops the old instance.
Not fancy, but it works. No downtime, no traffic loss, and it rolls back if Nginx config test fails.
Zero/near-zero downtime
No Docker or Kubernetes overhead
Runs fine on a simple VPS
Rollback-safe
So I'm just curious if anyone's know other good ways to handle zero-downtime or atomic deployments without using Docker.
https://redd.it/1nywxv3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
No shop? No problem — launch your online business today with full e-commerce, payments & analytics
I built an all-in-one platform to help shop owners and freelancers sell online without renting a shop. Happy to set up demo stores for free — check computelabs.in
https://redd.it/1nzaiy3
@r_devops
I built an all-in-one platform to help shop owners and freelancers sell online without renting a shop. Happy to set up demo stores for free — check computelabs.in
https://redd.it/1nzaiy3
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Production Support Engineer - Guidance needed
I'm working in the Production Support area for the past 3 years. Apart from managing applications in Production, resolving the incidents, Change deployment, Monitoring etc, I've been involved in couple of application server migrations as well(On premises Windows servers).
The very closely related domain for me next is Site Reliability Engineer. Also the organisation has started recently an SRE working group, and I'm included. But our task is just limited to Monitoring Dynatrace and enabling alerts, optimising them, taking care of the problems etc...
Devops is one career path which has always excited me. What would be the ideal career path for me considering my current role.
https://redd.it/1nzaru6
@r_devops
I'm working in the Production Support area for the past 3 years. Apart from managing applications in Production, resolving the incidents, Change deployment, Monitoring etc, I've been involved in couple of application server migrations as well(On premises Windows servers).
The very closely related domain for me next is Site Reliability Engineer. Also the organisation has started recently an SRE working group, and I'm included. But our task is just limited to Monitoring Dynatrace and enabling alerts, optimising them, taking care of the problems etc...
Devops is one career path which has always excited me. What would be the ideal career path for me considering my current role.
https://redd.it/1nzaru6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you keep risk assessments in sync when a new product or feature launches mid-quarter?
Fast-moving product teams can introduce new risks before the next assessment cycle.
What’s a practical way to keep risk evaluations aligned with product or feature changes throughout the quarter?
https://redd.it/1nzbuza
@r_devops
Fast-moving product teams can introduce new risks before the next assessment cycle.
What’s a practical way to keep risk evaluations aligned with product or feature changes throughout the quarter?
https://redd.it/1nzbuza
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
A little something.
Everybody says, create side projects which matter, here is the one I'm proud of. As an aspiring devops engineer, our job is make things simpler and more efficient, I created a small automation using the bash shell noscripting.
So, I have been learning linux, aws, etc (the basics).
While learning, I had to turn on instance, wait for the new ip, connect to the instance, do my work and then stop manually. Now it is automated:
https://github.com/Jain-Sameer/AWS-EC2-Automation-Script its nothing much, but honest work. let's connect!
https://redd.it/1nzct43
@r_devops
Everybody says, create side projects which matter, here is the one I'm proud of. As an aspiring devops engineer, our job is make things simpler and more efficient, I created a small automation using the bash shell noscripting.
So, I have been learning linux, aws, etc (the basics).
While learning, I had to turn on instance, wait for the new ip, connect to the instance, do my work and then stop manually. Now it is automated:
https://github.com/Jain-Sameer/AWS-EC2-Automation-Script its nothing much, but honest work. let's connect!
https://redd.it/1nzct43
@r_devops
GitHub
GitHub - Jain-Sameer/AWS-EC2-Automation-Script
Contribute to Jain-Sameer/AWS-EC2-Automation-Script development by creating an account on GitHub.
New to DevOps.
Any devops related pages on twitter to follow? for someone who is starting to get into devops. I have created a page where I will be sharing all my learnings and hoping to connect with people.
https://redd.it/1nzcp1z
@r_devops
Any devops related pages on twitter to follow? for someone who is starting to get into devops. I have created a page where I will be sharing all my learnings and hoping to connect with people.
https://redd.it/1nzcp1z
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is HTTPS the Best Protocol for Agent - Orchestrator Communication?
Hey everyone, I need some advice, knowledge, or debate on what to use for a project I'm building.
The context is that I'm developing an event-based automation platform, something like a mix of Jenkins / N8N / and Ansible (it has inspiration from all of them). Its core components are agents. These agents consume very few resources on the host vm and communicate unidirectionally with an agent orchestrator to avoid exposing dangerous ports (like 22). The communication only goes one way: from the agent host → agent orchestrator.
Now, the problem (or not) is that I'm using HTTPS for the orchestrator to tell the agent its next instruction (agents poll instructions) but after seeing this image I don't know if HTTPS is really the best protocol for this.
Should I choose another protocol for the communication or is HTTPS still the most optimal and secure choice for this use case?
A sample workflow for multiple orchestrators to follow is this one.
https://redd.it/1nzf7j9
@r_devops
Hey everyone, I need some advice, knowledge, or debate on what to use for a project I'm building.
The context is that I'm developing an event-based automation platform, something like a mix of Jenkins / N8N / and Ansible (it has inspiration from all of them). Its core components are agents. These agents consume very few resources on the host vm and communicate unidirectionally with an agent orchestrator to avoid exposing dangerous ports (like 22). The communication only goes one way: from the agent host → agent orchestrator.
Now, the problem (or not) is that I'm using HTTPS for the orchestrator to tell the agent its next instruction (agents poll instructions) but after seeing this image I don't know if HTTPS is really the best protocol for this.
Should I choose another protocol for the communication or is HTTPS still the most optimal and secure choice for this use case?
A sample workflow for multiple orchestrators to follow is this one.
https://redd.it/1nzf7j9
@r_devops
As a junior Engineer do i need to become good at non-devops languages
Hey everyone,
I’m a junior software engineer straight out of university, currently working at a company that’s given me a good opportunity, I get to choose whether I want to focus more on traditional software engineering or DevOps.
Over the past few months, I’ve naturally gravitated toward DevOps and I’ve been loving it. I find it way more interesting, and I genuinely want to get good at it. Most of the work I’ve been doing involves a lot of Terraform and a good amount of YAML for CI/CD pipelines, and I enjoy it more than writing application code.
I spoke to one of my coworkers and told him I’m considering going all-in on DevOps here. He mentioned that I should still continue practicing or trying to get involved in projects with Java and JavaScript since that’s what most of the company uses. Which seems understandable but at the same time he is really good at his job but would have the same if not worse levels of proficientcy in those other languages as i do now as he never got good at them.
For context, I know Java and JS to a decent graduate level, and I like them, but I don’t love them the same way I enjoy working with infra and tooling.
So I wanted to get some opinions from people with more experience:
If I want to pursue DevOps seriously, how important is it to keep up with languages like Java/JS?
Should I split my time between both, or is it okay to focus on DevOps and becoming really good and only maintain a basic level of application coding skill?
Any general advice for someone early in their career choosing this path?
also i would like to hear your experiences from people who went down a similar route.
https://redd.it/1nzftih
@r_devops
Hey everyone,
I’m a junior software engineer straight out of university, currently working at a company that’s given me a good opportunity, I get to choose whether I want to focus more on traditional software engineering or DevOps.
Over the past few months, I’ve naturally gravitated toward DevOps and I’ve been loving it. I find it way more interesting, and I genuinely want to get good at it. Most of the work I’ve been doing involves a lot of Terraform and a good amount of YAML for CI/CD pipelines, and I enjoy it more than writing application code.
I spoke to one of my coworkers and told him I’m considering going all-in on DevOps here. He mentioned that I should still continue practicing or trying to get involved in projects with Java and JavaScript since that’s what most of the company uses. Which seems understandable but at the same time he is really good at his job but would have the same if not worse levels of proficientcy in those other languages as i do now as he never got good at them.
For context, I know Java and JS to a decent graduate level, and I like them, but I don’t love them the same way I enjoy working with infra and tooling.
So I wanted to get some opinions from people with more experience:
If I want to pursue DevOps seriously, how important is it to keep up with languages like Java/JS?
Should I split my time between both, or is it okay to focus on DevOps and becoming really good and only maintain a basic level of application coding skill?
Any general advice for someone early in their career choosing this path?
also i would like to hear your experiences from people who went down a similar route.
https://redd.it/1nzftih
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you handle cloud cost optimization without hurting performance?
Cost optimization is a constant challenge between right-sizing, reserved instances, and autoscaling, it’s easy to overshoot or under-provision.
What strategies have actually worked for your teams to reduce spend without compromising reliability?
https://redd.it/1nzgfv8
@r_devops
Cost optimization is a constant challenge between right-sizing, reserved instances, and autoscaling, it’s easy to overshoot or under-provision.
What strategies have actually worked for your teams to reduce spend without compromising reliability?
https://redd.it/1nzgfv8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Gitlab Best Practices
Hello everyone,
We recently moved from GitHub to GitLab (not self-hosted) and I’d love to hear what best practices or lessons learned you’ve picked up along the way.
Why I am not just googling this? Because most of the articles I find are pretty superficial: do not leak sensitive info in your pipeline, write comments, etc. I am not looking for specific CI/CD best practices, but best practices for Gitlab as a whole if that makes sense.
For example, using a service account so it doesn’t eat up a seat, avoiding personal PATs for pipelines or apps that need to keep running if you leave or forget to renew them, or making sure project-level variables are scoped properly so they don’t accidentally override global ones.
What are some other gotchas or pro tips you’ve run into?
Thanks a lot!
https://redd.it/1nzgo9n
@r_devops
Hello everyone,
We recently moved from GitHub to GitLab (not self-hosted) and I’d love to hear what best practices or lessons learned you’ve picked up along the way.
Why I am not just googling this? Because most of the articles I find are pretty superficial: do not leak sensitive info in your pipeline, write comments, etc. I am not looking for specific CI/CD best practices, but best practices for Gitlab as a whole if that makes sense.
For example, using a service account so it doesn’t eat up a seat, avoiding personal PATs for pipelines or apps that need to keep running if you leave or forget to renew them, or making sure project-level variables are scoped properly so they don’t accidentally override global ones.
What are some other gotchas or pro tips you’ve run into?
Thanks a lot!
https://redd.it/1nzgo9n
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How to connect different AI tools across an organization to avoid silos?
Our data science team uses one set of tools, engineering uses another, and everything is starting to feel disconnected. How do you create a cohesive AI architecture where models from different frameworks can actually work together and share data? Are we doomed to a mess of point-to-point integrations?
https://redd.it/1nzift8
@r_devops
Our data science team uses one set of tools, engineering uses another, and everything is starting to feel disconnected. How do you create a cohesive AI architecture where models from different frameworks can actually work together and share data? Are we doomed to a mess of point-to-point integrations?
https://redd.it/1nzift8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Backstage VS Other Developer Portals
I’m in a situation where I inherited a developer portal that is designed on being a deployment UI for data scientists who need a lot of flexibility on gpu, cpu architecture, memory, volumes, etc. But they don’t really have the cloud understanding to ask for it or make their own IAC. Hence templates and UI.
However, it’s a bit of an internal monster. There’s a lot of strange choices. While the infra side is handles decently in terms of integrating with AWS, k8 scheduling, and so forth. The UI is pretty half backed, slow refreshes, doesn’t properly display logs and graphs well, and well…it’s clear it was made by engineers who had their own personal opinion on design that is not intuitive at all. Like additional docker optional runtime commands to add to a custom image being buried 6 selection windows deep.
While I’m also not a Front End and UI expert, I find that maintaining or improving the web portion of this portal to be…a lost cause in anything more than upkeep.
I was thinking of exploring backstage because it is very similar to our in house solution in terms of coding own plugs to work with the infra, but I wouldn’t have to manage my own UI elements as much. But, I’ve also heard mixed in other places I’ve looked.
TLDR:
For anyone who has had to integrate or build their own development portals for those who don’t have engineering background but still need deeply configurable k8 infra, what do you use? Especially for an infra team of…1-2 people at the moment
https://redd.it/1nzlg8o
@r_devops
I’m in a situation where I inherited a developer portal that is designed on being a deployment UI for data scientists who need a lot of flexibility on gpu, cpu architecture, memory, volumes, etc. But they don’t really have the cloud understanding to ask for it or make their own IAC. Hence templates and UI.
However, it’s a bit of an internal monster. There’s a lot of strange choices. While the infra side is handles decently in terms of integrating with AWS, k8 scheduling, and so forth. The UI is pretty half backed, slow refreshes, doesn’t properly display logs and graphs well, and well…it’s clear it was made by engineers who had their own personal opinion on design that is not intuitive at all. Like additional docker optional runtime commands to add to a custom image being buried 6 selection windows deep.
While I’m also not a Front End and UI expert, I find that maintaining or improving the web portion of this portal to be…a lost cause in anything more than upkeep.
I was thinking of exploring backstage because it is very similar to our in house solution in terms of coding own plugs to work with the infra, but I wouldn’t have to manage my own UI elements as much. But, I’ve also heard mixed in other places I’ve looked.
TLDR:
For anyone who has had to integrate or build their own development portals for those who don’t have engineering background but still need deeply configurable k8 infra, what do you use? Especially for an infra team of…1-2 people at the moment
https://redd.it/1nzlg8o
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Need advice
Hi folks, over the years I held various roles:
- desktop support(2y)
- sysadmin(almost 3yo)
- cloud sysadmin with focus on AWS and automation(3yos)
- and now SRE at a huge enterprise(a little over half a year)
The thing is I have this feeling that I never really pushed myself in any of the roles to be good and gain depth and now working as an SRE I work with completely new tech and I constantly struggle.
It feels like in any of those roles I had only 1 year of experience despite being in a role 3 years. Then when better opportunity appeared I left for another without gaining any depth.
Now I find myself struggling to interview for mid devops or other roles and on a CV I'm too senior for junior positions. Age too may not be helping as Im in mid 30s.
How would you proceed? I have AWS SAA and RHCA certs, I wrote automations using Python, actively worked on internal tooling in Python used to manage infrastructure ok AWS. Infrastructure as code with Cloudformation, containers ECS. I have limited experience with Gitlab CI/CD. I also feel that because of the new role I forget old skills.
https://redd.it/1nzn8w9
@r_devops
Hi folks, over the years I held various roles:
- desktop support(2y)
- sysadmin(almost 3yo)
- cloud sysadmin with focus on AWS and automation(3yos)
- and now SRE at a huge enterprise(a little over half a year)
The thing is I have this feeling that I never really pushed myself in any of the roles to be good and gain depth and now working as an SRE I work with completely new tech and I constantly struggle.
It feels like in any of those roles I had only 1 year of experience despite being in a role 3 years. Then when better opportunity appeared I left for another without gaining any depth.
Now I find myself struggling to interview for mid devops or other roles and on a CV I'm too senior for junior positions. Age too may not be helping as Im in mid 30s.
How would you proceed? I have AWS SAA and RHCA certs, I wrote automations using Python, actively worked on internal tooling in Python used to manage infrastructure ok AWS. Infrastructure as code with Cloudformation, containers ECS. I have limited experience with Gitlab CI/CD. I also feel that because of the new role I forget old skills.
https://redd.it/1nzn8w9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
The State of CI/CD in 2025: Key Insights from the Latest JetBrains Survey
JetBrains just published the results of a recent survey about the CI/CD tools market. A few major takeaways:
1) most organizations use more than one CI/CD tool
2) GitHub Actions rules personal projects, but Jenkins and GitLab still dominate in companies.
3) AI in CI/CD isn't really happening yet (which was surprising for me). 73% of respondents said they don't use it at all for CI/CD workflows.
Here's the full blog post. Does your team use AI in CI/CD anyhow?
https://redd.it/1nzoaww
@r_devops
JetBrains just published the results of a recent survey about the CI/CD tools market. A few major takeaways:
1) most organizations use more than one CI/CD tool
2) GitHub Actions rules personal projects, but Jenkins and GitLab still dominate in companies.
3) AI in CI/CD isn't really happening yet (which was surprising for me). 73% of respondents said they don't use it at all for CI/CD workflows.
Here's the full blog post. Does your team use AI in CI/CD anyhow?
https://redd.it/1nzoaww
@r_devops
The JetBrains Blog
The State of CI/CD in 2025: Key Insights from the Latest JetBrains Survey | The TeamCity Blog
What's the most popular CI/CD tool in 2025? How many companies use AI in their workflows? Find answers to these and other questions in this blog post, based on the latest JetBrains survey.
How to go back to W2
I’ve been working for myself for the last 6 years. Built a small B2B SaaS and have strong relationships with my customers.
I’m tired of consulting and ready to wind that part of the business down. I still have high margin subnoscription revenue (low 6 figure ARR) and maintain the infrastructure, though it’s low effort these days.
Now, I’m interested in working for a large company. Something 9-5 where I can work with smart, driven people. I miss working with passionate peers. I only have a couple employees now who work 95% independently day to day.
I want to work on something new and exciting, but without killing myself or sinking all my money into it (I have young kids).
Am I even employable in my situation? I have no clue. I’m not in a rush, just looking for advice. Thank you!
https://redd.it/1nzpn8i
@r_devops
I’ve been working for myself for the last 6 years. Built a small B2B SaaS and have strong relationships with my customers.
I’m tired of consulting and ready to wind that part of the business down. I still have high margin subnoscription revenue (low 6 figure ARR) and maintain the infrastructure, though it’s low effort these days.
Now, I’m interested in working for a large company. Something 9-5 where I can work with smart, driven people. I miss working with passionate peers. I only have a couple employees now who work 95% independently day to day.
I want to work on something new and exciting, but without killing myself or sinking all my money into it (I have young kids).
Am I even employable in my situation? I have no clue. I’m not in a rush, just looking for advice. Thank you!
https://redd.it/1nzpn8i
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How are you scheduling GPU-heavy ML jobs in your org?
From speaking with many research labs over the past year, I’ve heard ML teams usually fall back to either SLURM or Kubernetes for training jobs. They’ve shared challenges for both:
SLURM is simple but rigid, especially for hybrid/on-demand setups
K8s is elastic, but manifests and debugging overhead don’t make for a smooth researcher experience
We’ve been experimenting with a different approach and just released Transformer Lab GPU Orchestration. It’s open-source and built on SkyPilot + Ray + K8s. It’s designed with modern AI/ML workloads in mind:
All GPUs (local + 20+ clouds) are abstracted up as a unified pool to researchers to be reserved
Jobs can burst to the cloud automatically when the local cluster is fully utilized
Distributed orchestration (checkpointing, retries, failover) handled under the hood
Admins get quotas, priorities, utilization reports
I’m curious how devops folks here handle ML training pipelines and if you’ve experienced any challenges we’ve heard?
If you’re interested, please check out the repo (https://github.com/transformerlab/transformerlab-gpu-orchestration) or sign up for our beta (https://lab.cloud). Again it’s open source and easy to set up a pilot alongside your existing SLURM implementation. Appreciate your feedback.
https://redd.it/1nzrmqx
@r_devops
From speaking with many research labs over the past year, I’ve heard ML teams usually fall back to either SLURM or Kubernetes for training jobs. They’ve shared challenges for both:
SLURM is simple but rigid, especially for hybrid/on-demand setups
K8s is elastic, but manifests and debugging overhead don’t make for a smooth researcher experience
We’ve been experimenting with a different approach and just released Transformer Lab GPU Orchestration. It’s open-source and built on SkyPilot + Ray + K8s. It’s designed with modern AI/ML workloads in mind:
All GPUs (local + 20+ clouds) are abstracted up as a unified pool to researchers to be reserved
Jobs can burst to the cloud automatically when the local cluster is fully utilized
Distributed orchestration (checkpointing, retries, failover) handled under the hood
Admins get quotas, priorities, utilization reports
I’m curious how devops folks here handle ML training pipelines and if you’ve experienced any challenges we’ve heard?
If you’re interested, please check out the repo (https://github.com/transformerlab/transformerlab-gpu-orchestration) or sign up for our beta (https://lab.cloud). Again it’s open source and easy to set up a pilot alongside your existing SLURM implementation. Appreciate your feedback.
https://redd.it/1nzrmqx
@r_devops
GitHub
GitHub - transformerlab/transformerlab-gpu-orchestration
Contribute to transformerlab/transformerlab-gpu-orchestration development by creating an account on GitHub.