Who owns GitHub/vcs policies and compliance at your company?
Like specific things in GitHub settings such as which branches should be protected (when you have multiple orgs and those orgs all disagree on which branches should be protected), etc.
https://redd.it/1qsp2jo
@r_devops
Like specific things in GitHub settings such as which branches should be protected (when you have multiple orgs and those orgs all disagree on which branches should be protected), etc.
https://redd.it/1qsp2jo
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Update to my “Al was implemented as a trial in my company, and it's scary.”
I’ve made a [post\](https://www.reddit.com/r/devops/s/rgLaBXNe7W) here a couple of months ago where my company was experimenting with implementing AI, this post is an update to how it went and what happened.
The company stopped hiring any “infra personnel” and started utilizing AI to do things like create and configure some AWS machines and VPCs by just talking with the agent (using the CLI) with specific IAM policies just in case.
I thought this was just a problem with the company I am in but everyone I know has almost the exact same thing. I am not working anymore, I either use AI or when I start to use my brain, everyone around me answers with AI. I am not an angel, I am a junior that can’t learn properly because no one wants to, everyone wants AI and less human error.
The only thing it failed at was deep architecture like database migration and specific clustering, but everything else it simply just does it and when it doesn’t, we only have to do maybe a single thing to fix it.
I am leaving the DevOps as a field and getting into security (was really interested in it before) but I genuinely feel like I was trolled and did nothing, and maybe even soon security would be replaced with AI.
This post may be stupid to seniors, but as a junior and people starting, this is reality. We don’t learn, we don’t grow, we are the ones getting replaced and I see no field being currently resistant to that. I will just get into moltbook and doom scroll.
Thank you for everyone who helped me pave my devops path, it is really one of the best fields I’ve ever went in and honored to have been here even if just for a short while, hopefully where I live is the problem and not the entire planet.
https://redd.it/1qsy1ln
@r_devops
I’ve made a [post\](https://www.reddit.com/r/devops/s/rgLaBXNe7W) here a couple of months ago where my company was experimenting with implementing AI, this post is an update to how it went and what happened.
The company stopped hiring any “infra personnel” and started utilizing AI to do things like create and configure some AWS machines and VPCs by just talking with the agent (using the CLI) with specific IAM policies just in case.
I thought this was just a problem with the company I am in but everyone I know has almost the exact same thing. I am not working anymore, I either use AI or when I start to use my brain, everyone around me answers with AI. I am not an angel, I am a junior that can’t learn properly because no one wants to, everyone wants AI and less human error.
The only thing it failed at was deep architecture like database migration and specific clustering, but everything else it simply just does it and when it doesn’t, we only have to do maybe a single thing to fix it.
I am leaving the DevOps as a field and getting into security (was really interested in it before) but I genuinely feel like I was trolled and did nothing, and maybe even soon security would be replaced with AI.
This post may be stupid to seniors, but as a junior and people starting, this is reality. We don’t learn, we don’t grow, we are the ones getting replaced and I see no field being currently resistant to that. I will just get into moltbook and doom scroll.
Thank you for everyone who helped me pave my devops path, it is really one of the best fields I’ve ever went in and honored to have been here even if just for a short while, hopefully where I live is the problem and not the entire planet.
https://redd.it/1qsy1ln
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
European infrastructure engineers - What's happening inside your companies regarding your dependency on US hyperscalers?
Everybody follows the news and sees what's going on.
In the Netherlands, this has sparked a debate on our dependence on US tech specifically AWS, Azure, and GCP for businesses and the government. Management at my working place (medium sized SaaS business) has instructed the operations team to start planning an exit strategy.
We will probably stay with AWS for the time being but will slowly move everything towards OSS components as long as it's a feasible option. This shift was already initiated last year by moving towards Kubernetes, but we still use a dozen AWS services. It's going to take some time to move to a more portable architecture.
I'm wondering: what's going on in your company or team? Do you think this trend will last?
https://redd.it/1qsyjdw
@r_devops
Everybody follows the news and sees what's going on.
In the Netherlands, this has sparked a debate on our dependence on US tech specifically AWS, Azure, and GCP for businesses and the government. Management at my working place (medium sized SaaS business) has instructed the operations team to start planning an exit strategy.
We will probably stay with AWS for the time being but will slowly move everything towards OSS components as long as it's a feasible option. This shift was already initiated last year by moving towards Kubernetes, but we still use a dozen AWS services. It's going to take some time to move to a more portable architecture.
I'm wondering: what's going on in your company or team? Do you think this trend will last?
https://redd.it/1qsyjdw
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Almost twice (2x) the salary but high workload. Should I accept the new offer?
I have around 4-5 years of experience, and I'm in my late 20s, not married. Recently, I got a job offer from a startup, and I’m just thinking whether I should accept it. So let me brief.
The new offer’s take-home salary is almost twice the current job’s take-home salary. 80% increase. It’s a big jump as I see. For my experience, I’m pretty sure this is above the market range in my country. It’s difficult to find this kind of a job. Downsides are high workload and high risk.
So let me compare the current one and the new one.
Current job:
2 days per office job, with EPF,ETF and OPD, insurance coverage.
I’m a permanent employee, and have 3 months of notice period. So job security is high.
Current compay is large and spread across multiple countries with 1500+ employees.
Tech Stack is good. (Azure, ArgoCD, AKS, GitOps, LGTM stack, etc)
Culture is bit toxic and not supportive at all. I’m actually looking for a good job for a while.
Major releases happen 2 times per month.
New Job:
Fully Remote, USD salary, but no OPD/Insurance coverage.
Notice period is pretty low. When probation it’s 8 days and after probation it’s 4 weeks. So job security is pretty low as well.
It’s a startup, and have Sri Lankan Team, with employees in other countries as well. And it’s seems to be growing okay with funds.
Tech stack is OK/Good. (AWS, ECS, GitHub Actions, Cloudwatch, etc. )
Culture I’m not so sure. Seems it’s better than the current job.
Releases happen every week.
Both have similar kind of weekend works, once in around 2 months.
What I know is salary increase is high (80%), and the workload is high as well. As I heard few days per week I may have to work 12+ hours per day, may be even more, since this is a startup.
Current job’s workload is also sometimes getting higher. I believe the new one will be pretty high. And the new job security is pretty low as well with smaller notice.
For me it’s high risk, high income, high stress/ workload job.
Should I accept the new offer?? What’ your opinion. I like to hear from experienced people in the industry.
https://redd.it/1qt0aca
@r_devops
I have around 4-5 years of experience, and I'm in my late 20s, not married. Recently, I got a job offer from a startup, and I’m just thinking whether I should accept it. So let me brief.
The new offer’s take-home salary is almost twice the current job’s take-home salary. 80% increase. It’s a big jump as I see. For my experience, I’m pretty sure this is above the market range in my country. It’s difficult to find this kind of a job. Downsides are high workload and high risk.
So let me compare the current one and the new one.
Current job:
2 days per office job, with EPF,ETF and OPD, insurance coverage.
I’m a permanent employee, and have 3 months of notice period. So job security is high.
Current compay is large and spread across multiple countries with 1500+ employees.
Tech Stack is good. (Azure, ArgoCD, AKS, GitOps, LGTM stack, etc)
Culture is bit toxic and not supportive at all. I’m actually looking for a good job for a while.
Major releases happen 2 times per month.
New Job:
Fully Remote, USD salary, but no OPD/Insurance coverage.
Notice period is pretty low. When probation it’s 8 days and after probation it’s 4 weeks. So job security is pretty low as well.
It’s a startup, and have Sri Lankan Team, with employees in other countries as well. And it’s seems to be growing okay with funds.
Tech stack is OK/Good. (AWS, ECS, GitHub Actions, Cloudwatch, etc. )
Culture I’m not so sure. Seems it’s better than the current job.
Releases happen every week.
Both have similar kind of weekend works, once in around 2 months.
What I know is salary increase is high (80%), and the workload is high as well. As I heard few days per week I may have to work 12+ hours per day, may be even more, since this is a startup.
Current job’s workload is also sometimes getting higher. I believe the new one will be pretty high. And the new job security is pretty low as well with smaller notice.
For me it’s high risk, high income, high stress/ workload job.
Should I accept the new offer?? What’ your opinion. I like to hear from experienced people in the industry.
https://redd.it/1qt0aca
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Linux packages - v2026.02.01 - Versions, files and directories
In operating systems with shared dependencies, we often don't know which program or version a particular file was in. This is a recurring problem in my daily work. That's why I created a public domain index with all the packages from the Arch Linux, Artix Linux, Black Arch Linux, and CachyOS Linux repositories.
It is in the public domain and is updated monthly.
https://archive.org/details/packages\_202602
https://redd.it/1qsyygh
@r_devops
In operating systems with shared dependencies, we often don't know which program or version a particular file was in. This is a recurring problem in my daily work. That's why I created a public domain index with all the packages from the Arch Linux, Artix Linux, Black Arch Linux, and CachyOS Linux repositories.
It is in the public domain and is updated monthly.
https://archive.org/details/packages\_202602
https://redd.it/1qsyygh
@r_devops
Internet Archive
Linux packages - v2026.02.01 - Versions, files and directories : Joaquin 'ShyanJMC' Crespo : Free Download, Borrow, and Streaming…
This JSON package include the program's name, version, files and directories.
My team should be renamed to talkOps
Some days I spend more time talking about reliability than actually improving it.
Standups, syncs, postmortems, pre-mortems, planning, re-planning, alignment calls... and by the time I get a quiet hour, I'm already drained.
get that communication matters, but at some point the work needs focus.
How do you protect deep work time without looking "unavailable"?
https://redd.it/1qvzhiv
@r_devops
Some days I spend more time talking about reliability than actually improving it.
Standups, syncs, postmortems, pre-mortems, planning, re-planning, alignment calls... and by the time I get a quiet hour, I'm already drained.
get that communication matters, but at some point the work needs focus.
How do you protect deep work time without looking "unavailable"?
https://redd.it/1qvzhiv
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Audits keep pulling senior engineers into work only they can explain
Growing tired of these audit cycles. We plan ahead and just when we think we’re ready senior engineers get dragged into explaining configs, workflows and edge cases that technically exist but aren’t documented in the most formal way.
It’s not wrong but it’s disruptive and hard to schedule around delivery. We want audits to be predictable not ifs buts and maybes.
How do we relieve the eng team of this work?
https://redd.it/1qvtb82
@r_devops
Growing tired of these audit cycles. We plan ahead and just when we think we’re ready senior engineers get dragged into explaining configs, workflows and edge cases that technically exist but aren’t documented in the most formal way.
It’s not wrong but it’s disruptive and hard to schedule around delivery. We want audits to be predictable not ifs buts and maybes.
How do we relieve the eng team of this work?
https://redd.it/1qvtb82
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Every ai code assistant assumes your code can touch the internet?
Getting really tired of this.
Been evaluating tools for our team and literally everything requires cloud connectivity. Cursor sends to their servers, Copilot needs GitHub integration, Codeium is cloud-only.
What about teams where code cannot leave the building? Defense contractors, finance companies, healthcare systems... do we just not exist?
The "trust our security" pitch doesn't work when compliance says no external connections. Period. Explaining why we can't use the new hot tool gets exhausting.
Anyone else dealing with this, or is it just us?
https://redd.it/1qwfo46
@r_devops
Getting really tired of this.
Been evaluating tools for our team and literally everything requires cloud connectivity. Cursor sends to their servers, Copilot needs GitHub integration, Codeium is cloud-only.
What about teams where code cannot leave the building? Defense contractors, finance companies, healthcare systems... do we just not exist?
The "trust our security" pitch doesn't work when compliance says no external connections. Period. Explaining why we can't use the new hot tool gets exhausting.
Anyone else dealing with this, or is it just us?
https://redd.it/1qwfo46
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Currently using code-driven RAG for K8s alerting system, considering moving to Agentic RAG - is it worth it?
Hey everyone,
I'm building a system that helps diagnose Kubernetes alerts using runbooks stored in a vector database (ChromaDB). Currently it works, but I'm questioning my architecture and wanted to get some opinions.
**Current Setup (Code-Driven RAG):**
When an alert comes in (e.g., PodOOMKilled), my code:
1. Extracts keywords from the alert using a hardcoded list (`['error', 'failed', 'crash', 'oom', 'timeout']`)
2. Queries the vector DB with those keywords
3. Checks similarity scores against fixed thresholds:
* Score ≥ 0.80 → Reuse existing runbook
* Score ≥ 0.65 → Update/adapt runbook
* Score < 0.65 → Generate new guidance
4. Passes the decision to the LLM agent.
The agent basically just executes what the code tells it to do.
**What I'm Considering (Agentic RAG):**
Instead of hardcoding the decision logic, give the agent simple tools (`search_runbooks`, `get_runbook`) and let IT:
* Formulate its own search queries
* Interpret the results
* Decide whether to reuse, adapt, or ignore runbooks
* Explain its reasoning
The decision-making moves from code to prompts.
**My Questions:**
1. Is this actually better, or am I just adding complexity?
2. For those running agentic RAG in production - how do you handle the non-determinism? My code-driven approach is predictable, agent decisions aren't.
3. Are there specific scenarios where code-driven RAG is actually preferable?
4. Any gotchas I should know about before making this switch?
I've been going back and forth on this. The agentic approach seems more flexible (agent can craft better queries than my keyword list), but I lose the predictability of "score > 0.8 = reuse".
Would love to hear from anyone who's made this transition or has opinions either way.
Thanks!
https://redd.it/1qwh5t2
@r_devops
Hey everyone,
I'm building a system that helps diagnose Kubernetes alerts using runbooks stored in a vector database (ChromaDB). Currently it works, but I'm questioning my architecture and wanted to get some opinions.
**Current Setup (Code-Driven RAG):**
When an alert comes in (e.g., PodOOMKilled), my code:
1. Extracts keywords from the alert using a hardcoded list (`['error', 'failed', 'crash', 'oom', 'timeout']`)
2. Queries the vector DB with those keywords
3. Checks similarity scores against fixed thresholds:
* Score ≥ 0.80 → Reuse existing runbook
* Score ≥ 0.65 → Update/adapt runbook
* Score < 0.65 → Generate new guidance
4. Passes the decision to the LLM agent.
The agent basically just executes what the code tells it to do.
**What I'm Considering (Agentic RAG):**
Instead of hardcoding the decision logic, give the agent simple tools (`search_runbooks`, `get_runbook`) and let IT:
* Formulate its own search queries
* Interpret the results
* Decide whether to reuse, adapt, or ignore runbooks
* Explain its reasoning
The decision-making moves from code to prompts.
**My Questions:**
1. Is this actually better, or am I just adding complexity?
2. For those running agentic RAG in production - how do you handle the non-determinism? My code-driven approach is predictable, agent decisions aren't.
3. Are there specific scenarios where code-driven RAG is actually preferable?
4. Any gotchas I should know about before making this switch?
I've been going back and forth on this. The agentic approach seems more flexible (agent can craft better queries than my keyword list), but I lose the predictability of "score > 0.8 = reuse".
Would love to hear from anyone who's made this transition or has opinions either way.
Thanks!
https://redd.it/1qwh5t2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is this enough to target a DevOps / Cloud role without a degree?
I’ve been freelancing in infra, cloud, and ops work for 3–4 years. I also co-founded a private limited company, but I’m shutting that down due to compliance and sales fatigue.
I don’t have a degree.
My experience is mostly practical:
* Windows installations, configurations
* Security hardening for Windows
* Linux server installation (Ubuntu, Red Hat)
* Email security (SPF, DKIM, DMARC)
* DNS setup (Cloudflare, Route 53)
* SSL installation
* LAMP/LEMP stack setup, maintain, support
* Server administration (Hetzner, DigitalOcean, AWS, Azure)
* Peripherals connectivity issues, driver issues
* Windows applications error troubleshooting
* Dependency management
* MySQL / PostgreSQL administration
* Deployed applications using Docker compose
* Odoo / ERPNext administration
* SES mail server setup
* AWS deployments using Lightsail, EC2, RDS, VPN, S3, CloudFront, Lambda
* Git source code management
* Deployed static sites using Hugo and Cloudflare Pages
* Protected data theft and hotlinking using BunnyCDN CORS rules
* Troubleshot android OS, increased performance by using dev tools
* Google Workspace & Microsoft Outlook for Business administration
* Identified and blocked phishing emails by diagnosing email headers
* Removed a cryptojacking malware from multiple compromised servers
* Automated repetitive processes using AutoHotKey
* Created python noscript to fetch all uploaded videos and create wordpress posts in bulk
* Prevented bots and malicious traffic using Cloudflare under attack mode
* Blocked traffic from restricted geos using Cloudflare WAF
* Filtered logs, JSON, and other data using basic regex
* Right-sized EC2 instances based on historic usage to save costs
Provisioned basic cloud infrastructure using Terraform (EC2, VPC, CIDR configuration) and worked with local Kubernetes environments (Minikube, KIND) to deploy and validate Nginx workloads based on official docs.
**Question:**
Does this map to DevOps / Cloud Engineer roles, or is it still sysadmin-heavy?
What skills would you expect before hiring someone with this background?
I’m currently pursuing IT support roles because I’ve heard that’s where most people start. If possible, I’d also appreciate some resume tips.
https://redd.it/1qwhhxe
@r_devops
I’ve been freelancing in infra, cloud, and ops work for 3–4 years. I also co-founded a private limited company, but I’m shutting that down due to compliance and sales fatigue.
I don’t have a degree.
My experience is mostly practical:
* Windows installations, configurations
* Security hardening for Windows
* Linux server installation (Ubuntu, Red Hat)
* Email security (SPF, DKIM, DMARC)
* DNS setup (Cloudflare, Route 53)
* SSL installation
* LAMP/LEMP stack setup, maintain, support
* Server administration (Hetzner, DigitalOcean, AWS, Azure)
* Peripherals connectivity issues, driver issues
* Windows applications error troubleshooting
* Dependency management
* MySQL / PostgreSQL administration
* Deployed applications using Docker compose
* Odoo / ERPNext administration
* SES mail server setup
* AWS deployments using Lightsail, EC2, RDS, VPN, S3, CloudFront, Lambda
* Git source code management
* Deployed static sites using Hugo and Cloudflare Pages
* Protected data theft and hotlinking using BunnyCDN CORS rules
* Troubleshot android OS, increased performance by using dev tools
* Google Workspace & Microsoft Outlook for Business administration
* Identified and blocked phishing emails by diagnosing email headers
* Removed a cryptojacking malware from multiple compromised servers
* Automated repetitive processes using AutoHotKey
* Created python noscript to fetch all uploaded videos and create wordpress posts in bulk
* Prevented bots and malicious traffic using Cloudflare under attack mode
* Blocked traffic from restricted geos using Cloudflare WAF
* Filtered logs, JSON, and other data using basic regex
* Right-sized EC2 instances based on historic usage to save costs
Provisioned basic cloud infrastructure using Terraform (EC2, VPC, CIDR configuration) and worked with local Kubernetes environments (Minikube, KIND) to deploy and validate Nginx workloads based on official docs.
**Question:**
Does this map to DevOps / Cloud Engineer roles, or is it still sysadmin-heavy?
What skills would you expect before hiring someone with this background?
I’m currently pursuing IT support roles because I’ve heard that’s where most people start. If possible, I’d also appreciate some resume tips.
https://redd.it/1qwhhxe
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Restricting external egress to a single API (ChatGPT) in Istio Ambient Mesh?
I'm working with Istio Ambient Mesh and trying to lock down a specific namespace (ai-namespace).
The goal: Apps in this namespace should only be allowed to send requests to the ChatGPT API (api.openai.com). All other external systems/URLs must be blocked.
I want to avoid setting the global outboundTrafficPolicy.mode to REGISTRY_ONLY because I don't want to break egress for every other namespace in the cluster.
What is the best way to "jail" just this one namespace using Waypoint proxies and AuthorizationPolicies? Has anyone done this successfully without sidecars?
https://redd.it/1qwflgn
@r_devops
I'm working with Istio Ambient Mesh and trying to lock down a specific namespace (ai-namespace).
The goal: Apps in this namespace should only be allowed to send requests to the ChatGPT API (api.openai.com). All other external systems/URLs must be blocked.
I want to avoid setting the global outboundTrafficPolicy.mode to REGISTRY_ONLY because I don't want to break egress for every other namespace in the cluster.
What is the best way to "jail" just this one namespace using Waypoint proxies and AuthorizationPolicies? Has anyone done this successfully without sidecars?
https://redd.it/1qwflgn
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
No love for Systemd?
So I'm a freelance developer and have been doing this now for 4-5 years, with half of my responsibilites typically in infra work. I've done all sorts of public/private sector stuff for small startups to large multinationals. In infra, I administer and operate anything from the single VPC AWS machine + RDS to on-site HPC clusters. I also operate some Kubernetes clusters for clients, although I'd say my biggest blindspot is yet org scale platform engineering and large public facing services with dynamic scaling, so take the following with a grain of salt.
Now that I'm doing this for a while, I gained some intuition about the things that are more important than others. Earlier, I was super interested in best possible uptimes, stability, scalability. These things obviously require many architectural considerations and resources to guarantee success.
Now that I'm running some stuff for a while, my impression is that many of the services just don't have actual requirements towards uptime, stability and performance that would warrant the engineering effort and cost.
In my quest to simplify some of the setups I run, I found what probably the old schoolers knew all along. Systemd+Journald is the GOAT (even for containerized workloads). I can go some more into detail on why I think this, but I assume this might not be news to many. Why is it though, that in this subreddit, nobody seems to talk about it? There are only a dozen or so threads mentioning it throughout recent years. Is it just a trend thing, or are there things that make you really dislike it that I might not be aware off?
https://redd.it/1qwl27q
@r_devops
So I'm a freelance developer and have been doing this now for 4-5 years, with half of my responsibilites typically in infra work. I've done all sorts of public/private sector stuff for small startups to large multinationals. In infra, I administer and operate anything from the single VPC AWS machine + RDS to on-site HPC clusters. I also operate some Kubernetes clusters for clients, although I'd say my biggest blindspot is yet org scale platform engineering and large public facing services with dynamic scaling, so take the following with a grain of salt.
Now that I'm doing this for a while, I gained some intuition about the things that are more important than others. Earlier, I was super interested in best possible uptimes, stability, scalability. These things obviously require many architectural considerations and resources to guarantee success.
Now that I'm running some stuff for a while, my impression is that many of the services just don't have actual requirements towards uptime, stability and performance that would warrant the engineering effort and cost.
In my quest to simplify some of the setups I run, I found what probably the old schoolers knew all along. Systemd+Journald is the GOAT (even for containerized workloads). I can go some more into detail on why I think this, but I assume this might not be news to many. Why is it though, that in this subreddit, nobody seems to talk about it? There are only a dozen or so threads mentioning it throughout recent years. Is it just a trend thing, or are there things that make you really dislike it that I might not be aware off?
https://redd.it/1qwl27q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Fixing Noisy Logs with OpenTelemetry Log Deduplication
Hi all, I wrote an article on reducing log volume using the OpenTelemetry Collector log deduplication processor.
It covers why duplicate logs happen in distributed systems and how to discard identical entries without sacrificing observability.
Article: https://www.dash0.com/guides/opentelemetry-log-deduplication-processor
Would love feedback from anyone using OpenTelemetry in production
https://redd.it/1qwlj2s
@r_devops
Hi all, I wrote an article on reducing log volume using the OpenTelemetry Collector log deduplication processor.
It covers why duplicate logs happen in distributed systems and how to discard identical entries without sacrificing observability.
Article: https://www.dash0.com/guides/opentelemetry-log-deduplication-processor
Would love feedback from anyone using OpenTelemetry in production
https://redd.it/1qwlj2s
@r_devops
Dash0
Fixing Noisy Logs with OpenTelemetry Log Deduplication · Dash0
Learn how the OpenTelemetry log deduplication processor collapses log storms without losing context reduces noise and keeps observability pipelines efficient
I am building Conveyor CI: a lightweight headless CI/CD orchestration engine for building CI/CD platforms.
Hi everyone.
Just released Conveyor CI v0.5.0, a lightweight headless CI/CD orchestration engine for building CI/CD platforms. Its perfect for building Internal developer platforms(IDPs) and custom platforms.
I am applying for the project to join the CNCF Sandbox and would appreciate any support, from a github star, code contributions or even technical feedback(emphasis of the feedback, I want to know if this project is even viable in the broader community)
Checkout the repo at https://github.com/open-ug/conveyor
https://redd.it/1qwnabk
@r_devops
Hi everyone.
Just released Conveyor CI v0.5.0, a lightweight headless CI/CD orchestration engine for building CI/CD platforms. Its perfect for building Internal developer platforms(IDPs) and custom platforms.
I am applying for the project to join the CNCF Sandbox and would appreciate any support, from a github star, code contributions or even technical feedback(emphasis of the feedback, I want to know if this project is even viable in the broader community)
Checkout the repo at https://github.com/open-ug/conveyor
https://redd.it/1qwnabk
@r_devops
conveyor.open.ug
Headless, cloud-native CI/CD orchestration engine | Conveyor CI
Conveyor CI is a headless, cloud-native CI/CD orchestration engine for building distributed CI/CD systems with ease.
GitHub introduces scaleset module for easier GHA scheduling on self-hosted runners
Written in Go. Available at https://github.com/actions/scaleset. Was extracted from ARC and looks like it can be a great replacement for webhook-based scheduling.
https://redd.it/1qwpd6y
@r_devops
Written in Go. Available at https://github.com/actions/scaleset. Was extracted from ARC and looks like it can be a great replacement for webhook-based scheduling.
https://redd.it/1qwpd6y
@r_devops
GitHub
GitHub - actions/scaleset: Go client for GitHub Actions Runner Scale Set APIs - build custom autoscaling solutions for self-hosted…
Go client for GitHub Actions Runner Scale Set APIs - build custom autoscaling solutions for self-hosted runners - actions/scaleset
Career Advice For New Grad Platform Engineer Oppourtunity
I’m starting as a Junior New Grad platform engineer at a fast-moving startup this summer. I’ve shipped infra systems before, as I've had a previous internship that allowed me to work on k8s and observability issues, but I care a lot about business and product impact long-term. I like platform work, but I also would like to work on product issues as well.
For folks who started in platform roles:
Did starting off in platform pigeonhole you to being platform only? Is transitioning to product-facing roles in the future harder?
What skills mattered more than raw infra depth?
What would you do in the months before starting to be able to ship quick? Kinda worried that I will need to be told what to do, due to lack of knowing the system and the tools that could help.
How do I make sure that I do not work on just YAML and terraform configs? I know that's a huge part of the job, but in my previous internship, I felt like I did not grow much or learn much when I was working on configs.
Overall, I just feel unsure on whether I can land impact for system as a Junior engineer, and also want to ensure that I can keep growing technically. Will starting off my career on a Platform team still let me achieve these goals?
https://redd.it/1qwpyuu
@r_devops
I’m starting as a Junior New Grad platform engineer at a fast-moving startup this summer. I’ve shipped infra systems before, as I've had a previous internship that allowed me to work on k8s and observability issues, but I care a lot about business and product impact long-term. I like platform work, but I also would like to work on product issues as well.
For folks who started in platform roles:
Did starting off in platform pigeonhole you to being platform only? Is transitioning to product-facing roles in the future harder?
What skills mattered more than raw infra depth?
What would you do in the months before starting to be able to ship quick? Kinda worried that I will need to be told what to do, due to lack of knowing the system and the tools that could help.
How do I make sure that I do not work on just YAML and terraform configs? I know that's a huge part of the job, but in my previous internship, I felt like I did not grow much or learn much when I was working on configs.
Overall, I just feel unsure on whether I can land impact for system as a Junior engineer, and also want to ensure that I can keep growing technically. Will starting off my career on a Platform team still let me achieve these goals?
https://redd.it/1qwpyuu
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Why do people from Eastern Europe always seem so smart?
In job interviews, I keep noticing the same thing: people from Eastern Europe (Russia, Ukraine, Belarus, Moldova, etc.) are often extremely knowledgeable and sharp. It happens so often that I’m starting to wonder if there’s a reason behind it or if it’s just my experience.
Has anyone else noticed this?
https://redd.it/1qwt1vh
@r_devops
In job interviews, I keep noticing the same thing: people from Eastern Europe (Russia, Ukraine, Belarus, Moldova, etc.) are often extremely knowledgeable and sharp. It happens so often that I’m starting to wonder if there’s a reason behind it or if it’s just my experience.
Has anyone else noticed this?
https://redd.it/1qwt1vh
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
82% K8s production adoption, 86% of CIOs planning cloud repatriation
Two data points that seem contradictory but probably aren't:
1. CNCF 2025 survey: K8s hits 82% production adoption, 66% use it for AI inference workloads
2. IDC: 86% of CIOs planned to repatriate some workloads in 2025/2026 — highest rate ever
Meanwhile the hyperscalers are spending >$600B in capex this year (36% increase), with 75% of that going to AI infrastructure. But AI services only generated \~$25B in revenue. That's a hell of a bet.
Are we heading toward messy hybrid whether we like it or not.
Are you seeing repatriation actually happening at your org, or is it still just "CIO slide deck" talk?
For those running GPU workloads — cloud, on-prem, or hybrid? What drove the decision?
Reference in case you are interested: https://www.cncf.io/announcements/2026/01/20/kubernetes-established-as-the-de-facto-operating-system-for-ai-as-production-use-hits-82-in-2025-cncf-annual-cloud-native-survey/
https://redd.it/1qwtj8p
@r_devops
Two data points that seem contradictory but probably aren't:
1. CNCF 2025 survey: K8s hits 82% production adoption, 66% use it for AI inference workloads
2. IDC: 86% of CIOs planned to repatriate some workloads in 2025/2026 — highest rate ever
Meanwhile the hyperscalers are spending >$600B in capex this year (36% increase), with 75% of that going to AI infrastructure. But AI services only generated \~$25B in revenue. That's a hell of a bet.
Are we heading toward messy hybrid whether we like it or not.
Are you seeing repatriation actually happening at your org, or is it still just "CIO slide deck" talk?
For those running GPU workloads — cloud, on-prem, or hybrid? What drove the decision?
Reference in case you are interested: https://www.cncf.io/announcements/2026/01/20/kubernetes-established-as-the-de-facto-operating-system-for-ai-as-production-use-hits-82-in-2025-cncf-annual-cloud-native-survey/
https://redd.it/1qwtj8p
@r_devops
CNCF
Kubernetes Established as the De Facto ‘Operating System’ for AI as Production Use Hits 82% in 2025 CNCF Annual Cloud Native Survey
New CNCF Annual Cloud Native Survey reveals near-universal adoption of Kubernetes Key highlights: SAN FRANCISCO, CA, January 20, 2026 —The Cloud Native Computing Foundation® (CNCF®)…
Where you guys are looking for jobs nowadays?
I'm on indeed and LinkedIn and trying my luck here too on Reddit but aside that, where do you guys are getting your hits from?
I need to find work and am spreading my effort, can't depend on only two vectors for HA to happen :D
C1 (or 2ish) english level, 6 years of experience in DevOps, 20 years overall experience, based in LATAM (Brazil). Willing to relocate but I don't have a visa to anywhere so I would need sponsorship for that.
Thanks for any ideas I can try!
https://redd.it/1qwuass
@r_devops
I'm on indeed and LinkedIn and trying my luck here too on Reddit but aside that, where do you guys are getting your hits from?
I need to find work and am spreading my effort, can't depend on only two vectors for HA to happen :D
C1 (or 2ish) english level, 6 years of experience in DevOps, 20 years overall experience, based in LATAM (Brazil). Willing to relocate but I don't have a visa to anywhere so I would need sponsorship for that.
Thanks for any ideas I can try!
https://redd.it/1qwuass
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Cool write-up about running a small $5M training cluster
Denoscription of comma's on-prem data center including a bunch of technical details: https://blog.comma.ai/datacenter/
https://redd.it/1qwt8yn
@r_devops
Denoscription of comma's on-prem data center including a bunch of technical details: https://blog.comma.ai/datacenter/
https://redd.it/1qwt8yn
@r_devops
comma.ai blog
Owning a $5M data center
Data centers are cool, everyone should have one.
How do you handle IaC drift when auto-remediation changes resources?
We use AWS Config/Security Hub with auto-remediation rules, things like enabling S3 default encryption or fixing security group rules. It works, but it creates a headache: Terraform doesn't know about the change, so the next plan either tries to revert it, or you're stuck doing manual state surgery.
Curious how other teams deal with this:
\- Do you accept the drift and fix Terraform manually?
\- Do you avoid auto-remediation entirely and handle findings through your normal IaC pipeline instead?
\- Something else?
Had an interesting conversation in the CloudPosse Slack where the take was that auto-remediation is fundamentally at odds with IaC, and the better approach is to ingest compliance findings and open PRs to fix Terraform directly. Curious if that matches what people are seeing in practice.
https://redd.it/1qwzd1i
@r_devops
We use AWS Config/Security Hub with auto-remediation rules, things like enabling S3 default encryption or fixing security group rules. It works, but it creates a headache: Terraform doesn't know about the change, so the next plan either tries to revert it, or you're stuck doing manual state surgery.
Curious how other teams deal with this:
\- Do you accept the drift and fix Terraform manually?
\- Do you avoid auto-remediation entirely and handle findings through your normal IaC pipeline instead?
\- Something else?
Had an interesting conversation in the CloudPosse Slack where the take was that auto-remediation is fundamentally at odds with IaC, and the better approach is to ingest compliance findings and open PRs to fix Terraform directly. Curious if that matches what people are seeing in practice.
https://redd.it/1qwzd1i
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community