Feedback
We’re two founders building an AI system that automatically detects, predicts and fixes website/app errors in real time, think Tesla Autopilot for debugging in DevOps.
We’d love to learn from you, engineers, founders or DevOps folks for 10 minutes about how you currently debug issues.
Not selling anything, just trying to validate if this could save teams a significant amount time.
Happy to share a summary of what we learn + offer early access!
https://calendly.com/aarittaparia/30min
If you don’t have time, we would appreciate if you could fill this form: https://rc60edu0zkd.typeform.com/to/YixyC7S7
Thanks so much!
https://redd.it/1ooyk0s
@r_devops
We’re two founders building an AI system that automatically detects, predicts and fixes website/app errors in real time, think Tesla Autopilot for debugging in DevOps.
We’d love to learn from you, engineers, founders or DevOps folks for 10 minutes about how you currently debug issues.
Not selling anything, just trying to validate if this could save teams a significant amount time.
Happy to share a summary of what we learn + offer early access!
https://calendly.com/aarittaparia/30min
If you don’t have time, we would appreciate if you could fill this form: https://rc60edu0zkd.typeform.com/to/YixyC7S7
Thanks so much!
https://redd.it/1ooyk0s
@r_devops
Calendly
10 Minute Meeting - Aarit Taparia
Any tips on places where i can train as aspiring devops?
Hi, currently working in small company and finishing my college degree in few months.
I got interested in devops around half year ago and trained linux, git, github, github actions + Jenkins, docker hub. Built pipelines on simple projets, even did some tests.
Also got my hands on deployment with kubctl but there is a lot i have to learn yet.
Back to the question. Coders have codewars and leetcode. I wonder if there is any site for devops?
I found Qwiklabs for GCP however i was wondering what about the rest? Like solving problems or using part of the knowledge to try fixing something more difficult?
I kind of want commercial experience..
https://redd.it/1oozs5r
@r_devops
Hi, currently working in small company and finishing my college degree in few months.
I got interested in devops around half year ago and trained linux, git, github, github actions + Jenkins, docker hub. Built pipelines on simple projets, even did some tests.
Also got my hands on deployment with kubctl but there is a lot i have to learn yet.
Back to the question. Coders have codewars and leetcode. I wonder if there is any site for devops?
I found Qwiklabs for GCP however i was wondering what about the rest? Like solving problems or using part of the knowledge to try fixing something more difficult?
I kind of want commercial experience..
https://redd.it/1oozs5r
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Live Coding session for the community. Who is in? (Beginners friendly)
Wanted to give something back to the tech community, so I’ll be hosting a live coding session with cameras and mics on. Been coding for 12+ years, and the last 3 fully into AI.
We’ll code together, learn, talk about workflows, answer questions, and just have fun with it.
Tech stack (most probably):
n8n
Airtable
Apify
OpenRouter
Interested in joining?
Drop a comment saying interested or whatever you want <3
=> We’re organizing everything in a WhatsApp group to pick the best time.
Oh and yeah… the call is FREE of course.
P.S. - yesterday’s session was f****ing amazing and super fun :-)
Talk soon,
GG
https://redd.it/1op040q
@r_devops
Wanted to give something back to the tech community, so I’ll be hosting a live coding session with cameras and mics on. Been coding for 12+ years, and the last 3 fully into AI.
We’ll code together, learn, talk about workflows, answer questions, and just have fun with it.
Tech stack (most probably):
n8n
Airtable
Apify
OpenRouter
Interested in joining?
Drop a comment saying interested or whatever you want <3
=> We’re organizing everything in a WhatsApp group to pick the best time.
Oh and yeah… the call is FREE of course.
P.S. - yesterday’s session was f****ing amazing and super fun :-)
Talk soon,
GG
https://redd.it/1op040q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Reduce CI CD pipeline time strategies that actually work? Ours is 47 min and killing us!
Need serious advice because our pipeline is becoming a complete joke. Full test suite takes 47 minutes to run which is already killing our deployment velocity but now we've also got probably 15 to 20% false positive failures.
Developers have started just rerunning failed builds until they pass which defeats the entire purpose of having tests. Some are even pushing directly to production to avoid the ci wait time which is obviously terrible but i also understand their frustration.
We're supposed to be shipping multiple times daily but right now we're lucky to get one deploy out because someone's waiting for tests to finish or debugging why something failed that worked fine locally.
I've tried parallelizing the test execution but that introduced its own issues with shared state and flakiness actually got worse. Looked into better test isolation but that seems like months of refactoring work we don't have time for.
Management is breathing down my neck about deployment frequency dropping and developer satisfaction scores tanking. I need to either dramatically speed this up or make the tests way more reliable, preferably both.
How are other teams handling this? Is 47 minutes normal for a decent sized app or are we doing something fundamentally wrong with our approach?
https://redd.it/1op2qri
@r_devops
Need serious advice because our pipeline is becoming a complete joke. Full test suite takes 47 minutes to run which is already killing our deployment velocity but now we've also got probably 15 to 20% false positive failures.
Developers have started just rerunning failed builds until they pass which defeats the entire purpose of having tests. Some are even pushing directly to production to avoid the ci wait time which is obviously terrible but i also understand their frustration.
We're supposed to be shipping multiple times daily but right now we're lucky to get one deploy out because someone's waiting for tests to finish or debugging why something failed that worked fine locally.
I've tried parallelizing the test execution but that introduced its own issues with shared state and flakiness actually got worse. Looked into better test isolation but that seems like months of refactoring work we don't have time for.
Management is breathing down my neck about deployment frequency dropping and developer satisfaction scores tanking. I need to either dramatically speed this up or make the tests way more reliable, preferably both.
How are other teams handling this? Is 47 minutes normal for a decent sized app or are we doing something fundamentally wrong with our approach?
https://redd.it/1op2qri
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Building control planes is part of devops
Hi all,
I'm a developer who loves operations. My take on DevOps is that any GitOps solution based on Terraform or Ansible could become a control plane. I think we should write our own control planes instead of gluing together off-the-shelf products, and DevOps engineers are developers with a broader understanding compared to backend engineers.
I've written a library in Clojure to prove my point, and this blog article outlines it.
https://bigconfig.it/blog/demystifying-the-control-plane-the-easy-upgrade-path-from-gitops-with-bigconfig/
https://redd.it/1oozepi
@r_devops
Hi all,
I'm a developer who loves operations. My take on DevOps is that any GitOps solution based on Terraform or Ansible could become a control plane. I think we should write our own control planes instead of gluing together off-the-shelf products, and DevOps engineers are developers with a broader understanding compared to backend engineers.
I've written a library in Clojure to prove my point, and this blog article outlines it.
https://bigconfig.it/blog/demystifying-the-control-plane-the-easy-upgrade-path-from-gitops-with-bigconfig/
https://redd.it/1oozepi
@r_devops
BigConfig
Demystifying the control plane: the easy upgrade path from GitOps with BigConfig
For many engineering teams, GitOps has been a game-changer, providing a declarative way to manage infrastructure and applications. But as complexity grows, you may find your processes hitting a ceiling. The natural next step? Upgrading to a control Plane.…
Terraform AWS "Bootstrap" Project
So i've seen a few people recommend a module or separate project that handles "bootstraping" Terraform. I'm still new to TF but from my understanding this would set a local state and create resources when you then migrate the local state to.
What would be a minimal example for this needed? I'm trying to sort of create a "base" bootstrap project for Terraform and AWS.
Seems like for a "base" level module I would only need the s3 resource for storing state, but I am sure there is more I am missing that would be "good to have".
I haven't really used modules, but I am guessing I could use them in some fashion to have a sort of "template" for different aws resources? (IE: I have 4-5 different .net projects that can use the same module?)
Thanks
https://redd.it/1op6xci
@r_devops
So i've seen a few people recommend a module or separate project that handles "bootstraping" Terraform. I'm still new to TF but from my understanding this would set a local state and create resources when you then migrate the local state to.
What would be a minimal example for this needed? I'm trying to sort of create a "base" bootstrap project for Terraform and AWS.
Seems like for a "base" level module I would only need the s3 resource for storing state, but I am sure there is more I am missing that would be "good to have".
I haven't really used modules, but I am guessing I could use them in some fashion to have a sort of "template" for different aws resources? (IE: I have 4-5 different .net projects that can use the same module?)
Thanks
https://redd.it/1op6xci
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Terraform code review tool github
Hi Experts,
Are you using any tool which auto reviews the terraform code? Since our team is growing and lot of changes are coming in daily, I am looking for a free tool which can be integrated with github actions that auto reviews and comment on my PR.
Right now I am trying windsurf bot, since its already been used by developers. Works ok but not the best.
If you all are using any, what are those?
https://redd.it/1op7h8b
@r_devops
Hi Experts,
Are you using any tool which auto reviews the terraform code? Since our team is growing and lot of changes are coming in daily, I am looking for a free tool which can be integrated with github actions that auto reviews and comment on my PR.
Right now I am trying windsurf bot, since its already been used by developers. Works ok but not the best.
If you all are using any, what are those?
https://redd.it/1op7h8b
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
PyPIPlus.com 2.0 — explore Python packages better: full dependency trees, reverse dependents, OSV CVEs, licenses, offline bundles
I built **PyPIPlus.com** a tool to explore Python packages in depth and I’d love your feedback. In the past, two of my posts about this project went viral, and the feedback from the community helped shape it into what it is today.
Below is what the site currently does: **PyPIPlus.com** can be used to check a python package dependencies (incl. extras), reverse dependents, OSV CVEs, licenses, health score, purity, and to generate offline ready to install bundles.
Dependency tree: direct + transitive deps, extras, env markers
Reverse dependents: what other packages use this package
Security: OSV CVEs per version, affected/fixed ranges, CSV exports/copy
Licenses: per package and each sub-dependancy in a full tree view
Health score: 0–100 + A–F (last updates, security vuln, docs, etc.. )
Purity: pure-Python vs compiled via analysis wheel tags/build metadata (only marked pure python if the package and all dependancies are pure)
Offline bundles: all wheels + SBOM + licenses, reproducible and air-gapped
Bundle contents:
wheels/ → all dependency wheels
requirements.txt → pinned versions
install.py → universal installer (Windows/macOS/Linux)
sbom.cdx.json → CycloneDX SBOM for security scans
LICENSES.md → license summary for all packages
NOTICE → attribution (when required)
Install: `python` [`install.py`](http://install.py)
Scan: `osv-scanner --sbom sbom.cdx.json`
Live: [https://pypiplus.com](https://pypiplus.com)
Example (flask v2.3.1): [https://pypiplus.com/project/flask/2.3.1/](https://pypiplus.com/project/flask/2.3.1/)
Previous Posts:
If you’re new to the project:
I made PyPIPlus.com — a faster way to see all dependencies of any Python package
P.S: I hope I've added enough value in this project to be useful, my last attempt at sharing it in r/devops received some rough audience. Regardless, any feedback is better than no feedback.
https://redd.it/1op61jy
@r_devops
I built **PyPIPlus.com** a tool to explore Python packages in depth and I’d love your feedback. In the past, two of my posts about this project went viral, and the feedback from the community helped shape it into what it is today.
Below is what the site currently does: **PyPIPlus.com** can be used to check a python package dependencies (incl. extras), reverse dependents, OSV CVEs, licenses, health score, purity, and to generate offline ready to install bundles.
Dependency tree: direct + transitive deps, extras, env markers
Reverse dependents: what other packages use this package
Security: OSV CVEs per version, affected/fixed ranges, CSV exports/copy
Licenses: per package and each sub-dependancy in a full tree view
Health score: 0–100 + A–F (last updates, security vuln, docs, etc.. )
Purity: pure-Python vs compiled via analysis wheel tags/build metadata (only marked pure python if the package and all dependancies are pure)
Offline bundles: all wheels + SBOM + licenses, reproducible and air-gapped
Bundle contents:
wheels/ → all dependency wheels
requirements.txt → pinned versions
install.py → universal installer (Windows/macOS/Linux)
sbom.cdx.json → CycloneDX SBOM for security scans
LICENSES.md → license summary for all packages
NOTICE → attribution (when required)
Install: `python` [`install.py`](http://install.py)
Scan: `osv-scanner --sbom sbom.cdx.json`
Live: [https://pypiplus.com](https://pypiplus.com)
Example (flask v2.3.1): [https://pypiplus.com/project/flask/2.3.1/](https://pypiplus.com/project/flask/2.3.1/)
Previous Posts:
If you’re new to the project:
I made PyPIPlus.com — a faster way to see all dependencies of any Python package
P.S: I hope I've added enough value in this project to be useful, my last attempt at sharing it in r/devops received some rough audience. Regardless, any feedback is better than no feedback.
https://redd.it/1op61jy
@r_devops
PyPIPlus
PyPIPlus - Python Package Explorer with Dependency Visualization
Search and explore over 500,000 Python packages from PyPI with interactive dependency visualization
Demo Day (feat. Murphy’s Law)
This happened to me mere hours ago. Three hours before a feature demo, I did the usual prep and deployed the app to our IDP-enabled namespace. IDP was down.
I pinged the teammate who owns it; they kicked off a fresh rollout. While that was happening, we found out another team had quietly added new namespace restrictions. Few extra steps we didn’t know about. So my teammate went hunting for the docs.
As a contingency plan, my lead shared a kubeconfig for another cluster with an IDP-enabled namespace. Switched over, tried again… IDP problems there too.
Forty-five minutes to go, and the original namespace came back up with the support services. I deployed immediately only for the deployment to fail. Same version I’ve shipped many times. Logs were of no help either. Quick triage and there it was: values drift. Someone had changed the deployment values. I reverted, redeployed, everything turned green. Ten minutes before the demo, I was finally ready.
Then the meeting got postponed.
Murphy’s Law didn’t write code today, but it definitely sat in on the stand-up.
https://redd.it/1opaw04
@r_devops
This happened to me mere hours ago. Three hours before a feature demo, I did the usual prep and deployed the app to our IDP-enabled namespace. IDP was down.
I pinged the teammate who owns it; they kicked off a fresh rollout. While that was happening, we found out another team had quietly added new namespace restrictions. Few extra steps we didn’t know about. So my teammate went hunting for the docs.
As a contingency plan, my lead shared a kubeconfig for another cluster with an IDP-enabled namespace. Switched over, tried again… IDP problems there too.
Forty-five minutes to go, and the original namespace came back up with the support services. I deployed immediately only for the deployment to fail. Same version I’ve shipped many times. Logs were of no help either. Quick triage and there it was: values drift. Someone had changed the deployment values. I reverted, redeployed, everything turned green. Ten minutes before the demo, I was finally ready.
Then the meeting got postponed.
Murphy’s Law didn’t write code today, but it definitely sat in on the stand-up.
https://redd.it/1opaw04
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Anyone else drowning in static-analysis false positives?
We’ve been using multiple linters and static tools for years. They find everything from unused imports to possible null dereference, but 90% of it isn’t real. Devs end up ignoring the reports, which defeats the point. Is there any modern tool that actually prioritizes meaningful issues?
https://redd.it/1opdlhz
@r_devops
We’ve been using multiple linters and static tools for years. They find everything from unused imports to possible null dereference, but 90% of it isn’t real. Devs end up ignoring the reports, which defeats the point. Is there any modern tool that actually prioritizes meaningful issues?
https://redd.it/1opdlhz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
SCM to Devops?
Hello, I (24m) graduated last december from university with a supply chain and logistics management degree. I took a job with Enterprise as a Management Trainee purely for the money. I am looking to learn a new hard skill, that I could pursue to specialize myself in someway. Graduated with the intention of getting into procurement, but a month after I graduated I took the AWS cloud practitioner essentials and cloud interested me. I read online how Dev-ops is like digital supply chain, given how the main objective is making sure things run smoothly and efficiently.
Planned on taking the essentials again to then take the practitioner exam. Thoughts?
https://redd.it/1opkg3y
@r_devops
Hello, I (24m) graduated last december from university with a supply chain and logistics management degree. I took a job with Enterprise as a Management Trainee purely for the money. I am looking to learn a new hard skill, that I could pursue to specialize myself in someway. Graduated with the intention of getting into procurement, but a month after I graduated I took the AWS cloud practitioner essentials and cloud interested me. I read online how Dev-ops is like digital supply chain, given how the main objective is making sure things run smoothly and efficiently.
Planned on taking the essentials again to then take the practitioner exam. Thoughts?
https://redd.it/1opkg3y
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking for DevOps/SRE/Platform Engineer opportunities since past 3 months
Im a DevOps / Sre Engg looking for a switch in organisation since past 3 months and there has been hardly any calls (2-3 calls at max) and these calls also get turned away after hearing about my 90 days NP or 2 interviews which I cleared were offering only a mere 30% hike which I think I way below par for my current CTC. also I have seen the requirements have got very specific with tools even though you explain them some other tool does the same thing,
Also what should be the avg CTC for DevOps, SRE, Platform roles for 6 YOE???
My experience and expertise include - AWS Cloud, Jenkins, GitHub actions, Ansible, Python, bash, Monitoring and dashboard with Cloudwatch (self study of Prometheus+Grafana), K8 (ECS, EKS) experience is limited to 10-12 months
I would be happy to share my resume anonymously for some reviews.
Are there no jobs in the market or am I following a wrong path? Need suggestions/guidance.
https://redd.it/1opmflj
@r_devops
Im a DevOps / Sre Engg looking for a switch in organisation since past 3 months and there has been hardly any calls (2-3 calls at max) and these calls also get turned away after hearing about my 90 days NP or 2 interviews which I cleared were offering only a mere 30% hike which I think I way below par for my current CTC. also I have seen the requirements have got very specific with tools even though you explain them some other tool does the same thing,
Also what should be the avg CTC for DevOps, SRE, Platform roles for 6 YOE???
My experience and expertise include - AWS Cloud, Jenkins, GitHub actions, Ansible, Python, bash, Monitoring and dashboard with Cloudwatch (self study of Prometheus+Grafana), K8 (ECS, EKS) experience is limited to 10-12 months
I would be happy to share my resume anonymously for some reviews.
Are there no jobs in the market or am I following a wrong path? Need suggestions/guidance.
https://redd.it/1opmflj
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Datadog question - split Jenkins job name on "/"?
I'm using the Jenkins plugin to feed jenkins job data into datadog. When I pull up a Jenkins log entry, there are attributes associated with it, one being jenkins.job_name. However, I want to split this into folder and job as most of our Jenkins jobs are foo/baz and bar/baz.
It seems to me this should be a custom processor under the Jenkins pipeline configuration. But I've tried getting it to work with a Grok processor as well as a Category processor and I'm out of ideas. Anyone know how best to do this? Thank you!
PS: I plan to use this to build a status dashboard grouping by job type (in this example, baz).
https://redd.it/1opo6q2
@r_devops
I'm using the Jenkins plugin to feed jenkins job data into datadog. When I pull up a Jenkins log entry, there are attributes associated with it, one being jenkins.job_name. However, I want to split this into folder and job as most of our Jenkins jobs are foo/baz and bar/baz.
It seems to me this should be a custom processor under the Jenkins pipeline configuration. But I've tried getting it to work with a Grok processor as well as a Category processor and I'm out of ideas. Anyone know how best to do this? Thank you!
PS: I plan to use this to build a status dashboard grouping by job type (in this example, baz).
https://redd.it/1opo6q2
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Where did RabbitMQ send our data?
Need some help from the community... We simply did a systemctl stop and start on our rabbitmq servers one at a time. After it came back up we lost nearly 200k messages from some but not all queues. All queues are set to persistent. Any clue what may have happened to the messages and where we can look to recover them?
We have tried all of your common stuff, reboots, service restarts, tons of spelunking through logs/data files... The servers are up and running and processing fine, just missing a ton of data. Thanks so much for any help!
https://redd.it/1opmx3y
@r_devops
Need some help from the community... We simply did a systemctl stop and start on our rabbitmq servers one at a time. After it came back up we lost nearly 200k messages from some but not all queues. All queues are set to persistent. Any clue what may have happened to the messages and where we can look to recover them?
We have tried all of your common stuff, reboots, service restarts, tons of spelunking through logs/data files... The servers are up and running and processing fine, just missing a ton of data. Thanks so much for any help!
https://redd.it/1opmx3y
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Blind XXE: Exfiltrating Data When You Can't See the Response 👁️
https://instatunnel.my/blog/blind-xxe-exfiltrating-data-when-you-cant-see-the-response
https://redd.it/1opoocl
@r_devops
https://instatunnel.my/blog/blind-xxe-exfiltrating-data-when-you-cant-see-the-response
https://redd.it/1opoocl
@r_devops
InstaTunnel
Blind XXE: Exfiltrating Data Out-of-Band in 2025
Learn how blind XXE (OOB XXE) exfiltrates files and internal data when responses aren't returned. Detect, exploit responsibly, and defend with modern prevention
GitLab: Wait for other pipelines to finish?
Hi,
just got asked whether it is possible for a pipeline to wait for another pipeline to finish? The idea is that there are several repositories (3 in that case) with pipelines that somewhat interfer during a step (deploy to a server). The person would like the pipeline to know whether a certain other pipeline is running.
Is this possible in GitLab?
We would still like to have concurrent runners - so using a tag and just have one runner for this tag, is not the ideal option.
https://redd.it/1opsmbq
@r_devops
Hi,
just got asked whether it is possible for a pipeline to wait for another pipeline to finish? The idea is that there are several repositories (3 in that case) with pipelines that somewhat interfer during a step (deploy to a server). The person would like the pipeline to know whether a certain other pipeline is running.
Is this possible in GitLab?
We would still like to have concurrent runners - so using a tag and just have one runner for this tag, is not the ideal option.
https://redd.it/1opsmbq
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What do you look for in node metrics?
Hey folks
I’m currently working on a little hobby project to get to know logging and observability - something us developers tend to ignore a lot.
When you’re looking at node/server metrics, what do you find most useful/required when it comes to your dashboards showing node health, resource utilisation etc?
I’m in the process of configuring my Prometheus stack and I don’t want to be bombarding myself with extra data I don’t need/isn’t really useful in the real world.
Thanks!
https://redd.it/1opv8og
@r_devops
Hey folks
I’m currently working on a little hobby project to get to know logging and observability - something us developers tend to ignore a lot.
When you’re looking at node/server metrics, what do you find most useful/required when it comes to your dashboards showing node health, resource utilisation etc?
I’m in the process of configuring my Prometheus stack and I don’t want to be bombarding myself with extra data I don’t need/isn’t really useful in the real world.
Thanks!
https://redd.it/1opv8og
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What are the projects i could build to show you that you can trust me as your junior cloud engineer in you company?
I am a WordPress developer transitioning to devops or cloud engineering. I am in route to get AWS solutions architect certification currently reviewing using udemy Stephane Maarek course. I built a serverless portfolio website in Amazon with the help of AI. I changed my laptop OS to ubuntu to get use of linux commands. I am experimenting in pulling different projects from github and test it in docker. So this trying to be familiar with terms, tools, and anything that can submerged my head in the field. I am maybe looking for a path of thinga to do and show to my employeer soon that would come from who is already there in the industry.
https://redd.it/1opw5p4
@r_devops
I am a WordPress developer transitioning to devops or cloud engineering. I am in route to get AWS solutions architect certification currently reviewing using udemy Stephane Maarek course. I built a serverless portfolio website in Amazon with the help of AI. I changed my laptop OS to ubuntu to get use of linux commands. I am experimenting in pulling different projects from github and test it in docker. So this trying to be familiar with terms, tools, and anything that can submerged my head in the field. I am maybe looking for a path of thinga to do and show to my employeer soon that would come from who is already there in the industry.
https://redd.it/1opw5p4
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Self-Hosting a Production Mobile Server: A Guide on How to Not Melt Your Phone
I don't know about everyone else, but I didn't want to pay for a server, and didn't want to host one on my computer. I have a flagship phone; an S25+ with Snapdragon 8 and 12 GB RAM. It's ridiculous. I wanted to run intense computational coding on my phone, and didn't have a solution to keep my phone from overheating. So. I built one. This is non-rooted using sys-reads and Termux (found on Google Play) and Termux API (found on F-Droid), so you can keep your warranty.
What my project does: Monitors core temperatures using sys reads and Termux API. It models thermal activity using Newton's Law of Cooling to predict thermal events before they happen and prevent Samsung's aggressive performance throttling at 42° C.
Target audience: Developers who want to run an intensive server on an S25+ without rooting or melting their phone.
Comparison: I haven't seen other predictive thermal modeling used on a phone before. The hardware is concrete and physics can be very good at modeling phone behavior in relation to workload patterns. Samsung itself uses a reactive and throttling system rather than predicting thermal events. Heat is continuous and temperature isn't an isolated event.
I didn't want to pay for a server, and I was also interested in the idea of mobile computing. As my workload increased, I noticed my phone would have temperature problems and performance would degrade quickly. I studied physics and realized that the cores in my phone and the hardware components were perfect candidates for modeling with physics. By using a "thermal bank" where you know how much heat is going to be generated by various workloads through machine learning, you can predict thermal events before they happen and defer operations so that the 42° C thermal throttle limit is never reached. At this limit, Samsung aggressively throttles performance by about 50%, which can cause performance problems, which can generate more heat, and the spiral can get out of hand quickly.
The hardware properties of modern mobile devices are perfect for modeling with physics. Here is what I have found.
Total predictions: 2142 Duration: 60 minutes MAE: 1.51°C RMSE: 2.70°C Bias: -0.95°C Within ±1°C: 58.2% Within ±2°C: 75.6%
Per-zone MAE: BATTERY : 0.27°C (357 predictions) CHASSIS : 2.92°C (357 predictions) CPU_BIG : 1.60°C (357 predictions) CPU_LITTLE : 2.50°C (357 predictions) GPU : 0.96°C (357 predictions) MODEM : 0.80°C (357 predictions)
0.27°C on the hardware that matters, 30 seconds in advance.
On S25+, throttling decisions are made almost entirely based on battery status.
Predictive Modeling > Reactive Throttling.
By using Newton's Law of Cooling in combination with measured estimates based on hardware constraints and adaptive damping for your specific device, you can predict thermal events before they happen and defer inexpensive operations, pause expensive operations, and emergency shutdown operations in danger territory. This prevents us from ever reaching the 42°C throttle limit. At this limit, Samsung aggressively throttles performance by about 50%, which can cause performance problems, which can generate more heat, and the spiral can get out of hand quickly.
Mathematical Model
Core equation (Newton's law of cooling):
T(t) = T_amb + (T₀ - T_amb)·exp(-t/τ) + (P·R)·(1 - exp(-t/τ))
Where:
τ = thermal time constant (zone-specific)
R = thermal resistance (°C/W)
P = power dissipation (W)
T_amb = ambient temperature
Per-zone constants (measured from S25+ hardware):
Battery: τ=540s, C=45 J/K (massive thermal mass)
CPU cores: τ=6-9s, C=0.025-0.05 J/K (fast response)
GPU/Modem: τ=9s, C=0.02-0.035 J/K
Prediction horizon: 30s at 10s sampling intervals
Adaptive damping: Prediction error feedback loop
damping = f(bias, confidence, sample_count)
T_predicted_adjusted = T_predicted - damping·ΔT
Maintains per-zone error history with confidence weighting. Damping strength scales inversely with thermal time constant (battery gets minimal damping due to high predictability, CPU
I don't know about everyone else, but I didn't want to pay for a server, and didn't want to host one on my computer. I have a flagship phone; an S25+ with Snapdragon 8 and 12 GB RAM. It's ridiculous. I wanted to run intense computational coding on my phone, and didn't have a solution to keep my phone from overheating. So. I built one. This is non-rooted using sys-reads and Termux (found on Google Play) and Termux API (found on F-Droid), so you can keep your warranty.
What my project does: Monitors core temperatures using sys reads and Termux API. It models thermal activity using Newton's Law of Cooling to predict thermal events before they happen and prevent Samsung's aggressive performance throttling at 42° C.
Target audience: Developers who want to run an intensive server on an S25+ without rooting or melting their phone.
Comparison: I haven't seen other predictive thermal modeling used on a phone before. The hardware is concrete and physics can be very good at modeling phone behavior in relation to workload patterns. Samsung itself uses a reactive and throttling system rather than predicting thermal events. Heat is continuous and temperature isn't an isolated event.
I didn't want to pay for a server, and I was also interested in the idea of mobile computing. As my workload increased, I noticed my phone would have temperature problems and performance would degrade quickly. I studied physics and realized that the cores in my phone and the hardware components were perfect candidates for modeling with physics. By using a "thermal bank" where you know how much heat is going to be generated by various workloads through machine learning, you can predict thermal events before they happen and defer operations so that the 42° C thermal throttle limit is never reached. At this limit, Samsung aggressively throttles performance by about 50%, which can cause performance problems, which can generate more heat, and the spiral can get out of hand quickly.
The hardware properties of modern mobile devices are perfect for modeling with physics. Here is what I have found.
Total predictions: 2142 Duration: 60 minutes MAE: 1.51°C RMSE: 2.70°C Bias: -0.95°C Within ±1°C: 58.2% Within ±2°C: 75.6%
Per-zone MAE: BATTERY : 0.27°C (357 predictions) CHASSIS : 2.92°C (357 predictions) CPU_BIG : 1.60°C (357 predictions) CPU_LITTLE : 2.50°C (357 predictions) GPU : 0.96°C (357 predictions) MODEM : 0.80°C (357 predictions)
0.27°C on the hardware that matters, 30 seconds in advance.
On S25+, throttling decisions are made almost entirely based on battery status.
Predictive Modeling > Reactive Throttling.
By using Newton's Law of Cooling in combination with measured estimates based on hardware constraints and adaptive damping for your specific device, you can predict thermal events before they happen and defer inexpensive operations, pause expensive operations, and emergency shutdown operations in danger territory. This prevents us from ever reaching the 42°C throttle limit. At this limit, Samsung aggressively throttles performance by about 50%, which can cause performance problems, which can generate more heat, and the spiral can get out of hand quickly.
Mathematical Model
Core equation (Newton's law of cooling):
T(t) = T_amb + (T₀ - T_amb)·exp(-t/τ) + (P·R)·(1 - exp(-t/τ))
Where:
τ = thermal time constant (zone-specific)
R = thermal resistance (°C/W)
P = power dissipation (W)
T_amb = ambient temperature
Per-zone constants (measured from S25+ hardware):
Battery: τ=540s, C=45 J/K (massive thermal mass)
CPU cores: τ=6-9s, C=0.025-0.05 J/K (fast response)
GPU/Modem: τ=9s, C=0.02-0.035 J/K
Prediction horizon: 30s at 10s sampling intervals
Adaptive damping: Prediction error feedback loop
damping = f(bias, confidence, sample_count)
T_predicted_adjusted = T_predicted - damping·ΔT
Maintains per-zone error history with confidence weighting. Damping strength scales inversely with thermal time constant (battery gets minimal damping due to high predictability, CPU
gets aggressive damping).
Result: 0.27°C MAE on battery.
My solution is simple: never reach 42° C.
https://github.com/DaSettingsPNGN/S25_THERMAL-
Please take a look and give me feedback.
Thank you!
https://redd.it/1opvxum
@r_devops
Result: 0.27°C MAE on battery.
My solution is simple: never reach 42° C.
https://github.com/DaSettingsPNGN/S25_THERMAL-
Please take a look and give me feedback.
Thank you!
https://redd.it/1opvxum
@r_devops
GitHub
GitHub - DaSettingsPNGN/S25_THERMAL-
Contribute to DaSettingsPNGN/S25_THERMAL- development by creating an account on GitHub.
Cost optimization teams, is that a thing?
Hi
I have for the last year been heavily focused on. Cost reduction for our vloud infrastructure (and sometimes non cloud services). Although it isn't the most exciting thing in the world to be the person that goes around trying to save money, it is needed.
In general engineering is unaware/uninterested on how much the resources they consume cost. So in order to control the waste this tends to be something done by a random person in the team when red lights start flashing in a short term tactical manner.
I am wondering if there are teams that specialize in this cost optimization work for technology infrastructure. Is this a thing? Is management willing to invest money to be able to cut percentage points from their infrastructure bill?
I feel this is a need because the skills for someone to be able to do this work sit between an accountant, procurement and engineering. It seems someone hard to get.
https://redd.it/1opvgsz
@r_devops
Hi
I have for the last year been heavily focused on. Cost reduction for our vloud infrastructure (and sometimes non cloud services). Although it isn't the most exciting thing in the world to be the person that goes around trying to save money, it is needed.
In general engineering is unaware/uninterested on how much the resources they consume cost. So in order to control the waste this tends to be something done by a random person in the team when red lights start flashing in a short term tactical manner.
I am wondering if there are teams that specialize in this cost optimization work for technology infrastructure. Is this a thing? Is management willing to invest money to be able to cut percentage points from their infrastructure bill?
I feel this is a need because the skills for someone to be able to do this work sit between an accountant, procurement and engineering. It seems someone hard to get.
https://redd.it/1opvgsz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community