From Today: Microsoft 365 Admin Center Demands MFA
Starting today, access to the Microsoft 365 admin center will be blocked for any account that does not have Multi-factor Authentication enabled.
Stay ahead: If you haven’t enabled MFA yet, set it up right away to avoid any sign-in issues once mandatory MFA enforcement is rolled out in your organization.
https://redd.it/1r016ba
@r_systemadmin
Starting today, access to the Microsoft 365 admin center will be blocked for any account that does not have Multi-factor Authentication enabled.
Stay ahead: If you haven’t enabled MFA yet, set it up right away to avoid any sign-in issues once mandatory MFA enforcement is rolled out in your organization.
https://redd.it/1r016ba
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
Datacenter costs through the roof
Hi all,
We, a Belgian based company uses the data centre of one of the biggest ISP's in Belgium.
We recently were pressured in to change our model from reserved to pay as you use.
We were in the reserved model with 30 VM's, when checking the pay as you use model and seeing that we were going to eliminate half of our VM's this looked like a no brainer as the ISP¨stated that costs would be reduced by almost half.
Half a year later our bill is exactly as much but with half the resources.
Is this also fallout from the broadcom acquisition or have we been bamboozled?
(if this is in violation with guidelines, please tell me, as this keeps getting removed without a reason)
https://redd.it/1r06bx9
@r_systemadmin
Hi all,
We, a Belgian based company uses the data centre of one of the biggest ISP's in Belgium.
We recently were pressured in to change our model from reserved to pay as you use.
We were in the reserved model with 30 VM's, when checking the pay as you use model and seeing that we were going to eliminate half of our VM's this looked like a no brainer as the ISP¨stated that costs would be reduced by almost half.
Half a year later our bill is exactly as much but with half the resources.
Is this also fallout from the broadcom acquisition or have we been bamboozled?
(if this is in violation with guidelines, please tell me, as this keeps getting removed without a reason)
https://redd.it/1r06bx9
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
Exchange Online has broken almost every single month
One of those things that keeps surprising me is the general impression moving email to Microsoft's cloud isn't a massive business risk. I hear all the time that people have "never experienced an outage".
If you look at Bleeping Computer's posts tagged with Exchange Online, it's pretty much monthly that Microsoft fails to correctly let people send blurbs of text to other people across the Internet: https://www.bleepingcomputer.com/tag/exchange-online/
https://redd.it/1r0cinv
@r_systemadmin
One of those things that keeps surprising me is the general impression moving email to Microsoft's cloud isn't a massive business risk. I hear all the time that people have "never experienced an outage".
If you look at Bleeping Computer's posts tagged with Exchange Online, it's pretty much monthly that Microsoft fails to correctly let people send blurbs of text to other people across the Internet: https://www.bleepingcomputer.com/tag/exchange-online/
https://redd.it/1r0cinv
@r_systemadmin
BleepingComputer
Latest Exchange Online news
The latest news about Exchange Online
Our dev team is the weak point in our cyber security and they don't want to change
Tl;dr: dev team is pushing back hard to give up their privileges, which create a weak spot in our cyber security. Wonder how others handle this.
Our company does both manufacturing and software. About 150 desks of which 45 developers. We grew very quickly in the past few years, roughly 10x in size. This meant IT only became a thing when the dev team already got their own Linux devices with superuser, single shared password for the file shares, etc.
Last year I got the responsibility to streamline IT. I don't have a degree in it but just became the 'sysadmin' because I was the only one taking on responsibility and answering questions about IT.
I worked diligently with an MSP to get everything in order from backups, redundancy, password policy, password manager, asset management, Intune, CA, standardizing on- and off boarding etc.
This year we came to the point we wanted a clear view on the road ahead so I made a Cyber Roadmap. We identified one major cyber security risk, and that was that our Linux endpoints are (basically) unmanaged. No endpoint protection, no encryption, full permissions, shared passwords, no patches or updates. And almost no options for managing it, except maybe when using 5+ tools.
Looking at alternatives, a Unix OS seem to be a must for some AI/ML tools. And we have on prem software that only runs on Windows, which some of the developers need in their workflow. So that left me with:
\- Mac + Azure Virtual Desktop
\- Windows + WSL
I've been leaving hints about the change that needs to happen and that seemed to have rubbed the wrong way. Some of the team members appear to have exagerrated this, claiming we want to force them on Windows only.
I got approval for a one desk pilot, but even setting that up got me some snarky comments. I feel like i'm walking on a thin line. Management understands the need for security but also don't want to scare away our valuable dev team (and me neither). I still have the green light but feel like it's turning to orange.
What would you guys do?
https://redd.it/1r0eadi
@r_systemadmin
Tl;dr: dev team is pushing back hard to give up their privileges, which create a weak spot in our cyber security. Wonder how others handle this.
Our company does both manufacturing and software. About 150 desks of which 45 developers. We grew very quickly in the past few years, roughly 10x in size. This meant IT only became a thing when the dev team already got their own Linux devices with superuser, single shared password for the file shares, etc.
Last year I got the responsibility to streamline IT. I don't have a degree in it but just became the 'sysadmin' because I was the only one taking on responsibility and answering questions about IT.
I worked diligently with an MSP to get everything in order from backups, redundancy, password policy, password manager, asset management, Intune, CA, standardizing on- and off boarding etc.
This year we came to the point we wanted a clear view on the road ahead so I made a Cyber Roadmap. We identified one major cyber security risk, and that was that our Linux endpoints are (basically) unmanaged. No endpoint protection, no encryption, full permissions, shared passwords, no patches or updates. And almost no options for managing it, except maybe when using 5+ tools.
Looking at alternatives, a Unix OS seem to be a must for some AI/ML tools. And we have on prem software that only runs on Windows, which some of the developers need in their workflow. So that left me with:
\- Mac + Azure Virtual Desktop
\- Windows + WSL
I've been leaving hints about the change that needs to happen and that seemed to have rubbed the wrong way. Some of the team members appear to have exagerrated this, claiming we want to force them on Windows only.
I got approval for a one desk pilot, but even setting that up got me some snarky comments. I feel like i'm walking on a thin line. Management understands the need for security but also don't want to scare away our valuable dev team (and me neither). I still have the green light but feel like it's turning to orange.
What would you guys do?
https://redd.it/1r0eadi
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
Weekly 'I made a useful thing' Thread - February 13, 2026
There is a great deal of user-generated content out there, from noscripts and software to tutorials and videos, but we've generally tried to keep that off of the front page due to the volume and as a result of community feedback. There's also a great deal of content out there that violates our advertising/promotion rule, from noscripts and software to tutorials and videos.
We have received a number of requests for exemptions to the rule, and rather than allowing the front page to get consumed, we thought we'd try a weekly thread that allows for that kind of content. We don't have a catchy name for it yet, so please let us know if you have any ideas!
In this thread, feel free to show us your pet project, YouTube videos, blog posts, or whatever else you may have and share it with the community. Commercial advertisements, affiliate links, or links that appear to be monetization-grabs will still be removed.
https://redd.it/1r3laua
@r_systemadmin
There is a great deal of user-generated content out there, from noscripts and software to tutorials and videos, but we've generally tried to keep that off of the front page due to the volume and as a result of community feedback. There's also a great deal of content out there that violates our advertising/promotion rule, from noscripts and software to tutorials and videos.
We have received a number of requests for exemptions to the rule, and rather than allowing the front page to get consumed, we thought we'd try a weekly thread that allows for that kind of content. We don't have a catchy name for it yet, so please let us know if you have any ideas!
In this thread, feel free to show us your pet project, YouTube videos, blog posts, or whatever else you may have and share it with the community. Commercial advertisements, affiliate links, or links that appear to be monetization-grabs will still be removed.
https://redd.it/1r3laua
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
Patch Tuesday Megathread (2026-02-10)
Apologies, y'all - We didn't get the 2026 Patch Tuesday threads scheduled. Here's this month's thread temporarily while we get squared away for the year.
Hello r/sysadmin, I'm ~~u/automoderator~~ err. u/kumorigoe , and welcome to this month's Patch Megathread!
This is the (mostly) safe location to talk about the latest patches, updates, and releases. We put this thread into place to help gather all the information about this month's updates: What is fixed, what broke, what got released and should have been caught in QA, etc. We do this both to keep clutter out of the subreddit, and provide you, the dear reader, a singular resource to read.
For those of you who wish to review prior Megathreads, you can do so here.
While this thread is timed to coincide with Microsoft's Patch Tuesday, feel free to discuss any patches, updates, and releases, regardless of the company or product. NOTE: This thread is usually posted before the release of Microsoft's updates, which are scheduled to come out at 5:00PM UTC. Except today, because... 2026.
Remember the rules of safe patching:
Deploy to a test/dev environment before prod.
Deploy to a pilot/test group before the whole org.
Have a plan to roll back if something doesn't work.
Test, test, and test!
https://redd.it/1r1hz0s
@r_systemadmin
Apologies, y'all - We didn't get the 2026 Patch Tuesday threads scheduled. Here's this month's thread temporarily while we get squared away for the year.
Hello r/sysadmin, I'm ~~u/automoderator~~ err. u/kumorigoe , and welcome to this month's Patch Megathread!
This is the (mostly) safe location to talk about the latest patches, updates, and releases. We put this thread into place to help gather all the information about this month's updates: What is fixed, what broke, what got released and should have been caught in QA, etc. We do this both to keep clutter out of the subreddit, and provide you, the dear reader, a singular resource to read.
For those of you who wish to review prior Megathreads, you can do so here.
While this thread is timed to coincide with Microsoft's Patch Tuesday, feel free to discuss any patches, updates, and releases, regardless of the company or product. NOTE: This thread is usually posted before the release of Microsoft's updates, which are scheduled to come out at 5:00PM UTC. Except today, because... 2026.
Remember the rules of safe patching:
Deploy to a test/dev environment before prod.
Deploy to a pilot/test group before the whole org.
Have a plan to roll back if something doesn't work.
Test, test, and test!
https://redd.it/1r1hz0s
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
PSA: visual studio (msdn) subnoscriptions doesn’t get license keys or azure credits anymore
Microsoft has quietly changed their benefits.
No more ISO and license keys for windows server, client, office or all their other on premise products.
Download ISO’s and keys while you can.
And azure credits? Will still be there - kinda. Now pooled centrally. Not sure yet how they are awarded.
Are you rocking a homelab? Did you want to test some configuration manager (SCCM) edge cases? Do you have a Entra and intune tenant with the m365 licenses? Did you want to show case some awesome solution you created?
Well Microsoft says fuck you, pay us more licenses.
\> Azure credits are now delivered through the partner program benefit packages at the organization level, rather than being bundled with individual IDE licenses. This pooled model enables partners to plan, share, and apply Azure credits across teams and projects more effectively, reducing unused credits and improving overall utilization.
\> Legacy on-premises software downloads and transferable product keys (such as Windows, Office, and server products) are no longer included with Partner Program developer benefits. These products remain available through appropriate Microsoft licensing channels.
\> Legacy developer tools that are no longer aligned with modern, cloud-first development workflows have been retired in favor of current tools, services, and learning resources.
https://learn.microsoft.com/en-us/partner-center/benefits/mpn-benefits-visual-studio#whats-changed
https://redd.it/1r4t9fu
@r_systemadmin
Microsoft has quietly changed their benefits.
No more ISO and license keys for windows server, client, office or all their other on premise products.
Download ISO’s and keys while you can.
And azure credits? Will still be there - kinda. Now pooled centrally. Not sure yet how they are awarded.
Are you rocking a homelab? Did you want to test some configuration manager (SCCM) edge cases? Do you have a Entra and intune tenant with the m365 licenses? Did you want to show case some awesome solution you created?
Well Microsoft says fuck you, pay us more licenses.
\> Azure credits are now delivered through the partner program benefit packages at the organization level, rather than being bundled with individual IDE licenses. This pooled model enables partners to plan, share, and apply Azure credits across teams and projects more effectively, reducing unused credits and improving overall utilization.
\> Legacy on-premises software downloads and transferable product keys (such as Windows, Office, and server products) are no longer included with Partner Program developer benefits. These products remain available through appropriate Microsoft licensing channels.
\> Legacy developer tools that are no longer aligned with modern, cloud-first development workflows have been retired in favor of current tools, services, and learning resources.
https://learn.microsoft.com/en-us/partner-center/benefits/mpn-benefits-visual-studio#whats-changed
https://redd.it/1r4t9fu
@r_systemadmin
Docs
Explore your Visual Studio and GitHub benefits - Partner Center
Learn about Microsoft AI Cloud Partner Program benefits for Visual Studio Subnoscriptions in Partner Center
Does the Highest Ranking IT Person in Your Company Report to the CEO?
Do you think this matters in how IT is viewed and treated at your company?
https://redd.it/1r4jn1s
@r_systemadmin
Do you think this matters in how IT is viewed and treated at your company?
https://redd.it/1r4jn1s
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
How to approach SSL certificate automation in this environment?
We've been tasked with figuring out a way to automate our SSL certificate handling. Yes, I know we're at least 10 years late. However due to reasons I'll detail below, I don't believe any sane solution really exists which fits our requirements.
Our environment
- ~700 servers, ~50/50 mix of Windows / Linux
- A number of different appliances (firewalls, load balancers etc)
- ~150 different domains
- Servers don't have outbound internet connectivity
- nginx, apache, IIS, docker containers, custom in-house software, 3rd party software
- We also use Azure and GCP and have certificates in different managed services there
- We require Extended Validation due to some customer agreements, meaning Let's encrypt is out of the question and we need to turn to commercial service providers with ACME support
So far we have managed certificate renewals manually. Yes, it's dumb and takes time. Given the tightening certificate validity times we're now looking to switch to ACME based automation. I've been driving myself insane thinking about this for the last few weeks.
The main issue we face is that we can't just setup certbot / any other ACME client on the servers using the certificates themselves, for multiple reasons:
- A large amount of our services run behind load balancers and the load balancers perform HTTP -> HTTPS redirects with no way to configure exceptions. This means our servers can't utilize HTTP-01 ACME challenge.
- Our servers have no outbound internet access, meaning we can't access our DNS provider's API for DNS-01 challenge for example.
- Even if we could, we have ~150 domains and our DNS provider doesn't provide per-zone permission management. Meaning all of our servers would have DNS edit access to all of our domains, which is a recipe for disaster in case any of them get breached. So client ACME + DNS-01 is out of the question as well.
Given that our servers can't utilize HTTP-01 or DNS-01 ACME challenges, the only viable option seems to be to set up a centralized certificate management server which loops through all of our certificates and re-enrolls them with ACME + DNS-01 challenge. This way we can solve certificate acquisition.
If we go the route of a centralized certificate management server we then need to figure out a way to distribute the certificates to the clients. One possibility would be to use a push-based approach with ansible for example. However we don't really have infrastructure for that. All of our servers don't have centralized user management in place and creating local users for SSH / WinRM connections is quite the task, given the user accounts permissions would have to be tightened. We also run into the issue that especially on Linux we use such different distributions from different times that there isn't a single ansible release which would work with the different python versions across our server fleet. Plus having a push-based approach would make the certificate management server a very critical piece of infrastructure, if an attacker got hold of it they could get local access to all of our servers easily via it. So a push-based approach isn't preferable.
If we look at pull-based distribution mechanisms then we require server-specific authentication, since we want to limit the scope of a possible breach to as few certificates as possible. So every server should only have access to the certificates they really need. For this permission model probably the best suited choice would be to use SFTP. It's supported natively by both Linux and Windows and allows keypair authentication. This creates some annoying workflows of "create a user-account per client server on the certificate management server with accompanying chroot jail + permission shenanigans" but that's doable with Ansible for example. In this case I imagine we'd symlink the necessary certificate files to the chrooted server-specific SFTP directories and clients would poll the certificate management server for new certificates via cron jobs /
We've been tasked with figuring out a way to automate our SSL certificate handling. Yes, I know we're at least 10 years late. However due to reasons I'll detail below, I don't believe any sane solution really exists which fits our requirements.
Our environment
- ~700 servers, ~50/50 mix of Windows / Linux
- A number of different appliances (firewalls, load balancers etc)
- ~150 different domains
- Servers don't have outbound internet connectivity
- nginx, apache, IIS, docker containers, custom in-house software, 3rd party software
- We also use Azure and GCP and have certificates in different managed services there
- We require Extended Validation due to some customer agreements, meaning Let's encrypt is out of the question and we need to turn to commercial service providers with ACME support
So far we have managed certificate renewals manually. Yes, it's dumb and takes time. Given the tightening certificate validity times we're now looking to switch to ACME based automation. I've been driving myself insane thinking about this for the last few weeks.
The main issue we face is that we can't just setup certbot / any other ACME client on the servers using the certificates themselves, for multiple reasons:
- A large amount of our services run behind load balancers and the load balancers perform HTTP -> HTTPS redirects with no way to configure exceptions. This means our servers can't utilize HTTP-01 ACME challenge.
- Our servers have no outbound internet access, meaning we can't access our DNS provider's API for DNS-01 challenge for example.
- Even if we could, we have ~150 domains and our DNS provider doesn't provide per-zone permission management. Meaning all of our servers would have DNS edit access to all of our domains, which is a recipe for disaster in case any of them get breached. So client ACME + DNS-01 is out of the question as well.
Given that our servers can't utilize HTTP-01 or DNS-01 ACME challenges, the only viable option seems to be to set up a centralized certificate management server which loops through all of our certificates and re-enrolls them with ACME + DNS-01 challenge. This way we can solve certificate acquisition.
If we go the route of a centralized certificate management server we then need to figure out a way to distribute the certificates to the clients. One possibility would be to use a push-based approach with ansible for example. However we don't really have infrastructure for that. All of our servers don't have centralized user management in place and creating local users for SSH / WinRM connections is quite the task, given the user accounts permissions would have to be tightened. We also run into the issue that especially on Linux we use such different distributions from different times that there isn't a single ansible release which would work with the different python versions across our server fleet. Plus having a push-based approach would make the certificate management server a very critical piece of infrastructure, if an attacker got hold of it they could get local access to all of our servers easily via it. So a push-based approach isn't preferable.
If we look at pull-based distribution mechanisms then we require server-specific authentication, since we want to limit the scope of a possible breach to as few certificates as possible. So every server should only have access to the certificates they really need. For this permission model probably the best suited choice would be to use SFTP. It's supported natively by both Linux and Windows and allows keypair authentication. This creates some annoying workflows of "create a user-account per client server on the certificate management server with accompanying chroot jail + permission shenanigans" but that's doable with Ansible for example. In this case I imagine we'd symlink the necessary certificate files to the chrooted server-specific SFTP directories and clients would poll the certificate management server for new certificates via cron jobs /
scheduled tasks. Ok, this seems doable albeit annoying.
Then we come to handling the client side automation. Ok, let's imagine we have the cronjobs / scheduled tasks polling for new certificates from the certificate management server. We'd also need accompanying noscripts for handling service restarts for the services utilizing these noscripts. Maybe the poller noscript should invoke the service restart noscripts when it detects that a new version of any of the certificate files is present on the cert mgmt server and downloads them.
Then we come to the issue that some servers may have multiple certificates and/or multiple services utilizing these certificates. One approach would be to have a configuration file with a mapping table for "certificate x is used by services y and z, certificates n and m are used by service i etc". However that sounds awful, maintaining such mapping tables does not spark joy. The alternative way of handling this would be to just say "fuck it, when ANY certificate has changed, just run ALL of the service reload noscripts". That way we would not need any cert -> service mapping tables but it'd in some cases lead to unnecessary service downtime for some specific services where reloading them causes application downtime. Maybe that's an acceptable outcome, not sure yet.
But the biggest problem I see with this approach is actually managing the client-side automation noscripts. As described earlier, we can't really rely on Ansible to deploy these noscripts to target hosts due to python version mismatches across our fleet. But I'd still want some sort of a centralized way to deploy new versions of the client noscripts across our fleet, since it's not particularly unimaginable that some edge cases will pop up every now and then requiring us to deploy new version of some IIS reload noscript for example across our fleet. It'd also be nice to have a single source of truth telling us where exactly have different service reload noscripts been deployed to (just relying on documentation for this will result in bad times).
So to combat that problem... More SFTP polling? This is where this whole thing starts to feel way too hacky. The best answer to that problem that I've come up with is to also host the client-side noscripts on the certificate server and deploy them to client via the same symlink + client-side poller noscript setup. Thus we can see on the certificate server what servers use what service reload noscripts and updating them en masse is easy. But this also feels like something we really should not do..
Initially I thought we should just save the certificates to a predefined location like /etc/cert-deploy/ and configure all services to read their certificates from there, rather than deploying the services to custom locations on all servers. However I now realize that brings permission / ownership problems. How does the poller noscript know to which user the certificates should be chowned to? It doesn't. So either we'd require local "ssl-access" groups to which we'd attempt to add all sorts of generic www-data, apache, nginx etc accounts and chgrp the cert files to that group, or the service reload noscripts should re-copy the certs to another location and chown them with user account that they know the certs will be used by. Or another mapping table for the poller noscript. Yay, more brittle complexity regardless of choice.
At this point if we go with an approach such as this one, I'd also want to have some observability into the whole thing. Some nice UI showing when have the clients last polled their certificates. "Oh, this server hasn't polled their certificates for 10 days, what's up with that?" etc. Parsing that information from sftp logs and displaying on some web server is of course doable but once again one starts to ask themselves "are we out of our minds?".
I even went as far as I started drafting up a Python webserver which would replace the whole sftp-based approach. Instead clients would send requests to the application, providing a unique per-client authentication token which must match the client token
Then we come to handling the client side automation. Ok, let's imagine we have the cronjobs / scheduled tasks polling for new certificates from the certificate management server. We'd also need accompanying noscripts for handling service restarts for the services utilizing these noscripts. Maybe the poller noscript should invoke the service restart noscripts when it detects that a new version of any of the certificate files is present on the cert mgmt server and downloads them.
Then we come to the issue that some servers may have multiple certificates and/or multiple services utilizing these certificates. One approach would be to have a configuration file with a mapping table for "certificate x is used by services y and z, certificates n and m are used by service i etc". However that sounds awful, maintaining such mapping tables does not spark joy. The alternative way of handling this would be to just say "fuck it, when ANY certificate has changed, just run ALL of the service reload noscripts". That way we would not need any cert -> service mapping tables but it'd in some cases lead to unnecessary service downtime for some specific services where reloading them causes application downtime. Maybe that's an acceptable outcome, not sure yet.
But the biggest problem I see with this approach is actually managing the client-side automation noscripts. As described earlier, we can't really rely on Ansible to deploy these noscripts to target hosts due to python version mismatches across our fleet. But I'd still want some sort of a centralized way to deploy new versions of the client noscripts across our fleet, since it's not particularly unimaginable that some edge cases will pop up every now and then requiring us to deploy new version of some IIS reload noscript for example across our fleet. It'd also be nice to have a single source of truth telling us where exactly have different service reload noscripts been deployed to (just relying on documentation for this will result in bad times).
So to combat that problem... More SFTP polling? This is where this whole thing starts to feel way too hacky. The best answer to that problem that I've come up with is to also host the client-side noscripts on the certificate server and deploy them to client via the same symlink + client-side poller noscript setup. Thus we can see on the certificate server what servers use what service reload noscripts and updating them en masse is easy. But this also feels like something we really should not do..
Initially I thought we should just save the certificates to a predefined location like /etc/cert-deploy/ and configure all services to read their certificates from there, rather than deploying the services to custom locations on all servers. However I now realize that brings permission / ownership problems. How does the poller noscript know to which user the certificates should be chowned to? It doesn't. So either we'd require local "ssl-access" groups to which we'd attempt to add all sorts of generic www-data, apache, nginx etc accounts and chgrp the cert files to that group, or the service reload noscripts should re-copy the certs to another location and chown them with user account that they know the certs will be used by. Or another mapping table for the poller noscript. Yay, more brittle complexity regardless of choice.
At this point if we go with an approach such as this one, I'd also want to have some observability into the whole thing. Some nice UI showing when have the clients last polled their certificates. "Oh, this server hasn't polled their certificates for 10 days, what's up with that?" etc. Parsing that information from sftp logs and displaying on some web server is of course doable but once again one starts to ask themselves "are we out of our minds?".
I even went as far as I started drafting up a Python webserver which would replace the whole sftp-based approach. Instead clients would send requests to the application, providing a unique per-client authentication token which must match the client token
stored in a database. Then the application would allow the client to download the certificates and service reload noscripts via it. It'd allow showing client connection statistic more easily etc. However my coworker thankfully managed to convince me that this is a really bad idea both from a maintainability and auditing POV.
So, to sum it all up.. How should this problem actually be tackled? I'm at a loss. All solutions I can come up with seem hacky at best and straight up horrible at worst. I can't imagine we're the only organization battling with these woes, so how have others in a similar boat overcome these problems?
https://redd.it/1r4ttqo
@r_systemadmin
So, to sum it all up.. How should this problem actually be tackled? I'm at a loss. All solutions I can come up with seem hacky at best and straight up horrible at worst. I can't imagine we're the only organization battling with these woes, so how have others in a similar boat overcome these problems?
https://redd.it/1r4ttqo
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
Google to Microsoft
I am in the midst of migrating our google workspace to microsoft. our CEO sent the directive and I have my own feelings about it but whatever. let me lay the situation out.
Our google workspace is connected via Okta sso so that users could Okta to get to their gmail, drive, calendar, etc.
we have moved the authoritative mx and txt records from google to microsoft several hours ago now and we are experiencing an issue when testing signing into outlook, that when i put in the email address, it asks me first if i want to add an gmail inbox to outlook vs adding it natively as an exchange inbox. when you say continue, it redirects to Okta to sign in, and then loads it as a gmail inbox in the outlook client.
my question is this. is it doing this because Okta claims the sso and once inside Okta, it uses the google workspace assignment tile to mistakenly point it to google? we didn't delete the accounts in google, but just re-pointed the records away from google to microsoft.
https://redd.it/1r4wlnq
@r_systemadmin
I am in the midst of migrating our google workspace to microsoft. our CEO sent the directive and I have my own feelings about it but whatever. let me lay the situation out.
Our google workspace is connected via Okta sso so that users could Okta to get to their gmail, drive, calendar, etc.
we have moved the authoritative mx and txt records from google to microsoft several hours ago now and we are experiencing an issue when testing signing into outlook, that when i put in the email address, it asks me first if i want to add an gmail inbox to outlook vs adding it natively as an exchange inbox. when you say continue, it redirects to Okta to sign in, and then loads it as a gmail inbox in the outlook client.
my question is this. is it doing this because Okta claims the sso and once inside Okta, it uses the google workspace assignment tile to mistakenly point it to google? we didn't delete the accounts in google, but just re-pointed the records away from google to microsoft.
https://redd.it/1r4wlnq
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
What's the most “obviously not the issue” root cause that actually was the issue?
Had a recent incident where everything pointed in one direction and the logs were screaming about it.
Naturally we chased that signal for hours. Packet captures looked fine. Monitoring showed nothing unusual. Hardware checks passed. It all looked clean.
Turned out the real issue was something we had mentally ruled out early because it "couldn’t possibly be that."
No fancy exploit. No obscure kernel bug. Just something simple that didn’t match the noise we were seeing.
It got me thinking how often confirmation bias creeps into troubleshooting, especially under pressure.
What’s the most convincing false lead you’ve chased in production before realizing the real culprit was something you dismissed early?
https://redd.it/1r55pv6
@r_systemadmin
Had a recent incident where everything pointed in one direction and the logs were screaming about it.
Naturally we chased that signal for hours. Packet captures looked fine. Monitoring showed nothing unusual. Hardware checks passed. It all looked clean.
Turned out the real issue was something we had mentally ruled out early because it "couldn’t possibly be that."
No fancy exploit. No obscure kernel bug. Just something simple that didn’t match the noise we were seeing.
It got me thinking how often confirmation bias creeps into troubleshooting, especially under pressure.
What’s the most convincing false lead you’ve chased in production before realizing the real culprit was something you dismissed early?
https://redd.it/1r55pv6
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
Getting into IT before everything as a service
Does anyone else feel like those who started in IT pre cloud, before everything as a service, are way more skilled than those who did not?
My point being, if you got into IT when you had to take care of your own on prem hardware and your own applications, you had to know how to troubleshoot. You had to know way more, learn way more and couldn’t rely on AI. This has lead me to have a very strong foundation that can now use while working in the cloud and everything as a service. But I never would have gotten this experience if I started in 2025.
Now if something is down, simply blame the cloud provider and wait for them to fix it.
This leads to the new IT workers not being go getters and self starters like you used to have to be to be successful in IT.
Stack Overflow, Reddit, Microsoft forums, hell even Quora for an answer sometimes.
We are the ones who make shit happen and don’t fill our days with useless meetings and bullshit.
Every other department is full of bullshit.
https://redd.it/1r47jab
@r_systemadmin
Does anyone else feel like those who started in IT pre cloud, before everything as a service, are way more skilled than those who did not?
My point being, if you got into IT when you had to take care of your own on prem hardware and your own applications, you had to know how to troubleshoot. You had to know way more, learn way more and couldn’t rely on AI. This has lead me to have a very strong foundation that can now use while working in the cloud and everything as a service. But I never would have gotten this experience if I started in 2025.
Now if something is down, simply blame the cloud provider and wait for them to fix it.
This leads to the new IT workers not being go getters and self starters like you used to have to be to be successful in IT.
Stack Overflow, Reddit, Microsoft forums, hell even Quora for an answer sometimes.
We are the ones who make shit happen and don’t fill our days with useless meetings and bullshit.
Every other department is full of bullshit.
https://redd.it/1r47jab
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
sporadic authentication failures occurring in exact 37-minute cycles. all diagnostics say everything is fine. im losing my mind.
yall pls help me
environment:
4 DCs running Server 2019 (2 per site, sites connected via 1Gbps MPLS)
\~800 Windows 10/11 clients (22H2/23H2 mix)
Azure AD Connect for hybrid identity
all DCs are GCs, DNS integrated
functional level 2016
for the past 3 months we've been getting tickets about "random" password failures. users swear their password is correct, they retry immediately, it works. this affects maybe 5-10 users per day across both sites.
i finally got fed up and started logging everything so i pulled kerberos events (4768, 4769, 4771), correlated timestamps across all DCs and built a spreadsheet.
the failures occur in exact 37-minute cycles.
here's what i've ruled out:
time sync: all DCs within 2ms of each other, w32tm shows healthy sync to stratum 2 NTP
replication: repadmin /showrepl clean, repadmin /replsum shows <15 second latency
kerberos policy: default domain policy, 10 hour TGT, 7 day renewal, 600 min service ticket (standard)
DNS: forward/reverse clean, scavenging configured properly, no stale records
DC locator: nltest /dsgetdc returns correct DC every time
secure channel: Test-ComputerSecureChannel passes on affected machines
clock skew: checked every affected workstation, all within tolerance
GPO processing: gpresult shows clean processing, no CSE failures
37 minutes doesn't match anything i can find:
not kerberos TGT lifetime (10 hours = 600 minutes)
not service ticket lifetime (600 minutes)
not GPO refresh (90-120 minutes with random offset)
not machine account password rotation check (ScavengeInterval = 15 minutes by default)
not the netlogon scavenger thread (900 seconds = 15 minutes)
not OCSP/CRL cache refresh (varies by cert)
not any known windows timer i can find documentation for
the pattern started the exact day we added DC04 to the environment. i thought okay, something's wrong with DC04. i decommed it, migrated FSMO roles away, demoted it, removed DNS records, cleaned up AD metadata...the 37-minute cycle continued.
i'm three months into this like i've run packet captures, wireshark shows normal kerberos exchanges. the failure events just happen, and then don't happen, in a perfect 37-minute oscillation.
microsoft premier support escalated to the backend team twice. first response was "have you tried rebooting the DCs?" second response hasn't come in 6 weeks.
at this point i'm considering:
1. the universe is broken
2. i'm in a simulation and the devs are testing my sanity
3. there's some timer or scheduled task somewhere i haven't found
4. something in our environment is doing something every 37 minutes that affects auth
has anyone seen anything like this? any obscure windows timer that runs at 37-minute intervals? third party software that might do this?
i will pay money at this point srs not joking.
EDIT: SOLVEDDDDDDD
it was SolarWinds.
after someone mentioned backup infrastructure, i went down the storage rabbit hole. correlated Pure snapshot times against my failure timestamps - close but not exact. 7-minute offset wasn't consistent enough but it got me thinking about what ELSE runs on schedules that i don't control.
our monitoring team (separate group, different building, we don't talk much) uses SolarWinds SAM. i asked them to pull the probe schedules. there's an "Active Directory Authentication Monitor" probe. it performs a real LDAP bind + kerberos auth test against a service account to verify AD is responding.
the probe runs every 37 minutes. why 37 minutes? because years ago some admin set it to 2220 seconds thinking that's roughly every half hour but offset so it doesn't collide with our other probes. nobody documented it and that admin left in 2019.
why did it start when DC04 was added? because DC04's IP got added to the probe's target list automatically via their autodiscovery. the probe was already running against DC01-03 but the auth requests were
yall pls help me
environment:
4 DCs running Server 2019 (2 per site, sites connected via 1Gbps MPLS)
\~800 Windows 10/11 clients (22H2/23H2 mix)
Azure AD Connect for hybrid identity
all DCs are GCs, DNS integrated
functional level 2016
for the past 3 months we've been getting tickets about "random" password failures. users swear their password is correct, they retry immediately, it works. this affects maybe 5-10 users per day across both sites.
i finally got fed up and started logging everything so i pulled kerberos events (4768, 4769, 4771), correlated timestamps across all DCs and built a spreadsheet.
the failures occur in exact 37-minute cycles.
here's what i've ruled out:
time sync: all DCs within 2ms of each other, w32tm shows healthy sync to stratum 2 NTP
replication: repadmin /showrepl clean, repadmin /replsum shows <15 second latency
kerberos policy: default domain policy, 10 hour TGT, 7 day renewal, 600 min service ticket (standard)
DNS: forward/reverse clean, scavenging configured properly, no stale records
DC locator: nltest /dsgetdc returns correct DC every time
secure channel: Test-ComputerSecureChannel passes on affected machines
clock skew: checked every affected workstation, all within tolerance
GPO processing: gpresult shows clean processing, no CSE failures
37 minutes doesn't match anything i can find:
not kerberos TGT lifetime (10 hours = 600 minutes)
not service ticket lifetime (600 minutes)
not GPO refresh (90-120 minutes with random offset)
not machine account password rotation check (ScavengeInterval = 15 minutes by default)
not the netlogon scavenger thread (900 seconds = 15 minutes)
not OCSP/CRL cache refresh (varies by cert)
not any known windows timer i can find documentation for
the pattern started the exact day we added DC04 to the environment. i thought okay, something's wrong with DC04. i decommed it, migrated FSMO roles away, demoted it, removed DNS records, cleaned up AD metadata...the 37-minute cycle continued.
i'm three months into this like i've run packet captures, wireshark shows normal kerberos exchanges. the failure events just happen, and then don't happen, in a perfect 37-minute oscillation.
microsoft premier support escalated to the backend team twice. first response was "have you tried rebooting the DCs?" second response hasn't come in 6 weeks.
at this point i'm considering:
1. the universe is broken
2. i'm in a simulation and the devs are testing my sanity
3. there's some timer or scheduled task somewhere i haven't found
4. something in our environment is doing something every 37 minutes that affects auth
has anyone seen anything like this? any obscure windows timer that runs at 37-minute intervals? third party software that might do this?
i will pay money at this point srs not joking.
EDIT: SOLVEDDDDDDD
it was SolarWinds.
after someone mentioned backup infrastructure, i went down the storage rabbit hole. correlated Pure snapshot times against my failure timestamps - close but not exact. 7-minute offset wasn't consistent enough but it got me thinking about what ELSE runs on schedules that i don't control.
our monitoring team (separate group, different building, we don't talk much) uses SolarWinds SAM. i asked them to pull the probe schedules. there's an "Active Directory Authentication Monitor" probe. it performs a real LDAP bind + kerberos auth test against a service account to verify AD is responding.
the probe runs every 37 minutes. why 37 minutes? because years ago some admin set it to 2220 seconds thinking that's roughly every half hour but offset so it doesn't collide with our other probes. nobody documented it and that admin left in 2019.
why did it start when DC04 was added? because DC04's IP got added to the probe's target list automatically via their autodiscovery. the probe was already running against DC01-03 but the auth requests were
being load balanced and the brief lock wasn't noticeable. adding a fourth target changed the timing juuust enough that the probe's auth attempt started colliding with real user auth attempts on the same DC at the same millisecond.
why did it persist after DC04 removal? because the probe targets were never cleaned up. it was still trying to auth against DC04's old IP, timing out, then immediately hitting another DC - which shifted the timing window but kept the 37-minute cycle.
disabled the probe. cycle stopped immediately. haven't had a single 4771 in 72 hours. i just mass-deployed kerberos debug logging, built correlation spreadsheets, spent hours in wireshark, and mass-ticketed microsoft premier support twice to resolve a problem caused by a misconfigured monitoring checkbox.
this job is a meme.
thanks everyone for the suggestions - especially the lateral thinking about backup/storage timing. that's what got me looking at things that run on schedules that aren't mine.
https://redd.it/1r4b9qe
@r_systemadmin
why did it persist after DC04 removal? because the probe targets were never cleaned up. it was still trying to auth against DC04's old IP, timing out, then immediately hitting another DC - which shifted the timing window but kept the 37-minute cycle.
disabled the probe. cycle stopped immediately. haven't had a single 4771 in 72 hours. i just mass-deployed kerberos debug logging, built correlation spreadsheets, spent hours in wireshark, and mass-ticketed microsoft premier support twice to resolve a problem caused by a misconfigured monitoring checkbox.
this job is a meme.
thanks everyone for the suggestions - especially the lateral thinking about backup/storage timing. that's what got me looking at things that run on schedules that aren't mine.
https://redd.it/1r4b9qe
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
our 'ai transformation' cost seven figures and delivered a chatgpt wrapper
six months of consulting, workshops, a 47 page roadmap deck. the first deliverable just landed on our desks for testing.
it's chatgpt with our company logo. literally a system prompt that says 'you are a helpful assistant for [company name\]'. same hallucinations, same limitations, except now it confidently makes up internal policies that don't exist and everyone in leadership thinks the issue is that we need to 'prompt engineer better'.
the consultants are already pitching phase two.
https://redd.it/1r3wgjt
@r_systemadmin
six months of consulting, workshops, a 47 page roadmap deck. the first deliverable just landed on our desks for testing.
it's chatgpt with our company logo. literally a system prompt that says 'you are a helpful assistant for [company name\]'. same hallucinations, same limitations, except now it confidently makes up internal policies that don't exist and everyone in leadership thinks the issue is that we need to 'prompt engineer better'.
the consultants are already pitching phase two.
https://redd.it/1r3wgjt
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
"Best" printer manufacturer
Which printer manufacturer have you had the best experiences with for use in your company?
https://redd.it/1r4gr7w
@r_systemadmin
Which printer manufacturer have you had the best experiences with for use in your company?
https://redd.it/1r4gr7w
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
ASUS shut down their support portal in Germany and Austria
This is just terrible imo. A court in munich ruled ASUS violated patents of Nokia, now their support portal is inaccessible. Should have saved all drivers for company equipment when i had the chance. Need drivers for a few boards and just no way to grab them directly from ASUS (except VPN, would be last resort).
One thing left to say: WTF.
https://redd.it/1r5bd3a
@r_systemadmin
This is just terrible imo. A court in munich ruled ASUS violated patents of Nokia, now their support portal is inaccessible. Should have saved all drivers for company equipment when i had the chance. Need drivers for a few boards and just no way to grab them directly from ASUS (except VPN, would be last resort).
One thing left to say: WTF.
https://redd.it/1r5bd3a
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community
pstop: terminal based system monitor for Windows (htop clone with tree view, process kill, I/O monitoring)
Built a terminal system monitor for Windows that works like htop on Linux.
Why:
Task Manager is fine for GUI, but if you manage Windows servers or spend time in the terminal, having htop available makes life simpler. pstop runs in any terminal with ANSI support.
Install:
What it does:
- Per core CPU monitoring with usage bars
- Memory/Swap/Network bars
- Process table with sort by any column
- Tree view (process hierarchy)
- I/O tab (disk read/write rates per process)
- Network tab
- Kill process (F9), priority (F7/F8), CPU affinity
- Search (F3), filter (F4)
- Persistent config
- ~1 MB single binary, zero dependencies
Single ~1 MB binary. No installer. No runtime dependencies. Just run it.
GitHub: https://github.com/marlocarlo/pstop
https://redd.it/1r5evtz
@r_systemadmin
Built a terminal system monitor for Windows that works like htop on Linux.
Why:
Task Manager is fine for GUI, but if you manage Windows servers or spend time in the terminal, having htop available makes life simpler. pstop runs in any terminal with ANSI support.
Install:
cargo install pstop
What it does:
- Per core CPU monitoring with usage bars
- Memory/Swap/Network bars
- Process table with sort by any column
- Tree view (process hierarchy)
- I/O tab (disk read/write rates per process)
- Network tab
- Kill process (F9), priority (F7/F8), CPU affinity
- Search (F3), filter (F4)
- Persistent config
- ~1 MB single binary, zero dependencies
Single ~1 MB binary. No installer. No runtime dependencies. Just run it.
GitHub: https://github.com/marlocarlo/pstop
https://redd.it/1r5evtz
@r_systemadmin
GitHub
GitHub - marlocarlo/pstop: htop for Windows . TUI system monitor with per-core CPU bars, memory/swap/network, tree view, process…
htop for Windows . TUI system monitor with per-core CPU bars, memory/swap/network, tree view, process kill, 7 color schemes, mouse support. cargo install pstop - marlocarlo/pstop
MDU Routers
Anyone out there doing MDU setups? Currently doing this for several properties using Ruckus AP’s, Ruckus SmartZone and Windows DHCP server off-prem. It’s time to move away from this setup, and I’m curious what a recommendation might be for handling up to 100 Vlans per site and a DHCP Server per subnet (just handing out about 30 hosts per vlan).
And no, please don’t mention Nomadix.
Edit: Added clarity on the DHCP servers.
https://redd.it/1r5hpsf
@r_systemadmin
Anyone out there doing MDU setups? Currently doing this for several properties using Ruckus AP’s, Ruckus SmartZone and Windows DHCP server off-prem. It’s time to move away from this setup, and I’m curious what a recommendation might be for handling up to 100 Vlans per site and a DHCP Server per subnet (just handing out about 30 hosts per vlan).
And no, please don’t mention Nomadix.
Edit: Added clarity on the DHCP servers.
https://redd.it/1r5hpsf
@r_systemadmin
Reddit
From the sysadmin community on Reddit
Explore this post and more from the sysadmin community