Reddit Sysadmin – Telegram
Why does identity in the Microsoft stack still feel so scattered?

Entra ID roles here.

Azure IAM there.

Intune permissions somewhere else.

Enterprise app settings in another menu.

CA policies in their own world entirely.

Every time I try to do a clean audit, I end up clicking through 10 different portals just to understand who can do what.

Is this just the permanent state of Microsoft cloud, or have any of you actually found a sane way to centralize identity governance?

https://redd.it/1p66n1h
@r_systemadmin
Genuinely curious - would you use AI more if your data actually stayed private?

Hey everyone, genuine and curious question here.

I've been talking to a bunch of people lately about AI at work - ChatGPT, Claude, Copilot, all that stuff. And I keep hearing the same thing over and over: "I'd use it way more, but I can't put client data into it" or "my compliance team would kill me."

So what happens? People either don't use AI at all and feel like they're falling behind, or they use it anyway and just... hope nobody finds out. I've even heard of folks spending 20 minutes scrubbing sensitive info before pasting anything in, which kind of defeats the whole point.

I've been researching this space trying to figure out what people actually want, and honestly I'm a bit confused.

Like, there's the self-hosting route (which I saw recently there's a post that went viral on self-hosting services). Full control, but from what I've seen the quality just isn't there compared to GPT-5 or Claude Opus 4.5 (which just came out and it's damn smart!). And you need decent hardware plus the technical know-how to set it up.

Then there's the "private cloud" option - running better models but in your company's AWS or Azure environment. Sounds good in theory but someone still needs to set all that up and maintain it.

Or you could just use the enterprise versions of ChatGPT and hope that "enterprise" actually means your data is safe. Easiest option but... are people actually trusting that?

I guess I'm curious about two different situations:

If you're using AI for personal stuff - do you even care about data privacy? Are you fine just using ChatGPT/Claude as-is, or do you hold back on certain things?

If you're using AI at work - how does your company handle this? Do you have approved tools, or are you basically on your own figuring out what's safe to share? Do you find yourself scrubbing data before pasting, or just avoiding AI altogether for sensitive work?

And for anyone who went the self-hosting route - is the quality tradeoff actually worth it for the privacy?

I'm exploring building something in this space but honestly trying to figure out if this is a real problem people would pay to solve or if I'm just overthinking it.

Would love to hear from both sides - whether you're using AI personally or at work.

Thanks :)

https://redd.it/1p65x1m
@r_systemadmin
Data leakage is happening on every device, managed or unmanaged. What does mobile compliance even mean anymore? Be real, all our sensitive company data and personal info we shouldn’t type into AI tools is already there...

We enforce MDM.
We lock down mobile policies.
We build secure BYOD frameworks.
We warn people not to upload internal data into ChatGPT, Perplexity, Gemini, or whatever AI tool they use.
Emails, internal forms, sensitive numbers, drafts, documents....everything gets thrown into these AI engines because it’s convenient.

The moment someone steals an employee’s phone…
or their laptop…
or even just their credentials…
all that AI history is exposed.

If this continues, AI tools will become the new shadow IT risk no one can control and we’re not ready
And because none of this is monitored, managed, logged, or enforced…
we will never know what leaked, where it ended up, or who has it
How are u handling mobile & AI data leakage ?
Anything that actually works?

https://redd.it/1p6absr
@r_systemadmin
Windows DNS forwarders validation error

Hy!

I have a DC, which are also DNS server. I try to set up the forwarders to dns1.fortiguard.net. When I entered the IP address of the DNS server 96.45.45.45, the GUI show: An unknown error occurred while validating the server.

I check the name resolution with nslookup from DC:

nslookup google.hu 96.45.45.45 and the result is success. I also check with PowerShell:

Test-NetConnection 96.45.45.45 \-Port 53

The result is success.

Why does it say the GUI the validation error?


Edit: The server operatin system is Windows Server 2022. I tried it on Windows Server 2019 and 2016, but the validation is OK in the same network. Is it a Windows Server 2022 bug?

https://redd.it/1p69vr5
@r_systemadmin
The original "Vibe Coding" wasn't AI. It was VisiCalc (1979)

I've been seeing the term "Vibe Coding" thrown around a lot lately regarding AI tools, and it sent me down a bit of a history rabbit hole.

I went back and looked at the launch of VisiCalc in 1979 and James Martin’s 1982 book Application Development Without Programmers. The parallels to what we are dealing with right now are actually kind of insane.

Back then, IT departments had multi-year backlogs. Managers started buying Apple IIs with their typewriter budgets just to run VisiCalc so they could bypass IT. That was the birth of "Shadow IT."

Everyone thinks macros were the start of user-gen coding, but VisiCalc didn't even have macros. It was just the sheer ability for a user to define logic without asking permission that broke the dam.

I wrote up a deeper dive on this, but the conclusion I came to is that we're trying to solve this the wrong way (again). In the 80s, IT tried to ban PCs. It failed. Then we tried to ignore spreadsheets. That failed. Eventually, we just accepted them.

We're currently in the "ban/ignore" phase with AI/Low-code tools. I think the only way out is what I'm calling "Governed Sandboxes"—basically giving users "IT-like" powers but inside a walled garden where we can still audit the data.

Curious if anyone here was around for the Lotus/Excel wars, or if you guys are seeing the exact same "Shadow IT" patterns popping up with things like Copilot or Power Platform right now?

https://redd.it/1p6ecnd
@r_systemadmin
Org goes all shadow IT

Anyone else find their org going all shadow IT? I get pulled in to fix stuff non-stop and never included from the start. Ready to jump off a roof.

https://redd.it/1p6eu8l
@r_systemadmin
Memory - Fair Warning

Folks, we've seen a few posts regarding Memory availability and pricing over the last week or two and just a quick update from what we are seeing on the VAR side.

Memory is becoming non-existent slowly, but surely.
The pricing since just August has more then doubled.
Anticipate system costs going up from here if they haven't already.

Dell for example will not sell certain modules unless its in a system build. I've seen this with servers and laptops at this time.

3rd parties like Axiom/Kingston/Crucial are basically running out of stock.

I don't believe there's a good solution to "Buy Now" or "Wait it out" this is just what to expect if any of your partners come back with exceptionally high pricing or long lead times. Also your ETA's should be expected to be extended at any time.

Just fair warning friends.

https://redd.it/1p6fq4h
@r_systemadmin
Who's working on their last 10 years

Who's working on their theoretically last 10 years (retire at 65?), and what are your thoughts on your current position and future in the industry?

https://redd.it/1p6j5rr
@r_systemadmin
APC UPS eats up batteries

Hello, please let me know if this the wrong sub.

SMB infr here. We bought a Smart-UPS SRT 8000 in 2017 along with 2 battery packs in addition to the internal one that comes with the UPS. Each battery pack has two cartridges and each cartridge has 2 cells in it. Over the last three years we have had to replace both cartridges on one of the add-on battery packs every twice. The first time the cartridges lasted a year and the second time they lasted almost 2 years. We've also had to replace cartridges on the other add-on battery pack but much less frequently. The curious thing is that when the batteries are first installed they'll say that the "Predicted Replacement Date" is like 4-5 years out

Last week I got one of the alert messages saying that one of the cartridges in the problematic battery pack needs to be replaced soon (mid December). Then this week, after the UPS ran a scheduled self-test it came back saying that 3 cartridges in total needed replacing. One if each of the 3 battery packs. I am also getting messages saying that "The battery power is too low to support the load; if power fails, the UPS will be shut down immediately."

I'm curious, has anyone seen this behavior where cartridges need replacing every 1 to 2 years? Is there a proper way to replacing these that I am missing? Should I be replacing both cartridges in each pack at the same time instead of just the one that UPS says needs replacing?

Also, I noticed that when the self-test ran I got messages saying "The battery power is too low to support the load; if power fails, the UPS will be shut down immediately." I know that the self test is supposed to drain the battery to a certain amount but I never received those errors before.

What I don't want to happen is that we replace all 3 of these cartridges now (about $3K) and a year down the road we are in the same boat again without actually fixing what the real problem may be. I already have enough issues justifying other necessary IT purchases to management.

Any suggestions or insight on what may be going on would help alot.

https://redd.it/1p6j516
@r_systemadmin
Anyone using Starlink as Internet backup?

Currently, we have a single Internet service for our office. 1000 meg download with a block of 15 static public IPs.

We are now looking into a redundant Internet service. Fiber is not yet fully available in our area. Talks about early - mid 2026 though.

Anyway, anyone using Starlink as a backup internet service? If so, have you noticed if the connection is solid? Also, do they offer static IPs for businesses?

https://redd.it/1p6m48y
@r_systemadmin
EU customer wants a DPA before trial. Is GDPR technically unavoidable now?

We’re US only (7 ppl) with only US customers so far

Yesterday a potential client from Britain told us they need a signed DPA and to confirm GDPR compliance before they even test the product

My initial perception of GDPR was that it's something to deal with when we intentionally launch in Europe not right now when 1 European only signs up (especially when they're treating this like its non negotiable). From what I've read it says that it includes DPAs, subprocessor lists, SCCs, mapping which all together just feel like too much to handle especially when you don't have the EU market as your current primary market

Do small teams get ahead of this or only do it once they actually close EU revenue? I don't want to just ignore it if we're LEGALLY required to do it but also can't afford to spend the next two months on nothing but compliance work

https://redd.it/1p6vf9y
@r_systemadmin
Can I reserve/block 25 GB for Windows Updates?

Hi,

at work we have sometimes the problem that the users use every GB on their system drive. It does not matter if they have 256 GB, 512 GB or 1 TB. The drive is full and the Feature Upgrade cannot be installed.

In our SCCM TS we have some clean up tasks like orphaned MSI packages, Temp folder, delete Windows search index etc. but still sometimes it is not enough.

So my question is, can we already block space that will be used by just for windows updates?

Thanks

https://redd.it/1p70qjo
@r_systemadmin
How can we better protect ourselves from the recent npm supply chain attacks leaking secrets?

The recent wave of malware infecting hundreds of npm packages organization. sensitive secrets on platforms like GitHub has shaken the developer community. These supply chain attacks exploit malicious post-install noscripts and compromised maintainers, making it really challenging to trust the packages we depend on daily.

Many security best practices suggest disabling post-install noscripts, implementing strict package version cooldowns, validating package provenance, and minimizing dependency trees. Yet, even with these, the leakage of secrets remains a critical risk, especially when malicious code executes inside containers or developer environments.

Has anyone explored or implemented strategies that go beyond traditional methods to reduce the attack surface within containerised or runtime environments? Ideally, approaches that combine minimal trusted environments with strong compliance and visibility controls could offer better containment of such threats. Curious to hear what the community is trying or thinking about as more organizations wrestle with these issues.

https://redd.it/1p6z3ar
@r_systemadmin
I’m tired of playing “where did this update go?”

Every sprint review turns into a hunt for missing updates. Devs update GitHub, PMs update Trello, leads update Google Sheets, and nothing matches. Half our delays come from misalignment, not actual coding issues. Is there anything that pulls GitHub info directly into the project boards and makes reporting automatic? I'm done manually chasing pull requests like they're stray cats

https://redd.it/1p71tuw
@r_systemadmin
Chainguard alternative?

hey anyone got cheaper (or free) alternatives to chainguard images that actually get rebuilt weekly with patches? chainguard is killing our budget and my manager is about to have a stroke over the invoice 😂

i just need tiny base images that stay mostly cve-free without costing a kidney. what are y’all using?

https://redd.it/1p765wt
@r_systemadmin
Has anyone ever actually fixed anything by updating drivers in Device Manager?

I’ve been in IT for 5 years now, and not once has “Search automatically for updated driver software” in Device Manager ever found any missing drivers. I get that it only pulls generic stuff and not the proper manufacturer drivers, but why this crap is still widely recommended as a first troubleshooting step is beyond me.

Yet I still try it every now and then out of pure desperation… only to confirm what I already know: it is never a solution. Has this ever actually solved anything for anyone?

https://redd.it/1p73k01
@r_systemadmin
FreeRADIUS in production: 10 practices that eliminated random delays and weird spikes

I manage FreeRADIUS in one real project (no sensitive details, of course) where it handles a significant flow of authentication and accounting requests.
In the early days we saw everything: random delays, ODBC stalls, unexpected request spikes, duplicate storms, and periodic “mystery slowdowns.”

After months of tuning, log analysis, and observation, these practices made the system far more stable and predictable.
Sharing them here — maybe useful to someone.



# 1. Database latency watchdog (every 5 seconds)

A tiny query like SELECT 1 through ODBC.
If latency goes above a threshold → log immediately.
Helps distinguish “DB is slow” from “RADIUS is slow.”

# 2. Proper ODBC pool tuning

These values worked extremely well:

min pool = 8
max pool = 32
connection lifetime = 3600
query timeout = 5–8 seconds
login timeout = 2 seconds

Without a lifetime limit, stale connections accumulate and eventually collapse the entire chain.

# 3. Duplicate-request control

We added a small duplicate counter + a soft-limit.
When a device floods identical Access-Requests, FreeRADIUS can behave strangely.
This made such issues instantly visible.

# 4. Log handling: only rotated .gz files

Never touch active logs.
Use logrotate → compress to `.gz` → process archives only.
Touching “live” RADIUS logs is an easy way to corrupt them silently.

# 5. Weekly system-status snapshots

A single automated report containing:

RAM / SWAP usage
IO wait
Load average
SQL latency
ODBC pool state
log size growth
RADIUS response time

Week-to-week baselines make long-term patterns obvious.

# 6. RTT monitoring between nodes

Even if servers are in the same site or different regions.
If two nodes show identical RTT spikes → it’s a systemic event, not a local issue.

# 7. Docker maintenance (if containerized)

We run FreeRADIUS in Docker, so we use:

cleaning overlay2 layers older than 7 days
truncating large container logs
weekly `docker system prune`
healthchecks + auto-restart

This removed several unexpected IO stalls.

# 8. Reject-peak detector

If rejects per second go above a threshold → log it as a separate event.
Helps detect anomalies in real time (DB slowdown, traffic bursts, etc.).

# 9. Accounting/session logs: gzip → archive

Never read or write active accounting files.
Compress → move → remove local copies once verified.
Keeps live directories clean and safe.

# 10. Lightweight RCA notes for every incident

5–6 lines:

timestamp
what happened
root cause
impact
fix
current state

This saved hours of analysis when something similar happened again.

# Result

After implementing all of this, random slowdowns dropped dramatically, and incident resolution time became much shorter.

If anyone wants it, I can share:

the system-status noscript
ODBC configs
logrotate templates
duplicate-request checker
my reject-peak detector
or the safe directory layout we use

Just ask.

https://redd.it/1p77epw
@r_systemadmin
Has anyone found any AI use cases that work and deliver value yet? Other than smarter helpdesk support article suggestions...

I'm not talking about something where a user starts to enter a ticket about needing to reset their password, and the help desk system can find and suggest a support page about ... resetting passwords. That stuff has been around for a long time.

I'm talking current AI, or "AIOps" (which surprisingly really started ticking up in the past year). Even if the AI isn't automatically taking actions ... if it's able to quickly triage and bring all sorts of information together so by the time you get involved there's already an assessment waiting to be reviewed ... would be helpful.

It'd be interesting to know of any real-world examples where this is taking place. You don't have to name specific vendors (unless you want to) but I'd like to believe that somewhere out there, someone has stumbled on a few things that make their daily lives easier (personally, I'm playing around a lot with n8n on that front but that's not directly "AI" even though you can call AI engines into workflows with it).

https://redd.it/1p77zo3
@r_systemadmin
Are there any reasons to support TLS versions lower than 1.3 nowadays?

I am configuring a new host on Cloudflare, and I noticed that all versions of TLS, from 1.0 onwards, are enabled by default.

After a quick check, it seems that all modern browsers now support TLS 1.3. So is there any valid reason to keep TLS 1.0/1.1/1.2 enabled?

https://redd.it/1p78nnd
@r_systemadmin