Reddit Sysadmin – Telegram
How green am I?

I think what I'm looking to learn from this is where my current experience would normally land me on the totem pole in a larger company. I'm not quite 30 and currently work at a hardware startup of about 25 people. I have a degree in physics, started out at this company a few years ago as a mechanical engineer and machinist because of my hobbies, and now for about 6 months I've been the sole IT guy because we needed it and I have experience from my homelab. I have no certs in literally anything. That being said, here's what I've done and currently do:

* Set up and administer microsoft 365 tenant across Teams, Exchange, Entra, Intune, Sharepoint, etc. I recently migrated a bunch of legacy systems using ForensiT profwiz, and set up a process to enroll new devices using Autopilot. Currently rolling out MAM for personal devices and doing the slow grind of getting all devices compliant so I can implement conditional access policies
* Purchased and installed some Supermicro servers for Proxmox and Truenas with replication between our two locations and a cloud storage provider, and put the rest of the rack together (UPS, switches, environmental sensor, etc)
* Set up backups for all the things. i.e. Cubebackup for Sharepoint, Urbackup for certain windows and linux devices. Trying to reduce cloud reliance (lol) and single points of failure
* Gutted our awful Eero routers and set up Unifi networking and protect equipment. Made vlans to segregate staff, servers, local services, and PLCs. Set up our security cams, will probably set up Unifi access equipment soon
* Spin up and administer all of our local services like Grafana, Vaultwarden, aforementioned backups, Nextcloud, Bookstack - in Debian VMs in Proxmox, with scheduled backups to Proxmox Backup Server. Much ansible going on here
* In the process of evaluating traditional vs overlay VPNs like Tailscale/Netbird, evaluating SIEM/XDR like Wazuh, rolling out Admin by Request, working on a presentation to push Knowbe4 phishing prevention training (has been an issue...), and writing company policy for stuff like AI use, remote access, break glass accounts, privilege management, etc

I feel like I've kind of been speed running stuff because we started from zero lol. My only real management experience comes from training and managing a jr CNC mill programmer. Because I've not been "in the industry", If I were to go to a theoretical new employer with this information, I don't even know where I land or what position I'd want to ask for.


EDIT: I should also mention a few more items:

* I have a homelab, a 3-node Proxmox cluster, which runs a lot of my self hosted services like Nextcloud, Immich, Home Assistant, etc. I have high availability set up with ZFS replication, and I've played around with Ceph.
* I've got some Traefik reverse proxies set up for both local DNS and externally exposing certain services with valid certs, and using Crowdsec to ban IPs. I'm keeping any service that doesn't NEED to be external, internal, and certain services like uptime-kuma are on a VPS. I was using Pihole as a dhcp server when we had the Eero router, but have since switched to Unifi.
* I have our backup strategies and dataflows mapped out using draw.io and Bookstack, along with any other information that shouldn't live only in my brain.

https://redd.it/1p4nz9l
@r_systemadmin
What's the next step for you guys?

Just curious. What's next for you guys? Systems engineer, something else, or are you comfortable where you are?

https://redd.it/1p4u3j0
@r_systemadmin
Users receiving Microsoft MFA SMS code when they did not initiate a login

Hi everyone!

I have two users over the past 4 days who have received Microsoft MFA SMS codes that they did not attempt any Microsoft login during the time they came in. The codes also came from the same number as authentic text codes come from. I had the two users change their password the first time it occurred just to be safe if a bad actor had their login credentials and I signed the users out of all sessions though the 365 admin portal just in case the bad actor had the users session tokens, but last night one of the users received another SMS code. I looked all though Entra in sign-in log's, Audit log's, Multifactor Authentication Activity... but can't find nothing during the time the codes came in!

I tested another account to see if a sign-in log appears in Entra if a user gets to the MFA prompt when signing into Microsoft but does not know the code or types in a bad code, but nothing appeared in the log's.

Is there another place I should be looking? could this just be SMS spoofing sending the code to the users?

Thanks!

EDIT: Guys.. I think I found the issue. Entra Admin Center> Authentication Methods > Policy's > SMS > "Use for sign-in" is check marked.... users were probably apart of a Microsoft phone number login spray attack. When logging into Microsoft with a phone number "instead of email" it sends a SMS code to the users phone to sign in.

I am going to confirm with my team on Monday and at least get that check marked off if not get SMS MFA turned off and have Authenticator app be the primary like mentioned in comments below.

Thanks for all your help everyone!

https://redd.it/1p4ryu4
@r_systemadmin
Raid 10 disk failure

I’ve had a disk failure on a dell server running Server 2016

I took the failed disk out and put it back in, the disk has gone from orange to green but now the raid configuration is asking if I want to clear the foreign configuration

I’m guessing it’s not recognising the failed disk as part of the original raid setup.

Windows wouldn’t boot with the failed disk, had auto repair cycle but now the server doesn’t think it has a bootable drive.

How screwed am I?

If I take out the failed disk and put a clean one in will all be restored? 😩

https://redd.it/1p4rkdp
@r_systemadmin
What was your "Dream Sysadmin Job" back in the day vs. Now?

I used to dream of managing a cool server room, but after watching tech events, I realized the new goal is becoming an "AI Architect". So i wanna be ready for this future. And i wanna ask, what was your dream sysadmin job?

https://redd.it/1p4xy3p
@r_systemadmin
Small Business/Church IT setup

I’m looking for recommendations on an IT setup for my church. I have limited experience, but I’m a fast learner. The current setup includes a 24 port managed Cisco Switch on its last legs. We have a solid modem, the router is old and I plan to replace it, I’ll need a good quality managed PoE switch, maybe 24 port, but I’m only using 16 ports now. All the WAPs are failing and will need to be replaced. We have 7, but I can’t get by with 4. We currently have 7 Ethernet connected computers, four laptops that can be connected via WiFi and we run a livestream, so we need a strong VLAN setup to protect that signal. I want at least three separate VLANs that I can isolate (office, media, and guest), and I want good security (firewall?) to protect the network. We have a security camera setup that is separate from this network that is already managed and needs only a single internet port. The camera just needs a PoE port and functions on NDI. We just replaced all the desktop computers with new HP Business profile Windows machines. It is primarily our WiFi that is completely down. My IT guy thinks all the WAPs are just too old and their firmware is out of date and beyond updating. Bottom line, I’m looking for the best recommendation for a high quality, cost effective, router, 24 port managed PoE Switch (with VLANs, QoS, security), and 4 high quality WAPs (or whatever we are calling wireless access points now).

https://redd.it/1p50np7
@r_systemadmin
What's the politically correct/professional wording for calling/wording a company and telling that company, that's aggressively pushing their software to the cloud? They are charging 8x the fee for an on-prem migration compared to their cloud solution which isn't mature. We can't change supplier

And no it's not Broadcom (haha).
They have 5% of their clients on that cloud solution today. They will do major changes to how it works as well for the end-users in the coming months, which means retraining hundreds of users. Our current on-prem server is dying and it's a critical program (thanks to the previous sysadmin who never maintained it).
Edit: We don't mind to pay the on-prem fee, the thing is if we do they still force us to the cloud next year...

https://redd.it/1p5199l
@r_systemadmin
Backhaul Pain + Security Policies… is there a better way?

Been thinking about how companies secure remote users and cloud apps lately. One thing keeps bugging me: all that old school VPN and firewall backhaul. Seriously, forcing traffic through a central datacenter? Latency spikes, weird routing, and making policies stick everywhere feels impossible sometimes.

Some vendors promise “cloud-native secure access” or “unified network and security,” but from what I’ve seen, it’s not always obvious if it really fixes the problem or just slaps a modern label on old tech. And honestly, monitoring PoPs and ensuring consistent enforcement globally? That’s another headache entirely!!!

So I’m wondering, are people actually running setups that merge routing and security at the edge properly, keeping policies consistent and avoiding backhaul pain? Or is it still mostly VPN/firewall patchwork pretending to be SASE?



https://redd.it/1p54u0f
@r_systemadmin
A rather interesting take on “traditional” dataCentre’s vs cloud services.

I apologize if this is not the right place to ask but I thought it best since there would be quite a few varying views. I had an interesting conversation with a group of young learners entering the field of IT that came about from a certification question that went like this “which two of these things separates traditional data Centres from cloud services providers” or something along that line. Now the answers were, automation, load balancing, virtualization and auto-scaling groups. Now when I heard the question I was stumped for a bit, I’ve been in IT for a tad bit too long and from my experiences the only thing that stood out was auto scaling groups and here’s my reasoning. Virtualization, automation, and load balancing is not a cloud-service native feature since these were being done in on premise data Centres since forever though it’s not as easily done as it can be in like aws, azure or whatever. But I was kinda even more stumped when I learned the answers were automation and virtualization. I ask this here to basically see what everyone’s feedback is on that question.

https://redd.it/1p55xlp
@r_systemadmin
Moderating user content is breaking my team’s brain

Running a UGC platform in 2025 is like being a firefighter. One day it’s spam floods, next day coordinated harassment, next day someone tries to get an AI bot to generate borderline illegal stuff to test boundaries.

We can’t keep up manually and our in-house tools feel prehistoric. Is everyone else drowning too or are we just bad at this

https://redd.it/1p59de8
@r_systemadmin
Best office chair for back pain? Is Aeron really that good?

Hey all.. I’ve started dealing with lower-back pain from long hours at the desk, so I’m finally looking to upgrade my chair. I’m a sysadmin, so most days I’m sitting for long stretches with occasional bursts of activity, and my current cheap chair just isn’t cutting it.

What I’m looking for:

* Strong lumbar support (adjustable preferred)
* Mesh back
* Adjustable seat height/tilt
* Something durable that won’t fall apart in a year
* Budget: up to \~$500

I’ve seen a lot of people recommend things like the Aeron or other ergonomic mesh chairs, but I’m hoping to hear what’s actually worked for folks in IT who sit for long hours.

Any chair you’d recommend that genuinely helped with back pain?

https://redd.it/1p5a610
@r_systemadmin
Spark standalone executor failures take forever to recover

Running Spark on a standalone cluster and hitting a big problem. When an executor fails, recovery is painfully slow. Tasks sit there with executor lost errors and nothing moves for minutes. Other jobs on the cluster freeze too.

I tried tweaking spark.deploy.maxExecutorRetries and heartbeat intervals. It helps a little but not enough. One small failure still stalls the pipeline.

Has anyone actually solved this? Do you break jobs into smaller stages, monitor executors differently, or use some trick to speed recovery?

https://redd.it/1p5cb7g
@r_systemadmin
Microsoft SQL Server 2025 Express edition limit database size to 50 GB

Hello,


on official page https://learn.microsoft.com/en-us/sql/sql-server/what-s-new-in-sql-server-2025?view=sql-server-ver17 MS announced that SQL 2025 Express edition will support up to 50 GB databases (on previous versions it was limited to 10 GB).

Is there any trick behind that limit change or why would MS do something like that?

https://redd.it/1p5dfkh
@r_systemadmin
I am begging for something that doesn’t require admin training

our current tool literally has a 52 page admin guide. to change one workflow, i need permission from the Jira Overlord yes, that’s what he calls himself. why can’t project tools be… normal?



https://redd.it/1p5drfa
@r_systemadmin
Thinking through the trade offs of DIY multi cloud networking

been doing a lot of soul searching on whether DIY cloud networking policy, routing, and security is actually the right move for us, or if we’re biting off more than we can chew.

Here’s what I’m weighing

Control vs Complexity: We want full control, routing, QoS, firewalls, but managing that across cloud regions is very complex. Multicloud networking inherently has high operational overhead.
Visibility/Policy Consistency: It’s hard to maintain unified policies across clouds. According to network computing advice, without a central control plane your policies fragment.
Performance Risk: Latency and jitter are no joke. If your QoS isn’t rock solid, real-time traffic suffers. Research shows even SDN based routing needs careful queuing and policy routing to guarantee performance.
Security: Cross cloud security policy reconciliation is a major pain.

But on the flip side, if we get this right, we could save on vendor lock-in and have a highly tailored policy model.

https://redd.it/1p5eaqp
@r_systemadmin
Daily drift is real

Noticed something recently.

Most tenants I see have small changes happening daily.

Role assignments.

Conditional Access toggles.

Intune settings.

App permissions.

One percent here.

Two percent there.

After six months the environment is unrecognizable.

How do you all track drift without manually comparing JSON dumps?

https://redd.it/1p5avvw
@r_systemadmin
What did you know how to do before becoming a sysadmin?

I am on my journey to become a sysadmin. I have zero actual work experience. I'm 42. Been in manual labour since I was 16 and always felt my calling was working in IT. Finally decided to do it. Haven't owned a pc in 10 years. I brought a pc 6 months ago. Took the conptia tech+ a week later and passed. Took A+ the next month and passed. Took network+ a month later and passed.

Ive been doing everything I think i need to be able to get a junior role or 1st/2nd line support but my end goal is sysadmin. I have a home lab set up and I do regular daily practice when I finish my job (my job is 9-10 hours a day).

Ive learnt to use Linux and Windows server to monitor and manage users/servers. Learnt sql for some reason. Powershell. Excel. I got a m365 business account a few weeks ago and just messed about adding old devices through intune and made some policies.

My whole work life ive dealt with talking to the public and customers. I feel like im ready to get into the world of IT now. Ive applied for tons of jobs but not even an interview yet.

What did you guys know and do before becoming a sysadmin?

Edit: I appreciate all the great replies. This definitely feels like a sub where you're all just there for each other. Good stuff.

https://redd.it/1p5h3ua
@r_systemadmin
Are IT responsible for writing/owning the Business Continuity Plan?

I understand that IT input will be required at stages throughout the plan, but just wondering who is typically responsible for writing/owning an org’s BCP? Does it fall under IT Manager or a role under corporate/risk?



https://redd.it/1p5hqnq
@r_systemadmin
Quality of engineers is really going down

More and more people even with 4-5 YOE as just blind clickops zombies. They dont know anything about anything and when it comes to troobuleshoot any bigger issues its just goes beyond their head. I was not master with 4-5 years in the field but i knew how to search for stuff on the internet and sooner or later i would figure it out. Isnt the most important ability the ability to google stuff or even easier today to use a AI tool.But even for that you need to know what to search for.

https://redd.it/1p5ki7i
@r_systemadmin
RDS Server 2025 - High WMI usage 30%-90%

hi guys (and girls)


I'm troubleshooting an issue for a few weeks now, and feel like i'm stuck.
So I finally decided to aks you guys for any help:)



The Story

We recently upgraded a
customer from an RDS 2016 farm to RDS 2025. The old 2016 servers suffered from
very high CPU load for WMIPrvSE.exe.

When there ware 0 users logged on, the
problem was not there.
When there ware \~ 5 users logged on, it was
not that bad.
When there ware \~ 20 users logged on, it was
absolute disaster.... Like almost always 80% usage for this WMI process alone.

I was unable to find the
cause on the 2016 Farm, but ended up assigning only 1 CPU to this process.
Artificially limiting the CPU usage. This worked for years. Not the best way to
handle the issue, to be honest. 

Now I always assumed (my bad!) that whenever we replaced the 2016 server with a new server, this problem word just disappear. Boy was I Wrong!

The new server, having 32-core CPU (Hyper-v VM) is having the exact same issue!
WMIPrvSE.exe using between 30% and 80% of the CPU usage, all-dag-long.
But at the end of the day, when all users log out, it’s gone.

Now here is my big issue: I cant find why! I have been reading logs and traces for days…
My gut feeling is telling me it’s specific to this customers environment. Because we had the same with Server 2016 and with Server 2025. I never saw this on any other environemnt. So I feel like I can rull out any of the generic software tools we use (Antivirus/backup etc) that we run on all our customers. I feel like it must be client-specific software. Or maybe a printer driver for example.


I used Process Explorer to analyse WmiPrvSE.exe and this is the stack trace:

 

ntoskrnl.exe!KeSaveStateForHibernate+0x7d66ntoskrnl.exe!KeQueryPerformanceCounter+0x1c20

ntoskrnl.exe!KeWaitForSingleObject+0x1a9d

ntoskrnl.exe!KeWaitForSingleObject+0x71f

ntoskrnl.exe!KeQueryUnbiasedInterruptTimePrecise+0x2167

ntoskrnl.exe!ExReleaseFastMutexUnsafe+0xc6d

ntoskrnl.exe!KiCheckForKernelApcDelivery+0x32

ntoskrnl.exe!ExAcquirePushLockSharedEx+0x4fb

ntoskrnl.exe!ExAcquirePushLockSharedEx+0x4b9

ntoskrnl.exe!ExUuidCreate+0x1ec9

ntoskrnl.exe!ExUuidCreate+0x1ace

ntoskrnl.exe!WmiQueryTraceInformation+0x2243

ntoskrnl.exe!NtQuerySystemInformation+0xf54

ntoskrnl.exe!NtQuerySystemInformation+0x3e

ntoskrnl.exe!setjmpex+0x9215

ntdll.dll!NtQuerySystemInformation+0x14

cimwin32.dll+0x2dbc0

cimwin32.dll+0x116b4

framedynos.dll!CWbemProviderGlue::CreateInstanceEnumAsync+0x426

wmiprvse.exe+0x8ca9

wmiprvse.exe+0x8338

RPCRT4.dll!NdrServerCallNdr64+0x1c63

RPCRT4.dll!NdrStubCall2+0x30d

combase.dll!CStdStubBuffer_Invoke+0xdf

RPCRT4.dll!CStdStubBuffer_Invoke+0x46

combase.dll!RoClearError+0xc4e2

combase.dll!RoClearError+0xba56

combase.dll!RoClearError+0xb0a1

combase.dll!HBITMAP_UserSize+0x25c6

combase.dll!CoWaitForMultipleHandles+0x101a

combase.dll!CoWaitForMultipleHandles+0x6488

combase.dll!HMONITOR_UserFree+0x2123

RPCRT4.dll!I_RpcFreeBuffer+0x107

RPCRT4.dll!NDRSContextUnmarshall2+0xa24

RPCRT4.dll!NDRSContextUnmarshall2+0x17ea

RPCRT4.dll!RpcExceptionFilter+0x27e4

RPCRT4.dll!RpcBindingFromStringBindingW+0x325c

RPCRT4.dll!RpcImpersonateClient+0x123c

RPCRT4.dll!RpcImpersonateClient+0x3c3

RPCRT4.dll!I_RpcGetBufferWithObject+0x678

ntdll.dll!RtlSetThreadSubProcessTag+0x3bae

ntdll.dll!RtlSetThreadSubProcessTag+0x1cd3

KERNEL32.DLL!BaseThreadInitThunk+0x17

ntdll.dll!RtlUserThreadStart+0x2c

 

 

I you guys have suggestion how I can find the root cause of this then please, let me know!
I have been all over WMImon.exe and analysed logs for hours…

 

 

 

https://redd.it/1p5h75m
@r_systemadmin
Am I crazy?

So, I'm at another career crossroad. For the last decade or so, I've been a commercial truck driver. 12 weeks ago, I suffered an injury that almost took my eyesight and I'm not sure if I'm going to be getting back into the drivers seat.

Last week, a Linux for the Professional book bundle became available through Humble Bundles and I took the whole 22-book volume. I've been using Linux for years keeping old desktops and laptops alive for much longer than the average person would think possible and after starting with one on the books, I'm more into it than ever.

If I don't have a college degree and not a ton of money to work with, but I have a lot of work experience and the drive to learn everything I can, would there be a future in this industry for me?

TL;DR - I might need to find a new career and am wondering if I can teach myself enough to get into SysAdmin.

https://redd.it/1p5nzlx
@r_systemadmin