Is building a full centralized observability system (Prometheus + Grafana + Loki + network/DB/security monitoring) realistically a Junior-level task if doing it independently?
Hi r/devops,
I’m a recent grad (2025) with \~1.5 years equivalent experience (strong internship at a cloud provider + personal projects). My background:
• Deployed Prometheus + Grafana for monitoring 50+ nodes (reduced incident response \~20%)
• Set up ELK/Fluent Bit + Kibana alerting with webhooks
• Built K8s clusters (kubeadm), Docker pipelines, Terraform, Jenkins CI/CD
• Basic network troubleshooting from campus IT helpdesk
Now I’m trying to build a full centralized monitoring/observability system for a pharmaceutical company (traditional pharma enterprise, \~1,500–2,000 employees, multiple factories, strong distribution network, listed on stock exchange). The scope includes:
1. Metrics collection (CPU/RAM/disk/network I/O) via Prometheus exporters
2. Full logs centralization (syslog, Windows Event Log, auth.log, app logs) with Loki/Promtail or similar
3. Network device monitoring (switches/routers/firewalls: SNMP traps, bandwidth per interface, packet loss, top talkers – Cisco/Palo Alto/etc.)
4. Database monitoring (MySQL/PostgreSQL/SQL Server: IOPS, query time, blocking/deadlock, replication)
5. Application monitoring (.NET/Java: response time, heap/GC, threads)
6. Security/anomaly detection (failed logins, unauthorized access)
7. Real-time dashboards, alerting (threshold + trend-based, multi-channel: email/Slack/Telegram), RCA with timeline correlation
I’m confident I can handle the metrics part (Prometheus + exporters) and basic logs (Loki/ELK), but the rest (SNMP/NetFlow for network, DB-specific exporters with advanced alerting, security patterns, full integration/correlation) feels overwhelming for me right now.
My question for the community:
• On a scale of Junior/Mid/Senior/Staff, what level do you think this task requires to do independently at production quality (scaleable, reliable alerting, cost-optimized, maintainable)?
• Is it realistic for a strong Junior+/early-Mid (2–3 years exp) to tackle this solo, or is it typically a Senior+ (4–7+ years) job with real production incident experience?
• What are the biggest pitfalls/trade-offs for beginners attempting this? (e.g., alert fatigue, storage costs for logs, wrong exporters)
• Recommended starting point/stack for someone like me? (e.g., begin with Prometheus + snmp_exporter + postgres_exporter + Loki, then expand)
I’d love honest opinions from people who’ve built similar systems (open-source or at work). Thanks in advance – really appreciate the community’s insights
https://redd.it/1q88ulk
@r_devops
Hi r/devops,
I’m a recent grad (2025) with \~1.5 years equivalent experience (strong internship at a cloud provider + personal projects). My background:
• Deployed Prometheus + Grafana for monitoring 50+ nodes (reduced incident response \~20%)
• Set up ELK/Fluent Bit + Kibana alerting with webhooks
• Built K8s clusters (kubeadm), Docker pipelines, Terraform, Jenkins CI/CD
• Basic network troubleshooting from campus IT helpdesk
Now I’m trying to build a full centralized monitoring/observability system for a pharmaceutical company (traditional pharma enterprise, \~1,500–2,000 employees, multiple factories, strong distribution network, listed on stock exchange). The scope includes:
1. Metrics collection (CPU/RAM/disk/network I/O) via Prometheus exporters
2. Full logs centralization (syslog, Windows Event Log, auth.log, app logs) with Loki/Promtail or similar
3. Network device monitoring (switches/routers/firewalls: SNMP traps, bandwidth per interface, packet loss, top talkers – Cisco/Palo Alto/etc.)
4. Database monitoring (MySQL/PostgreSQL/SQL Server: IOPS, query time, blocking/deadlock, replication)
5. Application monitoring (.NET/Java: response time, heap/GC, threads)
6. Security/anomaly detection (failed logins, unauthorized access)
7. Real-time dashboards, alerting (threshold + trend-based, multi-channel: email/Slack/Telegram), RCA with timeline correlation
I’m confident I can handle the metrics part (Prometheus + exporters) and basic logs (Loki/ELK), but the rest (SNMP/NetFlow for network, DB-specific exporters with advanced alerting, security patterns, full integration/correlation) feels overwhelming for me right now.
My question for the community:
• On a scale of Junior/Mid/Senior/Staff, what level do you think this task requires to do independently at production quality (scaleable, reliable alerting, cost-optimized, maintainable)?
• Is it realistic for a strong Junior+/early-Mid (2–3 years exp) to tackle this solo, or is it typically a Senior+ (4–7+ years) job with real production incident experience?
• What are the biggest pitfalls/trade-offs for beginners attempting this? (e.g., alert fatigue, storage costs for logs, wrong exporters)
• Recommended starting point/stack for someone like me? (e.g., begin with Prometheus + snmp_exporter + postgres_exporter + Loki, then expand)
I’d love honest opinions from people who’ve built similar systems (open-source or at work). Thanks in advance – really appreciate the community’s insights
https://redd.it/1q88ulk
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
SBOM generation for a .net app in a container
I'm trying to create a reliable way to track packages we use (for license and CVE issues). So far I'm using CycloneDX for .NET apps, and cyclonedx-npm for our React apps. This is working fine.
I'm now looking to make this work for a .NET app deployed via Docker, and I'm not sure how to proceed. Currently I'm generating two SBOMs:
1. CycloneDX for the .NET application code (captures NuGet packages with versions)
2. Syft for the container image (captures OS packages and other container dependencies)
My questions:
\- Should I merge these BOMs into one, or treat them as separate projects in Dependency-Track?
\- Syft doesn't seem to capture NuGet package versions properly - if I only use Syft's SBOM, I'm missing important .NET dependency details
\- Is there a better tool than Syft for .NET containers, or a way to make Syft scan the published app files properly?
What approach do you use for tracking both application dependencies AND container dependencies for .NET apps in Docker?
https://redd.it/1q8erp9
@r_devops
I'm trying to create a reliable way to track packages we use (for license and CVE issues). So far I'm using CycloneDX for .NET apps, and cyclonedx-npm for our React apps. This is working fine.
I'm now looking to make this work for a .NET app deployed via Docker, and I'm not sure how to proceed. Currently I'm generating two SBOMs:
1. CycloneDX for the .NET application code (captures NuGet packages with versions)
2. Syft for the container image (captures OS packages and other container dependencies)
My questions:
\- Should I merge these BOMs into one, or treat them as separate projects in Dependency-Track?
\- Syft doesn't seem to capture NuGet package versions properly - if I only use Syft's SBOM, I'm missing important .NET dependency details
\- Is there a better tool than Syft for .NET containers, or a way to make Syft scan the published app files properly?
What approach do you use for tracking both application dependencies AND container dependencies for .NET apps in Docker?
https://redd.it/1q8erp9
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Cosmic Rundown: How developers think about creativity, infrastructure, and perception
Interesting read on how developers approach infrastructure and system design. The article explores the intersection of creativity and logistics.
https://www.cosmicjs.com/blog/cosmic-rundown-how-will-the-miracle-happen-london-calcutta-bus-protest-perception
https://redd.it/1q8ha0z
@r_devops
Interesting read on how developers approach infrastructure and system design. The article explores the intersection of creativity and logistics.
https://www.cosmicjs.com/blog/cosmic-rundown-how-will-the-miracle-happen-london-calcutta-bus-protest-perception
https://redd.it/1q8ha0z
@r_devops
Cosmic
Cosmic Rundown: How Will the Miracle Happen Today? London-Calcutta Bus Routes, and Protest Perception Studies
From Kevin Kelly's daily creative practice to 1950s intercontinental bus routes and cognitive bias in protest perception, today's discussions offer unexpected insights about constraints, creativity, and building reliable systems.
Where are you keeping your LLM logs?
LLM logs are crushing my application logging system. We recently launched AI features on our app and went from \~100mb/month of normal website logs to 3gb/month of llm conversation logs and growing. Our existing logging system was overwhelmed (queries timing out, etc), and costs started increasing. We’re considering how to re-architect our llm logs specifically so we can handle more users plus the increasing token use from things like reasoning models, tool calling, and multi-agent systems. I’m not selling any solutions here, genuinely curious what others are doing. Do you store them alongside APM logs? Dedicated LLM logging service? Build it yourself with open source tools?
https://redd.it/1q8g282
@r_devops
LLM logs are crushing my application logging system. We recently launched AI features on our app and went from \~100mb/month of normal website logs to 3gb/month of llm conversation logs and growing. Our existing logging system was overwhelmed (queries timing out, etc), and costs started increasing. We’re considering how to re-architect our llm logs specifically so we can handle more users plus the increasing token use from things like reasoning models, tool calling, and multi-agent systems. I’m not selling any solutions here, genuinely curious what others are doing. Do you store them alongside APM logs? Dedicated LLM logging service? Build it yourself with open source tools?
https://redd.it/1q8g282
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Recommendations for log monitoring tools
Hey everyone, hope you’re doing well.
I’m looking for recommendations for log monitoring tools with decent Webhook integration.
I currently use New Relic. I’ve set up Log + Alert Policies, but the best I could manage was getting generic alerts on Discord, like "Query result is > 0 on 'Error Log Detected'".
The problem is that this alert lacks context. It doesn't tell me what the error was. I’m forced to log into the New Relic dashboard, filter the time window, and manually hunt down the log just to see the stack trace. This is exactly the kind of manual toil I want to eliminate.
I need a tool that triggers a webhook and sends the actual log content (traceback/error message) directly in the notification body when my app throws an exception. I want to be able to glance at Discord and immediately know where the code broke.
Has anyone dealt with this? Any suggestions?
Thanks!
https://redd.it/1q897a8
@r_devops
Hey everyone, hope you’re doing well.
I’m looking for recommendations for log monitoring tools with decent Webhook integration.
I currently use New Relic. I’ve set up Log + Alert Policies, but the best I could manage was getting generic alerts on Discord, like "Query result is > 0 on 'Error Log Detected'".
The problem is that this alert lacks context. It doesn't tell me what the error was. I’m forced to log into the New Relic dashboard, filter the time window, and manually hunt down the log just to see the stack trace. This is exactly the kind of manual toil I want to eliminate.
I need a tool that triggers a webhook and sends the actual log content (traceback/error message) directly in the notification body when my app throws an exception. I want to be able to glance at Discord and immediately know where the code broke.
Has anyone dealt with this? Any suggestions?
Thanks!
https://redd.it/1q897a8
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Has anyone actually tried AWS DevOps Agent for incident response? Worth the setup effort?
Hey everyone,
I'm an SRE at a mid-sized company and we're drowning in incident response time. Our typical P1 takes 2-3 hours just to figure out what's actually broken - we're jumping between CloudWatch, Datadog, our deployment logs in GitHub, and trying to correlate what changed with what broke.
I saw AWS announced DevOps Agent at re:Invent and it sounds almost too good to be true - like it automatically correlates all this stuff and investigates incidents for you? But I'm skeptical because:
1. We have a pretty complex setup (multiple AWS accounts, microservices, the usual mess)
2. I don't want to spend a week integrating something that gives me generic "have you tried turning it off and on again" advice
3. It's in preview so I'm worried about stability/support
For those who've actually used it:
How long did setup take realistically and be actually useful?
Does it actually find root causes or just surface the same logs you'd find manually?
Is it useful for complex distributed system issues or just simple stuff?
Any gotchas with multi-account setups?
Our on-call rotation is brutal right now and management is asking why our MTTR is so high. If this tool actually works, it could be a game-changer. But if it's just AI hype, I'd rather spend my time improving our runbooks.
Thanks for any real-world experiences you can share!
https://redd.it/1q8k1ct
@r_devops
Hey everyone,
I'm an SRE at a mid-sized company and we're drowning in incident response time. Our typical P1 takes 2-3 hours just to figure out what's actually broken - we're jumping between CloudWatch, Datadog, our deployment logs in GitHub, and trying to correlate what changed with what broke.
I saw AWS announced DevOps Agent at re:Invent and it sounds almost too good to be true - like it automatically correlates all this stuff and investigates incidents for you? But I'm skeptical because:
1. We have a pretty complex setup (multiple AWS accounts, microservices, the usual mess)
2. I don't want to spend a week integrating something that gives me generic "have you tried turning it off and on again" advice
3. It's in preview so I'm worried about stability/support
For those who've actually used it:
How long did setup take realistically and be actually useful?
Does it actually find root causes or just surface the same logs you'd find manually?
Is it useful for complex distributed system issues or just simple stuff?
Any gotchas with multi-account setups?
Our on-call rotation is brutal right now and management is asking why our MTTR is so high. If this tool actually works, it could be a game-changer. But if it's just AI hype, I'd rather spend my time improving our runbooks.
Thanks for any real-world experiences you can share!
https://redd.it/1q8k1ct
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Transitioning into DevOps from Help Desk
Hi everyone! I've recently built my own home lab environment and I've thoroughly enjoyed the ups and downs of being able to host multiple services on my own. Currently not satisfied/no longer challenged with the work that I'm doing at my current job and I'm interested in transitioning into the DevOps industry but need some guidance as I'm unsure on what I should be focusing on first.
Background:
\- 27 yrs old
\- No degree. Dropped out in 2018. 1.6 GPA. School was never a strong suit for me growing up.
\- No certifications. Tried focusing on A+/Network+ a year ago, but I didn't have the passion that I have now to follow through with either certification. Likely will obtain either or this year.
\- 7 yrs of experience in IT at my current job. Started off as a part-time helpdesk tech and got promoted into various senior level help desk roles focusing on different parts of our product's support/installation efforts. Worked in a NOC environment, field service/product implementation support, led and managed a team of help desk techs and even had a year of experience as a project coordinator. Current role is senior field service operations engineer (leading a team that supports our technicians who are sent out to install and troubleshoot our product).
\- Absolutely despise inefficiencies. At my current job, if I see something that can either be automated or streamline to assist my team and the customer, I try to pitch to to leadership and sometimes it's appreciated and it sticks. But honestly, most of the time I'm told to "get back to solving tickets".
\- I thrive in DIY/hands-on learning. Primarily self-taught IT through building PCs, configuring my home network (VLAN segmentation/tagging, IDS/IPS, subnetting, etc.), and now my home lab environment. I also like to be thrown into the fire and be forced to learn, but on my own terms (might be a bad habit?).
Why am I thinking about DevOps?:
\- Started building my home lab on bare metal early last year with Proxmox. Deploying, breaking and fixing my services is what's now filling my free-time after work. I used to be a heavy PC gamer but the time I used to spending gaming is now spent maintaining and deploying new services. It's my primary driving point for trying to get into the DevOps world after successfully deploying multiple VMs and containers on my server. Currently hosting services such as a mail server, TrueNAS, Home Assistant, Portainer, Jellyfin, Nginx, Beszel and other niche services. Most of them have been deployed with Docker and I manage them in Portainer.
After lurking in this and other subreddits, I've heard that I should look into the following:
\- Understand the basics of CI/CD
\- Deploy and understand the uses of Grafana/Prometheus
\- Get comfortable with K8s/K3s
\- Learn Python/Go
\- Continue using Bash
I'm open to any and all suggestions on where I should go next with my journey. Perhaps I'm more suited for another industry? Feel free to ask questions. Thanks in advance, hope everyone's 2026 is starting off well :)
TL;DR - I'm a help desk grunt that wants more for his career than solving the same issues over and over. Found out about home labbing, enjoy deploying and maintaining docker containers, need advice on how to enter the DevOps industry and land my first junior dev ops role or bridge role.
https://redd.it/1q8q1of
@r_devops
Hi everyone! I've recently built my own home lab environment and I've thoroughly enjoyed the ups and downs of being able to host multiple services on my own. Currently not satisfied/no longer challenged with the work that I'm doing at my current job and I'm interested in transitioning into the DevOps industry but need some guidance as I'm unsure on what I should be focusing on first.
Background:
\- 27 yrs old
\- No degree. Dropped out in 2018. 1.6 GPA. School was never a strong suit for me growing up.
\- No certifications. Tried focusing on A+/Network+ a year ago, but I didn't have the passion that I have now to follow through with either certification. Likely will obtain either or this year.
\- 7 yrs of experience in IT at my current job. Started off as a part-time helpdesk tech and got promoted into various senior level help desk roles focusing on different parts of our product's support/installation efforts. Worked in a NOC environment, field service/product implementation support, led and managed a team of help desk techs and even had a year of experience as a project coordinator. Current role is senior field service operations engineer (leading a team that supports our technicians who are sent out to install and troubleshoot our product).
\- Absolutely despise inefficiencies. At my current job, if I see something that can either be automated or streamline to assist my team and the customer, I try to pitch to to leadership and sometimes it's appreciated and it sticks. But honestly, most of the time I'm told to "get back to solving tickets".
\- I thrive in DIY/hands-on learning. Primarily self-taught IT through building PCs, configuring my home network (VLAN segmentation/tagging, IDS/IPS, subnetting, etc.), and now my home lab environment. I also like to be thrown into the fire and be forced to learn, but on my own terms (might be a bad habit?).
Why am I thinking about DevOps?:
\- Started building my home lab on bare metal early last year with Proxmox. Deploying, breaking and fixing my services is what's now filling my free-time after work. I used to be a heavy PC gamer but the time I used to spending gaming is now spent maintaining and deploying new services. It's my primary driving point for trying to get into the DevOps world after successfully deploying multiple VMs and containers on my server. Currently hosting services such as a mail server, TrueNAS, Home Assistant, Portainer, Jellyfin, Nginx, Beszel and other niche services. Most of them have been deployed with Docker and I manage them in Portainer.
After lurking in this and other subreddits, I've heard that I should look into the following:
\- Understand the basics of CI/CD
\- Deploy and understand the uses of Grafana/Prometheus
\- Get comfortable with K8s/K3s
\- Learn Python/Go
\- Continue using Bash
I'm open to any and all suggestions on where I should go next with my journey. Perhaps I'm more suited for another industry? Feel free to ask questions. Thanks in advance, hope everyone's 2026 is starting off well :)
TL;DR - I'm a help desk grunt that wants more for his career than solving the same issues over and over. Found out about home labbing, enjoy deploying and maintaining docker containers, need advice on how to enter the DevOps industry and land my first junior dev ops role or bridge role.
https://redd.it/1q8q1of
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Huge e-commerce brands buckle under the pressure of high volume sales. Why?
Hello devops! So this past holiday season I had a job at a call center where we did customer service for a few worldwide beauty brands. Why I´m making this post is that their sites could not handle the load for Cyber Monday and Black Friday sales. Irate almost-customers called in to complain how the ordering system didn´t allow them to get through checkout. False order confirmations, items in their shopping cart not making it through to the backend ordering system, customers having their orders frozen at checkout... As customer service agents we all use Salesforce on the backend. How do huge companies like these have such crappy websites? Is it the fault of the developers for the sites themselves? Is it a problem in the backend between the website and the Salesforce ordering system? I welcome any and all opinions on the matter. You never see Amazon having trouble like this with their website. Why do these big brands (think Versace, Gap, etc.) have such sucky e-commerce system?
https://redd.it/1q8r9tz
@r_devops
Hello devops! So this past holiday season I had a job at a call center where we did customer service for a few worldwide beauty brands. Why I´m making this post is that their sites could not handle the load for Cyber Monday and Black Friday sales. Irate almost-customers called in to complain how the ordering system didn´t allow them to get through checkout. False order confirmations, items in their shopping cart not making it through to the backend ordering system, customers having their orders frozen at checkout... As customer service agents we all use Salesforce on the backend. How do huge companies like these have such crappy websites? Is it the fault of the developers for the sites themselves? Is it a problem in the backend between the website and the Salesforce ordering system? I welcome any and all opinions on the matter. You never see Amazon having trouble like this with their website. Why do these big brands (think Versace, Gap, etc.) have such sucky e-commerce system?
https://redd.it/1q8r9tz
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Planning a career transition, does my plan make sense? Pipeline TD ->DevOps -> MLops
I am currently a Senior Pipeline Technical Director (Pipe TD for short) at a VFX/CG studio in Vancouver, BC with 7 YOE. Lately I've been feeling like I'm stagnating both in terms of learning new skills and salary (getting close to the cap at the senior level). Also, the VFX industry is declining and it's hard to find a new pipe openings at other studios these days. I've been doing some research and found that DevOps role is similar to my current role. My current responsibilities:
\- manage the render farm for failing jobs/efficiency of renders, stuck frames etc
\- make sure the pipeline outputs clean data between different departments (layout/anim/lighting etc)
\-troubleshoot artists' broken anim/lighting scenes
\-patch bugs in code for artists tools
\-make plugins/noscripts to make artist's life easier
\-a lot of babysitting artists so that they can log off on time at 5pm and not having to worry about their things breaking
My plan to break into DevOps and eventually into MLops:
1. study and pass the AWS Certified Solutions Architect - Associate Certificate
2. learn about IOC (TerraForm)
3. learn Docker and Kubernetes
4. Apply for a devOps role (after 6-7 months of study and personal projects)
5. If I get accepted, learn as much as I can
6.While employed, go through https://github.com/DataTalksClub/mlops-zoomcamp and apply it to personal projects
7. Get MLOps related certs
8. start applying to MLOps roles when I have \~2 years of devOps experience
Is my plan feasible? are there are gaping holes?
https://redd.it/1q8pug6
@r_devops
I am currently a Senior Pipeline Technical Director (Pipe TD for short) at a VFX/CG studio in Vancouver, BC with 7 YOE. Lately I've been feeling like I'm stagnating both in terms of learning new skills and salary (getting close to the cap at the senior level). Also, the VFX industry is declining and it's hard to find a new pipe openings at other studios these days. I've been doing some research and found that DevOps role is similar to my current role. My current responsibilities:
\- manage the render farm for failing jobs/efficiency of renders, stuck frames etc
\- make sure the pipeline outputs clean data between different departments (layout/anim/lighting etc)
\-troubleshoot artists' broken anim/lighting scenes
\-patch bugs in code for artists tools
\-make plugins/noscripts to make artist's life easier
\-a lot of babysitting artists so that they can log off on time at 5pm and not having to worry about their things breaking
My plan to break into DevOps and eventually into MLops:
1. study and pass the AWS Certified Solutions Architect - Associate Certificate
2. learn about IOC (TerraForm)
3. learn Docker and Kubernetes
4. Apply for a devOps role (after 6-7 months of study and personal projects)
5. If I get accepted, learn as much as I can
6.While employed, go through https://github.com/DataTalksClub/mlops-zoomcamp and apply it to personal projects
7. Get MLOps related certs
8. start applying to MLOps roles when I have \~2 years of devOps experience
Is my plan feasible? are there are gaping holes?
https://redd.it/1q8pug6
@r_devops
GitHub
GitHub - DataTalksClub/mlops-zoomcamp: Free MLOps course from DataTalks.Club
Free MLOps course from DataTalks.Club. Contribute to DataTalksClub/mlops-zoomcamp development by creating an account on GitHub.
Community for DevOps/Cloud Jobs referral
Hi guys, I am planning to create a whatsapp group from long time for job referrals for the devops and cloud engineer roles and today is the day.
The aim of this community is to allow easy job referrals for our members.
Anyone who is interested, please comment interested and write down your mail for further communication.
I hope with this community we will be able to fullfill our dreams.
See you there hustlers🫡
https://redd.it/1q8u8ke
@r_devops
Hi guys, I am planning to create a whatsapp group from long time for job referrals for the devops and cloud engineer roles and today is the day.
The aim of this community is to allow easy job referrals for our members.
Anyone who is interested, please comment interested and write down your mail for further communication.
I hope with this community we will be able to fullfill our dreams.
See you there hustlers🫡
https://redd.it/1q8u8ke
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What do you think about new emerging role: Forward Deployed Engineers?
What is your opinion on new emerging role: Forward Deployed engineers. Based on my reading and understanding , they are consultant/ sales engineers. I am seeing this word everywhere , companies are extensively hiring for them especially AI companies and it makes sense also because AI is complex and new. Now I want to know from the real people who are either FDE or making career transition to it or know someone closely who is into it. What is your opinion about this job- is it like a trend or will it stay for very long time? What is their day to day looks like? How are they making transition? How are they dealing with clients , managing multiple stakeholders ( the soft skills part)?
https://redd.it/1q8v5fr
@r_devops
What is your opinion on new emerging role: Forward Deployed engineers. Based on my reading and understanding , they are consultant/ sales engineers. I am seeing this word everywhere , companies are extensively hiring for them especially AI companies and it makes sense also because AI is complex and new. Now I want to know from the real people who are either FDE or making career transition to it or know someone closely who is into it. What is your opinion about this job- is it like a trend or will it stay for very long time? What is their day to day looks like? How are they making transition? How are they dealing with clients , managing multiple stakeholders ( the soft skills part)?
https://redd.it/1q8v5fr
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Can do freelancing
Can do freelancing on AWS and GCP DevOps.
* remote only.
getting bored with no activities after office hours and less pay.So thinking about taking freelancing Job on DevOps based on AWS or GCP.
any reference is highly appreciated.
already on fiver but not much helpful
https://redd.it/1q8w4ct
@r_devops
Can do freelancing on AWS and GCP DevOps.
* remote only.
getting bored with no activities after office hours and less pay.So thinking about taking freelancing Job on DevOps based on AWS or GCP.
any reference is highly appreciated.
already on fiver but not much helpful
https://redd.it/1q8w4ct
@r_devops
www.request.finance
Request Technologies - Crypto-to-Fiat API for global payments
With just a few lines of code, Request Technologies helps platforms unlock new revenue opportunities by enabling the conversion and global transfer of crypto and fiat.
AWS cost scanner - catches orphaned resources before they pile up (Python/open source)
Hey folks,
I've been learning AWS and kept forgetting to delete test resources.
My last bill had charges for 3 EBS volumes I'd completely forgotten about.
Built a Python noscript to help catch these before they accumulate:
* Scans all AWS regions
* Finds 6 types of common waste
* Shows exact costs and cleanup commands
It's free/open source. Still learning, it's not perfect but it works and so feedback is welcome!
GitHub: [AWS Waste Finder Tool](https://github.com/devopsjunctionn/AWS-WasteFinder)
Specifically checking for:
1. Orphaned EBS volumes
2. Unused Elastic IPs
3. Idle Load Balancers
4. Old snapshots
5. NAT Gateways
6. SageMaker notebooks
Has anyone else dealt with surprise AWS bills? What resources did
you forget about?
https://redd.it/1q8xh1e
@r_devops
Hey folks,
I've been learning AWS and kept forgetting to delete test resources.
My last bill had charges for 3 EBS volumes I'd completely forgotten about.
Built a Python noscript to help catch these before they accumulate:
* Scans all AWS regions
* Finds 6 types of common waste
* Shows exact costs and cleanup commands
It's free/open source. Still learning, it's not perfect but it works and so feedback is welcome!
GitHub: [AWS Waste Finder Tool](https://github.com/devopsjunctionn/AWS-WasteFinder)
Specifically checking for:
1. Orphaned EBS volumes
2. Unused Elastic IPs
3. Idle Load Balancers
4. Old snapshots
5. NAT Gateways
6. SageMaker notebooks
Has anyone else dealt with surprise AWS bills? What resources did
you forget about?
https://redd.it/1q8xh1e
@r_devops
GitHub
GitHub - devopsjunctionn/AWS-WasteFinder
Contribute to devopsjunctionn/AWS-WasteFinder development by creating an account on GitHub.
Career switch into cloud → DevOps: what actually matters in the first year?
I’m UK-based, mid-30s, researching a move into cloud with the intention of progressing into DevOps/platform work later.
Trying to sanity-check a few things with people actually doing the job:
• what skills genuinely separate juniors who get trusted vs those who don’t
• whether cloud roles are the cleanest entry point today
• what you’d focus on in the first 6–12 months if starting again
• what’s overhyped or unnecessary early on
Looking for practical answers rather than course recommendations.
https://redd.it/1q8u4fw
@r_devops
I’m UK-based, mid-30s, researching a move into cloud with the intention of progressing into DevOps/platform work later.
Trying to sanity-check a few things with people actually doing the job:
• what skills genuinely separate juniors who get trusted vs those who don’t
• whether cloud roles are the cleanest entry point today
• what you’d focus on in the first 6–12 months if starting again
• what’s overhyped or unnecessary early on
Looking for practical answers rather than course recommendations.
https://redd.it/1q8u4fw
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
“Is OAuth2/Keycloak justified for long-lived Kubernetes connector authentication?
I’m designing a system where a private Kubernetes cluster (no inbound access) runs a long-lived connector pod that communicates outbound to a central backend to execute kubectl commands. The flow is: a user calls /cluster/register, the backend generates a cluster_id and a secret, creates a Keycloak client (client_id = conn-<cluster_id>), and injects these into the connector manifest. The connector authenticates to Keycloak using OAuth2 client-credentials, receives a JWT, and uses it to authenticate to backend endpoints like /heartbeat and /callback, which the backend verifies via Keycloak JWKS. This works, but I’m questioning whether Keycloak is actually necessary if /cluster/register is protected (e.g., only trusted users can onboard clusters), since the backend is effectively minting and binding machine identities anyway. Keycloak provides centralized revocation and rotation, but I’m unsure whether it adds meaningful security value here versus a simpler backend-issued secret or mTLS/SPIFFE model. Looking for architectural feedback on whether this is a reasonable production auth approach for outbound-only connectors in private clusters, or unnecessary complexity.
Any suggestions would be appreciated, thanks.
https://redd.it/1q91siu
@r_devops
I’m designing a system where a private Kubernetes cluster (no inbound access) runs a long-lived connector pod that communicates outbound to a central backend to execute kubectl commands. The flow is: a user calls /cluster/register, the backend generates a cluster_id and a secret, creates a Keycloak client (client_id = conn-<cluster_id>), and injects these into the connector manifest. The connector authenticates to Keycloak using OAuth2 client-credentials, receives a JWT, and uses it to authenticate to backend endpoints like /heartbeat and /callback, which the backend verifies via Keycloak JWKS. This works, but I’m questioning whether Keycloak is actually necessary if /cluster/register is protected (e.g., only trusted users can onboard clusters), since the backend is effectively minting and binding machine identities anyway. Keycloak provides centralized revocation and rotation, but I’m unsure whether it adds meaningful security value here versus a simpler backend-issued secret or mTLS/SPIFFE model. Looking for architectural feedback on whether this is a reasonable production auth approach for outbound-only connectors in private clusters, or unnecessary complexity.
Any suggestions would be appreciated, thanks.
https://redd.it/1q91siu
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What are your learning goals for 2026? How would you approach job switching?
Context:
This year, I will cross the five-year experience milestone in the IT industry. The majority of this time has been spent in a DevOps/SRE-type role, where I mainly worked on Azure Pipelines templates and Terraform (I feel quite confident in Terraform now, I've already fixed a couple of tricky deadlock situations) for our AWS infrastructure (nothing crazy, basic services like S3, EC2, Lambda, and API Gateway). I rarely coded smaller parts of .NET applications or helper applications, and I also often automated tasks using PowerShell and Bash.
Actual post:
I haven’t received my salary update yet, but I doubt it will be anything more than a 10% raise at best, plus one additional salary as a bonus. The past six months have been really rough due to deadlines, management chaos, and the AWS migration from legacy servers.
I am considering switching jobs this year, as I have been with this company for almost four years. I have a good manager (he gives me exceptional performance notes), and I have a chill remote setup, but at the same time, I can see that, theoretically, I could earn 2–2.5x my current salary at my level of experience (according to the offers I see on job boards - at least theoretically in my area, I am not US based). I know that the market is in very rough state currently, even in my country but somehow there are still job postings
The point is that I suck at interviewing. I hate doing live coding challenges, my brain always goes blank, and I forget how to even create a basic loop.
I also want to upskill a bit, but I’m not sure what to focus on with all the AI hype these days. I wanted to:
\- Read Linux Bible: I want to organize my Linux knowledge. I use WSL and Bash, but I mainly work in Windows Server environments, which kind of sucks.
\- Learn material for AWS certs: In the past, I’ve bought a couple of courses on Udemy but haven’t actually completed them. I think this could help me organize my AWS knowledge better, especially for the Solutions Architect Associate and CloudOps Associate certifications, and maybe later the DevOps Engineer Professional but that depends on how much time I have. (I don’t think I’ll actually take the exams, is it still worth it?)
\- AI coding/agents as my current company is pushing it really hard
\- Monitoring: I want to expand my knowledge in this area, but so far I only have experience with CloudWatch, which is a provider-locked solution. I’d like to learn other tools, but I don’t know where to start maybe OpenTelemetry, Grafana, or Prometheus? Could you suggest anything?
Final questions/thoughts:
What are your personal goals for 2026?
How would you approach it in my current position?
I feel like imposter syndrome is bigger than ever, especially with AI agents and recent revelations about their performance. Hard to chill, to be honest, I've even started considering weekend university courses in psychology because all of this (studies in my country are free or low fee)
https://redd.it/1q93ki6
@r_devops
Context:
This year, I will cross the five-year experience milestone in the IT industry. The majority of this time has been spent in a DevOps/SRE-type role, where I mainly worked on Azure Pipelines templates and Terraform (I feel quite confident in Terraform now, I've already fixed a couple of tricky deadlock situations) for our AWS infrastructure (nothing crazy, basic services like S3, EC2, Lambda, and API Gateway). I rarely coded smaller parts of .NET applications or helper applications, and I also often automated tasks using PowerShell and Bash.
Actual post:
I haven’t received my salary update yet, but I doubt it will be anything more than a 10% raise at best, plus one additional salary as a bonus. The past six months have been really rough due to deadlines, management chaos, and the AWS migration from legacy servers.
I am considering switching jobs this year, as I have been with this company for almost four years. I have a good manager (he gives me exceptional performance notes), and I have a chill remote setup, but at the same time, I can see that, theoretically, I could earn 2–2.5x my current salary at my level of experience (according to the offers I see on job boards - at least theoretically in my area, I am not US based). I know that the market is in very rough state currently, even in my country but somehow there are still job postings
The point is that I suck at interviewing. I hate doing live coding challenges, my brain always goes blank, and I forget how to even create a basic loop.
I also want to upskill a bit, but I’m not sure what to focus on with all the AI hype these days. I wanted to:
\- Read Linux Bible: I want to organize my Linux knowledge. I use WSL and Bash, but I mainly work in Windows Server environments, which kind of sucks.
\- Learn material for AWS certs: In the past, I’ve bought a couple of courses on Udemy but haven’t actually completed them. I think this could help me organize my AWS knowledge better, especially for the Solutions Architect Associate and CloudOps Associate certifications, and maybe later the DevOps Engineer Professional but that depends on how much time I have. (I don’t think I’ll actually take the exams, is it still worth it?)
\- AI coding/agents as my current company is pushing it really hard
\- Monitoring: I want to expand my knowledge in this area, but so far I only have experience with CloudWatch, which is a provider-locked solution. I’d like to learn other tools, but I don’t know where to start maybe OpenTelemetry, Grafana, or Prometheus? Could you suggest anything?
Final questions/thoughts:
What are your personal goals for 2026?
How would you approach it in my current position?
I feel like imposter syndrome is bigger than ever, especially with AI agents and recent revelations about their performance. Hard to chill, to be honest, I've even started considering weekend university courses in psychology because all of this (studies in my country are free or low fee)
https://redd.it/1q93ki6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Looking for feedback on my AWS TUI tool
I built a terminal UI for AWS resource management (think k9s but for AWS). Would love feedback from people who actually manage AWS infrastructure daily.
GitHub: https://github.com/clawscli/claws
Main features:
Query multiple profiles × regions at once
Vim-style navigation
60+ services, 160+ resource types
Read-only mode for safe exploration
Specifically interested in:
What services/resources are missing that you'd actually use?
Any UX pain points?
https://redd.it/1q92v5s
@r_devops
I built a terminal UI for AWS resource management (think k9s but for AWS). Would love feedback from people who actually manage AWS infrastructure daily.
GitHub: https://github.com/clawscli/claws
Main features:
Query multiple profiles × regions at once
Vim-style navigation
60+ services, 160+ resource types
Read-only mode for safe exploration
Specifically interested in:
What services/resources are missing that you'd actually use?
Any UX pain points?
https://redd.it/1q92v5s
@r_devops
GitHub
GitHub - clawscli/claws: A terminal UI for AWS resource management with vim-style navigation
A terminal UI for AWS resource management with vim-style navigation - clawscli/claws
A practical 2026 roadmap for production observability & debugging
I kept seeing observability content that stops at “add metrics + dashboards” and still leaves teams blind during real incidents.
I put together a roadmap that reflects how production observability actually works in distributed systems:
– monitoring vs observability (signals vs symptoms)
– metrics, logs, traces as a system, not silos
– context propagation across async and service boundaries
– instrumentation strategy (what not to instrument)
– sampling & cost reality (debugging without full fidelity)
– latency without errors, errors without load, silent failures
– incident debugging playbooks
– cascading failure patterns & partial outages
– alerting, SLOs, and operational feedback loops
The focus is how to think during production incidents, not tools or vendors.
Language- and stack-agnostic by design.
Roadmap image + interactive version here:
👉 https://nemorize.com/roadmaps/production-observability-from-signals-to-root-cause-2026
Curious what people think is missing, overkill, or ordered incorrectly.
https://redd.it/1q94jzi
@r_devops
I kept seeing observability content that stops at “add metrics + dashboards” and still leaves teams blind during real incidents.
I put together a roadmap that reflects how production observability actually works in distributed systems:
– monitoring vs observability (signals vs symptoms)
– metrics, logs, traces as a system, not silos
– context propagation across async and service boundaries
– instrumentation strategy (what not to instrument)
– sampling & cost reality (debugging without full fidelity)
– latency without errors, errors without load, silent failures
– incident debugging playbooks
– cascading failure patterns & partial outages
– alerting, SLOs, and operational feedback loops
The focus is how to think during production incidents, not tools or vendors.
Language- and stack-agnostic by design.
Roadmap image + interactive version here:
👉 https://nemorize.com/roadmaps/production-observability-from-signals-to-root-cause-2026
Curious what people think is missing, overkill, or ordered incorrectly.
https://redd.it/1q94jzi
@r_devops
Nemorize
Production Observability: From Signals to Root Cause (2026) - Learning Roadmap | Nemorize
Phase 0 – The mindset shift
Goal: Stop treating observability as dashboards and start treating it as causality.
You will learn
Why monitoring is not obser...
Goal: Stop treating observability as dashboards and start treating it as causality.
You will learn
Why monitoring is not obser...
I need advice on meaningful personal projects (developer + DevOps, tool-building focus)
Im trying to decide on what kind of personal project to make that will be meaningful for learning and possibly useful for job applications, but learning comes first. I've made many small projects before while creating my homelab setup but I am looking for something more like actually creating my own tools.
Im aiming for something that sits between developer and DevOps.
I want to improve my coding skills and understand DevOps tools on a deeper level. I'm kind of sick of just using tools and not creating my own, if that makes sense.
Maybe Im having the wrong take on these things, a comment I always get from older gen engineers is how much they learned when they had to create their own tools. So, I thought it would be cool too.
I would be grateful for any guidance regarding this topic, if my thought pattern is incorrect I'm open to hearing what I should focus on instead.
Some additional context, Ive been a DevOps for 4 years and recently I have become unemployed and I want to start a project but everything I've seen online feels like I've done better versions of those in real production environments.
https://redd.it/1q99x3q
@r_devops
Im trying to decide on what kind of personal project to make that will be meaningful for learning and possibly useful for job applications, but learning comes first. I've made many small projects before while creating my homelab setup but I am looking for something more like actually creating my own tools.
Im aiming for something that sits between developer and DevOps.
I want to improve my coding skills and understand DevOps tools on a deeper level. I'm kind of sick of just using tools and not creating my own, if that makes sense.
Maybe Im having the wrong take on these things, a comment I always get from older gen engineers is how much they learned when they had to create their own tools. So, I thought it would be cool too.
I would be grateful for any guidance regarding this topic, if my thought pattern is incorrect I'm open to hearing what I should focus on instead.
Some additional context, Ive been a DevOps for 4 years and recently I have become unemployed and I want to start a project but everything I've seen online feels like I've done better versions of those in real production environments.
https://redd.it/1q99x3q
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Why the hell are devs still putting passwords in AI prompts? It's 2026!
Writing this because I keep seeing devs hardcode API keys and passwords directly in prompts during code reviews. Your LLM logs everything. Your prompts get cached. Your secrets end up in training data.
Use environment variables. Use secret managers. Sanitize inputs before they hit the model.
This should be basic security hygiene by now but apparently it needs saying.
https://redd.it/1q9cw8r
@r_devops
Writing this because I keep seeing devs hardcode API keys and passwords directly in prompts during code reviews. Your LLM logs everything. Your prompts get cached. Your secrets end up in training data.
Use environment variables. Use secret managers. Sanitize inputs before they hit the model.
This should be basic security hygiene by now but apparently it needs saying.
https://redd.it/1q9cw8r
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Vendor selection: enterprise vs startup vs build your own?
Hey! Solopreneur here who just launched an observability SaaS.
Need honest feedback on how you make vendor decisions.
Three options with identical SLA and infrastructure:
Enterprise with high prices ($$$)
Small company/solo founder with moderate prices ($$)
Build your own (Prometheus, Grafana, Loki) ($)
Which do you choose and why?
Key questions:
How much does brand recognition matter (to you vs management)?
Hard requirements on vendor stability/longevity?Support team size important?
Build vs Buy: what tips the scale - control/customization or time-to-market/maintenance?
If self-hosted: how many FTEs maintaining your stack?
On integrations:
Unified dashboard - deal breaker or nice-to-have?
Alert integrations (PagerDuty, Slack)?
API access?
Appreciate any feedback, especially recent vendor selection or migration experiences
https://redd.it/1q9bu87
@r_devops
Hey! Solopreneur here who just launched an observability SaaS.
Need honest feedback on how you make vendor decisions.
Three options with identical SLA and infrastructure:
Enterprise with high prices ($$$)
Small company/solo founder with moderate prices ($$)
Build your own (Prometheus, Grafana, Loki) ($)
Which do you choose and why?
Key questions:
How much does brand recognition matter (to you vs management)?
Hard requirements on vendor stability/longevity?Support team size important?
Build vs Buy: what tips the scale - control/customization or time-to-market/maintenance?
If self-hosted: how many FTEs maintaining your stack?
On integrations:
Unified dashboard - deal breaker or nice-to-have?
Alert integrations (PagerDuty, Slack)?
API access?
Appreciate any feedback, especially recent vendor selection or migration experiences
https://redd.it/1q9bu87
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community