Group, compare and track health of GitHub repos you use
Hello,
Created this simple website gitfitcheck.com where you can group existing GitHub repos and track their health based on their public data. The idea came from working as a Sr SRE/DevOps on mostly Kubernetes/Cloud environments with tons of CNCF open source products, and usually there are many competing alternatives for the same task, so I started to create static markdown docs about these GitHub groups with basic health data (how old the tool is, how many stars it has, language it was written in), so I can compare them and have a mental map of their quality, lifecycle and where's what.
Over time whenever I hear about a new tool I can use for my job, I update my markdown docs. I found this categorization/grouping useful for mapping the tool landscape, comparing tools in the same category and see trends as certain projects are getting abandoned while others are catching attention.
The challenge I had that the doc I created was static and the data I recorded were point in time manual snapshots, so I thought I'll create an automated, dynamic version of this tool which keeps the health stats up to date. This tool became gitfitcheck.com. Later I realized that I can have further facets as well, not just comparison within the same category, for example I have a group for my core Python packages that I bootstrap all of my Django projects with. Using this tool I can see when a project is getting less love lately and can search for an alternative, maybe a fork or a completely new project. Also, all groups we/you create are public, so whenever we search for a topic/repo, we'll see how others grouped them as well, which can help discoverability too.
I found this process useful in the frontend and ML space as well, as both are depending on open source GitHub projects a lot.
Feedback are welcome, thank you for taking the time reading this and maybe even giving a try!
Thank you,
sendai
PS: I know this isn't the next big thing, neither it has AI in it nor it's vibe coded. It's just a simple tool I believe is useful to support SRE/DevOps/ML/Frontend or any other jobs that depends on GH repos a lot.
https://redd.it/1oxnhq6
@r_devops
Hello,
Created this simple website gitfitcheck.com where you can group existing GitHub repos and track their health based on their public data. The idea came from working as a Sr SRE/DevOps on mostly Kubernetes/Cloud environments with tons of CNCF open source products, and usually there are many competing alternatives for the same task, so I started to create static markdown docs about these GitHub groups with basic health data (how old the tool is, how many stars it has, language it was written in), so I can compare them and have a mental map of their quality, lifecycle and where's what.
Over time whenever I hear about a new tool I can use for my job, I update my markdown docs. I found this categorization/grouping useful for mapping the tool landscape, comparing tools in the same category and see trends as certain projects are getting abandoned while others are catching attention.
The challenge I had that the doc I created was static and the data I recorded were point in time manual snapshots, so I thought I'll create an automated, dynamic version of this tool which keeps the health stats up to date. This tool became gitfitcheck.com. Later I realized that I can have further facets as well, not just comparison within the same category, for example I have a group for my core Python packages that I bootstrap all of my Django projects with. Using this tool I can see when a project is getting less love lately and can search for an alternative, maybe a fork or a completely new project. Also, all groups we/you create are public, so whenever we search for a topic/repo, we'll see how others grouped them as well, which can help discoverability too.
I found this process useful in the frontend and ML space as well, as both are depending on open source GitHub projects a lot.
Feedback are welcome, thank you for taking the time reading this and maybe even giving a try!
Thank you,
sendai
PS: I know this isn't the next big thing, neither it has AI in it nor it's vibe coded. It's just a simple tool I believe is useful to support SRE/DevOps/ML/Frontend or any other jobs that depends on GH repos a lot.
https://redd.it/1oxnhq6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you handle Github Actions -> Slack notifications at your org?
I saw Slack has an example that uses
I thought I can use
Feels like I'm over complicating something that must be standard in most companies, so maybe someone can share how they handle sending Slack messages from a GH action in their org?
Thanks
https://redd.it/1oxnfkh
@r_devops
I saw Slack has an example that uses
users.lookupByEmail, here. If I can get the email I will be able to use the user's user ID and then send a Slack message to them. However that would require knowing the email of the ${GITHUB_ACTOR}.I thought I can use
gh api /users/$ACTOR, but testing it on myself I get null in the email field, so I'm not sure if it's the correct way to go about this. Maybe it's a permissions issue.Feels like I'm over complicating something that must be standard in most companies, so maybe someone can share how they handle sending Slack messages from a GH action in their org?
Thanks
https://redd.it/1oxnfkh
@r_devops
GitHub
slack-github-action/example-workflows/Technique_2_Slack_API_Method/author.yml at main · slackapi/slack-github-action
Send data into Slack using this GitHub Action! Contribute to slackapi/slack-github-action development by creating an account on GitHub.
How do I step up as the go to devops person?
I have recently studied docker, kubernetes and gitlab CI/CD from YouTube tutorials. The team I work in got restructured recently and we don't have anyone who knows about this stuff. We have to build our whole pipeline structure and cluster management from what remains. I feel like this is a golden opportunity for someone like me.
I just want to know how can I move from the beginner stuff from YouTube and go on to build real resilient systems and pipelines.
Maybe I can study from some good repos as a reference or other methods. Any help is greatly appreciated. Thank you!
https://redd.it/1oxpq1x
@r_devops
I have recently studied docker, kubernetes and gitlab CI/CD from YouTube tutorials. The team I work in got restructured recently and we don't have anyone who knows about this stuff. We have to build our whole pipeline structure and cluster management from what remains. I feel like this is a golden opportunity for someone like me.
I just want to know how can I move from the beginner stuff from YouTube and go on to build real resilient systems and pipelines.
Maybe I can study from some good repos as a reference or other methods. Any help is greatly appreciated. Thank you!
https://redd.it/1oxpq1x
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Simple tool that automates tasks by creating rootless containers displayed in tmux
Denoscription: A simple shell noscript that uses buildah to create customized OCI/docker images and podman to deploy rootless containers designed to automate compilation/building of github projects, applications and kernels, including any other conainerized task or service. Pre-defined environment variables, various command options, native integration of all containers with apt-cacher-ng, live log monitoring with neovim and the use of tmux to consolidate container access, ensures maximum flexibility and efficiency during container use.
Url: https://github.com/tabletseeker/pod-buildah
https://redd.it/1oxpb5m
@r_devops
Denoscription: A simple shell noscript that uses buildah to create customized OCI/docker images and podman to deploy rootless containers designed to automate compilation/building of github projects, applications and kernels, including any other conainerized task or service. Pre-defined environment variables, various command options, native integration of all containers with apt-cacher-ng, live log monitoring with neovim and the use of tmux to consolidate container access, ensures maximum flexibility and efficiency during container use.
Url: https://github.com/tabletseeker/pod-buildah
https://redd.it/1oxpb5m
@r_devops
GitHub
GitHub - tabletseeker/pod-buildah: A simple shell noscript that creates rootless podman containers to automate compilation/building…
A simple shell noscript that creates rootless podman containers to automate compilation/building of github projects, applications, kernels or any other task, while offering full integration with tmux...
Open-source Azure configuration drift detector - catches manual changes that break IaC compliance
Classic DevOps problem: You maintain infrastructure as code, but manual changes through cloud consoles create drift. Your reality doesn't match your code.
Built this for Azure + Bicep environments:
**Features:**
🔍 Uses Azure's native what-if API for 100% accuracy
🔧 Auto-fixes detected drift with --autofix mode
📊 Clean reporting (console, JSON, HTML, markdown)
🎯 Filters out Azure platform noise (provisioningState, etags, etc.)
**Perfect for:**
• Teams practicing Infrastructure as Code
• Compliance monitoring
• CI/CD pipeline integration
• Preventing security misconfigurations
**Example output:**
❌ Drift detected in storage account:
Expected: allowBlobPublicAccess = false
Actual: allowBlobPublicAccess = true
Built with C#/.NET, integrates with any CI/CD system.
**GitHub:** https://github.com/mwhooo/AzureDriftDetector
How do you handle configuration drift in your environments? Always curious about different approaches!
https://redd.it/1oxugx9
@r_devops
Classic DevOps problem: You maintain infrastructure as code, but manual changes through cloud consoles create drift. Your reality doesn't match your code.
Built this for Azure + Bicep environments:
**Features:**
🔍 Uses Azure's native what-if API for 100% accuracy
🔧 Auto-fixes detected drift with --autofix mode
📊 Clean reporting (console, JSON, HTML, markdown)
🎯 Filters out Azure platform noise (provisioningState, etags, etc.)
**Perfect for:**
• Teams practicing Infrastructure as Code
• Compliance monitoring
• CI/CD pipeline integration
• Preventing security misconfigurations
**Example output:**
❌ Drift detected in storage account:
Expected: allowBlobPublicAccess = false
Actual: allowBlobPublicAccess = true
Built with C#/.NET, integrates with any CI/CD system.
**GitHub:** https://github.com/mwhooo/AzureDriftDetector
How do you handle configuration drift in your environments? Always curious about different approaches!
https://redd.it/1oxugx9
@r_devops
GitHub
GitHub - mwhooo/AzureDriftDetector: Detect drift in azure based on Bicep input
Detect drift in azure based on Bicep input. Contribute to mwhooo/AzureDriftDetector development by creating an account on GitHub.
NPMScan - Malicious NPM Package Detection & Security Scanner
I built **npmscan.com** because npm has become a minefield. Too many packages look safe on the surface but hide obfuscated code, weird postinstall noscripts, abandoned maintainers, or straight-up malware. Most devs don’t have time to manually read source every time they install something — so I made a tool that does the dirty work instantly.
What **npmscan.com** does:
Scans any npm package in seconds
Detects malicious patterns, hidden noscripts, obfuscation, and shady network calls
Highlights abandoned or suspicious maintainers
Shows full file structure + dependency tree
Assigns a risk score based on real security signals
No install needed — just search and inspect
The goal is simple:
👉 Make it obvious when a package is trustworthy — and when it’s not.
If you want to quickly “x-ray” your dependencies before you add them to your codebase, you can try it here:
**https://npmscan.com**
Let me know what features you’d want next.
https://redd.it/1oy1pr5
@r_devops
I built **npmscan.com** because npm has become a minefield. Too many packages look safe on the surface but hide obfuscated code, weird postinstall noscripts, abandoned maintainers, or straight-up malware. Most devs don’t have time to manually read source every time they install something — so I made a tool that does the dirty work instantly.
What **npmscan.com** does:
Scans any npm package in seconds
Detects malicious patterns, hidden noscripts, obfuscation, and shady network calls
Highlights abandoned or suspicious maintainers
Shows full file structure + dependency tree
Assigns a risk score based on real security signals
No install needed — just search and inspect
The goal is simple:
👉 Make it obvious when a package is trustworthy — and when it’s not.
If you want to quickly “x-ray” your dependencies before you add them to your codebase, you can try it here:
**https://npmscan.com**
Let me know what features you’d want next.
https://redd.it/1oy1pr5
@r_devops
NPMScan
NPMScan - Malicious NPM Package Detection & Security Scanner
Protect your Node.js projects from supply chain attacks. Scan npm packages for malware and vulnerabilities.
Follow-up to my "Is logging enough?" post — I open-sourced our trace visualizer
A couple of months ago, I posted this thread asking whether logging alone was enough for complex debugging. At the time, we were dumping all our system messages into a database just to trace issues like a “free checked bag” disappearing during checkout.
That approach helped, but digging through logs was still slow and painful. So I built a trace visualizer—something that could actually show the message flow across services, with payloads, in a clear timeline.
I’ve now open-sourced it:
🔗 GitHub: softprobe/softprobe
It’s built as a high-performance Istio WASM plugin, and it’s focused specifically on business-level message flow visualization and troubleshooting. Less about infrastructure metrics—more about understanding what happened in the actual business logic during a user’s journey.
demo
Feedback and critiques welcome. This community’s input on the original post really pushed this forward.
https://redd.it/1oy5jv2
@r_devops
A couple of months ago, I posted this thread asking whether logging alone was enough for complex debugging. At the time, we were dumping all our system messages into a database just to trace issues like a “free checked bag” disappearing during checkout.
That approach helped, but digging through logs was still slow and painful. So I built a trace visualizer—something that could actually show the message flow across services, with payloads, in a clear timeline.
I’ve now open-sourced it:
🔗 GitHub: softprobe/softprobe
It’s built as a high-performance Istio WASM plugin, and it’s focused specifically on business-level message flow visualization and troubleshooting. Less about infrastructure metrics—more about understanding what happened in the actual business logic during a user’s journey.
demo
Feedback and critiques welcome. This community’s input on the original post really pushed this forward.
https://redd.it/1oy5jv2
@r_devops
Reddit
From the java community on Reddit
Explore this post and more from the java community
Entire Domain run from Kube?
Good afternoon all,
I have been trying to experiment with running a 3 node Kube cluster inside a single node Nutanix HCI.
My goal was to try and create an entire domain completed with IAM, DHCP, DNS, and a CA and make it as redundant as possible. So, i figured the best way to do that was to set up containers for each service inside a kube cluster.
The cluster itself is configured and complete with Calico and the Nutanix CSI driver. I also set up a storage class that uses a volume group made in Nutanix. Now i'm at the part where i'm trying to set up the actual domain and the containers to do so.
I'm currently stuck right now, because there doesn't seem to be an actual solution to create a domain in kube simliar to how you would do it in AD. I was going to try it by running Samba 4 in the cluster, but it seems like the functionality for it is limited to SMB shares. I was also looking at FreeIPA but there is very limited documentation of it actually working in Kube, and even less on how to set it up in there.
I'm starting to question now if it's even a good idea to run an entire domain from Kube. Am I right to question this?
I know most enterprises just run their domain using VMs of Windows server DCs, but there has to be another way of setting up a HA domain while using cloud technology without having to go through Azure.
I have to admit that I'm not a dev ops engineer, i'm just a security analyst so please go easy on me.
Thank you
https://redd.it/1oy5n6z
@r_devops
Good afternoon all,
I have been trying to experiment with running a 3 node Kube cluster inside a single node Nutanix HCI.
My goal was to try and create an entire domain completed with IAM, DHCP, DNS, and a CA and make it as redundant as possible. So, i figured the best way to do that was to set up containers for each service inside a kube cluster.
The cluster itself is configured and complete with Calico and the Nutanix CSI driver. I also set up a storage class that uses a volume group made in Nutanix. Now i'm at the part where i'm trying to set up the actual domain and the containers to do so.
I'm currently stuck right now, because there doesn't seem to be an actual solution to create a domain in kube simliar to how you would do it in AD. I was going to try it by running Samba 4 in the cluster, but it seems like the functionality for it is limited to SMB shares. I was also looking at FreeIPA but there is very limited documentation of it actually working in Kube, and even less on how to set it up in there.
I'm starting to question now if it's even a good idea to run an entire domain from Kube. Am I right to question this?
I know most enterprises just run their domain using VMs of Windows server DCs, but there has to be another way of setting up a HA domain while using cloud technology without having to go through Azure.
I have to admit that I'm not a dev ops engineer, i'm just a security analyst so please go easy on me.
Thank you
https://redd.it/1oy5n6z
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
AWS Metadata Service Exploitation: The Cloud's Skeleton Key 🔑
https://instatunnel.my/blog/aws-metadata-service-exploitation-the-clouds-skeleton-key
https://redd.it/1oybkyj
@r_devops
https://instatunnel.my/blog/aws-metadata-service-exploitation-the-clouds-skeleton-key
https://redd.it/1oybkyj
@r_devops
InstaTunnel
AWS Metadata Service Exploitation: Exposing IMDSv1 and Steal
Learn how SSRF attacks and AWS misconfigurations expose IMDSv1, leaking IAM credentials and metadata. Understand IMDSv2 protections, real-world exploits
Python for Automating stuff on Azure and Kafka
Hi,
I need some suggestions from the community here, I been working bash for noscripting in CI and CD pipeline jobs with minimal exposure to python in the automation pipelines.
I am looking to start focusing on developing my python skills and get some hands on with Azure python SDK and Kafka libraries to start using python at my workplace.
Need some suggestions on online learning platform and books to get started. Looking to invest about 10-12 hours each week in learning.
https://redd.it/1oyctub
@r_devops
Hi,
I need some suggestions from the community here, I been working bash for noscripting in CI and CD pipeline jobs with minimal exposure to python in the automation pipelines.
I am looking to start focusing on developing my python skills and get some hands on with Azure python SDK and Kafka libraries to start using python at my workplace.
Need some suggestions on online learning platform and books to get started. Looking to invest about 10-12 hours each week in learning.
https://redd.it/1oyctub
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Manage Vault in GitOps way
Hi all,
In my home cluster I'm introducing Vault and Vault operator to handle secrets within the cluster.
How to you guys manage Vault in an automated way? For example I would like to create kv and policies in a declarative way maybe managed with Argo CD
Any suggestings?
https://redd.it/1oygbil
@r_devops
Hi all,
In my home cluster I'm introducing Vault and Vault operator to handle secrets within the cluster.
How to you guys manage Vault in an automated way? For example I would like to create kv and policies in a declarative way maybe managed with Argo CD
Any suggestings?
https://redd.it/1oygbil
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Productizing LangGraph Agents
Hey,
I'm trying to understand which option is better based on your experience. I
want to deploy enterprise-ready agentic applications, my current agent framework is Langgraph.
To be production-ready, I need horizontal scaling and durable state so that if a failure occurs, the system can resume from the last successful step.
I’ve been reading a lot about Temporal and the Langsmith Agent Server, both seem to offer similar capabilities and promise durable execution for agents, tools, and MCPs.
I'm not sure which one is more recommended.
I did notice one major difference: in Langgraph I need to explicitly define retry policies in my code, while Temporal handles retries more transparently.
I’d love to get your feedback on this.
https://redd.it/1oyh93l
@r_devops
Hey,
I'm trying to understand which option is better based on your experience. I
want to deploy enterprise-ready agentic applications, my current agent framework is Langgraph.
To be production-ready, I need horizontal scaling and durable state so that if a failure occurs, the system can resume from the last successful step.
I’ve been reading a lot about Temporal and the Langsmith Agent Server, both seem to offer similar capabilities and promise durable execution for agents, tools, and MCPs.
I'm not sure which one is more recommended.
I did notice one major difference: in Langgraph I need to explicitly define retry policies in my code, while Temporal handles retries more transparently.
I’d love to get your feedback on this.
https://redd.it/1oyh93l
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Trouble sharing a Windows Server 2022 AMI between AWS accounts (no RDP password, no SSM connection)
Hello everyone,
I've been trying for the last two days to share a custom Windows Server 2022 AMI from Account A to Account B, but without success.
The source AMI is based on the official Windows_Server-2022-English-Full-Base image, and I installed a few internal programs and agents on it.
After creating and sharing the AMI, I can successfully launch instances from it in the target account (Account B), but:
I cannot retrieve the Windows password via “Get Windows password” (it says “This instance was launched from a custom AMI...”);
The SSM Agent doesn’t start or connect to Systems Manager;
The instance shows 3/3 health checks OK, but remains inaccessible over RDP or SSM.
---
🔹 What I have tried so far
1. Standard AMI creation:
Created the image via EC2 console → Create image.
Shared both the AMI and its snapshot with the target AWS account (including Allow EBS volume creation).
2. First attempt (no sysprep):
The image worked but AWS couldn’t decrypt the Windows password.
Expected behavior, since Windows wasn’t generalized.
3. Second attempt (sysprep with /oobe /generalize /shutdown):
Ran from SSM:
Start-Process "C:\Windows\System32\Sysprep\sysprep.exe" -ArgumentList "/oobe /generalize /shutdown" -Wait
Result: instance stopped correctly, but when launching from this AMI the system got stuck on the “Hi there” screen (OOBE GUI), so no EC2Launch automation, no RDP, no SSM.
4. Third attempt (sysprep with /generalize /shutdown only):
Based on the AWS official documentation, /oobe should not be used — EC2LaunchV2 handles first boot automatically.
However, the AMI was based on an older image that had EC2Launch v1, not EC2LaunchV2, so I verified this via:
Get-Service | Where-Object { $_.Name -like "EC2Launch*" }
and confirmed it was the legacy EC2Launch service.
Started the service:
Set-Service EC2Launch -StartupType Automatic
Start-Service EC2Launch
Re-ran:
Start-Process "C:\Windows\System32\Sysprep\sysprep.exe" -ArgumentList "/generalize /shutdown" -Wait
The process completed and the instance shut down, but in the new account I still couldn’t decrypt the Windows password (AWS said custom AMI).
5. Tried reinstalling EC2LaunchV2 manually:
Using:
Invoke-WebRequest "https://ec2-launch-v2.s3.amazonaws.com/latest/EC2LaunchV2.msi" -OutFile "$env:TEMP\EC2LaunchV2.msi"
Start-Process msiexec.exe -ArgumentList "/i $env:TEMP\EC2LaunchV2.msi /quiet" -Wait
However, the service didn’t register, likely because the image is built on a base that doesn’t support EC2LaunchV2 natively (Windows Server 2022 + legacy AMI lineage).
https://redd.it/1oyh932
@r_devops
Hello everyone,
I've been trying for the last two days to share a custom Windows Server 2022 AMI from Account A to Account B, but without success.
The source AMI is based on the official Windows_Server-2022-English-Full-Base image, and I installed a few internal programs and agents on it.
After creating and sharing the AMI, I can successfully launch instances from it in the target account (Account B), but:
I cannot retrieve the Windows password via “Get Windows password” (it says “This instance was launched from a custom AMI...”);
The SSM Agent doesn’t start or connect to Systems Manager;
The instance shows 3/3 health checks OK, but remains inaccessible over RDP or SSM.
---
🔹 What I have tried so far
1. Standard AMI creation:
Created the image via EC2 console → Create image.
Shared both the AMI and its snapshot with the target AWS account (including Allow EBS volume creation).
2. First attempt (no sysprep):
The image worked but AWS couldn’t decrypt the Windows password.
Expected behavior, since Windows wasn’t generalized.
3. Second attempt (sysprep with /oobe /generalize /shutdown):
Ran from SSM:
Start-Process "C:\Windows\System32\Sysprep\sysprep.exe" -ArgumentList "/oobe /generalize /shutdown" -Wait
Result: instance stopped correctly, but when launching from this AMI the system got stuck on the “Hi there” screen (OOBE GUI), so no EC2Launch automation, no RDP, no SSM.
4. Third attempt (sysprep with /generalize /shutdown only):
Based on the AWS official documentation, /oobe should not be used — EC2LaunchV2 handles first boot automatically.
However, the AMI was based on an older image that had EC2Launch v1, not EC2LaunchV2, so I verified this via:
Get-Service | Where-Object { $_.Name -like "EC2Launch*" }
and confirmed it was the legacy EC2Launch service.
Started the service:
Set-Service EC2Launch -StartupType Automatic
Start-Service EC2Launch
Re-ran:
Start-Process "C:\Windows\System32\Sysprep\sysprep.exe" -ArgumentList "/generalize /shutdown" -Wait
The process completed and the instance shut down, but in the new account I still couldn’t decrypt the Windows password (AWS said custom AMI).
5. Tried reinstalling EC2LaunchV2 manually:
Using:
Invoke-WebRequest "https://ec2-launch-v2.s3.amazonaws.com/latest/EC2LaunchV2.msi" -OutFile "$env:TEMP\EC2LaunchV2.msi"
Start-Process msiexec.exe -ArgumentList "/i $env:TEMP\EC2LaunchV2.msi /quiet" -Wait
However, the service didn’t register, likely because the image is built on a base that doesn’t support EC2LaunchV2 natively (Windows Server 2022 + legacy AMI lineage).
https://redd.it/1oyh932
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Is there a standard list of all potential metrics that one can / should extract from technologies like HTTP / gRPC / GraphQL server & clients? Or for Request Response systems in general?
We all deal with developing / maintaining servers and clients. With observability playing its part, I am trying to figure out wouldn't we have standardized metrics that one can by default use for such servers?
If so is there actually a project / foundation / tool that is working on it?
e.g. with server there can prometheus metrics for requests, responses
for client could be something similar. I mean developers can choose metrics they deem useful but having a list of what are potentially available metrics would be much better strategy IMHO.
I don't know if OpenTelemetry solves this issue, from what I understand it provides tools to obtain metrics, traces, logs but doesn't define a definitive set as to what most of these standard models can provide
https://redd.it/1oylwuc
@r_devops
We all deal with developing / maintaining servers and clients. With observability playing its part, I am trying to figure out wouldn't we have standardized metrics that one can by default use for such servers?
If so is there actually a project / foundation / tool that is working on it?
e.g. with server there can prometheus metrics for requests, responses
for client could be something similar. I mean developers can choose metrics they deem useful but having a list of what are potentially available metrics would be much better strategy IMHO.
I don't know if OpenTelemetry solves this issue, from what I understand it provides tools to obtain metrics, traces, logs but doesn't define a definitive set as to what most of these standard models can provide
https://redd.it/1oylwuc
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
How do you handle infrastructure audits across multiple monitoring tools?
Our team just went through an annual audit of our internal tools.
Some of the audits we do are the following:
1. Alerts - We have alerts spanning across Cloudwatch, Splunk, Chronosphere, Grafana, and custom cron jobs. We audit for things like if we still need the alert, is it still accurate, etc..
2. ASGs - We went through all the AWS ASGs that we own and ensured they have appropriate resources (not too much or too little), does our team still own it, etc…
That’s just a small portion of our audit.
Often these audits require the auditor to go to different systems and pull some data to get an idea on the current status of the infrastructure/tool in question.
All of this data is put into a spreadsheet and different audits are assigned to different team members.
Curious on a few things:
- Are you auditing your infra/tools regularly?
- Do you have tooling for this? Something beyond simple spreadsheets.
- How long does it take you to audit?
Looking to hear what works well for others!
https://redd.it/1oyomjm
@r_devops
Our team just went through an annual audit of our internal tools.
Some of the audits we do are the following:
1. Alerts - We have alerts spanning across Cloudwatch, Splunk, Chronosphere, Grafana, and custom cron jobs. We audit for things like if we still need the alert, is it still accurate, etc..
2. ASGs - We went through all the AWS ASGs that we own and ensured they have appropriate resources (not too much or too little), does our team still own it, etc…
That’s just a small portion of our audit.
Often these audits require the auditor to go to different systems and pull some data to get an idea on the current status of the infrastructure/tool in question.
All of this data is put into a spreadsheet and different audits are assigned to different team members.
Curious on a few things:
- Are you auditing your infra/tools regularly?
- Do you have tooling for this? Something beyond simple spreadsheets.
- How long does it take you to audit?
Looking to hear what works well for others!
https://redd.it/1oyomjm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
FREE Security audit for your code in exchange for 10 min feedback
Hey everyone,
I'm building a security analyzer called CodeSlick.dev that detects OWASP Top 10 vulnerabilities in JavaScript, Python, Java, and TypeScript.
To improve it, I'm offering free security audits in exchange for honest feedback.
What you get:
\- Instant security analysis (<3 seconds)
\- AI-powered fix suggestions with one-click apply
\- CVSS severity scoring
\- Downloadable HTML report
What I need:
\- 10-minute feedback survey after you see results
\- Your honest thoughts on what worked/what didn't
Zero friction:
\- No signup required
\- No installation
\- Just paste code → Get report → Share feedback
Interested? Please feel free to comment below or DM me.
https://redd.it/1oyq3gi
@r_devops
Hey everyone,
I'm building a security analyzer called CodeSlick.dev that detects OWASP Top 10 vulnerabilities in JavaScript, Python, Java, and TypeScript.
To improve it, I'm offering free security audits in exchange for honest feedback.
What you get:
\- Instant security analysis (<3 seconds)
\- AI-powered fix suggestions with one-click apply
\- CVSS severity scoring
\- Downloadable HTML report
What I need:
\- 10-minute feedback survey after you see results
\- Your honest thoughts on what worked/what didn't
Zero friction:
\- No signup required
\- No installation
\- Just paste code → Get report → Share feedback
Interested? Please feel free to comment below or DM me.
https://redd.it/1oyq3gi
@r_devops
codeslick.dev
CodeSlick - Your Code Fixer
Instant AI code validation. Catch errors, missing imports, and syntax issues before running. Supports JavaScript, TypeScript, Python, and Java.
Offline Scalable CICD Platform Recommendations
Hello all,
I was wondering if anyone could recommend any scalable platforms for running CICD in an offline environment. At present we have a bunch of VMs with GitLab runners on them, but due to mixed use of the VMs (like users logging in to do other stuff) it’s quite hard to manage security and keep config consistent.
Unfortunately a lot of the VMs need to be Windows based because that’s the target environment. Most jobs small jobs are Python, the larger jobs are Java, C++ etc. The Java stuff is super simple, but the other languages tend to be trickier. This network has about 40 proper devs and 60 python bandits.
We’re looking for a solution that can be purchased to run on an air gapped network that can do load balancing, re-base-lining etc without much manual maintenance.
I’d suggested doing it with Kubernetes ourselves but we are time restricted and have some budget to buy something. One of my colleagues say a VmWare Tanzu demo that looked good, but anyone with hands on experience would be more useful than a conference sale pitch.
Any suggestions would be appreciated, and I can provide more info if needed. We have about £200k budget for both the compute and the management platform.
Just in case anyone tries to sell me something directly, I won’t be the one making the decision or purchase.
Thanks in advance
https://redd.it/1oytnsx
@r_devops
Hello all,
I was wondering if anyone could recommend any scalable platforms for running CICD in an offline environment. At present we have a bunch of VMs with GitLab runners on them, but due to mixed use of the VMs (like users logging in to do other stuff) it’s quite hard to manage security and keep config consistent.
Unfortunately a lot of the VMs need to be Windows based because that’s the target environment. Most jobs small jobs are Python, the larger jobs are Java, C++ etc. The Java stuff is super simple, but the other languages tend to be trickier. This network has about 40 proper devs and 60 python bandits.
We’re looking for a solution that can be purchased to run on an air gapped network that can do load balancing, re-base-lining etc without much manual maintenance.
I’d suggested doing it with Kubernetes ourselves but we are time restricted and have some budget to buy something. One of my colleagues say a VmWare Tanzu demo that looked good, but anyone with hands on experience would be more useful than a conference sale pitch.
Any suggestions would be appreciated, and I can provide more info if needed. We have about £200k budget for both the compute and the management platform.
Just in case anyone tries to sell me something directly, I won’t be the one making the decision or purchase.
Thanks in advance
https://redd.it/1oytnsx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
what’s the one type of alert that ruins your sleep the most?
just trying to understand how bad on-call life really is outside my bubble.
Last night a friend got woken up at 3AM… for an alert that turned out to be nothing.
Curious:
• What alert always turns out to be noise?
• What’s the dumbest 3AM wake-up you’ve had?
• If you could delete one alert type forever, which one would it be?
https://redd.it/1oyv1lx
@r_devops
just trying to understand how bad on-call life really is outside my bubble.
Last night a friend got woken up at 3AM… for an alert that turned out to be nothing.
Curious:
• What alert always turns out to be noise?
• What’s the dumbest 3AM wake-up you’ve had?
• If you could delete one alert type forever, which one would it be?
https://redd.it/1oyv1lx
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
System Design interview for DevOps roles
For a year, system design interview has taken its place in the interview process of DevOps roles. At least I am seeing for a year.
In each interview, I was asked to design different systems (api design and database design) to achieve different requirements. These interviews always seem to focus on software itself, rather than infrastructure or operating systems or cloud. Personally I feel they’re judging a fish if it can fly.
Have you seen the same? What’s your opinion?
https://redd.it/1oywe81
@r_devops
For a year, system design interview has taken its place in the interview process of DevOps roles. At least I am seeing for a year.
In each interview, I was asked to design different systems (api design and database design) to achieve different requirements. These interviews always seem to focus on software itself, rather than infrastructure or operating systems or cloud. Personally I feel they’re judging a fish if it can fly.
Have you seen the same? What’s your opinion?
https://redd.it/1oywe81
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
want to build a microservice containing amixture of open source IAM and RBAC
im trying to build a microservice to handle my auth and rbac for a project im starting, though i dont want to waste my time on it, and ould rather use some opensource solutions to handle the requirements:
Authentication:
\- JWT + OAuth2 Password Flow
\- Access tokens + Refresh tokens
\- Token revocation, password reset, user invitations
\- bcrypt password hashing....
Multitenancy:
\- Database-per-tenant architecture
\- Shared schema (super_admins, entities) + Tenant schemas
\- Complete data isolation between entities
RBAC:
\- 3 fixed roles: Super Admin, Admin, User
\- Profile-based permissions for Users
\- Granular permissions: resource.action format (e.g., example.create, billing.*)
\- Admin creates custom profiles with specific permissions
\- Entity-level feature toggles
initially i did set hanko "great solution", but it doesnt align with my system requirements and will need a lot of customization, then i though about using Keycloak, or Ory Kratos ... with OpenFGA for RBAC
but i wonder, what could be the best combination for such requirements, or am i on a completly wrong track?
https://redd.it/1oyw32n
@r_devops
im trying to build a microservice to handle my auth and rbac for a project im starting, though i dont want to waste my time on it, and ould rather use some opensource solutions to handle the requirements:
Authentication:
\- JWT + OAuth2 Password Flow
\- Access tokens + Refresh tokens
\- Token revocation, password reset, user invitations
\- bcrypt password hashing....
Multitenancy:
\- Database-per-tenant architecture
\- Shared schema (super_admins, entities) + Tenant schemas
\- Complete data isolation between entities
RBAC:
\- 3 fixed roles: Super Admin, Admin, User
\- Profile-based permissions for Users
\- Granular permissions: resource.action format (e.g., example.create, billing.*)
\- Admin creates custom profiles with specific permissions
\- Entity-level feature toggles
initially i did set hanko "great solution", but it doesnt align with my system requirements and will need a lot of customization, then i though about using Keycloak, or Ory Kratos ... with OpenFGA for RBAC
but i wonder, what could be the best combination for such requirements, or am i on a completly wrong track?
https://redd.it/1oyw32n
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
CI/CD milestone reached for arkA (open video protocol)
CI/CD milestone reached for arkA (open video protocol)
We now have:
• Schema validation
• Automated builds
• Static deployments
• Zero-backend hosting model
Would love CI/CD feedback or contributors!
Repo: https://github.com/baconpantsuppercut/arkA
https://redd.it/1oyvmjn
@r_devops
CI/CD milestone reached for arkA (open video protocol)
We now have:
• Schema validation
• Automated builds
• Static deployments
• Zero-backend hosting model
Would love CI/CD feedback or contributors!
Repo: https://github.com/baconpantsuppercut/arkA
https://redd.it/1oyvmjn
@r_devops
GitHub
GitHub - baconpantsuppercut/arkA: Arka is an open, decentralized video ecosystem — a protocol + set of reference apps for free…
Arka is an open, decentralized video ecosystem — a protocol + set of reference apps for free expression, creator sovereignty, and AI-guided, kid-safe experiences. - baconpantsuppercut/arkA