Scaling Site Reliability Engineering (SRE) teams is crucial for maintaining high availability and performance as organizations grow. This article from DZone explores strategies for expanding SRE teams, including building scalable processes, leveraging automation, and fostering a culture of collaboration. Learn how to effectively scale your SRE practices to support the evolving needs of your organization.
https://dzone.com/articles/scaling-sre-teams
https://dzone.com/articles/scaling-sre-teams
DZone
Scaling SRE Teams
Scaling teams of site reliability engineers comes with many challenges. Here, explore the challenges of scaling and review a successful scaling framework.
👍1
manage multiple kubectl port-forward commands, with support for UDP, K8s proxy, and github state sync.
https://github.com/hcavarsan/kftray
https://github.com/hcavarsan/kftray
GitHub
GitHub - hcavarsan/kftray: kubectl port-forward manager and reverse tunnel (ngrok-like) for exposing local services publicly,…
kubectl port-forward manager and reverse tunnel (ngrok-like) for exposing local services publicly, with TLS termination, HTTP traffic inspection, UDP forwarding, multi-hop proxy routing through k...
The OpenTelemetry Collector is a powerful tool for gathering, processing, and exporting telemetry data from various sources. This article by Frankel provides a deep dive into the OpenTelemetry Collector, explaining its architecture, key features, and how to set it up. Learn how to use the OpenTelemetry Collector to improve observability in your systems by centralizing and standardizing the collection of metrics, traces, and logs.
https://blog.frankel.ch/opentelemetry-collector/
https://blog.frankel.ch/opentelemetry-collector/
A Java geek
Exploring the OpenTelemetry Collector
The OpenTelemetry Collector sits at the center of the OpenTelemetry architecture but is unrelated to the W3C Trace Context. In my tracing demo, I use Jaeger instead of the Collector. Yet, it’s ubiquitous, as in every OpenTelemetry-related post. I wanted to…
Upgrading Amazon EKS worker nodes is crucial for maintaining security, performance, and access to new features. This AWS blog post explains how to use Karpenter to automate the upgrade of EKS worker nodes, specifically handling node drift. Learn about the process and best practices to ensure smooth upgrades, minimize downtime, and maintain consistency in your Kubernetes environment.
https://aws.amazon.com/blogs/containers/how-to-upgrade-amazon-eks-worker-nodes-with-karpenter-drift/
https://aws.amazon.com/blogs/containers/how-to-upgrade-amazon-eks-worker-nodes-with-karpenter-drift/
Amazon
How to upgrade Amazon EKS worker nodes with Karpenter Drift | Amazon Web Services
[May, 2024 – This blog has been updated to reflect Karpenter v1beta1 API changes] Introduction Karpenter is an open-source cluster autoscaler that provisions right-sized nodes in response to unschedulable pods based on aggregated CPU, memory, volume requests…
1👍3
A syntax-highlighting pager for git, diff, grep, and blame output
https://github.com/dandavison/delta
https://github.com/dandavison/delta
GitHub
GitHub - dandavison/delta: A syntax-highlighting pager for git, diff, grep, and blame output
A syntax-highlighting pager for git, diff, grep, and blame output - dandavison/delta
👍4
🥧 HTTPie CLI — modern, user-friendly command-line HTTP client for the API era. JSON support, colors, sessions, downloads, plugins & more.
https://github.com/httpie/cli
https://github.com/httpie/cli
GitHub
GitHub - httpie/cli: 🥧 HTTPie CLI — modern, user-friendly command-line HTTP client for the API era. JSON support, colors, sessions…
🥧 HTTPie CLI — modern, user-friendly command-line HTTP client for the API era. JSON support, colors, sessions, downloads, plugins & more. - httpie/cli
👍1
Performance regressions in cloud environments can be challenging to diagnose and resolve. This article from DoltHub discusses a "spooky" performance regression issue encountered with AWS Elastic Block Store (EBS). It explores the investigative steps taken to identify the root cause, the lessons learned, and best practices for monitoring and mitigating similar issues in cloud storage systems.
https://www.dolthub.com/blog/2023-11-22-spooky-performance-regression-aws-ebs/
https://www.dolthub.com/blog/2023-11-22-spooky-performance-regression-aws-ebs/
Dolthub
A Spooky Performance Regression in AWS EBS Volumes
Blog for DoltHub, a website hosting databases made with Dolt, an open-source version-controlled SQL database with Git-like semantics.
👍3
Detecting and managing infrastructure drift is crucial for maintaining the desired state of your AWS resources. This article from ShipMonk's Product Development blog explains how to implement drift checks in Terraform for AWS environments. Learn about tools and techniques to identify, monitor, and remediate drift, ensuring your infrastructure remains consistent and compliant with your configurations.
https://pd.shipmonk.com/terraform-aws-drift-checks/
https://pd.shipmonk.com/terraform-aws-drift-checks/
ShipMonk Product Development » ShipMonk is revolutionizing eCommerce order fulfillment by providing the most personal and attentive fulfillment solution for today’s global economy.
Terraform AWS Drift Checks » ShipMonk Product Development
IaC (or “Infrastructure as Code”) is an IT management style all members of the technical world have to deal with because working in code vs. manual processes, as IaC entails, can make processes more efficient. Everyone who works with IaC, though, has to sometimes…
👍2
Wazuh - The Open Source Security Platform. Unified XDR and SIEM protection for endpoints and cloud
https://github.com/wazuh/wazuh
https://github.com/wazuh/wazuh
GitHub
GitHub - wazuh/wazuh: Wazuh - The Open Source Security Platform. Unified XDR and SIEM protection for endpoints and cloud workloads.
Wazuh - The Open Source Security Platform. Unified XDR and SIEM protection for endpoints and cloud workloads. - wazuh/wazuh
👍7
Managing access to Amazon RDS instances securely is vital for protecting your data and maintaining compliance. This article from SymOps discusses strategies for controlling and auditing access to RDS instances in AWS environments. Learn about best practices, tools, and techniques to enhance security, streamline access management, and ensure that only authorized users can interact with your databases.
https://blog.symops.com/post/rds-access
https://blog.symops.com/post/rds-access
Symops
The Many Ways to Access RDS - Sym Blog
Learn more about the options available to you for managing access to Amazon’s Relational Database Service (AWS RDS).
👍4
Understanding the internals of GNU/Linux, including file denoscriptors, pipes, terminals, user sessions, process groups, and daemons, is essential for Site Reliability Engineers (SREs). This comprehensive guide by Biriukov covers these critical concepts, explaining how they function and interconnect within a Linux environment. Learn how these components work together to manage processes and sessions, providing a foundation for advanced system troubleshooting and performance optimization.
https://biriukov.dev/docs/fd-pipe-session-terminal/0-sre-should-know-about-gnu-linux-shell-related-internals-file-denoscriptors-pipes-terminals-user-sessions-process-groups-and-daemons/
https://biriukov.dev/docs/fd-pipe-session-terminal/0-sre-should-know-about-gnu-linux-shell-related-internals-file-denoscriptors-pipes-terminals-user-sessions-process-groups-and-daemons/
Viacheslav Biriukov
GNU/Linux shell related internals
What every SRE should know about GNU/Linux shell related internals: file denoscriptors, pipes, terminals, user sessions, process groups and daemons # Last updated: Oct 2025 Contents
File denoscriptor and open file denoscription Pipes Process groups, jobs and sessions…
File denoscriptor and open file denoscription Pipes Process groups, jobs and sessions…
👍7❤🔥3❤1
Monitoring Redis metrics is crucial for maintaining optimal performance and ensuring system reliability. This article from Sematext outlines key Redis metrics to monitor, such as memory usage, latency, and command processing. Learn about best practices for tracking and analyzing these metrics to prevent issues, optimize performance, and ensure the smooth operation of your Redis instances.
https://sematext.com/blog/redis-metrics/
https://sematext.com/blog/redis-metrics/
Sematext
Key Redis Monitoring Metrics You Should Measure - Sematext
Find out what key Redis metrics you should monitor to ensure the performance of your database. A complete list with all the monitoring statistics you need.
Monitoring Redis metrics is essential for ensuring optimal performance and reliability of your Redis instances. This article from Sematext explores the key Redis metrics you should track, including memory usage, latency, and command performance. Learn how to leverage these metrics to identify potential issues, optimize resource usage, and maintain a high-performing Redis environment.
https://semaphoreci.com/blog/security-cloud-environment
https://semaphoreci.com/blog/security-cloud-environment
Semaphore
3 Pillars to Maximizing Security Potential in Complex Cloud Environments - Semaphore
With the right cybersecurity strategy and tools, you will be able to maximize security potential in your cloud environment.
Understanding different team types is crucial for structuring effective organizations and fostering collaboration. This article from IT Revolution outlines the four team types in modern software development: Stream-aligned, Enabling, Complicated-Subsystem, and Platform teams. Learn how each team type functions, their responsibilities, and how they can work together to deliver value efficiently and improve overall organizational performance.
https://itrevolution.com/articles/four-team-types/
https://itrevolution.com/articles/four-team-types/
IT Revolution
The Four Team Types from Team Topologies
According to Matthew Skelton and Manuel Pais, there are only four fundamental team types needed to build and run modern software systems.
👍4
Speeding up container image builds is vital for efficient CI/CD pipelines. This article from the CD Foundation discusses how to optimize container image builds using Tekton Pipelines. Discover strategies and best practices for reducing build times, enhancing build efficiency, and leveraging Tekton's capabilities to streamline your development workflow.
https://cd.foundation/blog/2023/10/12/speed-up-container-image-builds-tekton-pipelines/
https://cd.foundation/blog/2023/10/12/speed-up-container-image-builds-tekton-pipelines/
CD Foundation
Speed Up Container Image Builds in Tekton Pipelines - CD Foundation
Use Kaniko caching capabilities to speed up builds in your Tekton Pipeline.
❤1🔥1