How our data team handles incidents
https://incident.io/blog/how-our-data-team-handles-incidents
Historically, data teams have not been closely involved in the incident management process (at least, not in the traditional “get woken up at 2AM by a SEV0” sense). But with a growing involvement of data (and therefore data teams) in core business processes, decision making, and user-facing products, data-related incidents are increasingly common, and more important than ever.
At incident.io, the Data team works across multiple areas of the business, enabling go-to-market and product teams alike to make data-driven decisions. Given our broad involvement, we’re no stranger to data incidents and are heavy users of our own product to monitor, triage, and respond to them. Here’s a quick run-through of how we’ve set this up.
https://incident.io/blog/how-our-data-team-handles-incidents
1
What is an SLA?
https://uptimerobot.com/blog/what-is-an-sla
A Service Level Agreement (SLA) is a formal document that outlines the expectations, responsibilities, and performance metrics between a service provider and a customer.
https://uptimerobot.com/blog/what-is-an-sla
1
Optimizing global message transit latency: a journey through TCP configuration
https://ably.com/blog/optimizing-global-message-transit-latency-a-journey-through-tcp-configuration
https://ably.com/blog/optimizing-global-message-transit-latency-a-journey-through-tcp-configuration
1
kubetrim
https://github.com/alexellis/kubetrim
kubetrim tidies up old and broken cluster and context entries from your kubeconfig file.
https://github.com/alexellis/kubetrim
1
outline
https://github.com/outline/outline
The fastest knowledge base for growing teams. Beautiful, realtime collaborative, feature packed, and markdown compatible.
https://github.com/outline/outline
1
isaiah
https://github.com/will-moss/isaiah
Self-hostable clone of lazydocker for the web. Manage your Docker fleet with ease
https://github.com/will-moss/isaiah
1
wush
https://github.com/coder/wush
wush is a command line tool that lets you easily transfer files and open shells over a peer-to-peer WireGuard connection.
https://github.com/coder/wush
1
How to secure Terraform code with Trivy
https://verifa.io/blog/how-to-secure-terraform-trivy
In this blog post we will look at securing an AWS Terraform configuration using Trivy to check for known security issues. We will explore different ways of using Trivy, integrating it into your CI pipelines, practical issues you might face and solutions to those issues to get you started with improving the security of your IaC codebases.
https://verifa.io/blog/how-to-secure-terraform-trivy
1
Proper setup of IAM federation in Multi-account AWS Organization for Terragrunt
https://dev.to/yyarmoshyk/proper-setup-of-iam-federation-in-multi-account-aws-organization-for-terragrunt-3ape
In this article I want to describe how to configure the IAM relationships in a multi-account AWS organization with AWS SSO to allow managing infrastructure as a code with terragrunt/terraform from both CI/CD runner and local PCs.
https://dev.to/yyarmoshyk/proper-setup-of-iam-federation-in-multi-account-aws-organization-for-terragrunt-3ape
1
1
terraform plan -light
https://www.bejarano.io/terraform-plan-light
Add a terraform plan -light flag such that only resources modified in code are targeted for planning.
This would reduce the scope of the pre-plan refresh down to the set of resources we know changed, which reduces overall plan times without the consistency risk of -refresh=false.
For Terraform to know what resources were modified in code, it would store the hash of the serialized sorted attribute map for each resource successfully applied. This would allow diff’ing “last-applied code” vs. “new code”, the result of which is the scope of the next -light plan.
Basically, -light autogenerates the -target list from code changes.
https://www.bejarano.io/terraform-plan-light
2
10 Best Terraform Tools To Use In 2024
https://medium.com/@lognoroy2000/10-best-terraform-tools-to-use-in-2024-45fcd242b139
https://medium.com/@lognoroy2000/10-best-terraform-tools-to-use-in-2024-45fcd242b139
1
Terraform Modules and Grand Finale Project: Orchestrating Complex Azure Infrastructure
https://www.iamachs.com/p/azure-terraform/part-7-modules-grand-finale
Master Terraform modules for Azure infrastructure management. Learn to create, use, and optimize modules, build a multi-tier application, and implement best practices for large-scale projects. Ideal for DevOps engineers and cloud architects looking to enhance their Infrastructure as Code skills on Azure.
https://www.iamachs.com/p/azure-terraform/part-7-modules-grand-finale
1
Monitoring Inter-Pod Traffic at the AZ Level with Retina (an eBPF based tool)
https://medium.com/@j.aslanov94/monitoring-inter-pod-traffic-at-the-az-level-with-ebpf-based-tool-retina-7a79818e305b
Recently, I was tasked with analyzing cross-AZ traffic in Kubernetes to identify opportunities for reduction, as it is a significant contributor to our AWS bill. The first step was to understand how traffic flows between services and what portion consistently crosses Availability Zones (AZs). To optimize cross-AZ traffic, I considered using topology-aware routing for services. However, before implementing this solution, I needed a method to effectively analyze inter-pod traffic at the AZ level.
To achieve this, monitoring network traffic at the pod level is necessary. I decided to use eBPF (Extended Berkeley Packet Filter) technology, as it allows us to observe network interactions with minimal performance overhead.
In this article, I will explain what eBPF is, explore the tools available for using it, and provide a step-by-step guide on implementing monitoring for inter-pod traffic using Retina, Kube State Metrics, Prometheus, and Grafana.
https://medium.com/@j.aslanov94/monitoring-inter-pod-traffic-at-the-az-level-with-ebpf-based-tool-retina-7a79818e305b
1
1
Oops, I Deleted the AWS Auth Roles: My EKS Misadventure
https://blog.devops.dev/oops-i-deleted-the-aws-auth-roles-my-eks-misadventure-47688737ae1f
https://blog.devops.dev/oops-i-deleted-the-aws-auth-roles-my-eks-misadventure-47688737ae1f
1
Mastering Terraform: Understanding Variable Precedence for Optimal Configuration Control
https://towardsaws.com/mastering-terraform-understanding-variable-precedence-for-optimal-configuration-control-59c98dcd1505
Understanding Terraform Variable Precedence: Which Value Wins?
https://towardsaws.com/mastering-terraform-understanding-variable-precedence-for-optimal-configuration-control-59c98dcd1505
1
Can a Kubernetes Pod have more than one Network attached?
https://medium.datadriveninvestor.com/can-a-kubernetes-pod-have-more-than-one-network-attached-6d78456dbeb2
Additional Networks on Kubernetes using Multus CNI.
https://medium.datadriveninvestor.com/can-a-kubernetes-pod-have-more-than-one-network-attached-6d78456dbeb2
1
How to calculate CPU for containers in k8s dynamically?
https://medium.com/@mathieuces/how-to-calculate-cpu-for-containers-in-k8s-dynamically-47a89e3886eb
It’s possible to dynamically resize CPU on containers in k8s with the feature gate “InPlacePodVerticalScaling”.
Before this feature gate, sizing CPU was error prone and, in reality, we would often put something too high, to not deal with latency.
Too much CPU and precious resources are wasted, too few CPU and the app is slowed. Let’s explore the ways to dynamically resize CPU.
https://medium.com/@mathieuces/how-to-calculate-cpu-for-containers-in-k8s-dynamically-47a89e3886eb
1
k8gb: The Best Open Source GSLB Solution for Cloud Native
https://oilbeater.com/en/2024/04/18/k8gb-best-cloudnative-gslb
Balancing traffic across multiple Kubernetes clusters and achieving automatic disaster recovery switching has always been a headache. We have explored public clouds and Karmada Ingress, and have also tried manual DNS solutions, but these approaches often fell short in terms of cost, universality, flexibility, and automation. It was not until we discovered k8gb, a project initiated by South Africa’s Absa Group to provide banking-level multi-availability, that we realized the ingenuity of using various DNS protocols to deliver a universal and highly automated GSLB solution. This blog will briefly discuss the problems with other approaches and how k8gb cleverly uses DNS to implement GSLB.
https://oilbeater.com/en/2024/04/18/k8gb-best-cloudnative-gslb
2