DevOps&SRE Library – Telegram
DevOps&SRE Library
18.4K subscribers
466 photos
4 videos
2 files
5K links
Библиотека статей по теме DevOps и SRE.

Реклама: @ostinostin
Контент: @mxssl

РКН: https://www.gosuslugi.ru/snet/67704b536aa9672b963777b3
Download Telegram
How to secure Terraform code with Trivy

In this blog post we will look at securing an AWS Terraform configuration using Trivy to check for known security issues. We will explore different ways of using Trivy, integrating it into your CI pipelines, practical issues you might face and solutions to those issues to get you started with improving the security of your IaC codebases.


https://verifa.io/blog/how-to-secure-terraform-trivy
1
Proper setup of IAM federation in Multi-account AWS Organization for Terragrunt

In this article I want to describe how to configure the IAM relationships in a multi-account AWS organization with AWS SSO to allow managing infrastructure as a code with terragrunt/terraform from both CI/CD runner and local PCs.


https://dev.to/yyarmoshyk/proper-setup-of-iam-federation-in-multi-account-aws-organization-for-terragrunt-3ape
1
Remotely Producing Terraform from an API

https://nitric.io/blog/terraform-api
1
terraform plan -light

Add a terraform plan -light flag such that only resources modified in code are targeted for planning.

This would reduce the scope of the pre-plan refresh down to the set of resources we know changed, which reduces overall plan times without the consistency risk of -refresh=false.

For Terraform to know what resources were modified in code, it would store the hash of the serialized sorted attribute map for each resource successfully applied. This would allow diff’ing “last-applied code” vs. “new code”, the result of which is the scope of the next -light plan.

Basically, -light autogenerates the -target list from code changes.


https://www.bejarano.io/terraform-plan-light
2
Terraform Modules and Grand Finale Project: Orchestrating Complex Azure Infrastructure

Master Terraform modules for Azure infrastructure management. Learn to create, use, and optimize modules, build a multi-tier application, and implement best practices for large-scale projects. Ideal for DevOps engineers and cloud architects looking to enhance their Infrastructure as Code skills on Azure.


https://www.iamachs.com/p/azure-terraform/part-7-modules-grand-finale
1
Monitoring Inter-Pod Traffic at the AZ Level with Retina (an eBPF based tool)

Recently, I was tasked with analyzing cross-AZ traffic in Kubernetes to identify opportunities for reduction, as it is a significant contributor to our AWS bill. The first step was to understand how traffic flows between services and what portion consistently crosses Availability Zones (AZs). To optimize cross-AZ traffic, I considered using topology-aware routing for services. However, before implementing this solution, I needed a method to effectively analyze inter-pod traffic at the AZ level.

To achieve this, monitoring network traffic at the pod level is necessary. I decided to use eBPF (Extended Berkeley Packet Filter) technology, as it allows us to observe network interactions with minimal performance overhead.

In this article, I will explain what eBPF is, explore the tools available for using it, and provide a step-by-step guide on implementing monitoring for inter-pod traffic using Retina, Kube State Metrics, Prometheus, and Grafana.


https://medium.com/@j.aslanov94/monitoring-inter-pod-traffic-at-the-az-level-with-ebpf-based-tool-retina-7a79818e305b
1
perses

The CNCF candidate for observability visualisation.


https://github.com/perses/perses
1
Mastering Terraform: Understanding Variable Precedence for Optimal Configuration Control

Understanding Terraform Variable Precedence: Which Value Wins?


https://towardsaws.com/mastering-terraform-understanding-variable-precedence-for-optimal-configuration-control-59c98dcd1505
1
Can a Kubernetes Pod have more than one Network attached?

Additional Networks on Kubernetes using Multus CNI.


https://medium.datadriveninvestor.com/can-a-kubernetes-pod-have-more-than-one-network-attached-6d78456dbeb2
1
How to calculate CPU for containers in k8s dynamically?

It’s possible to dynamically resize CPU on containers in k8s with the feature gate “InPlacePodVerticalScaling”.

Before this feature gate, sizing CPU was error prone and, in reality, we would often put something too high, to not deal with latency.

Too much CPU and precious resources are wasted, too few CPU and the app is slowed. Let’s explore the ways to dynamically resize CPU.


https://medium.com/@mathieuces/how-to-calculate-cpu-for-containers-in-k8s-dynamically-47a89e3886eb
1
k8gb: The Best Open Source GSLB Solution for Cloud Native

Balancing traffic across multiple Kubernetes clusters and achieving automatic disaster recovery switching has always been a headache. We have explored public clouds and Karmada Ingress, and have also tried manual DNS solutions, but these approaches often fell short in terms of cost, universality, flexibility, and automation. It was not until we discovered k8gb, a project initiated by South Africa’s Absa Group to provide banking-level multi-availability, that we realized the ingenuity of using various DNS protocols to deliver a universal and highly automated GSLB solution. This blog will briefly discuss the problems with other approaches and how k8gb cleverly uses DNS to implement GSLB.


https://oilbeater.com/en/2024/04/18/k8gb-best-cloudnative-gslb
2
Benchmarking Hetzner's Storage Classes for Database Workloads on Kubernetes

TLDR: Running Kubernetes on Hetzner offers cost-effective options, but handling production workloads, especially stateful ones like databases, raises concerns. Hetzner provides instance and cloud volume storage options with significant differences in IOPS performance. Longhorn, a distributed block storage system, can be used to leverage local volumes, but benchmarks show a slowdown compared to raw local files. Probably host a datatbase either on a dedicated host or use a hosted option instead.


https://sveneliasson.de/benchmarking-hetzners-storage-classes-for-database-workloads-on-kubernetes
1
CloudNativePG Recipe 5 - How to migrate your PostgreSQL database in Kubernetes with ~0 downtime from anywhere

Are you considering migrating your PostgreSQL database from a service provider into Kubernetes, but you cannot afford downtime? Recipe #5 details step-by-step instructions, leveraging CloudNativePG and logical replication, to seamlessly transition from PostgreSQL 10+ to 16 using an imperative method. Learn how to set up initial configurations, execute migrations, and handle various use cases, such as transitioning from DBaaS to Kubernetes-managed databases and performing version upgrades. Emphasizing testing, learning, and compliance with regulations like the Data Act, this guide empowers users to maintain control over their data by migrating to Kubernetes.


https://www.gabrielebartolini.it/articles/2024/03/cloudnativepg-recipe-5-how-to-migrate-your-postgresql-database-in-kubernetes-with-~0-downtime-from-anywhere
1
kubetools

Curated List of Kubernetes Tools


https://github.com/collabnix/kubetools
2
ktail

A tool to easily tail Kubernetes container logs


https://github.com/atombender/ktail
1
kubevirt

KubeVirt is a virtual machine management add-on for Kubernetes. The aim is to provide a common ground for virtualization solutions on top of Kubernetes.


https://github.com/kubevirt/kubevirt
1
How We Migrated from StatsD to Prometheus in One Month

We recently migrated all of our infrastructure metrics from StatsD to Prometheus and are very pleased with the results. The migration was a ton of work and we learned a lot along the way. This post aims to shed some light on why we migrated to Prometheus, as well as outline some of the technical challenges we faced during the process.


https://engineering.mixpanel.com/how-we-migrated-from-statsd-to-prometheus-in-one-month-fb973af124f5
1
Building On-call: Our observability strategy

At incident.io, we run an on-call product. Our customers need to be sure that when their systems go wrong, we’ll tell them about it—high availability is a core requirement for us. To achieve the level of reliability that’s essential to our customers, excellent observability (o11y) is one of the most important tools in our belt.


https://incident.io/hubs/building-on-call/building-on-call-our-observability-strategy
2