DevOps & SRE notes – Telegram
DevOps & SRE notes
12K subscribers
39 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
Managing infrastructure as code with Terraform provides significant benefits, especially at scale. This article from Cloudflare details how they use Terraform to manage and automate their infrastructure. Discover the best practices, challenges, and strategies Cloudflare employs to optimize their Terraform workflows and achieve seamless, scalable infrastructure management.

https://blog.cloudflare.com/terraforming-cloudflare-at-cloudflare/
👍1
Calico and Kubernetes work together to provide a powerful solution for implementing robust network policies in cloud-native environments. This article from Faun discusses how Calico enhances Kubernetes' native networking capabilities, offering greater control, security, and flexibility. Learn how to configure and use Calico for managing network policies, improving both security and performance in your Kubernetes clusters.

https://faun.pub/calico-and-kubernetes-a-perfect-pair-for-robust-network-policy-2b91eb4eec44
👍21
Understanding the Kubernetes kubeconfig file is crucial for managing access to your Kubernetes clusters. This article from DevOpsCube provides a comprehensive guide on the structure, usage, and best practices of the kubeconfig file. Learn how to configure, manage, and secure your kubeconfig to ensure efficient and secure interactions with your Kubernetes clusters.

https://devopscube.com/kubernetes-kubeconfig-file/
2
Managing Grafana as code allows for consistent, version-controlled monitoring setups across environments. This comprehensive guide from Grafana covers tools, tips, and best practices for managing Grafana dashboards and configurations as code. Learn how to automate and streamline your Grafana deployments using various tools to enhance observability and maintain monitoring consistency.

https://grafana.com/blog/2022/12/06/a-complete-guide-to-managing-grafana-as-code-tools-tips-and-tricks/
Restricting cluster admin permissions in Kubernetes is essential for maintaining security and preventing unauthorized access. This article by Marcus Noble provides insights into best practices for limiting cluster admin privileges. Learn how to effectively manage roles and permissions to enhance the security of your Kubernetes environment and protect your infrastructure from potential threats.

https://marcusnoble.co.uk/2022-01-20-restricting-cluster-admin-permissions/
👍2👎1
Scaling Site Reliability Engineering (SRE) teams is crucial for maintaining high availability and performance as organizations grow. This article from DZone explores strategies for expanding SRE teams, including building scalable processes, leveraging automation, and fostering a culture of collaboration. Learn how to effectively scale your SRE practices to support the evolving needs of your organization.

https://dzone.com/articles/scaling-sre-teams
👍1
The OpenTelemetry Collector is a powerful tool for gathering, processing, and exporting telemetry data from various sources. This article by Frankel provides a deep dive into the OpenTelemetry Collector, explaining its architecture, key features, and how to set it up. Learn how to use the OpenTelemetry Collector to improve observability in your systems by centralizing and standardizing the collection of metrics, traces, and logs.

https://blog.frankel.ch/opentelemetry-collector/
Upgrading Amazon EKS worker nodes is crucial for maintaining security, performance, and access to new features. This AWS blog post explains how to use Karpenter to automate the upgrade of EKS worker nodes, specifically handling node drift. Learn about the process and best practices to ensure smooth upgrades, minimize downtime, and maintain consistency in your Kubernetes environment.

https://aws.amazon.com/blogs/containers/how-to-upgrade-amazon-eks-worker-nodes-with-karpenter-drift/
1👍3
Performance regressions in cloud environments can be challenging to diagnose and resolve. This article from DoltHub discusses a "spooky" performance regression issue encountered with AWS Elastic Block Store (EBS). It explores the investigative steps taken to identify the root cause, the lessons learned, and best practices for monitoring and mitigating similar issues in cloud storage systems.

https://www.dolthub.com/blog/2023-11-22-spooky-performance-regression-aws-ebs/
👍3
Detecting and managing infrastructure drift is crucial for maintaining the desired state of your AWS resources. This article from ShipMonk's Product Development blog explains how to implement drift checks in Terraform for AWS environments. Learn about tools and techniques to identify, monitor, and remediate drift, ensuring your infrastructure remains consistent and compliant with your configurations.
https://pd.shipmonk.com/terraform-aws-drift-checks/
👍2
Managing access to Amazon RDS instances securely is vital for protecting your data and maintaining compliance. This article from SymOps discusses strategies for controlling and auditing access to RDS instances in AWS environments. Learn about best practices, tools, and techniques to enhance security, streamline access management, and ensure that only authorized users can interact with your databases.

https://blog.symops.com/post/rds-access
👍4