NEW BOT Телеграм, страница

DevOps & SRE notes

This blog post discusses the growing trend of Large Language Models (LLMs) and their impact on various use cases. One specific application discussed is K8sGPT, an AI-based Site Reliability Engineer (SRE) that runs inside Kubernetes clusters. It scans, diagnoses, and triages issues using SRE experience codified into its analyzers. LocalAI, another project, is a drop-in replacement API for local CPU inferencing. Combining K8sGPT and LocalAI enables powerful SRE capabilities without relying on expensive GPUs.
https://itnext.io/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65

Medium

K8sGPT + LocalAI: Unlock Kubernetes superpowers for free!

As we all know, LLMs are trending like crazy and the hype is not unjustified. Tons of cool projects leveraging LLM-based text generation…

299 viewstutunak, 06:37

DevOps & SRE notes

This article explores Kubernetes Resource Manager and the Google Config Connector, comparing them to Terraform, a popular infrastructure orchestration tool. Kubernetes, an open-source container orchestration tool, has gained market dominance with its Custom Resource Definitions (CRDs), which allows managing Google Cloud resources through Kubernetes using CRDs. Config Connector, an add-on to Kubernetes, can potentially replace Terraform in some workflows. However, the author's experiment shows that while Config Connector can be used to deploy a Google Cloud landing zone, it has limitations compared to Terraform, particularly in handling interdependencies based on values unknown until a resource is created.

In conclusion, the author suggests a hybrid approach, with Terraform for platform-centric deployments and Config Connector for application-centric deployments. While Terraform's flexibility and provider support make it useful for organizations operating in multiple clouds, Config Connector has a compelling place in application-centric deployments where small amounts of infrastructure are deployed in support of Kubernetes-based services.

https://medium.com/cts-technologies/are-terraforms-days-numbered-a9a15ec0435a

Medium

Are Terraform’s days numbered?

An exploration of Kubernetes Resource Manager and the Google Config Connector

305 viewstutunak, 14:39

DevOps & SRE notes

https://medium.com/adevinta-tech-blog/managing-kubernetes-secrets-like-a-pro-93283fb4f06d

Medium

Managing Kubernetes secrets like a Pro

Learn about different use cases of Secrets management within the Kubernetes ecosystem

313 viewstutunak, 06:39

DevOps & SRE notes

K8sGPT gives Kubernetes Superpowers to everyone
k8sgpt is a tool for scanning your kubernetes clusters, diagnosing and triaging issues in simple english. It has SRE experience codified into it’s analyzers and helps to pull out the most relevant information to enrich it with AI.

https://k8sgpt.ai/

k8sgpt.ai

K8sGPT - Giving Kubernetes Superpowers to Everyone

K8sGPT is an AI-powered tool that helps diagnose and fix Kubernetes issues with intelligent insights and automated troubleshooting.

❤4

390 viewstutunak, 14:50

DevOps & SRE notes

This post provides a guide to configuring and installing a multi-cluster observability solution for cloud computing environments like AWS, Azure, and Google Cloud. The solution includes Grafana, Prometheus, Thanos, and Loki for monitoring applications and microservices in multi-cluster environments. The guide assumes prior experience with AWS S3, Policy, IAM, EKS, and Kubernetes. It covers the creation of IAM policies and roles, the installation of Helm, Bitnami's Helm charts, and EKS, AWS CLI, eksctl, and kubectl tools. The guide details the process of setting up multi-cluster observability with metrics monitoring using kube-prometheus and Thanos and log monitoring using Grafana Loki and Promtail.

https://medium.com/@bahungxt/multi-cluster-observability-solution-with-prometheus-thanos-loki-and-grafana-5d5be42635e8

Medium

MULTI-CLUSTER OBSERVABILITY SOLUTION WITH PROMETHEUS, THANOS, LOKI, AND GRAFANA

Background

336 viewstutunak, 06:41

DevOps & SRE notes

Nothing can be free forever or the story how Oracle took back a free cloud VMs
https://armin.su/oracle-cloud-and-loss-of-data-in-kubernetes-cluster-198d88181829

Medium

Oracle Cloud and Loss of all data

They offer 24GB RAM, 200GB SSD and 4 core cpu for free with a catch

291 viewstutunak, edited 14:27

DevOps & SRE notes

🔥 Open source static (serverless) status page. Uses hyperfast Go & Hugo, minimal HTML/CSS/JS, customizable, outstanding browser support (IE8+), preloaded CMS, read-only API, badges & more.

https://github.com/cstate/cstate

GitHub

GitHub - cstate/cstate: 🔥 Open source static (serverless) status page. Uses hyperfast Go & Hugo, minimal HTML/CSS/JS, customizable…

🔥 Open source static (serverless) status page. Uses hyperfast Go & Hugo, minimal HTML/CSS/JS, customizable, outstanding browser support (IE8+), preloaded CMS, read-only API, badges &amp...

291 viewstutunak, 06:43

DevOps & SRE notes

🎨 Diagram as Code for prototyping cloud system architectures

https://github.com/mingrammer/diagrams

GitHub

GitHub - mingrammer/diagrams: :art: Diagram as Code for prototyping cloud system architectures

:art: Diagram as Code for prototyping cloud system architectures - mingrammer/diagrams

318 viewstutunak, 14:44

DevOps & SRE notes

https://medium.com/@dotdc/prometheus-performance-and-cardinality-in-practice-74d5d9cd6230

Medium

Prometheus’ performance and cardinality in practice

In this article, I will explain how I analyzed and configured my Prometheus setup in order to significantly reduce its resource usage and…

312 viewstutunak, 06:44

DevOps & SRE notes

https://blog.palark.com/komodor-ui-for-kubernetes-overview/

Palark

Komodor overview: What it offers for Kubernetes monitoring and troubleshooting

Looking for a web interface to simplify troubleshooting and managing Kubernetes clusters? Consider Komodor as a powerful option, offered with paid and freemium plans.

310 viewstutunak, 14:46

DevOps & SRE notes

In the second part of the DevOps project, the focus is on deploying monitoring tools like ArgoCD, Prometheus, and Grafana to a Kubernetes cluster. The blog post covers installing ArgoCD, deploying Prometheus using Helm charts, setting up monitoring for ArgoCD, visualizing ArgoCD metrics using Grafana dashboards, and continuous deployment of applications using ArgoCD. A useful tool, K8sgpt, is recommended to analyze the cluster for errors and potential issues. The next blog post will discuss configuring Alert Manager for notifications, setting up Slack alerts, and installing Loki for logs, enhancing the monitoring solution.

https://blog.devgenius.io/optimizing-kubernetes-deployments-with-argocd-and-prometheus-aa86c11e2bba

Medium

Optimizing Kubernetes Deployments with ArgoCD and Prometheus

Welcome back to our DevOps project, where we demonstrate how to automate Kubernetes deployments using Terraform, ArgoCD, Prometheus, and…

345 viewstutunak, 06:45

DevOps & SRE notes

Don't forget about security

https://dzone.com/articles/container-security-top-5-best-practices-for-devops

DZone

Container Security: Top 5 Best Practices for DevOps Engineers

Container security ensures that your cloud-native applications are protected from cybersecurity threats associated with container environments.

290 viewstutunak, 14:46

DevOps & SRE notes

A new terraform version has been released. Import already existed infrastructure to the terraform state become easier.
https://www.hashicorp.com/blog/terraform-1-5-brings-config-driven-import-and-checks

307 viewstutunak, 06:16

DevOps & SRE notes

https://dev.to/danielepolencic/how-etcd-works-in-kubernetes-373l

DEV Community

How etcd works in Kubernetes

If you've ever interacted with a Kubernetes cluster in any way, chances are it was powered by etcd...

296 viewstutunak, 14:57

DevOps & SRE notes

Streaming alert evaluation offers better scalability than traditional polling time-series databases, overcoming high dimensionality/cardinality limitations. This enables engineers to have more reliable and real-time alerting systems. The transition to the streaming path has opened doors for supporting more exciting use-cases and has allowed multiple platform teams at Netflix to generate and maintain alerts programmatically without affecting other users. The streaming paradigm may help tackle correlation problems in observability and offer new opportunities for metrics and events verticals, such as logs and traces.

https://netflixtechblog.com/improved-alerting-with-atlas-streaming-eval-e691c60dc61e

Medium

Improved Alerting with Atlas Streaming Eval

Ruchir Jha, Brian Harrington, Yingwu Zhao

👍1

286 viewstutunak, 06:48

DevOps & SRE notes

https://www.techtarget.com/searchitoperations/tip/Stay-ahead-of-threats-with-DevOps-security-best-practices

IT Operations

Stay ahead of threats with DevOps security best practices

Learn how to safeguard your organization's IT infrastructure against cyberthreats by implementing these DevOps security best practices.

282 viewstutunak, 14:48

DevOps & SRE notes

In this post, the author discusses potential PostgreSQL pitfalls that may not affect small databases, but can cause issues when databases grow.
https://philbooth.me/blog/nine-ways-to-shoot-yourself-in-the-foot-with-postgresql

philbooth.me

Nine ways to shoot yourself in the foot with PostgreSQL

Personal website of Phil Booth

292 viewstutunak, 06:58

DevOps & SRE notes

https://community.ops.io/danielepolencic/learning-how-an-ingress-controller-works-by-building-one-in-bash-3fni

The Ops Community ⚙️

Learning how an ingress controller works by building one in bash

TL;DR: In this article, you will learn how the Ingress controller works in Kubernetes by building one...

291 viewstutunak, 15:01

DevOps & SRE notes

Pipedrive Infra manages numerous Kubernetes clusters across different clouds, including AWS and on-premise OpenStack. They had been experiencing intermittent failing pod health checks, which became more frequent over time. After an extensive investigation, the team discovered that Kubelet was initiating TCP sessions to pods using random source ports within the same range reserved by Kubernetes nodeports. This caused the TCP SYN-ACK to be redirected to other pods, leading to failed health checks. The solution was to disallow the use of the nodeport range as the source port for TCP sessions with a single line of code, effectively resolving the issue.

https://medium.com/pipedrive-engineering/solving-the-mystery-of-pods-health-checks-failures-in-kubernetes-55b375493d03

Medium

Solving the mystery of pods health checks failures in Kubernetes

The story of one troubleshooting case.

294 viewstutunak, 06:01

DevOps & SRE notes

https://heyryanw.medium.com/benchmark-metal-vs-ec2-vs-eks-vs-ecs-for-an-existing-stack-27a65df6db1f

Medium

Benchmark: Metal vs EC2 vs EKS vs ECS (for an existing stack)

A client I work with was recently contemplating a move from an EC2 deployment with some dedicated metal, to a self-hosted or EKS K8s…

304 viewstutunak, 15:01

DevOps & SRE notes

https://towardsdatascience.com/exploring-the-power-of-overlay-file-systems-in-linux-containers-d846724ec06d