In the second part of the DevOps project, the focus is on deploying monitoring tools like ArgoCD, Prometheus, and Grafana to a Kubernetes cluster. The blog post covers installing ArgoCD, deploying Prometheus using Helm charts, setting up monitoring for ArgoCD, visualizing ArgoCD metrics using Grafana dashboards, and continuous deployment of applications using ArgoCD. A useful tool, K8sgpt, is recommended to analyze the cluster for errors and potential issues. The next blog post will discuss configuring Alert Manager for notifications, setting up Slack alerts, and installing Loki for logs, enhancing the monitoring solution.
https://blog.devgenius.io/optimizing-kubernetes-deployments-with-argocd-and-prometheus-aa86c11e2bba
https://blog.devgenius.io/optimizing-kubernetes-deployments-with-argocd-and-prometheus-aa86c11e2bba
Medium
Optimizing Kubernetes Deployments with ArgoCD and Prometheus
Welcome back to our DevOps project, where we demonstrate how to automate Kubernetes deployments using Terraform, ArgoCD, Prometheus, and…
Don't forget about security
https://dzone.com/articles/container-security-top-5-best-practices-for-devops
https://dzone.com/articles/container-security-top-5-best-practices-for-devops
DZone
Container Security: Top 5 Best Practices for DevOps Engineers
Container security ensures that your cloud-native applications are protected from cybersecurity threats associated with container environments.
A new terraform version has been released. Import already existed infrastructure to the terraform state become easier.
https://www.hashicorp.com/blog/terraform-1-5-brings-config-driven-import-and-checks
https://www.hashicorp.com/blog/terraform-1-5-brings-config-driven-import-and-checks
Streaming alert evaluation offers better scalability than traditional polling time-series databases, overcoming high dimensionality/cardinality limitations. This enables engineers to have more reliable and real-time alerting systems. The transition to the streaming path has opened doors for supporting more exciting use-cases and has allowed multiple platform teams at Netflix to generate and maintain alerts programmatically without affecting other users. The streaming paradigm may help tackle correlation problems in observability and offer new opportunities for metrics and events verticals, such as logs and traces.
https://netflixtechblog.com/improved-alerting-with-atlas-streaming-eval-e691c60dc61e
https://netflixtechblog.com/improved-alerting-with-atlas-streaming-eval-e691c60dc61e
Medium
Improved Alerting with Atlas Streaming Eval
Ruchir Jha, Brian Harrington, Yingwu Zhao
👍1
In this post, the author discusses potential PostgreSQL pitfalls that may not affect small databases, but can cause issues when databases grow.
https://philbooth.me/blog/nine-ways-to-shoot-yourself-in-the-foot-with-postgresql
https://philbooth.me/blog/nine-ways-to-shoot-yourself-in-the-foot-with-postgresql
philbooth.me
Nine ways to shoot yourself in the foot with PostgreSQL
Personal website of Phil Booth
Pipedrive Infra manages numerous Kubernetes clusters across different clouds, including AWS and on-premise OpenStack. They had been experiencing intermittent failing pod health checks, which became more frequent over time. After an extensive investigation, the team discovered that Kubelet was initiating TCP sessions to pods using random source ports within the same range reserved by Kubernetes nodeports. This caused the TCP SYN-ACK to be redirected to other pods, leading to failed health checks. The solution was to disallow the use of the nodeport range as the source port for TCP sessions with a single line of code, effectively resolving the issue.
https://medium.com/pipedrive-engineering/solving-the-mystery-of-pods-health-checks-failures-in-kubernetes-55b375493d03
https://medium.com/pipedrive-engineering/solving-the-mystery-of-pods-health-checks-failures-in-kubernetes-55b375493d03
Medium
Solving the mystery of pods health checks failures in Kubernetes
The story of one troubleshooting case.
Efficient GPU utilization is crucial for minimizing infrastructure expenses, especially in large Kubernetes clusters running AI and HPC workloads. NVIDIA MIG enables partitioning GPUs into smaller slices, but using MIG in Kubernetes through the NVIDIA GPU Operator alone has limitations due to static configurations. Dynamic MIG Partitioning addresses these limitations by automating the creation and deletion of MIG profiles based on real-time workload requirements, ensuring optimal GPU utilization. The nos module works alongside the NVIDIA GPU Operator to implement dynamic MIG partitioning, simplifying the management of MIG configurations and reducing operational costs.
https://towardsdatascience.com/dynamic-mig-partitioning-in-kubernetes-89db6cdde7a3
https://towardsdatascience.com/dynamic-mig-partitioning-in-kubernetes-89db6cdde7a3
Medium
Dynamic MIG Partitioning in Kubernetes
Maximize GPU utilization and reduce infrastructure costs.
Roadmapper - A Roadmap as Code (Rac) python library. Generate professional roadmap diagram using python code.
https://github.com/csgoh/roadmapper
https://github.com/csgoh/roadmapper
GitHub
GitHub - csgoh/roadmapper: Roadmapper - A Roadmap as Code (Rac) python library. Generate professional roadmap diagram using python…
Roadmapper - A Roadmap as Code (Rac) python library. Generate professional roadmap diagram using python code. - csgoh/roadmapper