DevOps & SRE notes – Telegram
DevOps & SRE notes
12K subscribers
38 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
Railway’s latest narrative details their transition from relying on Google Cloud Platform to building their own physical infrastructure, highlighting the challenges and lessons learned in constructing a custom data center cage. This entry offers a behind-the-scenes look at selecting colocation options, managing power and cooling, and orchestrating the intricate cabling and network setup required for a resilient, high-performance platform.

https://blog.railway.com/p/data-center-build-part-one
👍3
This analysis explores how DeepSeek has reimagined the Transformer architecture to achieve greater efficiency and performance in large language models. The piece highlights innovations like Multi-Head Latent Attention and advanced Mixture-of-Experts routing that set DeepSeek apart from conventional approaches.

https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
5
TerraConstructs is a library of classes and interfaces inspired by AWS CDK, but designed to leverage the power and flexibility of Terraform.

https://github.com/TerraConstructs/base
👍4
Efficient, disruption-free application updates are essential for modern cloud-native operations. This article on Semaphore explains how Kubernetes’ rolling update deployment strategy enables teams to maintain service continuity while incrementally rolling out new versions.

https://semaphore.io/blog/kubernetes-rolling-update-deployment
2
Understanding logical replication in PostgreSQL is crucial for anyone managing data across multiple Postgres instances. This blogpost from EnterpriseDB introduces the basics of logical replication, explaining how it enables selective data replication—such as inserts, updates, and deletes—between databases, even across different Postgres versions, and outlines the practical steps to set up publications and subnoscriptions for real-time data synchronization.

https://www.enterprisedb.com/blog/logical-replication-postgres-basics
1👍1
Figma’s migration onto Kubernetes is a compelling case study in how a high-growth company can modernize its infrastructure for scalability, reliability, and developer productivity. This article recounts Figma’s decision to move from AWS ECS to Kubernetes (EKS), the challenges they faced with ECS—such as lack of support for StatefulSets, Helm charts, and advanced autoscaling—and the benefits they unlocked by embracing the broader CNCF ecosystem and Kubernetes’ popularity within the industry.

https://www.figma.com/blog/migrating-onto-kubernetes/
👍1
This newsletter explains the challenges of the "hot shard" problem—when a disproportionate amount of traffic targets a single shard, causing resource saturation and degraded performance. The blogpost outlines practical strategies to address this, such as vertical scaling, adding read replicas or caches, distributing hot keys across more shards, choosing better sharding keys and algorithms, implementing load balancing and queueing, controlling traffic with backpressure, and monitoring the cluster for early detection of issues.

https://newsletter.scalablethread.com/p/how-to-handle-hot-shard-problem
👍31
Migrating from MetalLB to Cilium streamlines Kubernetes networking by consolidating load balancer, IP address management, and network advertisement features into a single tool. This article details how Cilium—starting with version 1.13—natively supports LoadBalancer IP management, BGP (Layer 3) announcements, and Layer 2 (ARP) announcements, eliminating the need for MetalLB in most self-managed clusters. Through practical YAML examples, it demonstrates configuring Cilium IP pools, service selectors, specific IP assignments, and both IPv4 and IPv6 support, as well as advertising service IPs to the network using BGP or ARP, offering a more integrated and simplified approach to Kubernetes networking.

https://isovalent.com/blog/post/migrating-from-metallb-to-cilium/
👍52
Dropbox has built a flexible messaging system model to support its evolving async platform. This blogpost explores how the new architecture enhances decoupling and scalability across their infrastructure services.

https://dropbox.tech/infrastructure/infrastructure-messaging-system-model-async-platform-evolution
👍2
Sven Eliasson benchmarks Hetzner’s Kubernetes storage classes to evaluate their suitability for database workloads. This report highlights the significant performance differences between instance-attached NVMe storage and cloud volumes, offering practical insights for infrastructure planning.

https://sveneliasson.de/benchmarking-hetzners-storage-classes-for-database-workloads-on-kubernetes
👍2
Instant's engineering team shares their journey of upgrading an Aurora Postgres instance to version 16 with zero downtime. This experience report details the challenges faced, including performance bottlenecks and failed upgrade attempts, ultimately leading to a successful migration strategy.

https://www.instantdb.com/essays/pg_upgrade
👍6