Railway’s latest narrative details their transition from relying on Google Cloud Platform to building their own physical infrastructure, highlighting the challenges and lessons learned in constructing a custom data center cage. This entry offers a behind-the-scenes look at selecting colocation options, managing power and cooling, and orchestrating the intricate cabling and network setup required for a resilient, high-performance platform.
https://blog.railway.com/p/data-center-build-part-one
https://blog.railway.com/p/data-center-build-part-one
Railway Blog
So You Want to Build Your Own Data Center
When it comes to infrastructure engineering, building a data center is probably closer to building a house than to deploying a Terraform stack.
👍3
This analysis explores how DeepSeek has reimagined the Transformer architecture to achieve greater efficiency and performance in large language models. The piece highlights innovations like Multi-Head Latent Attention and advanced Mixture-of-Experts routing that set DeepSeek apart from conventional approaches.
https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
Epoch AI
How has DeepSeek improved the Transformer architecture?
This Gradient Updates issue goes over the major changes that went into DeepSeek’s most recent model.
❤5
TerraConstructs is a library of classes and interfaces inspired by AWS CDK, but designed to leverage the power and flexibility of Terraform.
https://github.com/TerraConstructs/base
https://github.com/TerraConstructs/base
GitHub
GitHub - TerraConstructs/base: TerraConstructs
TerraConstructs. Contribute to TerraConstructs/base development by creating an account on GitHub.
👍4
Efficient, disruption-free application updates are essential for modern cloud-native operations. This article on Semaphore explains how Kubernetes’ rolling update deployment strategy enables teams to maintain service continuity while incrementally rolling out new versions.
https://semaphore.io/blog/kubernetes-rolling-update-deployment
https://semaphore.io/blog/kubernetes-rolling-update-deployment
Semaphore
Kubernetes Deployments: A Guide to the Rolling Update Deployment Strategy - Semaphore
The article elaborates on Kubernetes' rolling update deployment strategy, emphasizing incremental changes, adjustable speed, and pause/resume options.
❤2
Understanding logical replication in PostgreSQL is crucial for anyone managing data across multiple Postgres instances. This blogpost from EnterpriseDB introduces the basics of logical replication, explaining how it enables selective data replication—such as inserts, updates, and deletes—between databases, even across different Postgres versions, and outlines the practical steps to set up publications and subnoscriptions for real-time data synchronization.
https://www.enterprisedb.com/blog/logical-replication-postgres-basics
https://www.enterprisedb.com/blog/logical-replication-postgres-basics
EDB
Logical replication in Postgres: Basics
In this post we'll explore the basics of logical replication between two Postgres databases as both a user and a developer. Postgres first implemented physical replication where it shipped bytes on disk from one database A to another database B. Database…
❤1👍1
Figma’s migration onto Kubernetes is a compelling case study in how a high-growth company can modernize its infrastructure for scalability, reliability, and developer productivity. This article recounts Figma’s decision to move from AWS ECS to Kubernetes (EKS), the challenges they faced with ECS—such as lack of support for StatefulSets, Helm charts, and advanced autoscaling—and the benefits they unlocked by embracing the broader CNCF ecosystem and Kubernetes’ popularity within the industry.
https://www.figma.com/blog/migrating-onto-kubernetes/
https://www.figma.com/blog/migrating-onto-kubernetes/
Figma
How We Migrated onto K8s in Less Than 12 months | Figma Blog
Migrating onto Kubernetes can take years. Here’s why we decided it was worth undertaking, and how we moved a majority of our core services.
👍1
This newsletter explains the challenges of the "hot shard" problem—when a disproportionate amount of traffic targets a single shard, causing resource saturation and degraded performance. The blogpost outlines practical strategies to address this, such as vertical scaling, adding read replicas or caches, distributing hot keys across more shards, choosing better sharding keys and algorithms, implementing load balancing and queueing, controlling traffic with backpressure, and monitoring the cluster for early detection of issues.
https://newsletter.scalablethread.com/p/how-to-handle-hot-shard-problem
https://newsletter.scalablethread.com/p/how-to-handle-hot-shard-problem
Scalablethread
How to Handle Hot Shard Problem?
Understanding Different Approaches to Address Hot Key/Partition Problem
👍3❤1
Migrating from MetalLB to Cilium streamlines Kubernetes networking by consolidating load balancer, IP address management, and network advertisement features into a single tool. This article details how Cilium—starting with version 1.13—natively supports LoadBalancer IP management, BGP (Layer 3) announcements, and Layer 2 (ARP) announcements, eliminating the need for MetalLB in most self-managed clusters. Through practical YAML examples, it demonstrates configuring Cilium IP pools, service selectors, specific IP assignments, and both IPv4 and IPv6 support, as well as advertising service IPs to the network using BGP or ARP, offering a more integrated and simplified approach to Kubernetes networking.
https://isovalent.com/blog/post/migrating-from-metallb-to-cilium/
https://isovalent.com/blog/post/migrating-from-metallb-to-cilium/
Isovalent
Migrating from MetalLB to Cilium
In this blog post, you will learn how to migrate from MetalLB to Cilium for local service advertisement over Layer 2.
👍5❤2
Dropbox has built a flexible messaging system model to support its evolving async platform. This blogpost explores how the new architecture enhances decoupling and scalability across their infrastructure services.
https://dropbox.tech/infrastructure/infrastructure-messaging-system-model-async-platform-evolution
https://dropbox.tech/infrastructure/infrastructure-messaging-system-model-async-platform-evolution
👍2
Sven Eliasson benchmarks Hetzner’s Kubernetes storage classes to evaluate their suitability for database workloads. This report highlights the significant performance differences between instance-attached NVMe storage and cloud volumes, offering practical insights for infrastructure planning.
https://sveneliasson.de/benchmarking-hetzners-storage-classes-for-database-workloads-on-kubernetes
https://sveneliasson.de/benchmarking-hetzners-storage-classes-for-database-workloads-on-kubernetes
Sven Eliasson
Hetzner Storage Classes Comparison on Kubernetes
👍2
Instant's engineering team shares their journey of upgrading an Aurora Postgres instance to version 16 with zero downtime. This experience report details the challenges faced, including performance bottlenecks and failed upgrade attempts, ultimately leading to a successful migration strategy.
https://www.instantdb.com/essays/pg_upgrade
https://www.instantdb.com/essays/pg_upgrade
Instantdb
A Major Postgres Upgrade with Zero Downtime
👍6
Oilbeater presents k8gb as a standout open-source GSLB solution, seamlessly integrating with Kubernetes to manage cross-cluster domain names and traffic with minimal external dependencies. This blogpost delves into how k8gb leverages DNS protocols to achieve automated, multi-cloud traffic routing and disaster recovery, positioning it as a top choice for cloud-native environments.
https://oilbeater.com/en/2024/04/18/k8gb-best-cloudnative-gslb/
https://oilbeater.com/en/2024/04/18/k8gb-best-cloudnative-gslb/
Oilbeater's Study Room
k8gb: The Best Open Source GSLB Solution for Cloud Native | Oilbeater's Study Room
❤2