DevOps & SRE notes – Telegram
DevOps & SRE notes
12K subscribers
38 photos
19 files
2.5K links
Helpfull articles and tools for DevOps&SRE

WhatsApp: https://whatsapp.com/channel/0029Vb79nmmHVvTUnc4tfp2F

For paid consultation (RU/EN), contact: @tutunak


All ways to support https://telegra.ph/How-support-the-channel-02-19
Download Telegram
The article "Linux LKM Persistence" explores advanced techniques for maintaining persistence on Linux systems using loadable kernel modules (LKMs). It delves into the methods of loading malicious kernel modules at boot time, focusing on the use of the systemd-modules-load service to install and hide a rootkit

https://righteousit.com/2024/11/18/linux-lkm-persistence/

🚀 Join our community 🌐💻
👍3
The article "Using Tracetest with OpenTelemetry for Trace-based Testing" explores the concept of trace-based testing in distributed systems and introduces tools to implement this approach. It delves into the basics of distributed tracing, compares it with traditional logging methods, and demonstrates how to use OpenTelemetry for instrumentation and Tracetest for creating and running trace-based tests.

https://tracetest.io/blog/trace-based-testing-with-opentelemetry-using-tracetest-with-opentelemetry

🚀 Join our community 🌐💻
👍3
The article details Timescale's journey to optimize upsert performance on compressed hypertables in TimescaleDB, a PostgreSQL extension. It describes how the team addressed a customer's challenge with suboptimal upsert performance, ultimately achieving a 300x speed improvement by leveraging existing indexes for efficient conflict resolution during upserts on high-cardinality datasets.

https://www.timescale.com/blog/how-we-made-postgresql-upserts-300x-faster-on-compressed-data

🚀 Join our community 🌐💻
👍3
The article discusses setting up a self-hosted full-stack observability system for startups using open-source tools, emphasizing the importance of monitoring system performance as businesses scale. It outlines key components and tools like OpenTelemetry, OpenSearch, Prometheus, and Grafana to achieve comprehensive system visibility without incurring high SaaS costs

https://osuite.io/articles/full-stack-observability-self-hosted

🚀 Join our community 🌐💻
👍4👎1
☁️ AWS Morning Brief – AWS News with a Twist ☁️

Stay updated on the latest AWS news—sprinkled with snark! With over 60+ AWS posts daily, we cut through the noise to bring you the hidden gems, top community contributions, and the must-know updates**—all summarized with **wit and clarity.

🔍 Curated AWS News & Insights
🎙 No-Nonsense, Snarky Summaries
🚀 Stay Informed Without the Overload

Subscribe now and get your AWS updates—minus the nonsense! 📲
👍4
The authro explores the emerging trend of AI agents in the observability and monitoring space, discussing how these agents could potentially revolutionize the way operational data is processed and utilized. It highlights various startups developing AI-powered solutions for DevOps, incident response, and SRE tasks, while also addressing potential challenges such as data privacy concerns and the need for benchmarking to evaluate agent effectiveness.

https://monitoring2.substack.com/p/ai-agents-invade-observability

🚀 Join our community 🌐💻
👍5
The article compares the performance characteristics of Classic and Quorum queues in RabbitMQ, highlighting their strengths and use cases. It presents benchmark results showing that Classic queues offer higher throughput and lower latency, making them suitable for high-performance applications, while Quorum queues provide better fault tolerance and durability at the cost of reduced performance, making them ideal for mission-critical systems requiring high availabilitys

https://dzone.com/articles/battle-of-the-rabbitmq-queues-performance-insights

🚀 Join our community 🌐💻
❤‍🔥3👍1
Slack's engineering team details their journey in evolving their Chef infrastructure to manage tens of thousands of EC2 instances efficiently. The article explores the challenges faced and solutions implemented as they transitioned from a single Chef stack to a sharded infrastructure, improving reliability and deployment safety for their vast and growing infrastructure.
https://slack.engineering/advancing-our-chef-infrastructure/

🚀 Join our community 🌐💻
🔥3
The blogpost discusses the creation of SREBench, a Kubernetes task dataset designed to evaluate LLM performance in root cause analysis of Kubernetes issues. It details the challenges faced by the Parity team in developing a reliable benchmark for their AI agent, ultimately leading to the creation of a synthetic dataset inspired by the MuSR murder mystery reasoning benchmark

https://www.tryparity.com/blog/how-and-why-we-made-srebench-swebench-for-k8s

🚀 Join our community 🌐💻
👍3
Discord embarked on a six-month project to reduce bandwidth usage for their clients, particularly on iOS and Android, aiming to create a more responsive experience. The initiative focused on improving the "gateway" service, which provides real-time updates to clients, by transitioning from zlib compression to zstandard, which offers higher compression ratios, faster compression times, and support for dictionaries[

https://discord.com/blog/how-discord-reduced-websocket-traffic-by-40-percent

🚀 Join our community 🌐💻
👍4
☁️ Screaming in the Cloud – Cloud Computing with a Twist ☁️

Join Corey Quinn as he dives into unfiltered conversations with domain experts across the cloud industry. From AWS, GCP, and Azure to Oracle Cloud, this podcast unpacks the "why" behind cloud decisions**—with a dose of humor and critical insight.

🎙 **Expert Talks on Cloud & DevOps

☁️ AWS, GCP, Azure & Beyond
🔥 Sharp Insights with a Side of Snark

Subscribe now and stay ahead in the ever-evolving cloud landscape! 📲
1
The article discusses a critical incident involving Postgres indexing at a high-volume database with billions of rows. It highlights the potential pitfalls of concurrent indexing, which can fail silently and lead to performance issues. The author shares best practices for Postgres indexing, including using the CONCURRENTLY flag, monitoring index creation, validating indexes manually, and properly handling partitioned tables.

https://blog.bemi.io/indexing/
👍1