Linux - Reddit – Telegram
Linux - Reddit
776 subscribers
4.19K photos
207 videos
39.9K links
Stay up-to-date with everything Linux!
Content directly fetched from the subreddit just for you.

Powered by : @r_channels
Download Telegram
🔍 From PostgreSQL Replica Lag to Kernel Bug: A Sherlock-Holmes-ing Journey Through Kubernetes, Page Cache, and Cgroups v2

[\(I&GPT\)](https://preview.redd.it/tmkiqilis6ve1.png?width=1280&format=png&auto=webp&s=256b665f3afe4158d541f4b2a240f425e061b347)

What started as a puzzling PostgreSQL replication lag in one of our Kubernetes cluster ended up uncovering... a Linux kernel bug. 🕵️

It began with our Postgres (PG) cluster, running in Kubernetes (K8s) pods/containers with memory limits and managed by the **Patroni** operator, behaving oddly:

* Replicas were lagging or getting dropped.
* Reinitialization of replicas (via pg\_basebackup) was taking **8–12 hours** (!).
* Grafana showed that **Network Bandwidth (BW) and Disk I/O** dropped dramatically — from 100MB/s to <1MB/s — **right after the pod’s memory limit was hit**.

Interestingly, memory usage was mostly in **inactive file page cache**, while **RSS** (Resident Set Size - container's processes allocated MEM) **and WSS** (Working Set Size: RSS + Active Files Page Cache) stayed low. Yet replication lag kept growing.

**So where is the issue..? Postgres? Kubernetes? Infra (Disks, Network, etc)!?**



We ruled out PostgreSQL specifics:

pg\_basebackup was just streaming files from leader → replica (K8s pod → K8s pod), like a fancy rsync.

* This slowdown only happened **if PG data directory size was greater than container memory limit**.
* Removing the memory limit fixed the issue — but that’s not a real-world solution for production.

***So still? What’s going on? Disk issue? Network throttling?***



We got methodic:

* **pg\_dump** from a remote IP > /dev/null → 🟢 Fast (no disk writes, no cache). ***So, no Netw issues?***
* pg\_dump (remote IP) > file → 🔴 Slow when Pod hits MEM Limit. ***Is it Disk???***
* Create and copy GBs of files inside the pod? 🟢 Fast. ***Hm, so no Disk I/O issues?***
* Use **rsync** inside the same container image to copy tons of files from remote IP? 🔴 Slow. ***Hm... So not exactly PG programs issue, but may be PG Docker Image? Olso, it happens when both Disk & Network are involved... strange!***
* Use a completely different image (**wbitt/network-multitool**)? 🔴 Still slow. ***O! No PG Issue!***
* Mount host network (hostNetwork: true) to bypass CNI/Calico? 🔴 Still slow. ***So, no K8s Netw Issue?***
* Launch containers manually with **ctr (*****containerd*****)** and memory limits, no K8s? 🔴 Slow! ***OMG! Is it Container Runtime Issue? What can I do? But, stop - I learned that containers are Linux Kernel cgroups, no? So let's try!***
* Run the same rsync inside a raw **cgroup v2 with memory.max set** via **systemd-run**? 🔴 Slow again! **WHAT!?? (*****Getting crazy here*****)**



But then, trying deep inspect, analyzing & repro it …

👉 On my **dev machine** (Ubuntu 22.04, kernel 6.x): 🟢 All tests ran smooth, no slowdowns.

👉 On Server there was Oracle Linux 9.2 (kernel 5.14.0-284.11.1.el9\_2, RHCK): 🔴 Reproducible every time! So..? Is it Linux Kernel Issue? (***Do U remember that containers are Kernel namespaced and cgrouped processes? ;)***)

So I did what any desperate sysadmin-spy-detective would do: started swapping kernels.

But before of these, I've studied a bit on Oracle Linux vs Kernels Docs ([https://docs.oracle.com/en/operating-systems/oracle-linux/9/boot/oracle\_linux9\_kernel\_version\_matrix.html](https://docs.oracle.com/en/operating-systems/oracle-linux/9/boot/oracle_linux9_kernel_version_matrix.html)), so, let's move on!

🔄 I Switched from RHCK (Red Hat Compatible Kernel) → **UEK (Oracle’s own kernel)** via grubby → 💥 **Issue gone**.

Still needed RHCK for some applications (e.g. **\[Censored\] DB** doesn’t support UEK), so we tried:

* RHCK from **OL 9.4** (5.14.0-427) → FIXED
* RHCK from **OL 9.5** (5.14.0-503.11.1) → FIXED (though some HW compat testing still ongoing)



📝 I haven’t found an official bug report in Oracle’s release notes for this kernel version. But behavior is clear:

OL 9.2 RHCK (5.14.0-284.11.1) = broken :(

OL 9.4/9.5 + RHCK = working!

I may just suppose that the memory of my specific
cgroupv2 wasn't reclaimed properly from inactive page cache and this led to the entire cgroup MEM saturation, inclusive those allocatable for network sockets of cgroup's processes (in cgroup there are "sock" KPI in memory.stat file) or Disk I/O mem structs..?

But, finally: ***Yeah, we did it :)!***



# 🧠 Key Takeaways:

* **Know your stack deeply** — I didn’t even check or care the OL version and kernel at first.
* **Reproduce outside your stack** — from PostgreSQL → rsync → cgroup tests.
* **Teamwork wins** — many clues came from teammates (and a certain ChatGPT 😉).
* **Container memory limits + cgroups v2 + page cache** on buggy kernels (*and not only - I have some horror stories on CPU Limits ;)*) can be a perfect storm.



I hope this post helps someone else chasing ghosts in containers and wondering why disk/network stalls under memory limits.

Let me know if you’ve seen anything similar — or if you enjoy a good kernel mystery! 🐧🔎

https://redd.it/1k0ipkg
@r_linux
Tired of find diving into node_modules hell? Meet trovatore – a fast, smart file searcher for Linux, no index needed.

I just released a small utility I’ve been working on: Trovatore – a fast CLI tool to search files by name, without relying on a database or indexing.

Why another file search tool?

Because I was tired of find crawling through cache/, node_modules/, .git/, and other junk folders when I just wanted to find something I saved on my Desktop two days ago.

Trovatore takes a smarter approach:

Ignores "blackhole" directories (build/, .cache/, etc.)
Prioritizes obvious places like Desktop, Documents, Downloads
Searches in real time – no indexing, no waiting
Supports wildcards and flexible search modes (starts, ends, exact, etc.)

GitHub repo: https://github.com/trikko/trovatore

Quick install:

curl `https://trikko.github.io/trovatore/install.sh` | bash

Example usage:

trovatore report*.pdf matches report.pdf report-blah.pdf ...

trovatore report_20??_*.pdf matches report_2024_full.pdf ...

trovatore -m ends .txt matches everything.txt

It’s written in D, works out of the box, and the config files are plain text and easy to tweak.

https://redd.it/1k0fmby
@r_linux
I just got the final authorization to convert the fleet workstations to all linux for my one client. Now we are talking migration strategy. This is really happening. I am so happy.

I know there will be the complainers but at the end of the day this is gonna make things so much better. Our test employee already had no issues.

I am very hopeful for a smooth transition.
***I wont get it. LOL
But still hopeful.

https://redd.it/1k0mrdk
@r_linux
My wife crocheted me a washable coaster
https://redd.it/1k0nh9q
@r_linux
min maxing btop with tmux
https://redd.it/1k0syw5
@r_linux
Love reviving older hardware
https://redd.it/1k0v74d
@r_linux
Switched to Arch! (Story about my linux journey through this year, read the denoscription)
https://redd.it/1k0umwx
@r_linux
debian on school chromebook
https://redd.it/1k11icw
@r_linux
It's great how much TTS in Linux has evolved

The 2015 article "An In-Depth Look at Text-to-Speech in Linux" discusses the challenges and shortcomings of text-to-speech (TTS) technology in the Linux environment. The author, who is preparing for a life without a voice due to throat cancer, explores various TTS solutions available in Linux and highlights their limitations.

Key points from the article include:

The author's personal journey and the reasons for investigating TTS solutions, including scenarios where verbal communication is crucial for safety and convenience. The state of TTS in Linux is described as "next to worthless" due to the lack of quality tools and the difficulty in integrating better voices. The article concludes by emphasizing the need for better TTS solutions in the Linux ecosystem, particularly for those who rely on such technology due to disabilities.

>Source: https://fossforce.com/2015/04/an-in-depth-look-at-text-to-speech-in-linux/

Now, jump forward to 2025, and Piper TTS has significantly improved the quality of TTS on Linux systems. It offers natural-sounding voices that are comparable to commercial services like Google TTS, making it a preferred choice over older, less accurate engines like espeak as discussed in the 2015 article. I'm using Piper TTS via the flatpak Speech Note, and I use it to read Wikipedia articles for me.

For comparison, here's a sample of espeak TTS. And here's a sample of Piper TTS.

Very impressive that it evolved from robotic sounding to natural sounding in the last decade since that article was written. I remember back in 2012, when I installed Xubuntu 12.04, when I first started Linux, I had to install WINE so I could install my SAPI5 voices from my Windows machine in order to get decent sounding TTS, now, with Piper TTS, I don't have to do that anymore. Thank you developers of Piper TTS for improving a part of the Linux ecoystem that has been stagnant since the early 2000s and 2010s.

I'm pretty sure Ken Starks, the author of that article from 2015, is quite happy now that Linux TTS has improved this much.

https://redd.it/1k11b5m
@r_linux
My daughter is definitively going to the right school.
https://redd.it/1k176uq
@r_linux
What size monitor + resolution to get for perfect Linux compatibility?

Hello,

out of curiosity, I'd like to find out what monitor size and resolution combination would be considered perfect for the best Linux compatibility

I've been thinking about 4K monitors and how they would scale at 27" vs 32" on Linux, but those would probably have to use fractional scaling to be usable, which is better avoided for perfect compatibility.

Which specific size/resolutions would grant perfect scaling?



https://redd.it/1k1c8sc
@r_linux
Finally decided to try Linux, any tips i should know to use it well?
https://redd.it/1k1j0al
@r_linux
Linux is for running a business

In the process of buying a business. I have used different POS programs in the past but they have all been windows based. Looking for OS distros and programs that are beneficial for running a business. POS, budgeting, payroll, all the things like that. I have used Linux off and on for 15 years but just for fun and personal use.

Also, I envision setting up 3-10 computers as I grow and would like to have them mesh together well. There is a lot of stuff in this arena that I know nothing about and will need professional help/tutoring to figure it out for sure. Even when I have ran more than one linux machine at a time they were always completely separate and never linked in any way.

Any input would be appreciated. Any laptop recommendations for longevity would be appreciated.

https://redd.it/1k1kv32
@r_linux