Python Daily – Telegram
Python Daily
2.57K subscribers
1.49K photos
53 videos
2 files
39K links
Daily Python News
Question, Tips and Tricks, Best Practices on Python Programming Language
Find more reddit channels over at @r_channels
Download Telegram
Kreuzberg v4.0.0-rc.8 is available

Hi Peeps,

I'm excited to announce that Kreuzberg v4.0.0 is coming very soon. We will release v4.0.0 at the beginning of next year - in just a couple of weeks time. For now, v4.0.0-rc.8 has been released to all channels.

## What is Kreuzberg?

Kreuzberg is a document intelligence toolkit for extracting text, metadata, tables, images, and structured data from 56+ file formats. It was originally written in Python (v1-v3), where it demonstrated strong performance characteristics compared to alternatives in the ecosystem.

## What's new in V4?

### A Complete Rust Rewrite with Polyglot Bindings

The new version of Kreuzberg represents a massive architectural evolution. Kreuzberg has been completely rewritten in Rust - leveraging Rust's memory safety, zero-cost abstractions, and native performance. The new architecture consists of a high-performance Rust core with native bindings to multiple languages. That's right - it's no longer just a Python library.

Kreuzberg v4 is now available for 7 languages across 8 runtime bindings:

- Rust (native library)
- Python (PyO3 native bindings)
- TypeScript - Node.js (NAPI-RS native bindings) + Deno/Browser/Edge (WASM)
- Ruby (Magnus FFI)
- Java 25+ (Panama Foreign Function & Memory API)
- C# (P/Invoke)
- Go (cgo bindings)

Post v4.0.0 roadmap includes:

- PHP
- Elixir (via Rustler - with Erlang and Gleam interop)

Additionally, it's available as

/r/Python
https://redd.it/1pn2a3t
Choosing Django model translation libraries in 2025

I’m looking for advice on choosing a model translation library for a Django + DRF application in 2025. We’re using PostgreSQL.

The use case is fairly typical:

Clients can add multiple translations for certain model fields (e.g. `name`, `denoscription` on an `Item` model)
When fetching data via the API, we return the correct translation dynamically based on a parameter (e.g. request header)

I’ve looked into the existing options, and it seems like there are three main contenders. I’d love to hear what others are using in production today, and what would you recommend.

Below is short summary of translation libraries i found and my thoughts on them, I'm not very familiar with Django ecosystem so any insights are appreciated

**Modeltranslation**

Utilizes one column per language per translatable field, no separate table, no joins, no jsonb stored, if you have 5 languages for model with 3 translatable fields you end up with extra 15 columns.

This one seems to have some activity, with fixes getting added to main recently and releases happening regularly. Seems like only non abandoned pick even though I'm not stoked about bloating tables with translations.

**Django-Parler**

Utilizes translation table

What worries me here is that it had last commit on `main` branch on Sep 3, 2022,

/r/django
https://redd.it/1pn4kio
Built a Japanese learning game using Django & Vanilla JS. Focus on "Zen" aesthetics.

/r/django
https://redd.it/1pn4g2r
Strategies for removing django-polymorphic from codebase

As per the noscript... The codebase grew with polymorphic in place, but it is causing more headaches and testing nightmares than the little abstraction help it provides. Going about removing it from some rather central models, while keeping all data and transferring to inheriting from abstract base classes instead, has been veeeery painful to say the least.

Anybody done the move and have some pointers?

/r/django
https://redd.it/1pmow0e
How to implement phone number + OTP login with django-allauth?

I’m currently working on a Django project and I have a requirement to allow users to log in using their Phone Number and an OTP (One-Time Password) via SMS, beside the standard Email/Username + Password combo.

I'd really like to use django-allauth for auth features.

I know that recent versions of django-allauth added ACCOUNT_PHONE_VERIFICATION_ENABLED and support for phone numbers as a primary identifier, but I don't know how to implement phone number + OTP login.

If anyone has implemented a Phone+OTP flow specifically with django-allauth recently, I’d love to hear how you approached.

Thanks in advance!

/r/django
https://redd.it/1pluo6i
Miguel's Flask Course

Hi all,

I'm currently learning Flask and after some due diligence I dove into Miguel's course. I felt good for the first few chapters and was grasping concepts pretty well then things started to get more complicated, I think more so the things that were introduced outside of the scope of Flask (third party libraries that are used) and it just completely knocked me off my horse. I feel like I'm just watching the videos now. I've made it to pretty much the end of the course but I don't feel like I've learnt as much as I should or could've. I'm not sure whether I'm too dumb or what's limiting me. Is it normal to find this course hard? Everyone says it's the go to for Flask and that's incredible, but I've honestly struggled immensley with it.

I moved to flask after I learnt JS and React, built some of my own little projects and felt comfortable enough to move on. I didn't really experience roadblocks like this with JS and React. But Flask, although the simple routes and whatnot are easy, it's beyond that when I feel stuck. I'm not sure what to do now, I've been learning

/r/flask
https://redd.it/1pnblkp
Building the Fastest Python CI

Hey all, there is a frustrating lack of resources and tooling for building Python CIs in a monorepo setting so I wrote up how we do it at $job.

# What my project does

We use uv as a package manager and pex to bundle our Python code and dependencies into executables. Pex recently added a feature that allows it to consume its dependencies from uv which drastically speeds up builds. This trick is included in the guide. Additionally, to keep our builds fast and vertically scalable we use a light-weight build system called Grog that allows us to cache and skip builds aswell as run them in parallel.

# Target Audience

Anyone building Python CI pipelines at small to medium scale.

# Comparison

The closest comparison to this would be Pants which comes with a massive complexity tasks and does not play well with existing dev tooling (more about this in the post). This approach on the other hand builds on top of uv and thus keeps the setup pretty lean while still delivering great performance.


Let me know what you think 🙏


Guide: https://chrismati.cz/posts/building-the-fastest-python-ci/

Demo repository: https://github.com/chrismatix/uv-pex-monorepo

/r/Python
https://redd.it/1pnbze0
Tuesday Daily Thread: Advanced questions

# Weekly Wednesday Thread: Advanced Questions 🐍

Dive deep into Python with our Advanced Questions thread! This space is reserved for questions about more advanced Python topics, frameworks, and best practices.

## How it Works:

1. **Ask Away**: Post your advanced Python questions here.
2. **Expert Insights**: Get answers from experienced developers.
3. **Resource Pool**: Share or discover tutorials, articles, and tips.

## Guidelines:

* This thread is for **advanced questions only**. Beginner questions are welcome in our [Daily Beginner Thread](#daily-beginner-thread-link) every Thursday.
* Questions that are not advanced may be removed and redirected to the appropriate thread.

## Recommended Resources:

* If you don't receive a response, consider exploring r/LearnPython or join the [Python Discord Server](https://discord.gg/python) for quicker assistance.

## Example Questions:

1. **How can you implement a custom memory allocator in Python?**
2. **What are the best practices for optimizing Cython code for heavy numerical computations?**
3. **How do you set up a multi-threaded architecture using Python's Global Interpreter Lock (GIL)?**
4. **Can you explain the intricacies of metaclasses and how they influence object-oriented design in Python?**
5. **How would you go about implementing a distributed task queue using Celery and RabbitMQ?**
6. **What are some advanced use-cases for Python's decorators?**
7. **How can you achieve real-time data streaming in Python with WebSockets?**
8. **What are the

/r/Python
https://redd.it/1pnn7xm
Looking for CSS frameworks, recommendations?

For my next project I'm staying with full stack Django templating with htmx I'm terrible at CSS and I hate writing it. A few of you will moan about that but I like frame works that have lots of components.

Do you have any recommendations?

Boot strap
Metroui
Beercss
Basecoatui


All great 👍 but are there anymore hiding in the wood work?

/r/django
https://redd.it/1pnlff8
Why don't dataclasses or attrs derive from a base class?

Both the standard `dataclasses` and the third-party `attrs` package follow the same approach: if you want to tell if an object or type is created using them, you need to do it in a non-standard way (call dataclasses.is_dataclass(), or catch attrs.NotAnAttrsClassError). It seems that both of them rely on setting a magic attribute in generated classes, so why not have them derive from an ABC with that attribute declared (or make it a property), so that users could use the standard isinstance? Was it performance considerations or something else?

/r/Python
https://redd.it/1pnne6l
P Built semantic PDF search with sentence-transformers + DuckDB - benchmarked chunking approaches

I built DocMine to make PDF research papers and documentation semantically searchable. 3-line API, runs locally, no API keys.



Architecture:

PyMuPDF (extraction) → Chonkie (semantic chunking) → sentence-transformers (embeddings) → DuckDB (vector storage)



Key decision: Semantic chunking vs fixed-size chunks

\- Semantic boundaries preserve context across sentences

\- \~20% larger chunks but significantly better retrieval quality

\- Tradeoff: 3x slower than naive splitting



Benchmarks (M1 Mac, Python 3.13):

\- 48-page PDF: 104s total (13.5s embeddings, 3.4s chunking, 0.4s extraction)

\- Search latency: 425ms average

\- Memory: Single-file DuckDB, <100MB for 1500 chunks



Example use case:

```python

from docmine.pipeline import PDFPipeline



pipeline = PDFPipeline()

pipeline.ingest_directory("./papers")

results = pipeline.search("CRISPR gene editing methods", top_k=5)



GitHub: https://github.com/bcfeen/DocMine



Open questions I'm still exploring:

1. When is semantic chunking worth the overhead vs simple sentence splitting?

2. Best way to handle tables/figures embedded in PDFs?

3. Optimal chunk_size for different document types (papers vs manuals)?



Feedback on the architecture or chunking approach welcome!

/r/Python
https://redd.it/1pnvuhf
I built PyGHA: Write GitHub Actions in Python, not YAML (Type-safe CI/CD)

# What My Project Does

PyGHA (v0.2.1, early beta) is a Python-native CI/CD framework that lets you define, test, and transpile workflow pipelines into GitHub Actions YAML using real Python instead of raw YAML. You write your workflows as Python functions, decorators, and control flow, and PyGHA generates the GitHub Actions files for you. It supports building, testing, linting, deploying, conditionals, matrices, and more through familiar Python constructs.

from pygha import job, defaultpipeline
from pygha.steps import shell, checkout, uses, when
from pygha.expr import runner, always

# Configure the default pipeline to run on:
# - pushes to main
# - pull requests
default
pipeline(onpush=["main"], onpullrequest=True)

# ---------------------------------------------------
# 1. Test job that runs across 3 Python versions
# ---------------------------------------------------

@job(
name="test",
matrix={"python": ["3.11", "3.12", "3.13"]},
)
def test
matrix():


/r/Python
https://redd.it/1pni2se
YourTimeStarts.now: A Small Flask App for Taskmaster-style Tasks
https://yourtimestarts.now

/r/flask
https://redd.it/1pnpqu1
Tool for splitting sports highlight videos into individual clips

Hi folks, I am looking for a way to split rugby highlight videos automatically into single clips containing tries. For example: https://www.youtube.com/watch\\?v\\=rnCF2VqYwdM to be split into videos of each of the 9 tries during the match.


Here are some of the complications involved:

\- Scenes have multiple camera angles and replays - so scene detection cutting based on visual by itself isn't feasible.

\- Not every scene is a try

\- Not every highlight video has consistent graphics - Some show a graphic between scenes, some do a cross fade. The scoreboard looks different in different competitions.


I imagine that the solution to this is some sort of combination of frame by frame analysis for scene detection, OCR of the scoreboard/time, audio analysis and commentary dialog. The solution also may have to be different for each broadcast so there might not even be a one size fits all solution.


Any suggestions?

/r/Python
https://redd.it/1pnznd9
Front end

So, I know backend (django) like at least to the point where I know what to search yk? . And can somewhat build backend of an app, but I am pretty bad at frontend , like I don't understand anything at all. ( I've always hated templates and static files and DTL) . But I do wanna learn it now (ps some one told me they can't give an opportunity since I'm not a full stack guy) . How do I approach front end? Like from the basics ? I would appreciate if you experienced folks can guide this hermit😔✋🏻

/r/django
https://redd.it/1po18nb
Recommended approach for single-endpoint, passwordless email-code login with domain restrictions with django-allauth

Hi, I am looking for guidance on implementing the following authentication flow using django-allauth.

Requirements

1. Restrict URL access Only /accounts/login/ should be accessible. All other django-allauth endpoints (signup, logout, password reset, email management, etc.), should be inaccessible. This applies regardless of whether the user is authenticated
2. Passwordless login via email code. No passwords are used, a user submits their email address on the login form and a one-time login code is sent to that email. If the email does not already exist, automatically create the user and send the login code, them log the user in after code verification
3. Domain-restricted access. Only email addresses from a whitelist of allowed domains may log in or be registered, attempts from other domains should be rejected before user creation.

I am building a service that depends on the student having access to the email address they are authenticating with, so email based verification is a core requirement. I want to avoid exposing any user facing account management or password based flows.

How may I achieve this?

/r/django
https://redd.it/1po8pxg
WhatsApp Wrapped with Polars & Plotly: Analyze chat history locally

I've always wanted something like Spotify Wrapped but for WhatsApp. There are some tools out there that do this, but every one I found either runs your chat history on their servers or is closed source. I wasn't comfortable with all that, so this year I built my own.

## What My Project Does

WhatsApp Wrapped generates visual reports for your group chats. You export your chat from WhatsApp (without media), run it through the tool, and get an HTML report with analytics. Everything runs locally or in your own Colab session. Nothing gets sent anywhere.

Here is a Sample Report.

Features include message counts, activity patterns, emoji stats, word clouds, and calendar heatmaps. The easiest way to use it is through Google Colab - just upload your chat export and download the report. There's also a CLI for local use.

## Target Audience

Anyone who wants to analyze their WhatsApp chats without uploading them to someone else's server. It's ready to use now.

## Comparison

Unlike other web tools that require uploading your data, this runs entirely on your machine (or your own Colab). It's also open source, so you can see exactly what it does with your chats.

Tech: Python, Polars, Plotly, Jinja2.

Links:
- GitHub
- Sample Report
- Google

/r/Python
[https://redd.it/1po9n17
Looking for Django developer for long term collaboration

Hello, I am looking for developer for my work.

It's easy, long term part time work.

Only US, America, Europe based developers are available.

DM for details.

/r/django
https://redd.it/1podw9b
I made FastAPI Clean CLI – Production-ready scaffolding with Clean Architecture

Hey r/Python,

What My Project Does
FastAPI Clean CLI is a pip-installable command-line tool that instantly scaffolds a complete, production-ready FastAPI project with strict Clean Architecture (4 layers: Domain, Application, Infrastructure, Presentation). It includes one-command full CRUD generation, optional production features like JWT auth, Redis caching, Celery tasks, Docker Compose orchestration, tests, and CI/CD.

Target Audience
Backend developers building scalable, maintainable FastAPI apps – especially for enterprise or long-term projects where boilerplate and clean structure matter (not just quick prototypes).

Comparison
Unlike simpler tools like cookiecutter-fastapi or manage-fastapi, this one enforces full Clean Architecture with dependency injection, repository pattern, and auto-generates vertical slices (CRUD + tests). It also bundles more production batteries (Celery, Prometheus, MinIO) in one command, while keeping everything optional.

Quick start:
pip install fastapi-clean-cli
fastapi-clean init --name=my_api --db=postgresql --auth=jwt --docker

It's on PyPI with over 600 downloads in the first few weeks!

GitHub: https://github.com/Amirrdoustdar/fastclean
PyPI: https://pypi.org/project/fastapi-clean-cli/
Stats: https://pepy.tech/project/fastapi-clean-cli

This is my first major open-source tool. Feedback welcome – what should I add next (MongoDB support coming soon)?

Thanks! 🚀

/r/Python
https://redd.it/1poh525