Looking for a GitHub alternative that is very different.
No "commits" or "pull requests". "Push request" or "Edit request" is fine.
Screens aren't busy.
The best practice for the README, or alternate, is to have a file that contains a summary of the code, a how-to implement the code, and any other needed information.
That's all I've got off the top of my head.
https://redd.it/1pm38sd
@r_opensource
No "commits" or "pull requests". "Push request" or "Edit request" is fine.
Screens aren't busy.
The best practice for the README, or alternate, is to have a file that contains a summary of the code, a how-to implement the code, and any other needed information.
That's all I've got off the top of my head.
https://redd.it/1pm38sd
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
I built a tiny open-source, local-first flashcard app after bouncing off Anki’s UI. Looking for feedback/possible contributors
As I was studying for the HL7 v2.8 Control Exam, I looked for a flashcard app. There are a LOT of flashcard apps out there, but they aren't all to my taste.
Anki seems to be the most popular open-source project; however, the UI left something to be desired.
Quizlet seems to have a good user interface, but I was turned off by its ad-heavy, closed setup.
Everything else seemed to be too complex.
So... as a one-week project, I built a tiny flashcard app named BaraBara. I built it with the following in mind:
A single-user experience that runs entirely in the browser.
No accounts or backend! Only `localStorage.`
Decks, with front/back of cards.
Simple "I knew it/I forgot it" spaced repetition.
Static build, you can self-host anywhere.
I'm not trying to compete with Anki/Quizlet. I'm aiming for something smaller and simpler. Thus, the scope is intentionally tiny. I'm sharing it here because:
I'd love some feedback from people who use and develop learning tools.
I'd like to grow this slowly and thoughtfully, and see if this is useful to anyone else.
I'm looking for a few contributors who like working on small projects. This project already attracted one generous contributor, who greatly improved the UI.
🔗Live Demo: [https://barabara.megafarad.com](https://barabara.megafarad.com)
🔗Repo (MIT): [https://github.com/megafarad/barabara](https://github.com/megafarad/barabara)
I'm especially interested in feedback on:
Does the "local only," no back-end approach resonate with you, or do you prefer to have a real back-end from day one?
What is the minimum feature set you expect from an open-source flashcard app (import/export, tags, richer media - like images)?
For anyone who has implemented SRS tools, I simply have two actions on cards: "I knew it," and "I forgot." Is that enough in your view? Additionally, are there any "gotchas" around scheduling, UX, or data modeling that I should know about?
I'm happy to answer any questions about the implementation or direction. If you can see a way for this project to be more useful (or even useful at all!) I'd love to hear it.
https://redd.it/1pm57ve
@r_opensource
As I was studying for the HL7 v2.8 Control Exam, I looked for a flashcard app. There are a LOT of flashcard apps out there, but they aren't all to my taste.
Anki seems to be the most popular open-source project; however, the UI left something to be desired.
Quizlet seems to have a good user interface, but I was turned off by its ad-heavy, closed setup.
Everything else seemed to be too complex.
So... as a one-week project, I built a tiny flashcard app named BaraBara. I built it with the following in mind:
A single-user experience that runs entirely in the browser.
No accounts or backend! Only `localStorage.`
Decks, with front/back of cards.
Simple "I knew it/I forgot it" spaced repetition.
Static build, you can self-host anywhere.
I'm not trying to compete with Anki/Quizlet. I'm aiming for something smaller and simpler. Thus, the scope is intentionally tiny. I'm sharing it here because:
I'd love some feedback from people who use and develop learning tools.
I'd like to grow this slowly and thoughtfully, and see if this is useful to anyone else.
I'm looking for a few contributors who like working on small projects. This project already attracted one generous contributor, who greatly improved the UI.
🔗Live Demo: [https://barabara.megafarad.com](https://barabara.megafarad.com)
🔗Repo (MIT): [https://github.com/megafarad/barabara](https://github.com/megafarad/barabara)
I'm especially interested in feedback on:
Does the "local only," no back-end approach resonate with you, or do you prefer to have a real back-end from day one?
What is the minimum feature set you expect from an open-source flashcard app (import/export, tags, richer media - like images)?
For anyone who has implemented SRS tools, I simply have two actions on cards: "I knew it," and "I forgot." Is that enough in your view? Additionally, are there any "gotchas" around scheduling, UX, or data modeling that I should know about?
I'm happy to answer any questions about the implementation or direction. If you can see a way for this project to be more useful (or even useful at all!) I'd love to hear it.
https://redd.it/1pm57ve
@r_opensource
I got tired of subnoscription-based finance apps, so I built a local-first alternative. I also did it to practice more and for my portfolio.
https://github.com/Jean-EstevezT/Aritmo-Budget-Planner
https://redd.it/1pm5wxr
@r_opensource
https://github.com/Jean-EstevezT/Aritmo-Budget-Planner
https://redd.it/1pm5wxr
@r_opensource
GitHub
GitHub - Jean-EstevezT/Aritmo-Budget-Planner: Aritmo is a smart budget tracker that helps you bring rhythm and balance to your…
Aritmo is a smart budget tracker that helps you bring rhythm and balance to your finances. Simple, clear, and accessible: organize your expenses, view your income, and find harmony in your numbers....
modern opensource smartphone?
I am looking for a modern open source smartphone (hardware)
https://redd.it/1pm78jm
@r_opensource
I am looking for a modern open source smartphone (hardware)
https://redd.it/1pm78jm
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
KV and Wide-column database with CDN-scale replication
Building https://github.com/ankur-anand/unisondb, a log-native KV/wide-column engine: with built-in global fanout.
I'm looking forward to your feedback.
https://redd.it/1pm85r5
@r_opensource
Building https://github.com/ankur-anand/unisondb, a log-native KV/wide-column engine: with built-in global fanout.
I'm looking forward to your feedback.
https://redd.it/1pm85r5
@r_opensource
GitHub
GitHub - ankur-anand/unisondb: A streaming multimodal database for Edge AI, and Edge Computing.
A streaming multimodal database for Edge AI, and Edge Computing. - ankur-anand/unisondb
Tired of Vue toast libraries, so I built my own (headless, Vue 3, TS-first)
Hey folks 👋 author here, looking for feedback.
I recently needed a toast system for a Vue 3 app that was:
modern,
lightweight,
and didn’t fight my custom styling.
I tried several Vue toast libraries and kept hitting the same issues: a lot of them were Vue 2–only or basically unmaintained, the styling was hard-wired instead of properly themeable, some were missing pretty basic options, and almost none gave me predictable behavior for things like duplicates, timers, or multiple stacks.
So I ended up building my own: Toastflow (core engine) + vue-toastflow (Vue 3 renderer).
# What it is
Headless toast engine + Vue 3 renderer
Toastflow keeps state in a tiny, framework-agnostic store (`toastflow-core`), and `vue-toastflow` is just a renderer on top with `<ToastContainer />` \+ a global `toast` helper.
CSS-first theming
The default look is driven by CSS variables (including per-type colors like `--success-bg`, `--error-text`, etc.). You can swap the design by editing one file or aligning it with your Tailwind/daisyUI setup.
Smooth stack animations
Enter/leave + move animations when items above/below are removed, for all positions (`top-left`, `top-center`, `top-right`, `bottom-left`, `bottom-center`, `bottom-right`). Implemented with `TransitionGroup` and overridable via `animation` config.
Typed API, works inside and outside components
You install the plugin once, then import `toast` from anywhere (components, composables, services, plain TS modules). Typed helpers: `toast.show`, `toast.success`, `toast.error`, `toast.warning`, `toast.info`, `toast.loading`, `toast.update`, `toast.dismiss`, `toast.dismissAll`, etc.
Deterministic behavior
The core handles duplicates, timers, pause-on-hover, close-on-click, `maxVisible`, stack order (`newest`/`oldest`), and `clear-all` in a predictable way.
Extras
Promise/async flows (`toast.loading`), optional HTML content with `supportHtml`, lifecycle hooks, events (`toast.subscribeEvents`), timestamps (`showCreatedAt`, `createdAtFormatter`), and a headless slot API if you want to render your own card.
# Quick taste
// main.ts
import { createApp } from 'vue'
import App from './App.vue'
import { createToastflow, ToastContainer } from 'vue-toastflow'
const app = createApp(App)
app.use(
createToastflow({
// optional global defaults
position: 'top-right',
duration: 5000,
}),
)
// register globally or import locally where you render it
app.component('ToastContainer', ToastContainer)
app.mount('#app')
<!-- Somewhere in your app -->
<noscript setup lang="ts">
import { toast } from 'vue-toastflow'
function handleSave() {
toast.success({
noscript: 'Saved',
denoscription: 'Your changes have been stored.',
})
}
</noscript>
<template>
<button @click="handleSave">Save</button>
<ToastContainer />
</template>
# Links
Playground / demo: https://toastflow.adrianjanocko.sk
GitHub: [https://github.com/adrianjanocko/toastflow](https://github.com/adrianjanocko/toastflow)
npm (Vue renderer): https://www.npmjs.com/package/vue-toastflow
https://redd.it/1pmamrw
@r_opensource
Hey folks 👋 author here, looking for feedback.
I recently needed a toast system for a Vue 3 app that was:
modern,
lightweight,
and didn’t fight my custom styling.
I tried several Vue toast libraries and kept hitting the same issues: a lot of them were Vue 2–only or basically unmaintained, the styling was hard-wired instead of properly themeable, some were missing pretty basic options, and almost none gave me predictable behavior for things like duplicates, timers, or multiple stacks.
So I ended up building my own: Toastflow (core engine) + vue-toastflow (Vue 3 renderer).
# What it is
Headless toast engine + Vue 3 renderer
Toastflow keeps state in a tiny, framework-agnostic store (`toastflow-core`), and `vue-toastflow` is just a renderer on top with `<ToastContainer />` \+ a global `toast` helper.
CSS-first theming
The default look is driven by CSS variables (including per-type colors like `--success-bg`, `--error-text`, etc.). You can swap the design by editing one file or aligning it with your Tailwind/daisyUI setup.
Smooth stack animations
Enter/leave + move animations when items above/below are removed, for all positions (`top-left`, `top-center`, `top-right`, `bottom-left`, `bottom-center`, `bottom-right`). Implemented with `TransitionGroup` and overridable via `animation` config.
Typed API, works inside and outside components
You install the plugin once, then import `toast` from anywhere (components, composables, services, plain TS modules). Typed helpers: `toast.show`, `toast.success`, `toast.error`, `toast.warning`, `toast.info`, `toast.loading`, `toast.update`, `toast.dismiss`, `toast.dismissAll`, etc.
Deterministic behavior
The core handles duplicates, timers, pause-on-hover, close-on-click, `maxVisible`, stack order (`newest`/`oldest`), and `clear-all` in a predictable way.
Extras
Promise/async flows (`toast.loading`), optional HTML content with `supportHtml`, lifecycle hooks, events (`toast.subscribeEvents`), timestamps (`showCreatedAt`, `createdAtFormatter`), and a headless slot API if you want to render your own card.
# Quick taste
// main.ts
import { createApp } from 'vue'
import App from './App.vue'
import { createToastflow, ToastContainer } from 'vue-toastflow'
const app = createApp(App)
app.use(
createToastflow({
// optional global defaults
position: 'top-right',
duration: 5000,
}),
)
// register globally or import locally where you render it
app.component('ToastContainer', ToastContainer)
app.mount('#app')
<!-- Somewhere in your app -->
<noscript setup lang="ts">
import { toast } from 'vue-toastflow'
function handleSave() {
toast.success({
noscript: 'Saved',
denoscription: 'Your changes have been stored.',
})
}
</noscript>
<template>
<button @click="handleSave">Save</button>
<ToastContainer />
</template>
# Links
Playground / demo: https://toastflow.adrianjanocko.sk
GitHub: [https://github.com/adrianjanocko/toastflow](https://github.com/adrianjanocko/toastflow)
npm (Vue renderer): https://www.npmjs.com/package/vue-toastflow
https://redd.it/1pmamrw
@r_opensource
Toastflow
Vue Toast Notifications Playground | Toastflow
Experiment with Vue toast notifications powered by Toastflow. Tune positions, animations, and behaviors in real-time.
I just created an open source Reddit Nuke extension!
https://github.com/edoardoCame/RedditNuke
Try it out!
https://redd.it/1pmc1ki
@r_opensource
https://github.com/edoardoCame/RedditNuke
Try it out!
https://redd.it/1pmc1ki
@r_opensource
GitHub
GitHub - edoardoCame/RedditNuke
Contribute to edoardoCame/RedditNuke development by creating an account on GitHub.
n8n vs Nyno (open-source alternative) for Python Code Execution: The Benchmarks and why Nyno is much faster
https://nyno.dev/n8n-vs-nyno-for-python-code-execution-the-benchmarks-and-why-nyno-is-much-faster
https://redd.it/1pmg0fw
@r_opensource
https://nyno.dev/n8n-vs-nyno-for-python-code-execution-the-benchmarks-and-why-nyno-is-much-faster
https://redd.it/1pmg0fw
@r_opensource
Reddit
From the opensource community on Reddit: n8n vs Nyno (open-source alternative) for Python Code Execution: The Benchmarks and why…
Posted by EveYogaTech - 0 votes and 1 comment
I built Preso – a free & open-source AI presentation generator (Gamma-like, but OSS)
Hey everyone, 👋
I just launched **Preso**, an **AI-powered presentation generator** that turns prompts, notes, or documents into fully designed slide decks.
**What it does:**
* **Prompt → Deck**: One sentence → researched, structured slides
* **Text → Deck**: Messy notes or articles → clean narrative
* **Doc → Deck**: PDFs / MD / TXT → extracted insights
**Design & Editing:**
* Curated themes (Modern, Luxury Noir, Cyberpunk, etc.)
* AI-generated color palettes
* Fixed **1920×1080 pixel-perfect canvas**
* Drag / resize / rotate elements
* **AI Remix**: edit slides using natural language
**Export:**
* Interactive HTML (standalone)
* PDF, PPTX
* High-res PNGs
**Why I built it:**
I make **a lot of presentations for college assignments**, and Gamma kept hitting limits - restricted exports, locked features, and paywalls for basic things.
I wanted:
* Full control over layouts (not template-locked)
* Proper editing like real design tools
* No usage limits
* Something I could actually extend and improve
So I built **Preso** as a **Gamma alternative**, but **free and open-source**, where AI handles structure and design instead of forcing predefined templates.
This is something I actively use for my own assignments.
**It’s completely free and open-source.**
🔗 Live: [https://preso-ai.vercel.app/](https://preso-ai.vercel.app/)
🐙 GitHub: [https://github.com/atharva9167j/preso](https://github.com/atharva9167j/preso)
Would love feedback - especially on UX, missing features, or performance issues.
https://redd.it/1pmh184
@r_opensource
Hey everyone, 👋
I just launched **Preso**, an **AI-powered presentation generator** that turns prompts, notes, or documents into fully designed slide decks.
**What it does:**
* **Prompt → Deck**: One sentence → researched, structured slides
* **Text → Deck**: Messy notes or articles → clean narrative
* **Doc → Deck**: PDFs / MD / TXT → extracted insights
**Design & Editing:**
* Curated themes (Modern, Luxury Noir, Cyberpunk, etc.)
* AI-generated color palettes
* Fixed **1920×1080 pixel-perfect canvas**
* Drag / resize / rotate elements
* **AI Remix**: edit slides using natural language
**Export:**
* Interactive HTML (standalone)
* PDF, PPTX
* High-res PNGs
**Why I built it:**
I make **a lot of presentations for college assignments**, and Gamma kept hitting limits - restricted exports, locked features, and paywalls for basic things.
I wanted:
* Full control over layouts (not template-locked)
* Proper editing like real design tools
* No usage limits
* Something I could actually extend and improve
So I built **Preso** as a **Gamma alternative**, but **free and open-source**, where AI handles structure and design instead of forcing predefined templates.
This is something I actively use for my own assignments.
**It’s completely free and open-source.**
🔗 Live: [https://preso-ai.vercel.app/](https://preso-ai.vercel.app/)
🐙 GitHub: [https://github.com/atharva9167j/preso](https://github.com/atharva9167j/preso)
Would love feedback - especially on UX, missing features, or performance issues.
https://redd.it/1pmh184
@r_opensource
preso-ai.vercel.app
Preso | Build Beautiful Presentations using AI for free
Stop formatting slides. Start telling stories. Generate beautiful, professional decks instantly from simple text prompts using Preso.
Challenge Integrate BRSCPP (Non-Custodial Fiat-to-Crypto Payments) in your dApp & Compete for 200 USDC Prize Pool
I am challenging young Web3 developers to integrate BRSCPP (a Non-Custodial infrastructure for Fiat-to-Crypto payments) into their dApps and web stores.
Anyone who successfully integrates, processes payments, or discovers a bug can compete for a 200 USDC prize pool and an option for future project collaboration.
BRSCPP is an MVP project on Sepolia and BSC Testnet developed be me, supporting ETH/BNB, USDC, USDT, and accepting payments in 12 different fiat currencies.
If you are interested, please send a DM.
Regards ;)
https://redd.it/1pmiry4
@r_opensource
I am challenging young Web3 developers to integrate BRSCPP (a Non-Custodial infrastructure for Fiat-to-Crypto payments) into their dApps and web stores.
Anyone who successfully integrates, processes payments, or discovers a bug can compete for a 200 USDC prize pool and an option for future project collaboration.
BRSCPP is an MVP project on Sepolia and BSC Testnet developed be me, supporting ETH/BNB, USDC, USDT, and accepting payments in 12 different fiat currencies.
If you are interested, please send a DM.
Regards ;)
https://redd.it/1pmiry4
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
I built a simple automatic app updater that uses WinGet
https://github.com/ELowry/WinGet-Updater
https://redd.it/1pmggv8
@r_opensource
https://github.com/ELowry/WinGet-Updater
https://redd.it/1pmggv8
@r_opensource
GitHub
GitHub - ELowry/WinGet-Updater: Take the headache out of keeping your software up-to-date. This app sits on top of the powerful…
Take the headache out of keeping your software up-to-date. This app sits on top of the powerful Windows Package Manager (WinGet) to provide a simple, set-and-forget experience. - ELowry/WinGet-Updater
How do you share open source work without it feeling like self-promotion
Hi everyone :),
I’ve been working on a small open source CLI tool in my spare time and recently reached a point where it feels “done enough” to share — but I’m unsure what the right next steps are.
So far I’ve tried:
\- Writing a clear README with examples
\- Adding documentation and usage guides on my docs website
\- Sharing it in one or two relevant discussions (without spamming)
I’m explicitly not trying to market it aggressively — I’d rather get it in front of the right people and receive honest feedback.
For those of you who’ve shipped open source projects that actually got adopted: What made the biggest difference early on? What do you wish you had done sooner?
If it helps, the project it's the link if you have any tips
Thanks!
If you want to check my project out or contribute feel very welcome to do so
https://github.com/Chrilleweb/dotenv-diff
https://redd.it/1pmks4y
@r_opensource
Hi everyone :),
I’ve been working on a small open source CLI tool in my spare time and recently reached a point where it feels “done enough” to share — but I’m unsure what the right next steps are.
So far I’ve tried:
\- Writing a clear README with examples
\- Adding documentation and usage guides on my docs website
\- Sharing it in one or two relevant discussions (without spamming)
I’m explicitly not trying to market it aggressively — I’d rather get it in front of the right people and receive honest feedback.
For those of you who’ve shipped open source projects that actually got adopted: What made the biggest difference early on? What do you wish you had done sooner?
If it helps, the project it's the link if you have any tips
Thanks!
If you want to check my project out or contribute feel very welcome to do so
https://github.com/Chrilleweb/dotenv-diff
https://redd.it/1pmks4y
@r_opensource
GitHub
GitHub - Chrilleweb/dotenv-diff: Validate environment variable usage in your codebase
Validate environment variable usage in your codebase - Chrilleweb/dotenv-diff
koin.js: MIT Licensed WebAssembly Gaming Engine for Retro Games
Hey Open source community!
I released koin.js under MIT license - a comprehensive WebAssembly gaming solution:
What it provides:
• Cross-platform emulation using Emnoscripten-compiled Libretro cores
• React component API for easy web integration
• Performance optimizations including Run-Ahead input processing
• Modular architecture - use just the engine or full UI
• Achievement system integration with RetroAchievements
• Virtual controls with haptic feedback algorithms
Architecture:
• Built on Nostalgist.js with additional performance enhancements
• WebGL rendering with SharedArrayBuffer threading
• Zero runtime dependencies for core functionality
• Comprehensive TypeScript definitions • Browser compatibility focused (Chrome, Firefox, Safari, Edge)
Perfect for: Game preservation, educational tools, indie development, web portfolios.
Contribute today:
Documentation: https://koin.js.org
Source code: https://github.com/muditjuneja/koin
Join the open-source gaming revolution - your contributions can make web gaming better for everyone!
https://redd.it/1pmmryc
@r_opensource
Hey Open source community!
I released koin.js under MIT license - a comprehensive WebAssembly gaming solution:
What it provides:
• Cross-platform emulation using Emnoscripten-compiled Libretro cores
• React component API for easy web integration
• Performance optimizations including Run-Ahead input processing
• Modular architecture - use just the engine or full UI
• Achievement system integration with RetroAchievements
• Virtual controls with haptic feedback algorithms
Architecture:
• Built on Nostalgist.js with additional performance enhancements
• WebGL rendering with SharedArrayBuffer threading
• Zero runtime dependencies for core functionality
• Comprehensive TypeScript definitions • Browser compatibility focused (Chrome, Firefox, Safari, Edge)
Perfect for: Game preservation, educational tools, indie development, web portfolios.
Contribute today:
npm install koin.js Documentation: https://koin.js.org
Source code: https://github.com/muditjuneja/koin
Join the open-source gaming revolution - your contributions can make web gaming better for everyone!
https://redd.it/1pmmryc
@r_opensource
koin.js.org
koin.js — Browser Retro Game Emulation for React
The drop-in React component for browser-based retro game emulation. 27 systems. Cloud saves. Zero backend required.
For Linux software maintainers: distropack now supports .tar archives aside from .deb .rpm and .pkg
https://distropack.dev/Blog/Post?slug=introducing-tar-package-support-simple-distribution-without-repository-complexity
https://redd.it/1pmq23a
@r_opensource
https://distropack.dev/Blog/Post?slug=introducing-tar-package-support-simple-distribution-without-repository-complexity
https://redd.it/1pmq23a
@r_opensource
Reddit
From the opensource community on Reddit: For Linux software maintainers: distropack now supports .tar archives aside from .deb…
Posted by TheAlexDev - 1 vote and 0 comments
Help improve Img2Num’s README! (Good First Issue)🦔
https://github.com/Ryan-Millard/Img2Num/issues/106
https://redd.it/1pmrd9e
@r_opensource
https://github.com/Ryan-Millard/Img2Num/issues/106
https://redd.it/1pmrd9e
@r_opensource
GitHub
Revise README to be concise, link to docs site, include logo & demo, and add credits section · Issue #106 · Ryan-Millard/Img2Num
Current Code Issue The current README is quite verbose and attempts to cover too many details that are already documented on the official documentation site. To improve clarity and first‑impression...
OpenMeters: audio visualization & metering for linux.
https://github.com/httpsworldview/openmeters
https://redd.it/1pmt5cc
@r_opensource
https://github.com/httpsworldview/openmeters
https://redd.it/1pmt5cc
@r_opensource
GitHub
GitHub - httpsworldview/openmeters: A fast and simple audio metering/visualization program for Linux.
A fast and simple audio metering/visualization program for Linux. - httpsworldview/openmeters
🌎 Trendgetter v2.0: An API for getting trending content from various platforms
https://github.com/Zivsteve/trendgetter
https://redd.it/1pmta0q
@r_opensource
https://github.com/Zivsteve/trendgetter
https://redd.it/1pmta0q
@r_opensource
GitHub
GitHub - Zivsteve/trendgetter: An API for getting trending content from various platforms 🌎
An API for getting trending content from various platforms 🌎 - Zivsteve/trendgetter
Better issues -> more contributions
If you want more pull requests, start by writing better issues.
From my own experience, on both sides, most people do not avoid contributing because they are lazy. They avoid it because the cost of entry is unclear. You do not know how much context you need or whether you will spend a weekend only to be told that is not what was meant. Clear issues remove that fear and shows respect for the contributor’s time.
The same applies to the codebase itself. If I can clone the repo, run it and understand the basic flow without reverse engineering everything, I am far more likely to help. Poor documentation does not just slow people down. It quietly filters contributors out.
Granularity matters too. Smaller, well scoped issues are simply less intimidating. That first small merge often turns into a second pull request, then a third. Large and fuzzy issues rarely get that first step.
None of this is meant to be flashy or inspirational. I just realized, that after I changed my maintainer habits a bit and followed these guidelines, way more new contributors entered the repo, which is a great feeling :)
https://redd.it/1pmspbu
@r_opensource
If you want more pull requests, start by writing better issues.
From my own experience, on both sides, most people do not avoid contributing because they are lazy. They avoid it because the cost of entry is unclear. You do not know how much context you need or whether you will spend a weekend only to be told that is not what was meant. Clear issues remove that fear and shows respect for the contributor’s time.
The same applies to the codebase itself. If I can clone the repo, run it and understand the basic flow without reverse engineering everything, I am far more likely to help. Poor documentation does not just slow people down. It quietly filters contributors out.
Granularity matters too. Smaller, well scoped issues are simply less intimidating. That first small merge often turns into a second pull request, then a third. Large and fuzzy issues rarely get that first step.
None of this is meant to be flashy or inspirational. I just realized, that after I changed my maintainer habits a bit and followed these guidelines, way more new contributors entered the repo, which is a great feeling :)
https://redd.it/1pmspbu
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
Isitreallyfoss - Website that evaluates "foss" projects to see if they're as free and open source as advertised
https://isitreallyfoss.com/
https://redd.it/1pmzi4k
@r_opensource
https://isitreallyfoss.com/
https://redd.it/1pmzi4k
@r_opensource
Kreuzberg v4.0.0-rc.8 is available
Hi Peeps,
I'm excited to announce that Kreuzberg v4.0.0 is coming very soon. We will release v4.0.0 at the beginning of next year - in just a couple of weeks time. For now, v4.0.0-rc.8 has been released to all channels.
## What is Kreuzberg?
Kreuzberg is a document intelligence toolkit for extracting text, metadata, tables, images, and structured data from 56+ file formats. It was originally written in Python (v1-v3), where it demonstrated strong performance characteristics compared to alternatives in the ecosystem.
## What's new in V4?
### A Complete Rust Rewrite with Polyglot Bindings
The new version of Kreuzberg represents a massive architectural evolution. Kreuzberg has been completely rewritten in Rust - leveraging Rust's memory safety, zero-cost abstractions, and native performance. The new architecture consists of a high-performance Rust core with native bindings to multiple languages. That's right - it's no longer just a Python library.
Kreuzberg v4 is now available for 7 languages across 8 runtime bindings:
- Rust (native library)
- Python (PyO3 native bindings)
- TypeScript - Node.js (NAPI-RS native bindings) + Deno/Browser/Edge (WASM)
- Ruby (Magnus FFI)
- Java 25+ (Panama Foreign Function & Memory API)
- C# (P/Invoke)
- Go (cgo bindings)
Post v4.0.0 roadmap includes:
- PHP
- Elixir (via Rustler - with Erlang and Gleam interop)
Additionally, it's available as a CLI (installable via
### Why the Rust Rewrite? Performance and Architecture
The Rust rewrite wasn't just about performance - though that's a major benefit. It was an opportunity to fundamentally rethink the architecture:
Architectural improvements:
- Zero-copy operations via Rust's ownership model
- True async concurrency with Tokio runtime (no GIL limitations)
- Streaming parsers for constant memory usage on multi-GB files
- SIMD-accelerated text processing for token reduction and string operations
- Memory-safe FFI boundaries for all language bindings
- Plugin system with trait-based extensibility
### v3 vs v4: What Changed?
| Aspect | v3 (Python) | v4 (Rust Core) |
|--------|-------------|----------------|
| Core Language | Pure Python | Rust 2024 edition |
| File Formats | 30-40+ (via Pandoc) | 56+ (native parsers) |
| Language Support | Python only | 7 languages (Rust/Python/TS/Ruby/Java/Go/C#) |
| Dependencies | Requires Pandoc (system binary) | Zero system dependencies (all native) |
| Embeddings | Not supported | ✓ FastEmbed with ONNX (3 presets + custom) |
| Semantic Chunking | Via semantic-text-splitter library | ✓ Built-in (text + markdown-aware) |
| Token Reduction | Built-in (TF-IDF based) | ✓ Enhanced with 3 modes |
| Language Detection | Optional (fast-langdetect) | ✓ Built-in (68 languages) |
| Keyword Extraction | Optional (KeyBERT) | ✓ Built-in (YAKE + RAKE algorithms) |
| OCR Backends | Tesseract/EasyOCR/PaddleOCR | Same + better integration |
| Plugin System | Limited extractor registry | Full trait-based (4 plugin types) |
| Page Tracking | Character-based indices | Byte-based with O(1) lookup |
| Servers | REST API (Litestar) | HTTP (Axum) + MCP + MCP-SSE |
| Installation Size | ~100MB base | 16-31 MB complete |
| Memory Model | Python heap management | RAII with streaming |
| Concurrency | asyncio (GIL-limited) | Tokio work-stealing |
### Replacement of Pandoc - Native Performance
Kreuzberg v3 relied on Pandoc - an amazing tool, but one that had to be invoked via subprocess because of its GPL license. This had significant impacts:
v3 Pandoc limitations:
- System dependency (installation required)
- Subprocess overhead on every document
- No streaming support
- Limited metadata extraction
- ~500MB+
Hi Peeps,
I'm excited to announce that Kreuzberg v4.0.0 is coming very soon. We will release v4.0.0 at the beginning of next year - in just a couple of weeks time. For now, v4.0.0-rc.8 has been released to all channels.
## What is Kreuzberg?
Kreuzberg is a document intelligence toolkit for extracting text, metadata, tables, images, and structured data from 56+ file formats. It was originally written in Python (v1-v3), where it demonstrated strong performance characteristics compared to alternatives in the ecosystem.
## What's new in V4?
### A Complete Rust Rewrite with Polyglot Bindings
The new version of Kreuzberg represents a massive architectural evolution. Kreuzberg has been completely rewritten in Rust - leveraging Rust's memory safety, zero-cost abstractions, and native performance. The new architecture consists of a high-performance Rust core with native bindings to multiple languages. That's right - it's no longer just a Python library.
Kreuzberg v4 is now available for 7 languages across 8 runtime bindings:
- Rust (native library)
- Python (PyO3 native bindings)
- TypeScript - Node.js (NAPI-RS native bindings) + Deno/Browser/Edge (WASM)
- Ruby (Magnus FFI)
- Java 25+ (Panama Foreign Function & Memory API)
- C# (P/Invoke)
- Go (cgo bindings)
Post v4.0.0 roadmap includes:
- PHP
- Elixir (via Rustler - with Erlang and Gleam interop)
Additionally, it's available as a CLI (installable via
cargo or homebrew), HTTP REST API server, Model Context Protocol (MCP) server for Claude Desktop/Continue.dev, and as public Docker images.### Why the Rust Rewrite? Performance and Architecture
The Rust rewrite wasn't just about performance - though that's a major benefit. It was an opportunity to fundamentally rethink the architecture:
Architectural improvements:
- Zero-copy operations via Rust's ownership model
- True async concurrency with Tokio runtime (no GIL limitations)
- Streaming parsers for constant memory usage on multi-GB files
- SIMD-accelerated text processing for token reduction and string operations
- Memory-safe FFI boundaries for all language bindings
- Plugin system with trait-based extensibility
### v3 vs v4: What Changed?
| Aspect | v3 (Python) | v4 (Rust Core) |
|--------|-------------|----------------|
| Core Language | Pure Python | Rust 2024 edition |
| File Formats | 30-40+ (via Pandoc) | 56+ (native parsers) |
| Language Support | Python only | 7 languages (Rust/Python/TS/Ruby/Java/Go/C#) |
| Dependencies | Requires Pandoc (system binary) | Zero system dependencies (all native) |
| Embeddings | Not supported | ✓ FastEmbed with ONNX (3 presets + custom) |
| Semantic Chunking | Via semantic-text-splitter library | ✓ Built-in (text + markdown-aware) |
| Token Reduction | Built-in (TF-IDF based) | ✓ Enhanced with 3 modes |
| Language Detection | Optional (fast-langdetect) | ✓ Built-in (68 languages) |
| Keyword Extraction | Optional (KeyBERT) | ✓ Built-in (YAKE + RAKE algorithms) |
| OCR Backends | Tesseract/EasyOCR/PaddleOCR | Same + better integration |
| Plugin System | Limited extractor registry | Full trait-based (4 plugin types) |
| Page Tracking | Character-based indices | Byte-based with O(1) lookup |
| Servers | REST API (Litestar) | HTTP (Axum) + MCP + MCP-SSE |
| Installation Size | ~100MB base | 16-31 MB complete |
| Memory Model | Python heap management | RAII with streaming |
| Concurrency | asyncio (GIL-limited) | Tokio work-stealing |
### Replacement of Pandoc - Native Performance
Kreuzberg v3 relied on Pandoc - an amazing tool, but one that had to be invoked via subprocess because of its GPL license. This had significant impacts:
v3 Pandoc limitations:
- System dependency (installation required)
- Subprocess overhead on every document
- No streaming support
- Limited metadata extraction
- ~500MB+
GitHub
GitHub - kreuzberg-dev/kreuzberg: A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured…
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Rub...
installation footprint
v4 native parsers:
- Zero external dependencies - everything is native Rust
- Direct parsing with full control over extraction
- Substantially more metadata extracted (e.g., DOCX document properties, section structure, style information)
- Streaming support for massive files (tested on multi-GB XML documents with stable memory)
- Example: PPTX extractor is now a fully streaming parser capable of handling gigabyte-scale presentations with constant memory usage and high throughput
### New File Format Support
v4 expanded format support from ~20 to 56+ file formats, including:
Added legacy format support:
-
-
-
-
-
Added academic/technical formats:
- LaTeX (
- BibTeX (
- Typst (
- JATS XML (scientific articles)
- DocBook XML
- FictionBook (
- OPML (
Better Office support:
- XLSB, XLSM (Excel binary/macro formats)
- Better structured metadata extraction from DOCX/PPTX/XLSX
- Full table extraction from presentations
- Image extraction with deduplication
### New Features: Full Document Intelligence Solution
The v4 rewrite was also an opportunity to close gaps with commercial alternatives and add features specifically designed for RAG applications and LLM workflows:
#### 1. Embeddings (NEW)
- FastEmbed integration with full ONNX Runtime acceleration
- Three presets:
- Custom model support (bring your own ONNX model)
- Local generation (no API calls, no rate limits)
- Automatic model downloading and caching
- Per-chunk embedding generation
#### 2. Semantic Text Chunking (NOW BUILT-IN)
Now integrated directly into the core (v3 used external semantic-text-splitter library):
- Structure-aware chunking that respects document semantics
- Two strategies:
- Generic text chunker (whitespace/punctuation-aware)
- Markdown chunker (preserves headings, lists, code blocks, tables)
- Configurable chunk size and overlap
- Unicode-safe (handles CJK, emojis correctly)
- Automatic chunk-to-page mapping
- Per-chunk metadata with byte offsets
#### 3. Byte-Accurate Page Tracking (BREAKING CHANGE)
This is a critical improvement for LLM applications:
- v3: Character-based indices (
- v4: Byte-based indices (
Additional page features:
- O(1) lookup: "which page is byte offset X on?" → instant answer
- Per-page content extraction
- Page markers in combined text (e.g.,
- Automatic chunk-to-page mapping for citations
#### 4. Enhanced Token Reduction for LLM Context
Enhanced from v3 with three configurable modes to save on LLM costs:
- Light mode: ~15% reduction (preserve most detail)
- Moderate mode: ~30% reduction (balanced)
- Aggressive mode: ~50% reduction (key information only)
Uses TF-IDF sentence scoring with position-aware weighting and language-specific stopword filtering. SIMD-accelerated for improved performance over v3.
#### 5. Language Detection (NOW BUILT-IN)
- 68 language support with confidence scoring
- Multi-language detection (documents with mixed languages)
- ISO 639-1 and ISO 639-3 code support
- Configurable confidence thresholds
#### 6. Keyword Extraction (NOW BUILT-IN)
Now built into core (previously optional KeyBERT in v3):
- YAKE (Yet Another Keyword Extractor): Unsupervised, language-independent
- RAKE (Rapid Automatic Keyword Extraction): Fast statistical method
- Configurable n-grams
v4 native parsers:
- Zero external dependencies - everything is native Rust
- Direct parsing with full control over extraction
- Substantially more metadata extracted (e.g., DOCX document properties, section structure, style information)
- Streaming support for massive files (tested on multi-GB XML documents with stable memory)
- Example: PPTX extractor is now a fully streaming parser capable of handling gigabyte-scale presentations with constant memory usage and high throughput
### New File Format Support
v4 expanded format support from ~20 to 56+ file formats, including:
Added legacy format support:
-
.doc (Word 97-2003)-
.ppt (PowerPoint 97-2003)-
.xls (Excel 97-2003)-
.eml (Email messages)-
.msg (Outlook messages)Added academic/technical formats:
- LaTeX (
.tex)- BibTeX (
.bib)- Typst (
.typ)- JATS XML (scientific articles)
- DocBook XML
- FictionBook (
.fb2)- OPML (
.opml)Better Office support:
- XLSB, XLSM (Excel binary/macro formats)
- Better structured metadata extraction from DOCX/PPTX/XLSX
- Full table extraction from presentations
- Image extraction with deduplication
### New Features: Full Document Intelligence Solution
The v4 rewrite was also an opportunity to close gaps with commercial alternatives and add features specifically designed for RAG applications and LLM workflows:
#### 1. Embeddings (NEW)
- FastEmbed integration with full ONNX Runtime acceleration
- Three presets:
"fast" (384d), "balanced" (512d), "quality" (768d/1024d)- Custom model support (bring your own ONNX model)
- Local generation (no API calls, no rate limits)
- Automatic model downloading and caching
- Per-chunk embedding generation
from kreuzberg import ExtractionConfig, EmbeddingConfig, EmbeddingModelType
config = ExtractionConfig(
embeddings=EmbeddingConfig(
model=EmbeddingModelType.preset("balanced"),
normalize=True
)
)
result = kreuzberg.extract_bytes(pdf_bytes, config=config)
# result.embeddings contains vectors for each chunk
#### 2. Semantic Text Chunking (NOW BUILT-IN)
Now integrated directly into the core (v3 used external semantic-text-splitter library):
- Structure-aware chunking that respects document semantics
- Two strategies:
- Generic text chunker (whitespace/punctuation-aware)
- Markdown chunker (preserves headings, lists, code blocks, tables)
- Configurable chunk size and overlap
- Unicode-safe (handles CJK, emojis correctly)
- Automatic chunk-to-page mapping
- Per-chunk metadata with byte offsets
#### 3. Byte-Accurate Page Tracking (BREAKING CHANGE)
This is a critical improvement for LLM applications:
- v3: Character-based indices (
char_start/char_end) - incorrect for UTF-8 multi-byte characters- v4: Byte-based indices (
byte_start/byte_end) - correct for all string operationsAdditional page features:
- O(1) lookup: "which page is byte offset X on?" → instant answer
- Per-page content extraction
- Page markers in combined text (e.g.,
--- Page 5 ---)- Automatic chunk-to-page mapping for citations
#### 4. Enhanced Token Reduction for LLM Context
Enhanced from v3 with three configurable modes to save on LLM costs:
- Light mode: ~15% reduction (preserve most detail)
- Moderate mode: ~30% reduction (balanced)
- Aggressive mode: ~50% reduction (key information only)
Uses TF-IDF sentence scoring with position-aware weighting and language-specific stopword filtering. SIMD-accelerated for improved performance over v3.
#### 5. Language Detection (NOW BUILT-IN)
- 68 language support with confidence scoring
- Multi-language detection (documents with mixed languages)
- ISO 639-1 and ISO 639-3 code support
- Configurable confidence thresholds
#### 6. Keyword Extraction (NOW BUILT-IN)
Now built into core (previously optional KeyBERT in v3):
- YAKE (Yet Another Keyword Extractor): Unsupervised, language-independent
- RAKE (Rapid Automatic Keyword Extraction): Fast statistical method
- Configurable n-grams