DevTool+ - A VSCode extension that provides common developer tools with well-designed UI
https://github.com/fuzionix/devtool-plus
https://redd.it/1mq5qfo
@r_opensource
https://github.com/fuzionix/devtool-plus
https://redd.it/1mq5qfo
@r_opensource
GitHub
GitHub - fuzionix/devtool-plus: A VSCode extension that provides common developer tools directly in code editor
A VSCode extension that provides common developer tools directly in code editor - fuzionix/devtool-plus
Writing a book in the age of open source: The power of engineering applied to writing
https://blog.incrementalforgetting.tech/p/sculpting-a-book-the-chisel?r=1tixy7&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false
https://redd.it/1mq9n2b
@r_opensource
https://blog.incrementalforgetting.tech/p/sculpting-a-book-the-chisel?r=1tixy7&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false
https://redd.it/1mq9n2b
@r_opensource
blog.incrementalforgetting.tech
Writing a book in the age of open source
The power of engineering applied to writing
I made a telegram bot template
I made this template for python-telegram-bot which covers almost every integral part of a telegram bot in addition to some nice decorators and utils. After about 6 years of python telegram bot development (not full time) I can finally say this template is indeed perfect, at least for me. Hope it'll be of use for you too
https://github.com/zmn-hamid/TeleTemplate
https://redd.it/1mqckdj
@r_opensource
I made this template for python-telegram-bot which covers almost every integral part of a telegram bot in addition to some nice decorators and utils. After about 6 years of python telegram bot development (not full time) I can finally say this template is indeed perfect, at least for me. Hope it'll be of use for you too
https://github.com/zmn-hamid/TeleTemplate
https://redd.it/1mqckdj
@r_opensource
GitHub
GitHub - zmn-hamid/TeleTemplate: a python telegram bot template with database integration
a python telegram bot template with database integration - zmn-hamid/TeleTemplate
🎬 FrameExtractionTool - Extract Perfect Frames from Videos with SwiftUI
**Hey Everyone!**
I just released my latest side project - **FrameExtractionTool** \- a simple iOS app for extracting high-quality frames from videos.
**📱 What it does:**
* Video Selection: Pick any video from your photo library
* Frame-Perfect Playback: Custom video player with precise timeline control
* Frame Marking: Mark specific moments during playback
* High-Quality Extraction: Save frames at original video resolution
* Custom Albums: Organize extracted frames in custom photo albums
**🛠️ Built with:**
* **SwiftUI** \+ **AVFoundation**
* **GitHub Actions** for automated builds
⚠️ **Important Disclaimer:**
This is a **very barebone app** as a side project of mine. The main goals were to:
* Learn how AI can help build apps
* Play around with SwiftUI and modern iOS development
* Experiment with SF Symbols and Icon Composer
* Explore automated CI/CD with GitHub Actions
**This app is very heavily developed using AI.** Bugs are expected! 🐛
**🎯 Why I built this:**
I often needed to extract specific frames from videos for presentations, memes, or reference images. And I don't see a same app that offers similar functionality for free. Therefore, I tried using AI and built it myself.
**🔗 Links:**
* **GitHub**: [FrameExtractionTool](https://github.com/CasperOng/FrameExtractionTool/)
* **Releases**: Check the releases page for unsigned IPA files.
**🤝 Contributing:**
Feel free to:
* Open issues for bugs 🐛
* Submit pull requests with fixes 🔧
* Suggest new features 💡
* Roast my (AI's) code (gently please) 😅
**TL;DR**: Made a simple frame extraction app with SwiftUI as an AI-assisted learning project. It works, has bugs, and is open source. Come try it! 😄
https://redd.it/1mqa4ib
@r_opensource
**Hey Everyone!**
I just released my latest side project - **FrameExtractionTool** \- a simple iOS app for extracting high-quality frames from videos.
**📱 What it does:**
* Video Selection: Pick any video from your photo library
* Frame-Perfect Playback: Custom video player with precise timeline control
* Frame Marking: Mark specific moments during playback
* High-Quality Extraction: Save frames at original video resolution
* Custom Albums: Organize extracted frames in custom photo albums
**🛠️ Built with:**
* **SwiftUI** \+ **AVFoundation**
* **GitHub Actions** for automated builds
⚠️ **Important Disclaimer:**
This is a **very barebone app** as a side project of mine. The main goals were to:
* Learn how AI can help build apps
* Play around with SwiftUI and modern iOS development
* Experiment with SF Symbols and Icon Composer
* Explore automated CI/CD with GitHub Actions
**This app is very heavily developed using AI.** Bugs are expected! 🐛
**🎯 Why I built this:**
I often needed to extract specific frames from videos for presentations, memes, or reference images. And I don't see a same app that offers similar functionality for free. Therefore, I tried using AI and built it myself.
**🔗 Links:**
* **GitHub**: [FrameExtractionTool](https://github.com/CasperOng/FrameExtractionTool/)
* **Releases**: Check the releases page for unsigned IPA files.
**🤝 Contributing:**
Feel free to:
* Open issues for bugs 🐛
* Submit pull requests with fixes 🔧
* Suggest new features 💡
* Roast my (AI's) code (gently please) 😅
**TL;DR**: Made a simple frame extraction app with SwiftUI as an AI-assisted learning project. It works, has bugs, and is open source. Come try it! 😄
https://redd.it/1mqa4ib
@r_opensource
GitHub
GitHub - CasperOng/FrameExtractionTool
Contribute to CasperOng/FrameExtractionTool development by creating an account on GitHub.
Anyone interested in an interesting project for an anti-bot?
All of you here likely know the dead internet theory, it’s especially bad on places like Reddit, twitter, comment sections etc.
I was thinking, maybe it’s time to try and get a group of folks together and build an open source bot detector, there has too be some way to train a program to detect likely bot activity with fairly high confidence.
Here’s why it needs to be open source and crowdsourced: we need huge amounts of data to train on human accounts and bot accounts.
But imagine a world where you can call on a Reddit bot, or twitter bot (ironic I know) and it will scan a account, then give a confidence score of how likely the account is run by a bot.
I’m fairly new into programming and ML, but I’m learning. I am however a technology consultant, meaning it’s literally my job to think of new ideas and ways to use tech, like this, then figure out how to make it happen.
So that’s what I’m doing now.
https://redd.it/1mq5xfi
@r_opensource
All of you here likely know the dead internet theory, it’s especially bad on places like Reddit, twitter, comment sections etc.
I was thinking, maybe it’s time to try and get a group of folks together and build an open source bot detector, there has too be some way to train a program to detect likely bot activity with fairly high confidence.
Here’s why it needs to be open source and crowdsourced: we need huge amounts of data to train on human accounts and bot accounts.
But imagine a world where you can call on a Reddit bot, or twitter bot (ironic I know) and it will scan a account, then give a confidence score of how likely the account is run by a bot.
I’m fairly new into programming and ML, but I’m learning. I am however a technology consultant, meaning it’s literally my job to think of new ideas and ways to use tech, like this, then figure out how to make it happen.
So that’s what I’m doing now.
https://redd.it/1mq5xfi
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
Bodhveda - open source notifications for developers
I wanted to add notifications to one of my products and I couldn't find a solution that was open source and I could self host but most are closed source, except Novu and are expensive $1 to $5 per 1,000 notifications.
So I built Bodhveda \- an open-source notification platform that lets developers add in-app notifications to their products in minutes — not weeks. Whether you’re launching your first product or scaling to millions, Bodhveda handles delivery, preferences, and analytics so you can focus on what matters.
GitHub - https://github.com/MudgalLabs/bodhveda
Website - https://bodhveda.com
Docs - https://docs.bodhveda.com
https://redd.it/1mqgjyb
@r_opensource
I wanted to add notifications to one of my products and I couldn't find a solution that was open source and I could self host but most are closed source, except Novu and are expensive $1 to $5 per 1,000 notifications.
So I built Bodhveda \- an open-source notification platform that lets developers add in-app notifications to their products in minutes — not weeks. Whether you’re launching your first product or scaling to millions, Bodhveda handles delivery, preferences, and analytics so you can focus on what matters.
GitHub - https://github.com/MudgalLabs/bodhveda
Website - https://bodhveda.com
Docs - https://docs.bodhveda.com
https://redd.it/1mqgjyb
@r_opensource
Bodhveda
Bodhveda - Notifications for developers
Bodhveda is a notification platform that lets developers add in-app notifications to their products in minutes — not weeks. You send. We deliver.
What are you building right now?
Tell us what your open-source project is about. Let’s check out each other’s projects
https://redd.it/1mqph1s
@r_opensource
Tell us what your open-source project is about. Let’s check out each other’s projects
https://redd.it/1mqph1s
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
I was tired of dealing with image-based subnoscripts, so I built Subnoscript Forge, a cross-platform tool to extract and convert them to SRT.
Hey everyone,
Like many of you who manage a media library, I often run into video files with embedded image-based subnoscripts (like PGS for Blu-rays or VobSub for DVDs). Getting those
into the universally compatible .srt format was always a hassle, requiring multiple tools and steps.
To solve this for myself, I created Subnoscript Forge, a desktop application for macOS, and Linux that makes the process much simpler.
It's a tool with both a GUI and a CLI, but the main features of the GUI version are:
* Extract & Convert: Pulls subnoscripts directly from MKV files.
* OCR for Image Subnoscripts: Converts PGS (SUP) and VobSub (SUB/IDX) subnoscripts into text-based SRT files using OCR. It also handles ASS/SSA to SRT conversion.
* Batch Processing: You can load a video file and process multiple subnoscript tracks at once.
* Insert Subnoscripts: You can also use it to add an external SRT file back into an MKV.
* Modern GUI: It has a clean, simple drag-and-drop interface, progress bars with time estimates, and dark theme support.
The app is built with Go and the Fyne (https://fyne.io/) toolkit for the cross-platform GUI. It's open-source, and I'm hoping to get some feedback from the community to
make it even better.
You can check it out, see screenshots, and find the installation instructions over on GitHub:
**https://github.com/VenimK/Subnoscript-Forge**
I'd love to hear what you think! Let me know if you have any questions or suggestions.
https://redd.it/1mqpmkb
@r_opensource
Hey everyone,
Like many of you who manage a media library, I often run into video files with embedded image-based subnoscripts (like PGS for Blu-rays or VobSub for DVDs). Getting those
into the universally compatible .srt format was always a hassle, requiring multiple tools and steps.
To solve this for myself, I created Subnoscript Forge, a desktop application for macOS, and Linux that makes the process much simpler.
It's a tool with both a GUI and a CLI, but the main features of the GUI version are:
* Extract & Convert: Pulls subnoscripts directly from MKV files.
* OCR for Image Subnoscripts: Converts PGS (SUP) and VobSub (SUB/IDX) subnoscripts into text-based SRT files using OCR. It also handles ASS/SSA to SRT conversion.
* Batch Processing: You can load a video file and process multiple subnoscript tracks at once.
* Insert Subnoscripts: You can also use it to add an external SRT file back into an MKV.
* Modern GUI: It has a clean, simple drag-and-drop interface, progress bars with time estimates, and dark theme support.
The app is built with Go and the Fyne (https://fyne.io/) toolkit for the cross-platform GUI. It's open-source, and I'm hoping to get some feedback from the community to
make it even better.
You can check it out, see screenshots, and find the installation instructions over on GitHub:
**https://github.com/VenimK/Subnoscript-Forge**
I'd love to hear what you think! Let me know if you have any questions or suggestions.
https://redd.it/1mqpmkb
@r_opensource
GitHub
GitHub - VenimK/Subnoscript-Forge: Subnoscript Forge is a GUI written in Go for subnoscripts
Subnoscript Forge is a GUI written in Go for subnoscripts - VenimK/Subnoscript-Forge
Shark WebAuthn library for .NET
Hello everyone,
Over the past few months, I have been working on a server-side implementation of the WebAuthn standard for .NET as an alternative to existing solutions.
You can check out the project here: https://github.com/linuxchata/fido2
I’d love to hear what you think. Do you see any areas for improvement? Are there features you’d like to see added? Any kind of feedback, advice, or questions are appreciated.
Thanks in advance!
https://redd.it/1mqsw2p
@r_opensource
Hello everyone,
Over the past few months, I have been working on a server-side implementation of the WebAuthn standard for .NET as an alternative to existing solutions.
You can check out the project here: https://github.com/linuxchata/fido2
I’d love to hear what you think. Do you see any areas for improvement? Are there features you’d like to see added? Any kind of feedback, advice, or questions are appreciated.
Thanks in advance!
https://redd.it/1mqsw2p
@r_opensource
GitHub
GitHub - linuxchata/fido2: Shark WebAuthn library for .NET
Shark WebAuthn library for .NET. Contribute to linuxchata/fido2 development by creating an account on GitHub.
Build the buddy that gets you! We open-sourced a complete AI voice interaction system!
Hey everyone, we just open-sourced Buddie: a complete, AI-powered voice interaction system we built from the ground up, so you can create your own AI buddy.
It's a full-stack platform for developers, hackers, and students, including custom hardware, firmware, and a mobile app. Therefore, you can use our solution to create various forms of AI devices, such as earphones, speakers, bracelets, toys, or desktop ornaments.
What it can do:
Live transcribe & summarize meetings, calls, or in-person chats.
Get real-time hints during conversations .
Talk to LLMs completely hands-free.
Context-aware help without needing to repeat yourself.
We've put everything on GitHub, including docs, to get you started. We're just getting started and would love to hear your ideas, questions, or even wild feature requests. Let us know what you think!
https://redd.it/1mqxxjs
@r_opensource
Hey everyone, we just open-sourced Buddie: a complete, AI-powered voice interaction system we built from the ground up, so you can create your own AI buddy.
It's a full-stack platform for developers, hackers, and students, including custom hardware, firmware, and a mobile app. Therefore, you can use our solution to create various forms of AI devices, such as earphones, speakers, bracelets, toys, or desktop ornaments.
What it can do:
Live transcribe & summarize meetings, calls, or in-person chats.
Get real-time hints during conversations .
Talk to LLMs completely hands-free.
Context-aware help without needing to repeat yourself.
We've put everything on GitHub, including docs, to get you started. We're just getting started and would love to hear your ideas, questions, or even wild feature requests. Let us know what you think!
https://redd.it/1mqxxjs
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
Anyone else got charged a few cents by GitHub for an open-source repo?
I just noticed something odd and wanted to check if it’s only me.
On July 27, 2025, I opened a support ticket with GitHub after receiving an invoice that showed my public open-source repository being billed under “metered” usage. From what I understand, public repos shouldn’t trigger these charges.
I only got a reply on August 12, and the next day they explained it was a bug: some users were charged a couple of cents for metered billing products, even when they shouldn’t have been. They reversed the charge and said they’re working on a fix.
That’s fine — but now I’m wondering: how many other people saw a tiny $0.02 or $0.03 charge and didn’t bother contacting support?
Has anyone else here noticed small, unexpected charges for public repos recently?
https://redd.it/1mqz5sr
@r_opensource
I just noticed something odd and wanted to check if it’s only me.
On July 27, 2025, I opened a support ticket with GitHub after receiving an invoice that showed my public open-source repository being billed under “metered” usage. From what I understand, public repos shouldn’t trigger these charges.
I only got a reply on August 12, and the next day they explained it was a bug: some users were charged a couple of cents for metered billing products, even when they shouldn’t have been. They reversed the charge and said they’re working on a fix.
That’s fine — but now I’m wondering: how many other people saw a tiny $0.02 or $0.03 charge and didn’t bother contacting support?
Has anyone else here noticed small, unexpected charges for public repos recently?
https://redd.it/1mqz5sr
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
Curl Keeps Cars Rolling: How a Tiny Open-Source Tool Powers Millions of Vehicles
https://ostechnix.com/curl-runs-in-top-car-brands/
https://redd.it/1mqzy8d
@r_opensource
https://ostechnix.com/curl-runs-in-top-car-brands/
https://redd.it/1mqzy8d
@r_opensource
OSTechNix
Curl Runs In The World's Top 47 Car Brands [August 2025 Report] - OSTechNix
Learn how curl, a small open-source tool, is built into hundreds of millions of cars, including models from the world’s top 47 car brands.
A secure and private Android call recording app without root?
For Android 10 or higher.
https://redd.it/1mr3jwv
@r_opensource
For Android 10 or higher.
https://redd.it/1mr3jwv
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
Build Your own AI Agents
We've released Denser Agent as an open-source project! You can build your AI agents with weather forecast, meeting scheduling and database analytics capabilities.
GitHub: https://github.com/denser-org/denser-agent/
Youtube tutorial & Demo: https://www.youtube.com/watch?v=3\_KledHS-WM
Happy building on your AI Agents! 🛠️
https://redd.it/1mr4e7g
@r_opensource
We've released Denser Agent as an open-source project! You can build your AI agents with weather forecast, meeting scheduling and database analytics capabilities.
GitHub: https://github.com/denser-org/denser-agent/
Youtube tutorial & Demo: https://www.youtube.com/watch?v=3\_KledHS-WM
Happy building on your AI Agents! 🛠️
https://redd.it/1mr4e7g
@r_opensource
YouTube
Build Powerful AI Agents with Denser - Open Source, Tutorial & Demo
Learn how to build production-ready AI agents with simple, extensible code! In this tutorial, we demonstrate the Denser Agent framework and show you how to create AI agents with real-world capabilities including weather forecasting, meeting scheduling,…
Zulip 11.0: Organized chat for distributed teams
https://blog.zulip.com/2025/08/13/zulip-11-0-released/
https://redd.it/1mr6795
@r_opensource
https://blog.zulip.com/2025/08/13/zulip-11-0-released/
https://redd.it/1mr6795
@r_opensource
The Zulip Blog
Zulip 11.0: Organized chat for distributed teams
We’re excited to announce the release of Zulip Server 11.0, containing hundreds
of new features and bug fixes: message reminders, support for channels without
topics, channel folders, and so much more! Over 3,300 new commits have been
merged across the project…
of new features and bug fixes: message reminders, support for channels without
topics, channel folders, and so much more! Over 3,300 new commits have been
merged across the project…
Project: Unstructored -> structured
I’m building an open-source AI Agent that converts messy, unstructured documents into clean, structured data.
The idea is simple:
You upload multiple documents — invoices, purchase orders, contracts, medical reports, etc. — and get back structured data (CSV tables) so you can visualize and work with your information more easily.
Here’s the approach I’m testing:
1. inference_schema
A vLLM analyzes your documents and suggests the best JSON schema for them — regardless of the document type.
This schema acts as the “official” structure for all files in the batch.
2. invoice_data_capture
A specialized LLM maps the extracted fields strictly to the schema.
For each uploaded document, it returns something like this, always following the same structure:
>
3. generate_csv
Once all documents are structured in JSON, another specialized LLM (with tools like Pandas) designs CSV tables to clearly present the extracted data.
💬 What do you think about this approach? All feedback is welcome
https://redd.it/1mr9rms
@r_opensource
I’m building an open-source AI Agent that converts messy, unstructured documents into clean, structured data.
The idea is simple:
You upload multiple documents — invoices, purchase orders, contracts, medical reports, etc. — and get back structured data (CSV tables) so you can visualize and work with your information more easily.
Here’s the approach I’m testing:
1. inference_schema
A vLLM analyzes your documents and suggests the best JSON schema for them — regardless of the document type.
This schema acts as the “official” structure for all files in the batch.
2. invoice_data_capture
A specialized LLM maps the extracted fields strictly to the schema.
For each uploaded document, it returns something like this, always following the same structure:
>
3. generate_csv
Once all documents are structured in JSON, another specialized LLM (with tools like Pandas) designs CSV tables to clearly present the extracted data.
💬 What do you think about this approach? All feedback is welcome
https://redd.it/1mr9rms
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community
Open-Source Civic Framework – Looking for Collaborators & Review
Open-source governance toolkit — modular, forkable, and maybe just a little bit sci-fi. Want to help shape it?
I’ve published the **first draft** of an open-source civic framework called **Constella**. It’s intended as a **modular governance toolkit** for communities, blending practical civic processes with some creative concepts (cosmic citizenship, AI companions).
GitHub repo:
📄 [Constella Framework – GitHub](https://github.com/Nightmarejam/constella-framework)
Looking for:
* Code review & contribution
* Ideas for modular features
* Advice on making the repo more contributor-friendly
https://redd.it/1mrdc52
@r_opensource
Open-source governance toolkit — modular, forkable, and maybe just a little bit sci-fi. Want to help shape it?
I’ve published the **first draft** of an open-source civic framework called **Constella**. It’s intended as a **modular governance toolkit** for communities, blending practical civic processes with some creative concepts (cosmic citizenship, AI companions).
GitHub repo:
📄 [Constella Framework – GitHub](https://github.com/Nightmarejam/constella-framework)
Looking for:
* Code review & contribution
* Ideas for modular features
* Advice on making the repo more contributor-friendly
https://redd.it/1mrdc52
@r_opensource
GitHub
GitHub - Nightmarejam/constella-framework: Constella — proof-before-scale civic OS for resilient siting, healthy homes, and fair…
Constella — proof-before-scale civic OS for resilient siting, healthy homes, and fair governance. - Nightmarejam/constella-framework
Released Lanemu P2P VPN 0.12.3 - Open-source alternative to Hamachi
https://gitlab.com/Monsterovich/lanemu/-/releases/0.12.3
https://redd.it/1mrexkz
@r_opensource
https://gitlab.com/Monsterovich/lanemu/-/releases/0.12.3
https://redd.it/1mrexkz
@r_opensource
GitLab
Release 0.12.3 · Nikolay Borodin / Lanemu P2P VPN · GitLab
Updated OpenJDK downloader: added download speed indicator and the link to the new version of OpenJDK has been updated. Switched to Bouncy Castle LTS, which...
I needed an efficient way to convert 5tb of unstructured html into dictionaries using just my laptop, so I wrote doc2dict.
I'm the developer of an open source package to work with SEC data. It turns out the SEC has 5tb of html. This data is visually standardized to humans, but under the hood is a mess of different tags and css.
There are a couple existing solutions for parsing html, but they usually involve a combination of LLMs and OCR, which is slow and expensive. So, I decided to write a flexible, algorithmic solution: doc2dict.
Installation
pip install doc2dict
User interface
dct = html2dict(content,mappingdict=None) # converts content to dictionary
visualizedict(dct) # visualizes the dictionary using your browser.
Note: I don't use this UI much, as I mostly use it via my SEC package. Docs
# Architecture
1. Iterate through DOM and via inheritance get characteristics such as bold, visual height, italics, etc for text on same line (e.g. within a block) to create instructions, e.g.
2. Use a rule set to determine how to convert instructions into a nested dictionary. This is customizable. For example, the mapping dict below tells the parser that 'items' should be nested under 'parts', in addition to the default rules.
​
tenkmappingdict = {
('part',r'^part\s([ivx]+)$') : 0,
('signatures',r'^signatures?\.$') : 0,
('item',r'^item\s(\d+)') : 1,
}
Note: This approach kinda works for modern pdfs. The text stream is often in the order a human would view as correct, so this kinda works. I've added the functionality to doc2dict, but it's in an early stage. (AKA, it sucks).
# Benchmarks
Benchmarks vary as I update the package w.r.t. to features (tables are slow!). Via my laptop:
500 pages per second single threaded
5,000 pages per second multi threaded
# Links
doc2dict GitHub
[raw html](https://html-preview.github.io/?url=https://raw.githubusercontent.com/john-friedman/doc2dict/refs/heads/main/example_output/html/msft_10k_2024.html#:~:text=embracing)
dictionary visualization (old)
[instructions visualization](https://html-preview.github.io/?url=https://github.com/john-friedman/doc2dict/blob/main/example_output/html/instructions_visualization.html) (old)
dictionary (old)
https://redd.it/1mrbkno
@r_opensource
I'm the developer of an open source package to work with SEC data. It turns out the SEC has 5tb of html. This data is visually standardized to humans, but under the hood is a mess of different tags and css.
There are a couple existing solutions for parsing html, but they usually involve a combination of LLMs and OCR, which is slow and expensive. So, I decided to write a flexible, algorithmic solution: doc2dict.
Installation
pip install doc2dict
User interface
dct = html2dict(content,mappingdict=None) # converts content to dictionary
visualizedict(dct) # visualizes the dictionary using your browser.
Note: I don't use this UI much, as I mostly use it via my SEC package. Docs
# Architecture
1. Iterate through DOM and via inheritance get characteristics such as bold, visual height, italics, etc for text on same line (e.g. within a block) to create instructions, e.g.
[{'text': 'BOARD MEETINGS', 'all_caps': True, 'bold': True, 'font-size': 15.995999999999999}]2. Use a rule set to determine how to convert instructions into a nested dictionary. This is customizable. For example, the mapping dict below tells the parser that 'items' should be nested under 'parts', in addition to the default rules.
​
tenkmappingdict = {
('part',r'^part\s([ivx]+)$') : 0,
('signatures',r'^signatures?\.$') : 0,
('item',r'^item\s(\d+)') : 1,
}
Note: This approach kinda works for modern pdfs. The text stream is often in the order a human would view as correct, so this kinda works. I've added the functionality to doc2dict, but it's in an early stage. (AKA, it sucks).
# Benchmarks
Benchmarks vary as I update the package w.r.t. to features (tables are slow!). Via my laptop:
500 pages per second single threaded
5,000 pages per second multi threaded
# Links
doc2dict GitHub
[raw html](https://html-preview.github.io/?url=https://raw.githubusercontent.com/john-friedman/doc2dict/refs/heads/main/example_output/html/msft_10k_2024.html#:~:text=embracing)
dictionary visualization (old)
[instructions visualization](https://html-preview.github.io/?url=https://github.com/john-friedman/doc2dict/blob/main/example_output/html/instructions_visualization.html) (old)
dictionary (old)
https://redd.it/1mrbkno
@r_opensource
GitHub
GitHub - john-friedman/datamule-python: A package to work with SEC data. Incorporates datamule endpoints.
A package to work with SEC data. Incorporates datamule endpoints. - john-friedman/datamule-python
Best practice for including third-party licenses in an OSS library?
I built a public library that’s MIT-licensed (the license is in a LICENSE file).
The package uses some third-party code, each with its own license.
I’m trying to figure out the standard way to include those third-party licenses in my repo:
Add them directly to my LICENSE file?
Create a separate file like THIRDPARTYLICENSES or NOTICE?
Also, when someone uses my package, do they need to include all these third-party licenses in their app?
One concern: I’ve noticed that some app license generators only pull the main LICENSE file of each dependency, so if third-party licenses are in a separate file, they might be missed. How do you handle this?
My library has 300k downloads a month, and I think it’s time to fix this in the best way.
Currently I only have in the readme a section with links to the third party code that I use with their license type.
Thanks
https://redd.it/1mrep4m
@r_opensource
I built a public library that’s MIT-licensed (the license is in a LICENSE file).
The package uses some third-party code, each with its own license.
I’m trying to figure out the standard way to include those third-party licenses in my repo:
Add them directly to my LICENSE file?
Create a separate file like THIRDPARTYLICENSES or NOTICE?
Also, when someone uses my package, do they need to include all these third-party licenses in their app?
One concern: I’ve noticed that some app license generators only pull the main LICENSE file of each dependency, so if third-party licenses are in a separate file, they might be missed. How do you handle this?
My library has 300k downloads a month, and I think it’s time to fix this in the best way.
Currently I only have in the readme a section with links to the third party code that I use with their license type.
Thanks
https://redd.it/1mrep4m
@r_opensource
Reddit
From the opensource community on Reddit
Explore this post and more from the opensource community