Data Science Jupyter Notebooks – Telegram
Data Science Jupyter Notebooks
11.7K subscribers
287 photos
43 videos
9 files
843 links
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
Download Telegram
🔥 Trending Repository: glow

📝 Denoscription: Render markdown on the CLI, with pizzazz! 💅🏻

🔗 Repository URL: https://github.com/charmbracelet/glow

📖 Readme: https://github.com/charmbracelet/glow#readme

📊 Statistics:
🌟 Stars: 19.9K stars
👀 Watchers: 75
🍴 Forks: 480 forks

💻 Programming Languages: Go - Dockerfile

🏷️ Related Topics:
#markdown #cli #hacktoberfest #excitement


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: hacker-noscripts

📝 Denoscription: Based on a true story

🔗 Repository URL: https://github.com/NARKOZ/hacker-noscripts

📖 Readme: https://github.com/NARKOZ/hacker-noscripts#readme

📊 Statistics:
🌟 Stars: 49K stars
👀 Watchers: 2.1k
🍴 Forks: 6.7K forks

💻 Programming Languages: JavaScript - Python - Java - Perl - Kotlin - Clojure

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: moon-dev-ai-agents

📝 Denoscription: autonomous ai agents for trading in python

🔗 Repository URL: https://github.com/moondevonyt/moon-dev-ai-agents

🌐 Website: https://algotradecamp.com

📖 Readme: https://github.com/moondevonyt/moon-dev-ai-agents#readme

📊 Statistics:
🌟 Stars: 2.2K stars
👀 Watchers: 100
🍴 Forks: 1.1K forks

💻 Programming Languages: Python - HTML

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: agenticSeek

📝 Denoscription: Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin993886460 (Beware of fake account)

🔗 Repository URL: https://github.com/Fosowl/agenticSeek

🌐 Website: http://agenticseek.tech

📖 Readme: https://github.com/Fosowl/agenticSeek#readme

📊 Statistics:
🌟 Stars: 22.4K stars
👀 Watchers: 132
🍴 Forks: 2.4K forks

💻 Programming Languages: Python - JavaScript - CSS - Shell - Batchfile - HTML - Dockerfile

🏷️ Related Topics:
#ai #agents #autonomous_agents #voice_assistant #llm #llm_agents #agentic_ai #deepseek_r1


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: LinkSwift

📝 Denoscription: 一个基于 JavaScript 的网盘文件下载地址获取工具。基于【网盘直链下载助手】修改 ,支持 百度网盘 / 阿里云盘 / 中国移动云盘 / 天翼云盘 / 迅雷云盘 / 夸克网盘 / UC网盘 / 123云盘 八大网盘

🔗 Repository URL: https://github.com/hmjz100/LinkSwift

🌐 Website: https://github.com/hmjz100/LinkSwift/raw/main/%EF%BC%88%E6%94%B9%EF%BC%89%E7%BD%91%E7%9B%98%E7%9B%B4%E9%93%BE%E4%B8%8B%E8%BD%BD%E5%8A%A9%E6%89%8B.user.js

📖 Readme: https://github.com/hmjz100/LinkSwift#readme

📊 Statistics:
🌟 Stars: 7.9K stars
👀 Watchers: 26
🍴 Forks: 371 forks

💻 Programming Languages: JavaScript

🏷️ Related Topics:
#usernoscript #tampermonkey #aria2 #baidu #baiduyun #tampermonkey_noscript #baidunetdisk #tampermonkey_usernoscript #baidu_netdisk #motrix #aliyun_drive #123pan #189_cloud #139_cloud #xunlei_netdisk #quark_netdisk #ali_netdisk #yidong_netdisk #tianyi_netdisk #uc_netdisk


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
Forwarded from Kaggle Data Hub
Unlock premium learning without spending a dime! ⭐️ @DataScienceC is the first Telegram channel dishing out free Udemy coupons daily—grab courses on data science, coding, AI, and beyond. Join the revolution and boost your skills for free today! 📕

What topic are you itching to learn next? 😊
https://news.1rj.ru/str/DataScienceC 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
3
🔥 Trending Repository: pytorch

📝 Denoscription: Tensors and Dynamic neural networks in Python with strong GPU acceleration

🔗 Repository URL: https://github.com/pytorch/pytorch

🌐 Website: https://pytorch.org

📖 Readme: https://github.com/pytorch/pytorch#readme

📊 Statistics:
🌟 Stars: 94.5K stars
👀 Watchers: 1.8k
🍴 Forks: 25.8K forks

💻 Programming Languages: Python - C++ - Cuda - C - Objective-C++ - CMake

🏷️ Related Topics:
#python #machine_learning #deep_learning #neural_network #gpu #numpy #autograd #tensor


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: LocalAI

📝 Denoscription: 🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference

🔗 Repository URL: https://github.com/mudler/LocalAI

🌐 Website: https://localai.io

📖 Readme: https://github.com/mudler/LocalAI#readme

📊 Statistics:
🌟 Stars: 36.4K stars
👀 Watchers: 241
🍴 Forks: 2.9K forks

💻 Programming Languages: Go - HTML - Python - JavaScript - Shell - C++

🏷️ Related Topics:
#api #ai #mcp #decentralized #text_generation #distributed #tts #image_generation #llama #object_detection #mamba #libp2p #gemma #mistral #audio_generation #llm #stable_diffusion #rwkv #musicgen #rerank


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: PageIndex

📝 Denoscription: 📄🧠 PageIndex: Document Index for Reasoning-based RAG

🔗 Repository URL: https://github.com/VectifyAI/PageIndex

🌐 Website: https://pageindex.ai

📖 Readme: https://github.com/VectifyAI/PageIndex#readme

📊 Statistics:
🌟 Stars: 3.1K stars
👀 Watchers: 24
🍴 Forks: 243 forks

💻 Programming Languages: Python - Jupyter Notebook

🏷️ Related Topics:
#ai #retrieval #reasoning #rag #llm


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: opentui

📝 Denoscription: OpenTUI is a library for building terminal user interfaces (TUIs)

🔗 Repository URL: https://github.com/sst/opentui

🌐 Website: https://opentui.com

📖 Readme: https://github.com/sst/opentui#readme

📊 Statistics:
🌟 Stars: 3.3K stars
👀 Watchers: 19
🍴 Forks: 122 forks

💻 Programming Languages: TypeScript - Zig - Go - Tree-sitter Query - Shell - Vue

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: awesome-rl-for-cybersecurity

📝 Denoscription: A curated list of resources dedicated to reinforcement learning applied to cyber security.

🔗 Repository URL: https://github.com/Kim-Hammar/awesome-rl-for-cybersecurity

📖 Readme: https://github.com/Kim-Hammar/awesome-rl-for-cybersecurity#readme

📊 Statistics:
🌟 Stars: 948 stars
👀 Watchers: 32
🍴 Forks: 137 forks

💻 Programming Languages: Not available

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceN
1
🔥 Trending Repository: How-To-Secure-A-Linux-Server

📝 Denoscription: An evolving how-to guide for securing a Linux server.

🔗 Repository URL: https://github.com/imthenachoman/How-To-Secure-A-Linux-Server

📖 Readme: https://github.com/imthenachoman/How-To-Secure-A-Linux-Server#readme

📊 Statistics:
🌟 Stars: 20.5K stars
👀 Watchers: 339
🍴 Forks: 1.3K forks

💻 Programming Languages: Not available

🏷️ Related Topics:
#linux #security #server #hardening #security_hardening #linux_server #cc_by_sa #hardening_steps


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: edgevpn

📝 Denoscription: The immutable, decentralized, statically built p2p VPN without any central server and automatic discovery! Create decentralized introspectable tunnels over p2p with shared tokens

🔗 Repository URL: https://github.com/mudler/edgevpn

🌐 Website: https://mudler.github.io/edgevpn

📖 Readme: https://github.com/mudler/edgevpn#readme

📊 Statistics:
🌟 Stars: 1.3K stars
👀 Watchers: 22
🍴 Forks: 149 forks

💻 Programming Languages: Go - HTML

🏷️ Related Topics:
#kubernetes #tunnel #golang #networking #mesh_networks #ipfs #nat #blockchain #p2p #vpn #mesh #golang_library #libp2p #cloudvpn #ipfs_blockchain #holepunch #p2pvpn


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: cs-self-learning

📝 Denoscription: 计算机自学指南

🔗 Repository URL: https://github.com/PKUFlyingPig/cs-self-learning

🌐 Website: https://csdiy.wiki

📖 Readme: https://github.com/PKUFlyingPig/cs-self-learning#readme

📊 Statistics:
🌟 Stars: 68.5K stars
👀 Watchers: 341
🍴 Forks: 7.7K forks

💻 Programming Languages: HTML

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
💡 Top 70 Web Scraping Operations in Python

I. Making HTTP Requests (requests)

• Import the library.
import requests

• Make a GET request to a URL.
response = requests.get('http://example.com')

• Check the response status code (200 is OK).
print(response.status_code)

• Access the raw HTML content (as bytes).
html_bytes = response.content

• Access the HTML content (as a string).
html_text = response.text

• Access response headers.
print(response.headers)

• Send a custom User-Agent header.
headers = {'User-Agent': 'My Cool Scraper 1.0'}
response = requests.get('http://example.com', headers=headers)

• Pass URL parameters in a request.
params = {'q': 'python scraping'}
response = requests.get('https://www.google.com/search', params=params)

• Make a POST request with form data.
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('http://httpbin.org/post', data=payload)

• Handle potential request errors.
try:
response = requests.get('http://example.com', timeout=5)
response.raise_for_status() # Raise an exception for bad status codes
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")


II. Parsing HTML with BeautifulSoup (Setup & Navigation)

• Import the library.
from bs4 import BeautifulSoup

• Create a BeautifulSoup object from HTML text.
soup = BeautifulSoup(html_text, 'html.parser')

• Prettify the parsed HTML for readability.
print(soup.prettify())

• Access a tag directly by name (gets the first one).
noscript_tag = soup.noscript

• Navigate to a tag's parent.
noscript_parent = soup.noscript.parent

• Get an iterable of a tag's children.
for child in soup.head.children:
print(child.name)

• Get the next sibling tag.
first_p = soup.find('p')
next_p = first_p.find_next_sibling('p')

• Get the previous sibling tag.
second_p = soup.find_all('p')[1]
prev_p = second_p.find_previous_sibling('p')


III. Finding Elements with BeautifulSoup
• Find the first occurrence of a tag.
first_link = soup.find('a')

• Find all occurrences of a tag.
all_links = soup.find_all('a')

• Find tags by their CSS class.
articles = soup.find_all('div', class_='article-content')

• Find a tag by its ID.
main_content = soup.find(id='main-container')

• Find tags by other attributes.
images = soup.find_all('img', attrs={'data-src': True})

• Find using a list of multiple tags.
headings = soup.find_all(['h1', 'h2', 'h3'])

• Find using a regular expression.
import re
links_with_blog = soup.find_all('a', href=re.compile(r'blog'))

• Find using a custom function.
# Finds tags with a 'class' but no 'id'
tags = soup.find_all(lambda tag: tag.has_attr('class') and not tag.has_attr('id'))

• Limit the number of results.
first_five_links = soup.find_all('a', limit=5)

• Use CSS Selectors to find one element.
footer = soup.select_one('#footer > p')

• Use CSS Selectors to find all matching elements.
article_links = soup.select('div.article a')

• Select direct children using CSS selector.
nav_items = soup.select('ul.nav > li')


IV. Extracting Data with BeautifulSoup

• Get the text content from a tag.
noscript_text = soup.noscript.get_text()

• Get stripped text content.
link_text = soup.find('a').get_text(strip=True)

• Get all text from the entire document.
all_text = soup.get_text()

• Get an attribute's value (like a URL).
link_url = soup.find('a')['href']

• Get the tag's name.
tag_name = soup.find('h1').name

• Get all attributes of a tag as a dictionary.
attrs_dict = soup.find('img').attrs


V. Parsing with lxml and XPath

• Import the library.
from lxml import html

• Parse HTML content with lxml.
tree = html.fromstring(response.content)

• Select elements using an XPath expression.
# Selects all <a> tags inside <div> tags with class 'nav'
links = tree.xpath('//div[@class="nav"]/a')

• Select text content directly with XPath.
# Gets the text of all <h1> tags
h1_texts = tree.xpath('//h1/text()')

• Select an attribute value with XPath.
# Gets all href attributes from <a> tags
hrefs = tree.xpath('//a/@href')


VI. Handling Dynamic Content (Selenium)

• Import the webdriver.
from selenium import webdriver

• Initialize a browser driver.
driver = webdriver.Chrome() # Requires chromedriver

• Navigate to a webpage.
driver.get('http://example.com')

• Find an element by its ID.
element = driver.find_element('id', 'my-element-id')

• Find elements by CSS Selector.
elements = driver.find_elements('css selector', 'div.item')

• Find an element by XPath.
button = driver.find_element('xpath', '//button[@type="submit"]')

• Click a button.
button.click()

• Enter text into an input field.
search_box = driver.find_element('name', 'q')
search_box.send_keys('Python Selenium')

• Wait for an element to become visible.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)

• Get the page source after JavaScript has executed.
dynamic_html = driver.page_source

• Close the browser window.
driver.quit()


VII. Common Tasks & Best Practices

• Handle pagination by finding the "Next" link.
next_page_url = soup.find('a', text='Next')['href']

• Save data to a CSV file.
import csv
with open('data.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Title', 'Link'])
# writer.writerow([noscript, url]) in a loop

• Save data to CSV using pandas.
import pandas as pd
df = pd.DataFrame(data, columns=['Title', 'Link'])
df.to_csv('data.csv', index=False)

• Use a proxy with requests.
proxies = {'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080'}
requests.get('http://example.com', proxies=proxies)

• Pause between requests to be polite.
import time
time.sleep(2) # Pause for 2 seconds

• Handle JSON data from an API.
json_response = requests.get('https://api.example.com/data').json()

• Download a file (like an image).
img_url = 'http://example.com/image.jpg'
img_data = requests.get(img_url).content
with open('image.jpg', 'wb') as handler:
handler.write(img_data)

• Parse a sitemap.xml to find all URLs.
# Get the sitemap.xml file and parse it like any other XML/HTML to extract <loc> tags.


VIII. Advanced Frameworks (Scrapy)

• Create a Scrapy spider (conceptual command).
scrapy genspider example example.com

• Define a parse method to process the response.
# In your spider class:
def parse(self, response):
# parsing logic here
pass

• Extract data using Scrapy's CSS selectors.
noscripts = response.css('h1::text').getall()

• Extract data using Scrapy's XPath selectors.
links = response.xpath('//a/@href').getall()

• Yield a dictionary of scraped data.
yield {'noscript': response.css('noscript::text').get()}

• Follow a link to parse the next page.
next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)

• Run a spider from the command line.
scrapy crawl example -o output.json

• Pass arguments to a spider.
scrapy crawl example -a category=books

• Create a Scrapy Item for structured data.
import scrapy
class ProductItem(scrapy.Item):
name = scrapy.Field()
price = scrapy.Field()

• Use an Item Loader to populate Items.
from scrapy.loader import ItemLoader
loader = ItemLoader(item=ProductItem(), response=response)
loader.add_css('name', 'h1.product-name::text')


#Python #WebScraping #BeautifulSoup #Selenium #Requests

━━━━━━━━━━━━━━━
By: @DataScienceN
3
🔥 Trending Repository: nocobase

📝 Denoscription: NocoBase is the most extensible AI-powered no-code/low-code platform for building business applications and enterprise solutions.

🔗 Repository URL: https://github.com/nocobase/nocobase

🌐 Website: https://www.nocobase.com

📖 Readme: https://github.com/nocobase/nocobase#readme

📊 Statistics:
🌟 Stars: 17.7K stars
👀 Watchers: 147
🍴 Forks: 2K forks

💻 Programming Languages: TypeScript - JavaScript - Smarty - Shell - Dockerfile - Less

🏷️ Related Topics:
#internal_tools #crud #crm #admin_dashboard #self_hosted #web_application #project_management #salesforce #developer_tools #airtable #workflows #low_code #no_code #app_builder #internal_tool #nocode #low_code_development_platform #no_code_platform #low_code_platform #low_code_framework


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: alertmanager

📝 Denoscription: Prometheus Alertmanager

🔗 Repository URL: https://github.com/prometheus/alertmanager

🌐 Website: https://prometheus.io

📖 Readme: https://github.com/prometheus/alertmanager#readme

📊 Statistics:
🌟 Stars: 7.3K stars
👀 Watchers: 166
🍴 Forks: 2.3K forks

💻 Programming Languages: Go - Elm - HTML - Makefile - TypeScript - JavaScript

🏷️ Related Topics:
#notifications #slack #monitoring #email #pagerduty #alertmanager #hacktoberfest #deduplication #opsgenie


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: gopeed

📝 Denoscription: A modern download manager that supports all platforms. Built with Golang and Flutter.

🔗 Repository URL: https://github.com/GopeedLab/gopeed

🌐 Website: https://gopeed.com

📖 Readme: https://github.com/GopeedLab/gopeed#readme

📊 Statistics:
🌟 Stars: 21K stars
👀 Watchers: 167
🍴 Forks: 1.5K forks

💻 Programming Languages: Dart - Go - C++ - CMake - Swift - Ruby

🏷️ Related Topics:
#android #windows #macos #golang #http #ios #torrent #downloader #debian #bittorrent #cross_platform #ubuntu #https #flutter #magnet


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: vertex-ai-creative-studio

📝 Denoscription: GenMedia Creative Studio is a Vertex AI generative media user experience highlighting the use of Imagen, Veo, Gemini 🍌, Gemini TTS, Chirp 3, Lyria and other generative media APIs on Google Cloud.

🔗 Repository URL: https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio

📖 Readme: https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio#readme

📊 Statistics:
🌟 Stars: 512 stars
👀 Watchers: 19
🍴 Forks: 200 forks

💻 Programming Languages: Jupyter Notebook - Python - TypeScript - Go - JavaScript - Shell

🏷️ Related Topics:
#google_cloud #gemini #chirp #imagen #veo #lyria #vertex_ai #nano_banana


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM