Data Science Jupyter Notebooks – Telegram
Data Science Jupyter Notebooks
11.7K subscribers
287 photos
43 videos
9 files
843 links
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
Download Telegram
Forwarded from Kaggle Data Hub
Unlock premium learning without spending a dime! ⭐️ @DataScienceC is the first Telegram channel dishing out free Udemy coupons daily—grab courses on data science, coding, AI, and beyond. Join the revolution and boost your skills for free today! 📕

What topic are you itching to learn next? 😊
https://news.1rj.ru/str/DataScienceC 🌟
Please open Telegram to view this post
VIEW IN TELEGRAM
3
🔥 Trending Repository: pytorch

📝 Denoscription: Tensors and Dynamic neural networks in Python with strong GPU acceleration

🔗 Repository URL: https://github.com/pytorch/pytorch

🌐 Website: https://pytorch.org

📖 Readme: https://github.com/pytorch/pytorch#readme

📊 Statistics:
🌟 Stars: 94.5K stars
👀 Watchers: 1.8k
🍴 Forks: 25.8K forks

💻 Programming Languages: Python - C++ - Cuda - C - Objective-C++ - CMake

🏷️ Related Topics:
#python #machine_learning #deep_learning #neural_network #gpu #numpy #autograd #tensor


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: LocalAI

📝 Denoscription: 🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference

🔗 Repository URL: https://github.com/mudler/LocalAI

🌐 Website: https://localai.io

📖 Readme: https://github.com/mudler/LocalAI#readme

📊 Statistics:
🌟 Stars: 36.4K stars
👀 Watchers: 241
🍴 Forks: 2.9K forks

💻 Programming Languages: Go - HTML - Python - JavaScript - Shell - C++

🏷️ Related Topics:
#api #ai #mcp #decentralized #text_generation #distributed #tts #image_generation #llama #object_detection #mamba #libp2p #gemma #mistral #audio_generation #llm #stable_diffusion #rwkv #musicgen #rerank


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: PageIndex

📝 Denoscription: 📄🧠 PageIndex: Document Index for Reasoning-based RAG

🔗 Repository URL: https://github.com/VectifyAI/PageIndex

🌐 Website: https://pageindex.ai

📖 Readme: https://github.com/VectifyAI/PageIndex#readme

📊 Statistics:
🌟 Stars: 3.1K stars
👀 Watchers: 24
🍴 Forks: 243 forks

💻 Programming Languages: Python - Jupyter Notebook

🏷️ Related Topics:
#ai #retrieval #reasoning #rag #llm


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: opentui

📝 Denoscription: OpenTUI is a library for building terminal user interfaces (TUIs)

🔗 Repository URL: https://github.com/sst/opentui

🌐 Website: https://opentui.com

📖 Readme: https://github.com/sst/opentui#readme

📊 Statistics:
🌟 Stars: 3.3K stars
👀 Watchers: 19
🍴 Forks: 122 forks

💻 Programming Languages: TypeScript - Zig - Go - Tree-sitter Query - Shell - Vue

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: awesome-rl-for-cybersecurity

📝 Denoscription: A curated list of resources dedicated to reinforcement learning applied to cyber security.

🔗 Repository URL: https://github.com/Kim-Hammar/awesome-rl-for-cybersecurity

📖 Readme: https://github.com/Kim-Hammar/awesome-rl-for-cybersecurity#readme

📊 Statistics:
🌟 Stars: 948 stars
👀 Watchers: 32
🍴 Forks: 137 forks

💻 Programming Languages: Not available

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceN
1
🔥 Trending Repository: How-To-Secure-A-Linux-Server

📝 Denoscription: An evolving how-to guide for securing a Linux server.

🔗 Repository URL: https://github.com/imthenachoman/How-To-Secure-A-Linux-Server

📖 Readme: https://github.com/imthenachoman/How-To-Secure-A-Linux-Server#readme

📊 Statistics:
🌟 Stars: 20.5K stars
👀 Watchers: 339
🍴 Forks: 1.3K forks

💻 Programming Languages: Not available

🏷️ Related Topics:
#linux #security #server #hardening #security_hardening #linux_server #cc_by_sa #hardening_steps


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: edgevpn

📝 Denoscription: The immutable, decentralized, statically built p2p VPN without any central server and automatic discovery! Create decentralized introspectable tunnels over p2p with shared tokens

🔗 Repository URL: https://github.com/mudler/edgevpn

🌐 Website: https://mudler.github.io/edgevpn

📖 Readme: https://github.com/mudler/edgevpn#readme

📊 Statistics:
🌟 Stars: 1.3K stars
👀 Watchers: 22
🍴 Forks: 149 forks

💻 Programming Languages: Go - HTML

🏷️ Related Topics:
#kubernetes #tunnel #golang #networking #mesh_networks #ipfs #nat #blockchain #p2p #vpn #mesh #golang_library #libp2p #cloudvpn #ipfs_blockchain #holepunch #p2pvpn


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: cs-self-learning

📝 Denoscription: 计算机自学指南

🔗 Repository URL: https://github.com/PKUFlyingPig/cs-self-learning

🌐 Website: https://csdiy.wiki

📖 Readme: https://github.com/PKUFlyingPig/cs-self-learning#readme

📊 Statistics:
🌟 Stars: 68.5K stars
👀 Watchers: 341
🍴 Forks: 7.7K forks

💻 Programming Languages: HTML

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
💡 Top 70 Web Scraping Operations in Python

I. Making HTTP Requests (requests)

• Import the library.
import requests

• Make a GET request to a URL.
response = requests.get('http://example.com')

• Check the response status code (200 is OK).
print(response.status_code)

• Access the raw HTML content (as bytes).
html_bytes = response.content

• Access the HTML content (as a string).
html_text = response.text

• Access response headers.
print(response.headers)

• Send a custom User-Agent header.
headers = {'User-Agent': 'My Cool Scraper 1.0'}
response = requests.get('http://example.com', headers=headers)

• Pass URL parameters in a request.
params = {'q': 'python scraping'}
response = requests.get('https://www.google.com/search', params=params)

• Make a POST request with form data.
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('http://httpbin.org/post', data=payload)

• Handle potential request errors.
try:
response = requests.get('http://example.com', timeout=5)
response.raise_for_status() # Raise an exception for bad status codes
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")


II. Parsing HTML with BeautifulSoup (Setup & Navigation)

• Import the library.
from bs4 import BeautifulSoup

• Create a BeautifulSoup object from HTML text.
soup = BeautifulSoup(html_text, 'html.parser')

• Prettify the parsed HTML for readability.
print(soup.prettify())

• Access a tag directly by name (gets the first one).
noscript_tag = soup.noscript

• Navigate to a tag's parent.
noscript_parent = soup.noscript.parent

• Get an iterable of a tag's children.
for child in soup.head.children:
print(child.name)

• Get the next sibling tag.
first_p = soup.find('p')
next_p = first_p.find_next_sibling('p')

• Get the previous sibling tag.
second_p = soup.find_all('p')[1]
prev_p = second_p.find_previous_sibling('p')


III. Finding Elements with BeautifulSoup
• Find the first occurrence of a tag.
first_link = soup.find('a')

• Find all occurrences of a tag.
all_links = soup.find_all('a')

• Find tags by their CSS class.
articles = soup.find_all('div', class_='article-content')

• Find a tag by its ID.
main_content = soup.find(id='main-container')

• Find tags by other attributes.
images = soup.find_all('img', attrs={'data-src': True})

• Find using a list of multiple tags.
headings = soup.find_all(['h1', 'h2', 'h3'])

• Find using a regular expression.
import re
links_with_blog = soup.find_all('a', href=re.compile(r'blog'))

• Find using a custom function.
# Finds tags with a 'class' but no 'id'
tags = soup.find_all(lambda tag: tag.has_attr('class') and not tag.has_attr('id'))

• Limit the number of results.
first_five_links = soup.find_all('a', limit=5)

• Use CSS Selectors to find one element.
footer = soup.select_one('#footer > p')

• Use CSS Selectors to find all matching elements.
article_links = soup.select('div.article a')

• Select direct children using CSS selector.
nav_items = soup.select('ul.nav > li')


IV. Extracting Data with BeautifulSoup

• Get the text content from a tag.
noscript_text = soup.noscript.get_text()

• Get stripped text content.
link_text = soup.find('a').get_text(strip=True)

• Get all text from the entire document.
all_text = soup.get_text()

• Get an attribute's value (like a URL).
link_url = soup.find('a')['href']

• Get the tag's name.
tag_name = soup.find('h1').name

• Get all attributes of a tag as a dictionary.
attrs_dict = soup.find('img').attrs


V. Parsing with lxml and XPath

• Import the library.
from lxml import html

• Parse HTML content with lxml.
tree = html.fromstring(response.content)

• Select elements using an XPath expression.
# Selects all <a> tags inside <div> tags with class 'nav'
links = tree.xpath('//div[@class="nav"]/a')

• Select text content directly with XPath.
# Gets the text of all <h1> tags
h1_texts = tree.xpath('//h1/text()')

• Select an attribute value with XPath.
# Gets all href attributes from <a> tags
hrefs = tree.xpath('//a/@href')


VI. Handling Dynamic Content (Selenium)

• Import the webdriver.
from selenium import webdriver

• Initialize a browser driver.
driver = webdriver.Chrome() # Requires chromedriver

• Navigate to a webpage.
driver.get('http://example.com')

• Find an element by its ID.
element = driver.find_element('id', 'my-element-id')

• Find elements by CSS Selector.
elements = driver.find_elements('css selector', 'div.item')

• Find an element by XPath.
button = driver.find_element('xpath', '//button[@type="submit"]')

• Click a button.
button.click()

• Enter text into an input field.
search_box = driver.find_element('name', 'q')
search_box.send_keys('Python Selenium')

• Wait for an element to become visible.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)

• Get the page source after JavaScript has executed.
dynamic_html = driver.page_source

• Close the browser window.
driver.quit()


VII. Common Tasks & Best Practices

• Handle pagination by finding the "Next" link.
next_page_url = soup.find('a', text='Next')['href']

• Save data to a CSV file.
import csv
with open('data.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Title', 'Link'])
# writer.writerow([noscript, url]) in a loop

• Save data to CSV using pandas.
import pandas as pd
df = pd.DataFrame(data, columns=['Title', 'Link'])
df.to_csv('data.csv', index=False)

• Use a proxy with requests.
proxies = {'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080'}
requests.get('http://example.com', proxies=proxies)

• Pause between requests to be polite.
import time
time.sleep(2) # Pause for 2 seconds

• Handle JSON data from an API.
json_response = requests.get('https://api.example.com/data').json()

• Download a file (like an image).
img_url = 'http://example.com/image.jpg'
img_data = requests.get(img_url).content
with open('image.jpg', 'wb') as handler:
handler.write(img_data)

• Parse a sitemap.xml to find all URLs.
# Get the sitemap.xml file and parse it like any other XML/HTML to extract <loc> tags.


VIII. Advanced Frameworks (Scrapy)

• Create a Scrapy spider (conceptual command).
scrapy genspider example example.com

• Define a parse method to process the response.
# In your spider class:
def parse(self, response):
# parsing logic here
pass

• Extract data using Scrapy's CSS selectors.
noscripts = response.css('h1::text').getall()

• Extract data using Scrapy's XPath selectors.
links = response.xpath('//a/@href').getall()

• Yield a dictionary of scraped data.
yield {'noscript': response.css('noscript::text').get()}

• Follow a link to parse the next page.
next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)

• Run a spider from the command line.
scrapy crawl example -o output.json

• Pass arguments to a spider.
scrapy crawl example -a category=books

• Create a Scrapy Item for structured data.
import scrapy
class ProductItem(scrapy.Item):
name = scrapy.Field()
price = scrapy.Field()

• Use an Item Loader to populate Items.
from scrapy.loader import ItemLoader
loader = ItemLoader(item=ProductItem(), response=response)
loader.add_css('name', 'h1.product-name::text')


#Python #WebScraping #BeautifulSoup #Selenium #Requests

━━━━━━━━━━━━━━━
By: @DataScienceN
3
🔥 Trending Repository: nocobase

📝 Denoscription: NocoBase is the most extensible AI-powered no-code/low-code platform for building business applications and enterprise solutions.

🔗 Repository URL: https://github.com/nocobase/nocobase

🌐 Website: https://www.nocobase.com

📖 Readme: https://github.com/nocobase/nocobase#readme

📊 Statistics:
🌟 Stars: 17.7K stars
👀 Watchers: 147
🍴 Forks: 2K forks

💻 Programming Languages: TypeScript - JavaScript - Smarty - Shell - Dockerfile - Less

🏷️ Related Topics:
#internal_tools #crud #crm #admin_dashboard #self_hosted #web_application #project_management #salesforce #developer_tools #airtable #workflows #low_code #no_code #app_builder #internal_tool #nocode #low_code_development_platform #no_code_platform #low_code_platform #low_code_framework


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: alertmanager

📝 Denoscription: Prometheus Alertmanager

🔗 Repository URL: https://github.com/prometheus/alertmanager

🌐 Website: https://prometheus.io

📖 Readme: https://github.com/prometheus/alertmanager#readme

📊 Statistics:
🌟 Stars: 7.3K stars
👀 Watchers: 166
🍴 Forks: 2.3K forks

💻 Programming Languages: Go - Elm - HTML - Makefile - TypeScript - JavaScript

🏷️ Related Topics:
#notifications #slack #monitoring #email #pagerduty #alertmanager #hacktoberfest #deduplication #opsgenie


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: gopeed

📝 Denoscription: A modern download manager that supports all platforms. Built with Golang and Flutter.

🔗 Repository URL: https://github.com/GopeedLab/gopeed

🌐 Website: https://gopeed.com

📖 Readme: https://github.com/GopeedLab/gopeed#readme

📊 Statistics:
🌟 Stars: 21K stars
👀 Watchers: 167
🍴 Forks: 1.5K forks

💻 Programming Languages: Dart - Go - C++ - CMake - Swift - Ruby

🏷️ Related Topics:
#android #windows #macos #golang #http #ios #torrent #downloader #debian #bittorrent #cross_platform #ubuntu #https #flutter #magnet


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: vertex-ai-creative-studio

📝 Denoscription: GenMedia Creative Studio is a Vertex AI generative media user experience highlighting the use of Imagen, Veo, Gemini 🍌, Gemini TTS, Chirp 3, Lyria and other generative media APIs on Google Cloud.

🔗 Repository URL: https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio

📖 Readme: https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio#readme

📊 Statistics:
🌟 Stars: 512 stars
👀 Watchers: 19
🍴 Forks: 200 forks

💻 Programming Languages: Jupyter Notebook - Python - TypeScript - Go - JavaScript - Shell

🏷️ Related Topics:
#google_cloud #gemini #chirp #imagen #veo #lyria #vertex_ai #nano_banana


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: Parabolic

📝 Denoscription: Download web video and audio

🔗 Repository URL: https://github.com/NickvisionApps/Parabolic

🌐 Website: https://flathub.org/apps/details/org.nickvision.tubeconverter

📖 Readme: https://github.com/NickvisionApps/Parabolic#readme

📊 Statistics:
🌟 Stars: 4.1K stars
👀 Watchers: 28
🍴 Forks: 188 forks

💻 Programming Languages: C++ - CMake - Python - Inno Setup - C - CSS

🏷️ Related Topics:
#music #windows #downloader #youtube #qt #cpp #youtube_dl #gnome #videos #flathub #gtk4 #yt_dlp #libadwaita


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: localstack

📝 Denoscription: 💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline

🔗 Repository URL: https://github.com/localstack/localstack

🌐 Website: https://localstack.cloud

📖 Readme: https://github.com/localstack/localstack#readme

📊 Statistics:
🌟 Stars: 61.1K stars
👀 Watchers: 514
🍴 Forks: 4.3K forks

💻 Programming Languages: Python - Shell - Makefile - ANTLR - JavaScript - Java

🏷️ Related Topics:
#python #testing #aws #cloud #continuous_integration #developer_tools #localstack


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: go-sdk

📝 Denoscription: The official Go SDK for Model Context Protocol servers and clients. Maintained in collaboration with Google.

🔗 Repository URL: https://github.com/modelcontextprotocol/go-sdk

📖 Readme: https://github.com/modelcontextprotocol/go-sdk#readme

📊 Statistics:
🌟 Stars: 2.7K stars
👀 Watchers: 39
🍴 Forks: 249 forks

💻 Programming Languages: Go

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: rachoon

📝 Denoscription: 🦝 Rachoon — A self-hostable way to handle invoices

🔗 Repository URL: https://github.com/ad-on-is/rachoon

📖 Readme: https://github.com/ad-on-is/rachoon#readme

📊 Statistics:
🌟 Stars: 292 stars
👀 Watchers: 4
🍴 Forks: 14 forks

💻 Programming Languages: TypeScript - Vue - HTML - SCSS - Dockerfile - JavaScript - Shell

🏷️ Related Topics: Not available

==================================
🧠 By: https://news.1rj.ru/str/DataScienceM
🔥 Trending Repository: Kotatsu

📝 Denoscription: Manga reader for Android

🔗 Repository URL: https://github.com/KotatsuApp/Kotatsu

🌐 Website: https://kotatsu.app

📖 Readme: https://github.com/KotatsuApp/Kotatsu#readme

📊 Statistics:
🌟 Stars: 7.2K stars
👀 Watchers: 72
🍴 Forks: 366 forks

💻 Programming Languages: Kotlin

🏷️ Related Topics:
#android #manga #comics #mangareader #manga_reader #webtoon


==================================
🧠 By: https://news.1rj.ru/str/DataScienceM