NEW BOT Телеграм, страница

Data science/ML/AI

An API (Application Programming Interface) allows different software systems to communicate with each other. In data science and software development, APIs are commonly used to retrieve data from web services such as social media platforms, financial systems, weather services, and databases hosted online.

Python provides powerful libraries that make it easy to import and process data from APIs efficiently.

Making API Requests in Python
HTTP Methods
GET – retrieve data
POST – send data
PUT – update data
DELETE – remove data

Next up ➡️ Importing API Data into a Pandas DataFrame

👉Join @datascience_bds for more
Part of the @bigdataspecialist family ❤️

❤5👍3

1.27K viewsedited 07:45

Data science/ML/AI

Importing API Data into a Pandas DataFrame

import requests            # Library for making HTTP requests
import pandas as pd        # Library for data manipulation and analysis

# API endpoint
url = "https://api.example.com/users"

# Send request to API
response = requests.get(url)

# Convert JSON response to Python object
data = response.json()

# Convert the JSON data into a pandas DataFrame
df = pd.DataFrame(data)

# Display the first five rows of the DataFrame
print(df.head())

Next up ➡️ API Key Authentication

❤6

1.27K views07:45

Data science/ML/AI

Hey everyone 👋 Tomorrow we are kicking off a new short & free series called: 📊 Data Importing Series 📊 We’ll go through all the real ways to pull data into Python: → CSV, Excel, JSON and more → Databases & SQL databases → APIs, Google Sheets, even PDFs…

API Key Authentication

import requests

# API endpoint
url = "https://api.example.com/data"

# Parameters including the API key for authentication
params = {
    "api_key": "YOUR_API_KEY"  # Replace with your actual API key
}

# Send GET request with parameters
response = requests.get(url, params=params)

# Convert JSON response to Python object
data = response.json()

# Print the data
print(data)

Next up ➡️ Importing Pickle files in python

❤6

1.3K views08:10

Data science/ML/AI

Pickle Files
Pickle files (.pkl) are used to store serialized Python objects such as DataFrames, lists, dictionaries, or trained models. They allow quick saving and loading of Python objects without converting them to text formats.

Importing Pickle files in python

import pickle  # Library for object serialization

# Open the pickle file in read-binary mode
with open("data.pkl", "rb") as file:
    data = pickle.load(file)  # Load the stored Python object

Using Pickle with Pandas

import pandas as pd

# Load a pickled pandas DataFrame
df = pd.read_pickle("data.pkl")

Next up ➡️ Importing HTML Tables

❤4

1.28K views07:45

Data science/ML/AI

HTML Tables
HTML tables are commonly found on websites and can be imported into Python for analysis by extracting table data directly from web pages. This is useful for collecting publicly available data without manually copying it.

Importing HTML Tables Using Pandas

import pandas as pd

# URL of the webpage containing HTML tables
url = "https://example.com/page"

# Read all tables from the webpage
tables = pd.read_html(url)

# Select the first table
df = tables[0]

Next up ➡️ Big Data Formats

❤4

1.19K views07:45

Data science/ML/AI

Big Data Formats
Big Data formats such as Parquet, ORC, and Feather are designed for efficient storage and fast access when working with large datasets. They are optimized for performance, compression, and scalability, making them ideal for data science and big data applications.

Parquet
Parquet is a columnar storage format widely used in big data ecosystems such as Apache Spark and Hadoop. It allows efficient reading of selected columns and supports strong compression.

import pandas as pd

# Read Parquet file into a DataFrame
df = pd.read_parquet("data.parquet")

ORC (Optimized Row Columnar)

ORC is a columnar format optimized for high-performance analytics and commonly used in Hadoop-based systems.

import pandas as pd

# Read ORC file into a DataFrame
df = pd.read_orc("data.orc")

Feather
Feather is a lightweight binary format designed for fast data exchange between Python and other languages like R.

import pandas as pd

# Read Feather file into a DataFrame
df = pd.read_feather("data.feather")

✅ This concludes our Data Importing Series.

👉Join @datascience_bds for more
Part of the @bigdataspecialist family ❤️

❤2⚡1👏1

1.09K views07:45

Data science/ML/AI

Sometimes reality outpaces expectations in the most unexpected ways.
While global AI development seems increasingly fragmented, Sber just released Europe's largest open-source AI collection—full weights, code, and commercial rights included.
✅ No API paywalls.
✅ No usage restrictions.
✅ Just four complete model families ready to run in your private infrastructure, fine-tuned on your data, serving your specific needs.

What makes this release remarkable isn't merely the technical prowess, but the quiet confidence behind sharing it openly when others are building walls. Find out more in the article from the developers.

GigaChat Ultra Preview: 702B-parameter MoE model (36B active per token) with 128K context window. Trained from scratch, it outperforms DeepSeek V3.1 on specialized benchmarks while maintaining faster inference than previous flagships. Enterprise-ready with offline fine-tuning for secure environments.
GitHub | HuggingFace

GigaChat Lightning offers the opposite balance: compact yet powerful MoE architecture running on your laptop. It competes with Qwen3-4B in quality, matches the speed of Qwen3-1.7B, yet is significantly smarter and larger in parameter count.
Lightning holds its own against the best open-source models in its class, outperforms comparable models on different tasks, and delivers ultra-fast inference—making it ideal for scenarios where Ultra would be overkill and speed is critical. Plus, it features stable expert routing and a welcome bonus: 256K context support.
GitHub | Hugging Face

Kandinsky 5.0 brings a significant step forward in open generative models. The flagship Video Pro matches Veo 3 in visual quality and outperforms Wan 2.2-A14B, while Video Lite and Image Lite offer fast, lightweight alternatives for real-time use cases. The suite is powered by K-VAE 1.0, a high-efficiency open-source visual encoder that enables strong compression and serves as a solid base for training generative models. This stack balances performance, scalability, and practicality—whether you're building video pipelines or experimenting with multimodal generation.
GitHub | Hugging Face | Technical report

Audio gets its upgrade too: GigaAM-v3 delivers speech recognition model with 50% lower WER than Whisper-large-v3, trained on 700k hours of audio with punctuation/normalization for spontaneous speech.
GitHub | HuggingFace

Every model can be deployed on-premises, fine-tuned on your data, and used commercially. It's not just about catching up – it's about building sovereign AI infrastructure that belongs to everyone who needs it.

❤2🆒1

1.27K views10:40

Data science/ML/AI

📚 Data Science Riddle - Dimensionality Reduction

You want to visualize high-dimensional clusters while keeping neighborhood structure intact. What should you use?

Anonymous Quiz

❤2

96 voters1.34K views05:30

Data science/ML/AI

Hey Everyone 👋

Should we continue another series on "Data Manipulation with Pandas" just like the previous series?

Anonymous Poll

98%

Yes

❤4

80 voters1.2K views08:47

Data science/ML/AI

Data Cleaning in Python
Data cleaning is the process of detecting and correcting inaccurate, incomplete, or inconsistent data to improve data quality for analysis and modeling. It is a crucial step in any data science workflow.

Handling Missing Values

df.isnull().sum()        # Check missing values
df.dropna()              # Remove rows with missing values
df.fillna(0)             # Replace missing values

Removing Duplicate Data

df.duplicated()          # Identify duplicates
df.drop_duplicates()     # Remove duplicates

Correcting Data Types

df.dtypes                                                      #identify data types
df["age"] = df["age"].astype(int)               #convert age column to integer data type
df["date"] = pd.to_datetime(df["date"])    #convert date column to date data type

Renaming Columns

df.columns = df.columns.str.lower().str.replace(" ", "_")

Handling Inconsistent Data

df["gender"] = df["gender"].str.lower()   #convert to lower case
df["name"] = df["name"].str.strip()

Clean data leads to more accurate analysis and reliable models. Python’s pandas library simplifies cleaning tasks such as handling missing values, duplicates, incorrect types, and inconsistencies.

❤10👎1

1.33K views08:25

Data science/ML/AI

📚 Data Science Riddle - Model Selection

Two models have similar accuracy, but one is far simpler. Which should you choose ?

Anonymous Quiz

117 voters1.19K views06:01

Data science/ML/AI

The Real Reason PCA Works: Variance as Signal

Students memorize PCA as “dimensionality reduction.”
But the deeper insight is: PCA assumes variance = information.

If a direction in the data has high variance, PCA considers it meaningful.
If variance is small, PCA considers it noise.

This is not always true in real systems.

PCA fails when:
➖important signals have low variance
➖noise has high variance
➖relationships are nonlinear

That’s why modern methods (autoencoders, UMAP, t-SNE) outperform PCA on many datasets.

❤4

1.47K views09:00

Data science/ML/AI

📚 Data Science Riddle - Probability

A classifier outputs 0.9 probability for class A, but the real frequency is only 0.7. What is the model lacking?

Anonymous Quiz

113 voters1.37K views06:31

Data science/ML/AI

Why Feature Drift Is Harder Than Data Drift

Data drift = inputs change
Feature drift = the logic that generates the feature changes

Example:
Your “active user” feature used to be “clicked in last 7 days.”
Marketing redefines it to “clicked in last 3 days.”
Your model silently dies because the underlying concept changed.

Feature drift is more dangerous:
it happens inside your system, not in external data.

Production ML must version:
▪️feature definitions
▪️transformation logic
▪️data contracts

Otherwise the same model receives different features week to week.

❤3👏2

1.39K views10:25

Data science/ML/AI

📚 Data Science Riddle - Feature Engineering

A model's performance drops because some features have extreme outliers. What helps most?

Anonymous Quiz

Increasing k-fold splits

❤4

100 voters1.23K views05:45

Data science/ML/AI

🧵 Thread Series on:

Mastering Pandas for Data Manipulation!

Pandas is the go-to library for handling tabular data in Python. Whether you're analyzing sales, surveys, or logs, start every project the same way:

import pandas as pd

# Load CSV
df = pd.read_csv('sales_data.csv')

# Quick look
df.head()     # First 5 rows
df.info()     # Structure & data types
df.describe() # Basic stats

Next up 👉 Selecting Columns & Rows

❤4🔥3

1.25K views08:04

Data science/ML/AI

Selecting Columns & Rows

Need specific columns or rows? Pandas makes selection intuitive and fast:

# Single column (Series)
df['name']

# Multiple columns (DataFrame)
df[['name', 'age', 'sales']]

# Row selection with .loc (label-based)
df.loc[0:5]                    # Rows 0 to 5
df.loc[df['sales'] > 1000]     # Conditional

# .iloc (position-based)
df.iloc[0:5, 1:4]              # Rows 0-4, columns 1-3

Next up 👉 Filtering and Querying

❤7

1.24K views07:50

Data science/ML/AI

Filtering and Querying

Want to zoom in on specific data?

Filtering in Pandas is incredibly powerful. Check the code below:

# Multiple conditions
high_sales = df[(df['sales'] > 1000) & (df['region'] == 'West')]

# Using .query() – cleaner syntax!
high_performers = df.query("sales > 1000 and region == 'West'")

# Find missing values
df[df['email'].isna()]

# Contains substring
df[df['product'].str.contains('Pro', case=False)]

Next up 👉 Adding and Removing Columns

❤5

1.28K views07:32

Data science/ML/AI

Adding and Removing Columns

DataFrames are flexible! Easily create new columns or remove unnecessary ones:

# Add new column
df['revenue'] = df['sales'] * df['price']

# From existing columns
df['full_name'] = df['first_name'] + ' ' + df['last_name']

# Drop columns
df.drop(columns=['temp_col'], inplace=True)

# Or create a new DF without modifying original
clean_df = df.drop(columns=['old_col1', 'old_col2'])

Next up 👉 Dealing with Missing Values

❤7

1.27K views08:50

Data science/ML/AI

Dealing with Missing Values

Real-world data is messy, missing values are common.

Here's how to handle them cleanly:

# Check for nulls
df.isnull().sum()

# Drop rows with any missing values
df_clean = df.dropna()

# Fill missing values
df['age'].fillna(df['age'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)

# Forward or backward fill (great for time series)
df['value'].ffill()

Next up 👉 Using GroupBy

❤8

1.23K views09:04

Data science/ML/AI

Using GroupBy

GroupBy is where Pandas shines brightest. It summarizes data by categories in one line.

# Total sales by region
df.groupby('region')['sales'].sum()

# Multiple aggregations
df.groupby('region').agg({
    'sales': 'sum',
    'customer_id': 'nunique',
    'order_date': 'max'
})

# Group by multiple columns
df.groupby(['region', 'product'])['sales'].mean()

Next up 👉 Sorting and Ranking

❤3

1.33K views07:55

About

Blog

Apps

Platform