NEW BOT Телеграм, страница

Forwarded from Machine Learning with Python

A noscript

I guess this is a start
https://news.1rj.ru/str/frommeforworld

Sponsored By WaybienAds

173 views04:49

🔹 Title: The Era of Agentic Organization: Learning to Organize with Language Models

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26658
• PDF: https://arxiv.org/pdf/2510.26658

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

332 views05:02

Explore Data Science

ML Research Hub

🔹 Title: OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26213
• PDF: https://arxiv.org/pdf/2510.26213

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

294 views06:02

Explore Data Science

ML Research Hub

🔹 Title: Exploring Conditions for Diffusion models in Robotic Control

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15510
• PDF: https://arxiv.org/pdf/2510.15510
• Project Page: https://orca-rc.github.io/
• Github: https://orca-rc.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

303 views07:02

Explore Data Science

ML Research Hub

🔹 Title: ChartAB: A Benchmark for Chart Grounding & Dense Alignment

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26781
• PDF: https://arxiv.org/pdf/2510.26781
• Project Page: https://huggingface.co/datasets/umd-zhou-lab/ChartAlignBench
• Github: https://github.com/tianyi-lab/ChartAlignBench

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

409 views07:02

Explore Data Science

ML Research Hub

🔹 Title: MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25897
• PDF: https://arxiv.org/pdf/2510.25897
• Project Page: https://nicolas-dufour.github.io/miro/
• Github: https://nicolas-dufour.github.io/miro/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

348 views10:03

Explore Data Science

ML Research Hub

Forwarded from Kaggle Data Hub

Is Your Crypto Transfer Secure?

Score Your Transfer analyzes wallet activity, flags risky transactions in real time, and generates downloadable compliance reports—no technical skills needed. Protect funds & stay compliant.

Sponsored By WaybienAds

259 views10:37

Explore Now!

ML Research Hub

🔹 Title: Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19949
• PDF: https://arxiv.org/pdf/2510.19949

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

341 views11:03

Explore Data Science

ML Research Hub

🔹 Title: CLASS-IT: Conversational and Lecture-Aligned Small-Scale Instruction Tuning for BabyLMs

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25364
• PDF: https://arxiv.org/pdf/2510.25364

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/colinglab/CLASS_IT

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

325 views12:03

Explore Data Science

ML Research Hub

🔹 Title: The End of Manual Decoding: Towards Truly End-to-End Language Models

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26697
• PDF: https://arxiv.org/pdf/2510.26697
• Github: https://github.com/Zacks917/AutoDeco

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

371 views13:03

Explore Data Science

ML Research Hub

🔹 Title: MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25867
• PDF: https://arxiv.org/pdf/2510.25867
• Project Page: https://ucsc-vlaa.github.io/MedVLSynther/
• Github: https://ucsc-vlaa.github.io/MedVLSynther/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

438 views13:03

Explore Data Science

ML Research Hub

🔹 Title: CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

🔹 Publication Date: Published on Oct 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22282
• PDF: https://arxiv.org/pdf/2510.22282
• Github: https://github.com/tsinghua-fib-lab/CityRiSE

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

581 views15:03

Explore Data Science

ML Research Hub

🔹 Title: PORTool: Tool-Use LLM Training with Rewarded Tree

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26020
• PDF: https://arxiv.org/pdf/2510.26020

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

563 views18:03

Explore Data Science

ML Research Hub

🔹 Title: L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

🔹 Publication Date: Published on Oct 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.20976
• PDF: https://arxiv.org/pdf/2510.20976

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

571 views19:04

Explore Data Science

ML Research Hub

🔹 Title: Performance Trade-offs of Optimizing Small Language Models for E-Commerce

🔹 Publication Date: Published on Oct 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21970
• PDF: https://arxiv.org/pdf/2510.21970

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤1

687 views19:04

Explore Data Science

ML Research Hub

🔹 Title: POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

🔹 Publication Date: Published on Oct 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24992
• PDF: https://arxiv.org/pdf/2510.24992

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://news.1rj.ru/str/DataScienceT

❤2

888 views20:04

Explore Data Science

ML Research Hub

nature papers: 2000$

Q1 and Q2 papers    1000$

Q3 and Q4 papers   500$

Doctoral thesis (complete)    700$

M.S thesis         300$

paper simulation   200$

Contact me @husseinsheikho

❤2

884 views16:06

ML Research Hub

ML Research Hub pinned «nature papers: 2000$ Q1 and Q2 papers 1000$ Q3 and Q4 papers 500$ Doctoral thesis (complete) 700$ M.S thesis 300$ paper simulation 200$ Contact me @husseinsheikho»

16:06

ML Research Hub

Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
• DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
• TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
• DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;

#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;

#4. What is the difference between WHERE and HAVING?
A:
• WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
• HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;

#5. What are the different types of SQL joins?
A:
• (INNER) JOIN: Returns records that have matching values in both tables.
• LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
• RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
• FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
• SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;

#8. What is a primary key vs. a foreign key?
A:
• A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
• A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.

❤2

569 views19:27

ML Research Hub

WITH DepartmentSales AS (
    SELECT department, SUM(sale_amount) as total_sales
    FROM sales
    GROUP BY department
)
SELECT department, total_sales
FROM DepartmentSales
WHERE total_sales > 100000;

---
#11. Difference between UNION and UNION ALL?
A:
• UNION combines the result sets of two or more SELECT statements and removes duplicate rows.
• UNION ALL also combines result sets but includes all rows, including duplicates. It is faster because it doesn't check for duplicates.

#12. How would you find the total number of employees in each department?
A: Use COUNT() with GROUP BY.

SELECT department, COUNT(employee_id) as number_of_employees
FROM employees
GROUP BY department;

#13. What is the difference between RANK() and DENSE_RANK()?
A:
• RANK() assigns a rank to each row within a partition. If there are ties, it skips the next rank(s). (e.g., 1, 2, 2, 4)
• DENSE_RANK() also assigns ranks, but it does not skip any ranks in case of ties. (e.g., 1, 2, 2, 3)

#14. Write a query to get the Nth highest salary.
A: Use DENSE_RANK() in a CTE.

WITH SalaryRanks AS (
    SELECT
        salary,
        DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
    FROM employees
)
SELECT salary
FROM SalaryRanks
WHERE rnk = 5; -- For the 5th highest salary

#15. What is COALESCE() used for?
A: The COALESCE() function returns the first non-NULL value in a list of expressions. It's useful for providing default values for nulls.

SELECT name, COALESCE(commission, 0) as commission
FROM employees; -- Replaces NULL commissions with 0

---
#16. How would you select all employees whose name starts with 'A'?
A: Use the LIKE operator with a wildcard (%).

SELECT name
FROM employees
WHERE name LIKE 'A%';

#17. Get the current date and time.
A: This is function-dependent on the SQL dialect.
• PostgreSQL/MySQL: NOW()
• SQL Server: GETDATE()

SELECT NOW();

#18. How can you extract the month from a date?
A: Use the EXTRACT function or MONTH().

-- Standard SQL
SELECT EXTRACT(MONTH FROM '2023-10-27');
-- MySQL
SELECT MONTH('2023-10-27');

#19. What is a subquery? What are the types?
A: A subquery is a query nested inside another query.
• Scalar Subquery: Returns a single value (one row, one column).
• Multi-row Subquery: Returns multiple rows.
• Correlated Subquery: An inner query that depends on the outer query for its values. It is evaluated once for each row processed by the outer query.

#20. Write a query to find all employees who work in the 'Sales' department.
A: Use a JOIN or a subquery.

-- Using JOIN (preferred)
SELECT e.name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Sales';

---
#21. How would you calculate the month-over-month growth rate of sales?
A: Use the LAG() window function to get the previous month's sales and then apply the growth formula.

WITH MonthlySales AS (
    SELECT
        DATE_TRUNC('month', order_date)::DATE as sales_month,
        SUM(sale_amount) as total_sales
    FROM sales
    GROUP BY 1
)
SELECT
    sales_month,
    total_sales,
    (total_sales - LAG(total_sales, 1) OVER (ORDER BY sales_month)) / LAG(total_sales, 1) OVER (ORDER BY sales_month) * 100 as growth_rate
FROM MonthlySales;

180 views19:27

ML Research Hub

#22. What is an index in a database? Why is it useful?
A: An index is a special lookup table that the database search engine can use to speed up data retrieval. It works like an index in the back of a book. It improves the speed of SELECT queries but can slow down data modification (INSERT, UPDATE, DELETE).

#23. Difference between VARCHAR and CHAR?
A:
• CHAR is a fixed-length string data type. CHAR(10) will always store 10 characters, padding with spaces if necessary.
• VARCHAR is a variable-length string data type. VARCHAR(10) can store up to 10 characters, but only uses the storage needed for the actual string.

#24. What is a CASE statement?
A: The CASE statement goes through conditions and returns a value when the first condition is met (like an if-then-else statement).

SELECT
    name,
    salary,
    CASE
        WHEN salary > 100000 THEN 'High Earner'
        WHEN salary > 50000 THEN 'Mid Earner'
        ELSE 'Low Earner'
    END as salary_category
FROM employees;

#25. Find the cumulative sum of sales over time.
A: Use a SUM() window function.

SELECT
    order_date,
    sale_amount,
    SUM(sale_amount) OVER (ORDER BY order_date) as cumulative_sales
FROM sales;

---
#26. What does GROUP_CONCAT (MySQL) or STRING_AGG (PostgreSQL) do?
A: These functions concatenate strings from a group into a single string with a specified separator.

-- PostgreSQL example
SELECT department, STRING_AGG(name, ', ') as employee_names
FROM employees
GROUP BY department;

#27. What is data normalization? Why is it important?
A: Data normalization is the process of organizing columns and tables in a relational database to minimize data redundancy. It is important because it reduces storage space, eliminates inconsistent data, and simplifies data management.

#28. Write a query to find users who made a purchase in January but not in February.
A: Use LEFT JOIN or NOT IN.

SELECT user_id
FROM sales
WHERE EXTRACT(MONTH FROM order_date) = 1
EXCEPT
SELECT user_id
FROM sales
WHERE EXTRACT(MONTH FROM order_date) = 2;

#29. What is a self-join?
A: A self-join is a join in which a table is joined to itself. This is useful for querying hierarchical data or comparing rows within the same table.

-- Find employees who have the same manager
SELECT e1.name as employee1, e2.name as employee2, e1.manager_id
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.manager_id AND e1.id <> e2.id;

#30. What is the execution order of a SQL query?
A: The logical processing order is generally:
• FROM / JOIN
• WHERE
• GROUP BY
• HAVING
• SELECT
• DISTINCT
• ORDER BY
• LIMIT / OFFSET

---
Part 2: Python (Pandas/NumPy) Questions (Q31-50)

#31. How do you select a column named 'age' from a pandas DataFrame df?
A: There are two common ways.

# Method 1 (preferred, handles column names with spaces)
age_column = df['age']

# Method 2 (dot notation)
age_column = df.age

#32. How do you filter a DataFrame df to get rows where 'age' is greater than 30?
A: Use boolean indexing.

filtered_df = df[df['age'] > 30]

#33. What's the difference between .loc and .iloc?
A:
• .loc is a label-based indexer. You use row and column names to select data.
• .iloc is an integer-position-based indexer. You use integer indices (like in Python lists) to select data.

145 views19:27

About

Blog

Apps

Platform