ML Research Hub – Telegram
ML Research Hub
32.7K subscribers
4.08K photos
237 videos
23 files
4.4K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
A noscript

I guess this is a start
https://news.1rj.ru/str/frommeforworld



Sponsored By WaybienAds
🔹 Title: The Era of Agentic Organization: Learning to Organize with Language Models

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26658
• PDF: https://arxiv.org/pdf/2510.26658

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26213
• PDF: https://arxiv.org/pdf/2510.26213

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: Exploring Conditions for Diffusion models in Robotic Control

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15510
• PDF: https://arxiv.org/pdf/2510.15510
• Project Page: https://orca-rc.github.io/
• Github: https://orca-rc.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: ChartAB: A Benchmark for Chart Grounding & Dense Alignment

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26781
• PDF: https://arxiv.org/pdf/2510.26781
• Project Page: https://huggingface.co/datasets/umd-zhou-lab/ChartAlignBench
• Github: https://github.com/tianyi-lab/ChartAlignBench

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25897
• PDF: https://arxiv.org/pdf/2510.25897
• Project Page: https://nicolas-dufour.github.io/miro/
• Github: https://nicolas-dufour.github.io/miro/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
Forwarded from Kaggle Data Hub
Is Your Crypto Transfer Secure?

Score Your Transfer analyzes wallet activity, flags risky transactions in real time, and generates downloadable compliance reports—no technical skills needed. Protect funds & stay compliant.



Sponsored By WaybienAds
🔹 Title: Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19949
• PDF: https://arxiv.org/pdf/2510.19949

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: CLASS-IT: Conversational and Lecture-Aligned Small-Scale Instruction Tuning for BabyLMs

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25364
• PDF: https://arxiv.org/pdf/2510.25364

🔹 Datasets citing this paper:
https://huggingface.co/datasets/colinglab/CLASS_IT

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: The End of Manual Decoding: Towards Truly End-to-End Language Models

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26697
• PDF: https://arxiv.org/pdf/2510.26697
• Github: https://github.com/Zacks917/AutoDeco

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25867
• PDF: https://arxiv.org/pdf/2510.25867
• Project Page: https://ucsc-vlaa.github.io/MedVLSynther/
• Github: https://ucsc-vlaa.github.io/MedVLSynther/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

🔹 Publication Date: Published on Oct 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22282
• PDF: https://arxiv.org/pdf/2510.22282
• Github: https://github.com/tsinghua-fib-lab/CityRiSE

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: PORTool: Tool-Use LLM Training with Rewarded Tree

🔹 Publication Date: Published on Oct 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26020
• PDF: https://arxiv.org/pdf/2510.26020

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

🔹 Publication Date: Published on Oct 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.20976
• PDF: https://arxiv.org/pdf/2510.20976

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
🔹 Title: Performance Trade-offs of Optimizing Small Language Models for E-Commerce

🔹 Publication Date: Published on Oct 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21970
• PDF: https://arxiv.org/pdf/2510.21970

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
1
🔹 Title: POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

🔹 Publication Date: Published on Oct 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.24992
• PDF: https://arxiv.org/pdf/2510.24992

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://news.1rj.ru/str/DataScienceT
2
nature papers: 2000$

Q1 and  Q2 papers    1000$

Q3 and Q4 papers   500$

Doctoral thesis (complete)    700$

M.S thesis         300$

paper simulation   200$

Contact me @husseinsheikho
2
ML Research Hub pinned «nature papers: 2000$ Q1 and  Q2 papers    1000$ Q3 and Q4 papers   500$ Doctoral thesis (complete)    700$ M.S thesis         300$ paper simulation   200$ Contact me @husseinsheikho»
Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;


#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;


#4. What is the difference between WHERE and HAVING?
A:
WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;


#5. What are the different types of SQL joins?
A:
(INNER) JOIN: Returns records that have matching values in both tables.
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);


#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;


#8. What is a primary key vs. a foreign key?
A:
• A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
• A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;


#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.
2
WITH DepartmentSales AS (
SELECT department, SUM(sale_amount) as total_sales
FROM sales
GROUP BY department
)
SELECT department, total_sales
FROM DepartmentSales
WHERE total_sales > 100000;

---
#11. Difference between UNION and UNION ALL?
A:
UNION combines the result sets of two or more SELECT statements and removes duplicate rows.
UNION ALL also combines result sets but includes all rows, including duplicates. It is faster because it doesn't check for duplicates.

#12. How would you find the total number of employees in each department?
A: Use COUNT() with GROUP BY.

SELECT department, COUNT(employee_id) as number_of_employees
FROM employees
GROUP BY department;


#13. What is the difference between RANK() and DENSE_RANK()?
A:
RANK() assigns a rank to each row within a partition. If there are ties, it skips the next rank(s). (e.g., 1, 2, 2, 4)
DENSE_RANK() also assigns ranks, but it does not skip any ranks in case of ties. (e.g., 1, 2, 2, 3)

#14. Write a query to get the Nth highest salary.
A: Use DENSE_RANK() in a CTE.

WITH SalaryRanks AS (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
FROM employees
)
SELECT salary
FROM SalaryRanks
WHERE rnk = 5; -- For the 5th highest salary


#15. What is COALESCE() used for?
A: The COALESCE() function returns the first non-NULL value in a list of expressions. It's useful for providing default values for nulls.

SELECT name, COALESCE(commission, 0) as commission
FROM employees; -- Replaces NULL commissions with 0

---
#16. How would you select all employees whose name starts with 'A'?
A: Use the LIKE operator with a wildcard (%).

SELECT name
FROM employees
WHERE name LIKE 'A%';


#17. Get the current date and time.
A: This is function-dependent on the SQL dialect.
• PostgreSQL/MySQL: NOW()
• SQL Server: GETDATE()

SELECT NOW();


#18. How can you extract the month from a date?
A: Use the EXTRACT function or MONTH().

-- Standard SQL
SELECT EXTRACT(MONTH FROM '2023-10-27');
-- MySQL
SELECT MONTH('2023-10-27');


#19. What is a subquery? What are the types?
A: A subquery is a query nested inside another query.
Scalar Subquery: Returns a single value (one row, one column).
Multi-row Subquery: Returns multiple rows.
Correlated Subquery: An inner query that depends on the outer query for its values. It is evaluated once for each row processed by the outer query.

#20. Write a query to find all employees who work in the 'Sales' department.
A: Use a JOIN or a subquery.

-- Using JOIN (preferred)
SELECT e.name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Sales';

---
#21. How would you calculate the month-over-month growth rate of sales?
A: Use the LAG() window function to get the previous month's sales and then apply the growth formula.

WITH MonthlySales AS (
SELECT
DATE_TRUNC('month', order_date)::DATE as sales_month,
SUM(sale_amount) as total_sales
FROM sales
GROUP BY 1
)
SELECT
sales_month,
total_sales,
(total_sales - LAG(total_sales, 1) OVER (ORDER BY sales_month)) / LAG(total_sales, 1) OVER (ORDER BY sales_month) * 100 as growth_rate
FROM MonthlySales;
#22. What is an index in a database? Why is it useful?
A: An index is a special lookup table that the database search engine can use to speed up data retrieval. It works like an index in the back of a book. It improves the speed of SELECT queries but can slow down data modification (INSERT, UPDATE, DELETE).

#23. Difference between VARCHAR and CHAR?
A:
CHAR is a fixed-length string data type. CHAR(10) will always store 10 characters, padding with spaces if necessary.
VARCHAR is a variable-length string data type. VARCHAR(10) can store up to 10 characters, but only uses the storage needed for the actual string.

#24. What is a CASE statement?
A: The CASE statement goes through conditions and returns a value when the first condition is met (like an if-then-else statement).

SELECT
name,
salary,
CASE
WHEN salary > 100000 THEN 'High Earner'
WHEN salary > 50000 THEN 'Mid Earner'
ELSE 'Low Earner'
END as salary_category
FROM employees;


#25. Find the cumulative sum of sales over time.
A: Use a SUM() window function.

SELECT
order_date,
sale_amount,
SUM(sale_amount) OVER (ORDER BY order_date) as cumulative_sales
FROM sales;

---
#26. What does GROUP_CONCAT (MySQL) or STRING_AGG (PostgreSQL) do?
A: These functions concatenate strings from a group into a single string with a specified separator.

-- PostgreSQL example
SELECT department, STRING_AGG(name, ', ') as employee_names
FROM employees
GROUP BY department;


#27. What is data normalization? Why is it important?
A: Data normalization is the process of organizing columns and tables in a relational database to minimize data redundancy. It is important because it reduces storage space, eliminates inconsistent data, and simplifies data management.

#28. Write a query to find users who made a purchase in January but not in February.
A: Use LEFT JOIN or NOT IN.

SELECT user_id
FROM sales
WHERE EXTRACT(MONTH FROM order_date) = 1
EXCEPT
SELECT user_id
FROM sales
WHERE EXTRACT(MONTH FROM order_date) = 2;


#29. What is a self-join?
A: A self-join is a join in which a table is joined to itself. This is useful for querying hierarchical data or comparing rows within the same table.

-- Find employees who have the same manager
SELECT e1.name as employee1, e2.name as employee2, e1.manager_id
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.manager_id AND e1.id <> e2.id;


#30. What is the execution order of a SQL query?
A: The logical processing order is generally:
FROM / JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
LIMIT / OFFSET

---
Part 2: Python (Pandas/NumPy) Questions (Q31-50)

#31. How do you select a column named 'age' from a pandas DataFrame df?
A: There are two common ways.

# Method 1 (preferred, handles column names with spaces)
age_column = df['age']

# Method 2 (dot notation)
age_column = df.age


#32. How do you filter a DataFrame df to get rows where 'age' is greater than 30?
A: Use boolean indexing.

filtered_df = df[df['age'] > 30]


#33. What's the difference between .loc and .iloc?
A:
.loc is a label-based indexer. You use row and column names to select data.
.iloc is an integer-position-based indexer. You use integer indices (like in Python lists) to select data.