NEW BOT Телеграм, страница

Data Analytics Interview Questions with Answers Part-1:

📱

1. What is the difference between data analysis and data analytics?
⦁ Data analysis involves inspecting, cleaning, and modeling data to discover useful information and patterns for decision-making.
⦁ Data analytics is a broader process that includes data collection, transformation, analysis, and interpretation, often involving predictive and prenoscriptive techniques to drive business strategies.

2. Explain the data cleaning process you follow.
⦁ Identify missing, inconsistent, or corrupt data.
⦁ Handle missing data by imputation (mean, median, mode) or removal if appropriate.
⦁ Standardize formats (dates, strings).
⦁ Remove duplicates.
⦁ Detect and treat outliers.
⦁ Validate cleaned data against known business rules.

3. How do you handle missing or duplicate data?
⦁ Missing data: Identify patterns; if random, impute using statistical methods or predictive modeling; else consider domain knowledge before removal.
⦁ Duplicate data: Detect with key fields; remove exact duplicates or merge fuzzy duplicates based on context.

4. What is a primary key in a database?
A primary key uniquely identifies each record in a table, ensuring entity integrity and enabling relationships between tables via foreign keys.

5. Write a SQL query to find the second highest salary in a table.

SELECT MAX(salary) 
FROM employees 
WHERE salary < (SELECT MAX(salary) FROM employees);

6. Explain INNER JOIN vs LEFT JOIN with examples.
⦁ INNER JOIN: Returns only matching rows between two tables.
⦁ LEFT JOIN: Returns all rows from the left table, plus matching rows from the right; if no match, right columns are NULL.

Example:

SELECT * FROM A INNER JOIN B ON A.id = B.id;
SELECT * FROM A LEFT JOIN B ON A.id = B.id;

7. What are outliers? How do you detect and treat them?
⦁ Outliers are data points significantly different from others that can skew analysis.
⦁ Detect with boxplots, z-score (>3), or IQR method (values outside 1.5*IQR).
⦁ Treat by investigating causes, correcting errors, transforming data, or removing if they’re noise.

8. Describe what a pivot table is and how you use it.
A pivot table is a data summarization tool that groups, aggregates (sum, average), and displays data cross-categorically. Used in Excel and BI tools for quick insights and reporting.

9. How do you validate a data model’s performance?
⦁ Use relevant metrics (accuracy, precision, recall for classification; RMSE, MAE for regression).
⦁ Perform cross-validation to check generalizability.
⦁ Test on holdout or unseen data sets.

10. What is hypothesis testing? Explain t-test and z-test.
⦁ Hypothesis testing assesses if sample data supports a claim about a population.
⦁ t-test: Used when sample size is small and population variance is unknown, often comparing means.
⦁ z-test: Used for large samples with known variance to test population parameters.

React ♥️ for Part-2

Please open Telegram to view this post

VIEW IN TELEGRAM

❤40👍3👌1

5.93K viewsedited 12:51