NEW BOT Телеграм, страница

Data Analytics

Topic 2: Filtering & Advanced WHERE Clause in SQL Filtering data efficiently is crucial in data analysis. The WHERE clause helps filter rows based on conditions. Let’s explore some advanced filtering techniques. 1️⃣ Using Comparison Operators in WHERE Clause…

Aggregation Functions in SQL

Aggregation functions help summarize data by performing calculations like sum, average, count, and more. These functions are commonly used in data analysis.

1️⃣ Common Aggregation Functions

COUNT() → Counts the number of rows
SUM() → Calculates the total sum of a numeric column
AVG() → Finds the average value
MIN() → Returns the smallest value
MAX() → Returns the largest value

2️⃣ Using COUNT() to Count Records

🔹 Find the total number of employees

SELECT COUNT(*) FROM employees;

🔹 Find the number of employees in the ‘Sales’ department

SELECT COUNT(*) FROM employees WHERE department = 'Sales';

3️⃣ Using SUM() to Calculate Totals

🔹 Find the total salary of all employees

SELECT SUM(salary) FROM employees;

🔹 Find the total salary paid to employees in the ‘IT’ department

SELECT SUM(salary) FROM employees WHERE department = 'IT';

4️⃣ Using AVG() to Calculate Averages

🔹 Find the average salary of all employees

SELECT AVG(salary) FROM employees;

🔹 Find the average salary of employees in the ‘HR’ department

SELECT AVG(salary) FROM employees WHERE department = 'HR';

5️⃣ Using MIN() and MAX() to Find Extremes

🔹 Find the lowest salary in the company

SELECT MIN(salary) FROM employees;

🔹 Find the highest salary in the company

SELECT MAX(salary) FROM employees;

🔹 Find the most recently hired employee (latest hire date)

SELECT MAX(hire_date) FROM employees;

6️⃣ Using Aggregation Functions with GROUP BY

Aggregation functions are often used with GROUP BY to analyze data by categories.

🔹 Find the total salary for each department

SELECT department, SUM(salary) FROM employees GROUP BY department;

🔹 Find the average salary for each job noscript

SELECT job_noscript, AVG(salary) FROM employees GROUP BY job_noscript;

Mini Task for You:
Write an SQL query to find the highest salary in each department.

You can find free SQL Resources here
👇👇
https://news.1rj.ru/str/mysqldata

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

❤8👍7

8.66K viewsedited 08:50

Data Analytics

Which of the following aggregate function is used to find smallest value in SQL?

Anonymous Quiz

👍5

1.21K voters8K views09:07

Data Analytics

SQL Interview Questions with detailed answers 1️⃣8️⃣ Write an SQL query to find customers who have placed more than 3 orders. To find customers who have placed more than 3 orders, we can use the GROUP BY and HAVING clauses to count the number of orders…

SQL Interview Questions with detailed answers

1️⃣9️⃣ How do you calculate the percentage of total sales for each category?

To calculate the percentage of total sales for each category, we use SUM() and window functions or subqueries.
Using Window Functions (Recommended for Modern SQL)

SELECT category_id, SUM(sales_amount) AS category_sales, (SUM(sales_amount) * 100.0) / SUM(SUM(sales_amount)) OVER () AS sales_percentage FROM sales GROUP BY category_id;

Explanation:

1️⃣ SUM(sales_amount) OVER () calculates the total sales across all categories.
2️⃣ SUM(sales_amount) * 100.0 / total_sales computes the percentage for each category.
3️⃣ GROUP BY category_id ensures aggregation at the category level.

Using a Subquery (Compatible with Older SQL Versions):

SELECT category_id, SUM(sales_amount) AS category_sales, (SUM(sales_amount) * 100.0) / (SELECT SUM(sales_amount) FROM sales) AS sales_percentage FROM sales GROUP BY category_id;

This works the same way but calculates total sales in a subquery.

Top 20 SQL Interview Questions

Like this post if you want me to continue this SQL Interview Series♥️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

👍12❤3🎉1

7.69K viewsedited 07:31

Data Analytics

SQL Interview Questions with detailed answers 1️⃣9️⃣ How do you calculate the percentage of total sales for each category? To calculate the percentage of total sales for each category, we use SUM() and window functions or subqueries. Using Window Functions…

SQL Interview Questions with detailed answers

2️⃣0️⃣ What is the use of CASE statements in SQL?

The CASE statement in SQL is used for conditional logic within queries, similar to an IF-ELSE statement in programming. It allows you to return different values based on conditions.

Use Cases of CASE Statement:

1️⃣ Creating custom categories based on conditions.
2️⃣ Handling NULL values with default replacements.
3️⃣ Applying conditional aggregations in reports.

Example 1: Categorizing Sales Amount

SELECT order_id, customer_id, sales_amount, CASE WHEN sales_amount > 1000 THEN 'High' WHEN sales_amount BETWEEN 500 AND 1000 THEN 'Medium' ELSE 'Low' END AS sales_category FROM sales;

✅ This classifies each sale as High, Medium, or Low based on sales_amount.

Example 2: Handling NULL Values

SELECT employee_id, CASE WHEN department IS NULL THEN 'Not Assigned' ELSE department END AS department_status FROM employees;

✅ This replaces NULL values in the department column with "Not Assigned".

Example 3: Conditional Aggregation in Reports

SELECT SUM(CASE WHEN order_status = 'Completed' THEN total_amount ELSE 0 END) AS completed_sales, SUM(CASE WHEN order_status = 'Pending' THEN total_amount ELSE 0 END) AS pending_sales FROM orders;

✅ This calculates total sales separately for "Completed" and "Pending" orders.

Top 20 SQL Interview Questions

React with ❤️ if you want similar Interview Series for other data analytics topics

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

❤15👍13

9.17K viewsedited 05:48

Data Analytics

Which of the following is an example of valid python variable?

Anonymous Quiz

👍3

1.05K voters8.07K views09:57

Data Analytics

Aggregation Functions in SQL Aggregation functions help summarize data by performing calculations like sum, average, count, and more. These functions are commonly used in data analysis. 1️⃣ Common Aggregation Functions COUNT() → Counts the number of rows…

GROUP BY & HAVING in SQL

The GROUP BY clause is used to group rows that have the same values in specified columns. It’s commonly used with aggregation functions (SUM(), AVG(), COUNT(), etc.) to perform calculations on each group.
The HAVING clause filters groups after aggregation, similar to how WHERE filters individual rows.

1️⃣ Basic GROUP BY Usage
🔹 Find the total number of employees in each department

SELECT department, COUNT(*) FROM employees GROUP BY department;

This groups employees by department and counts the number of employees in each department.

🔹 Find the total salary per department

SELECT department, SUM(salary) FROM employees GROUP BY department;

2️⃣ GROUP BY with Multiple Columns
You can group by multiple columns to analyze data more deeply.

🔹 Find the total salary for each job noscript within each department

SELECT department, job_noscript, SUM(salary) FROM employees GROUP BY department, job_noscript;

3️⃣ Using HAVING to Filter Groups
Unlike WHERE, which filters before aggregation, HAVING filters after aggregation.

🔹 Find departments with more than 5 employees

SELECT department, COUNT(*) AS employee_count FROM employees GROUP BY department HAVING COUNT(*) > 5;

🔹 Find departments where the total salary is greater than $500,000

SELECT department, SUM(salary) AS total_salary FROM employees GROUP BY department HAVING SUM(salary) > 500000;

🔹 Find job noscripts where the average salary is above $70,000

SELECT job_noscript, AVG(salary) AS avg_salary FROM employees GROUP BY job_noscript HAVING AVG(salary) > 70000;

4️⃣ GROUP BY with ORDER BY
To sort grouped results, use ORDER BY.

🔹 Find the total salary per department, sorted in descending order

SELECT department, SUM(salary) AS total_salary FROM employees GROUP BY department ORDER BY total_salary DESC;

Mini Task for You:

Write an SQL query to find departments where the average salary is more than $80,000.
Let me know when you’re ready to move to the next topic! 🚀

You can find free SQL Resources here
👇👇
https://news.1rj.ru/str/mysqldata

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

❤18👍5

9.3K viewsedited 07:34

Data Analytics

Which of the following join is not available in SQL?

Anonymous Quiz

👍7

1.39K voters7.97K views11:39

Data Analytics

GROUP BY & HAVING in SQL The GROUP BY clause is used to group rows that have the same values in specified columns. It’s commonly used with aggregation functions (SUM(), AVG(), COUNT(), etc.) to perform calculations on each group. The HAVING clause filters…

JOINS in SQL

Joins allow you to combine data from multiple tables based on related columns. They are essential for working with relational databases.

1️⃣ Types of JOINS

INNER JOIN → Returns only matching rows from both tables
LEFT JOIN → Returns all rows from the left table + matching rows from the right table
RIGHT JOIN → Returns all rows from the right table + matching rows from the left table
FULL JOIN → Returns all rows from both tables (matching + non-matching)
SELF JOIN → Joins a table with itself
CROSS JOIN → Returns all possible combinations of rows

2️⃣ INNER JOIN (Most Common Join)

🔹 Find employees and their department names

SELECT employees.name, employees.salary, departments.department_name FROM employees INNER JOIN departments ON employees.department_id = departments.department_id;

✔ Returns only employees who have a matching department.

3️⃣ LEFT JOIN (Includes Unmatched Rows from Left Table)

🔹 Find all employees, including those without a department

SELECT employees.name, employees.salary, departments.department_name FROM employees LEFT JOIN departments ON employees.department_id = departments.department_id;

✔ Includes employees even if they don’t have a department (NULL if no match).

4️⃣ RIGHT JOIN (Includes Unmatched Rows from Right Table)

🔹 Find all departments, including those without employees

SELECT employees.name, employees.salary, departments.department_name FROM employees RIGHT JOIN departments ON employees.department_id = departments.department_id;

✔ Includes all departments, even if no employees are assigned.

5️⃣ FULL JOIN (Includes Unmatched Rows from Both Tables)

🔹 Get a complete list of employees and departments (matched + unmatched rows)

SELECT employees.name, employees.salary, departments.department_name FROM employees FULL JOIN departments ON employees.department_id = departments.department_id;

✔ Includes all employees and departments even if there’s no match.

You can find free SQL Resources here
👇👇
https://news.1rj.ru/str/mysqldata

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

👍15❤9

9.23K viewsedited 07:05

Data Analytics

Which clause is used to define the condition for joining the tables, specifying which columns to match?

Anonymous Quiz

👍2👎1

1.03K voters8.03K views14:42

Data Analytics

What's the full form of CTE in SQL?

Anonymous Quiz

11%

Common Tabular Enterprises

86%

Common Table Expression

Common Time Experience

Cool Tools External

👍2

890 voters7.16K views04:21

Data Analytics

JOINS in SQL Joins allow you to combine data from multiple tables based on related columns. They are essential for working with relational databases. 1️⃣ Types of JOINS INNER JOIN → Returns only matching rows from both tables LEFT JOIN → Returns all rows…

Common Table Expressions (CTEs) in SQL 👇👇

CTEs (WITH statement) help write cleaner and more readable SQL queries. They are like temporary result sets that can be referenced within the main query.

1️⃣ Basic Syntax of CTE

WITH cte_name AS ( SELECT column1, column2 FROM table_name WHERE condition ) SELECT * FROM cte_name;

✔ The CTE cte_name is defined and then used in the main SELECT query.

2️⃣ Simple CTE Example

🔹 Find employees earning more than $70,000

WITH high_earners AS ( SELECT name, salary, department_id FROM employees WHERE salary > 70000 ) SELECT * FROM high_earners;

✔ The CTE high_earners filters employees with high salaries before selecting all columns from it.

3️⃣ CTE with Aggregation

🔹 Find departments where the average salary is above $80,000

WITH department_salary AS ( SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id ) SELECT department_id, avg_salary FROM department_salary WHERE avg_salary > 80000;

✔ The CTE department_salary calculates the average salary per department and filters out low-paying ones.

4️⃣ CTE for Recursive Queries (Hierarchy Example)

🔹 Find an employee hierarchy (who reports to whom)

WITH RECURSIVE employee_hierarchy AS ( SELECT employee_id, name, manager_id FROM employees WHERE manager_id IS NULL -- Start with top-level manager UNION ALL SELECT e.employee_id, e.name, e.manager_id FROM employees e INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id ) SELECT * FROM employee_hierarchy;

✔ This recursive CTE finds an employee hierarchy starting from the top-level manager.

5️⃣ Why Use CTEs Instead of Subqueries?

✅ Better Readability – Makes complex queries easier to understand
✅ Reusability – Can be referenced multiple times in the main query
✅ Performance – Some databases optimize CTEs better than nested subqueries

Mini Task for You: Write an SQL query using a CTE to find departments with more than 5 employees.

You can find free SQL Resources here
👇👇
https://news.1rj.ru/str/mysqldata

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

❤13👍9

8.28K views05:14

Data Analytics

Common Table Expressions (CTEs) in SQL 👇👇 CTEs (WITH statement) help write cleaner and more readable SQL queries. They are like temporary result sets that can be referenced within the main query. 1️⃣ Basic Syntax of CTE WITH cte_name AS ( SELECT column1…

Window Functions in SQL

Window functions perform calculations across a set of table rows related to the current row. Unlike aggregation functions, they do not collapse rows but retain all rows while providing additional insights.

1️⃣ Common Window Functions

ROW_NUMBER() → Assigns a unique rank to each row within a partition
RANK() → Similar to ROW_NUMBER(), but gives same rank to duplicates
DENSE_RANK() → Similar to RANK(), but without skipping numbers
NTILE(n) → Divides the result into n equal parts
SUM() OVER() → Running total (cumulative sum)
AVG() OVER() → Moving average
LAG() → Gets the previous row’s value
LEAD() → Gets the next row’s value

2️⃣ Basic Syntax

SELECT column1, column2, window_function() OVER (PARTITION BY column ORDER BY column) AS alias FROM table_name;

✔ PARTITION BY groups rows before applying the function
✔ ORDER BY determines the ranking or sequence

3️⃣ Using ROW_NUMBER()

🔹 Assign a unique row number to each employee based on salary (highest first)

SELECT name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees;



✔ Each employee gets a unique row number within their department.

4️⃣ Using RANK() and DENSE_RANK()

🔹 Rank employees by salary within each department

SELECT name, department, salary, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank FROM employees;



✔ RANK() skips numbers when there’s a tie
✔ DENSE_RANK() does not skip numbers

5️⃣

Using NTILE() for Distribution

🔹 Divide employees into 4 salary groups per department

SELECT name, department, salary, NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS salary_quartile FROM employees;



✔ Useful for dividing salaries into percentiles (e.g., top 25%, bottom 25%)

6️⃣

Running Total with SUM() OVER()

🔹 Calculate cumulative salary per department

SELECT name, department, salary, SUM(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS running_total FROM employees;



✔ Useful for tracking cumulative totals

7️⃣

Using LAG() and LEAD()


🔹

Compare an employee’s salary with the previous and next employee’s salary

SELECT name, department, salary, LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS previous_salary, LEAD(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS next_salary FROM employees;

✔ LAG() gets the previous row’s value
✔ LEAD() gets the next row’s value

Mini Task for You: Write an SQL query to assign a unique rank to employees based on their salary within each department using RANK().

You can find free SQL Resources here
👇👇
https://news.1rj.ru/str/mysqldata

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

👍10❤7🔥4

8.85K viewsedited 04:57

Data Analytics

Which of the following is not a Window Function in SQL?

Anonymous Quiz

👍2🔥1

1.14K voters8.11K views10:42

Data Analytics

Which of the following window function is used to assign a unique number to each row, even if the values are the same?

Anonymous Quiz

👍7

909 voters7.22K views18:56

Data Analytics

Window Functions in SQL Window functions perform calculations across a set of table rows related to the current row. Unlike aggregation functions, they do not collapse rows but retain all rows while providing additional insights. 1️⃣ Common Window Functions…

Indexing in SQL

Indexes improve the speed of data retrieval by optimizing how queries access tables. They work like a book’s index—allowing you to find information faster instead of scanning every page.

1️⃣ Types of Indexes in SQL:

Primary Index → Automatically created on the primary key
Unique Index → Ensures all values in a column are unique
Composite Index → Created on multiple columns
Clustered Index → Determines the physical order of data storage
Non-Clustered Index → Creates a separate structure for faster lookups
Full-Text Index → Optimized for text searches

2️⃣ Creating an Index

🔹 Create an index on the "email" column in the "users" table

CREATE INDEX idx_email ON users(email);

✔ Speeds up searches for users by email

3️⃣ Creating a Unique Index
🔹 Ensure that no two users have the same email

CREATE UNIQUE INDEX idx_unique_email ON users(email);

✔ Prevents duplicate emails from being inserted

4️⃣ Composite Index for Multiple Columns

🔹 Optimize queries that filter by first name and last name

CREATE INDEX idx_name ON users(first_name, last_name);

✔ Faster lookups when filtering by both first name and last name

5️⃣ Clustered vs. Non-Clustered Index

Clustered Index → Physically rearranges table data (only one per table)
Non-Clustered Index → Stores a separate lookup table for faster access

🔹 Create a clustered index on the "id" column

CREATE CLUSTERED INDEX idx_id ON users(id);

🔹 Create a non-clustered index on the "email" column

CREATE NONCLUSTERED INDEX idx_email ON users(email);

✔ Clustered indexes speed up searches when retrieving all columns
✔ Non-clustered indexes speed up searches for specific columns

6️⃣ Checking Indexes on a Table

🔹 Find all indexes on the "users" table

SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID('users');

7️⃣ When to Use Indexes?

✅ Columns frequently used in WHERE, JOIN, ORDER BY
✅ Large tables that need faster searches
✅ Unique columns that should not allow duplicates
❌ Avoid indexing on columns with highly repetitive values (e.g., boolean columns)
❌ Avoid too many indexes, as they slow down INSERT, UPDATE, DELETE operations

Mini Task for You: Write an SQL query to create a unique index on the "phone_number" column in the "customers" table.

You can find free SQL Resources here
👇👇
https://news.1rj.ru/str/mysqldata

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

👍10❤6🎉1

8.83K views05:12

Data Analytics

What's the full form of DDL in SQL?

Anonymous Quiz

93%

Data definition language

Database definition link

Dataset download link

Data download language

👍9

928 voters7.32K views08:16

Data Analytics

Indexing in SQL Indexes improve the speed of data retrieval by optimizing how queries access tables. They work like a book’s index—allowing you to find information faster instead of scanning every page. 1️⃣ Types of Indexes in SQL: Primary Index → Automatically…

Normalization in SQL

Normalization is the process of organizing a database to reduce redundancy and improve efficiency. It ensures data is stored logically by breaking it into smaller, related tables.

1️⃣ Why Normalize a Database?

Eliminates duplicate data
Reduces data anomalies (insertion, update, deletion issues)
Improves data integrity
Makes queries faster and more efficient

2️⃣ Normal Forms (NF) in SQL

First Normal Form (1NF) → No duplicate rows, atomic values
Second Normal Form (2NF) → No partial dependency (remove redundant columns)
Third Normal Form (3NF) → No transitive dependency (separate non-key attributes)
Boyce-Codd Normal Form (BCNF) → More strict version of 3NF

3️⃣ First Normal Form (1NF) – Atomic Values

Problem: Storing multiple values in a single column

Example (Before Normalization):

OrderID: 1, Customer: John, Products: Laptop, Mouse
OrderID: 2, Customer: Alice, Products: Phone, Headphones

Fix: Create a separate table with atomic values

Example (After Normalization):

OrderID: 1, Customer: John, Product: Laptop
OrderID: 1, Customer: John, Product: Mouse
OrderID: 2, Customer: Alice, Product: Phone
OrderID: 2, Customer: Alice, Product: Headphones

4️⃣ Second Normal Form (2NF) – No Partial Dependencies

Problem: Columns dependent on only part of the primary key

Example (Before Normalization):

OrderID: 1, Product: Laptop, Supplier: Dell, SupplierPhone: 123-456
OrderID: 2, Product: Phone, Supplier: Apple, SupplierPhone: 987-654

Fix: Separate supplier details into another table

Example (After Normalization):

Orders Table:
OrderID: 1, Product: Laptop, SupplierID: 1
OrderID: 2, Product: Phone, SupplierID: 2

Suppliers Table:
SupplierID: 1, Supplier: Dell, SupplierPhone: 123-456
SupplierID: 2, Supplier: Apple, SupplierPhone: 987-654

5️⃣ Third Normal Form (3NF) – No Transitive Dependencies

Problem: Non-key column dependent on another non-key column

Example (Before Normalization):
CustomerID: 1, Name: John, City: NY, ZipCode: 10001
CustomerID: 2, Name: Alice, City: LA, ZipCode: 90001

Fix: Separate city and ZIP code into a new table

Example (After Normalization):
Customers Table:
CustomerID: 1, Name: John, ZipCode: 10001
CustomerID: 2, Name: Alice, ZipCode: 90001
Locations Table:
ZipCode: 10001, City: NY
ZipCode: 90001, City: LA

6️⃣ Boyce-Codd Normal Form (BCNF) – No Overlapping Candidate Keys

Problem: Multiple candidate keys with dependencies

Fix: Ensure every determinant is a candidate key by further splitting tables

7️⃣ When to Normalize and When to Denormalize?

Use normalization for transactional databases (banking, e-commerce)

Use denormalization for analytics databases (faster reporting queries)

Mini Task for You: Write an SQL query to split a "Customers" table by moving city details into a separate "Locations" table following 3NF.

You can find free SQL Resources here
👇👇
https://news.1rj.ru/str/mysqldata

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

👍11❤10

7.54K viewsedited 13:28

Data Analytics

Normalization in SQL Normalization is the process of organizing a database to reduce redundancy and improve efficiency. It ensures data is stored logically by breaking it into smaller, related tables. 1️⃣ Why Normalize a Database? Eliminates duplicate data…

Let's move to our next topic now

Data Cleaning & Transformation

Data cleaning and transformation are critical for preparing raw data for analysis. It involves handling missing data, removing duplicates, standardizing formats, and optimizing data structures.

1️⃣ Handling Missing Data in SQL & Python

In SQL:

COALESCE(): Replaces NULL values with a default value

SELECT id, name, COALESCE(salary, 0) AS salary FROM employees; 
IFNULL(): Works similarly to COALESCE (MySQL) SELECT id, name, IFNULL(salary, 0) AS salary FROM employees;

In Python (Pandas):

dropna(): Removes rows with missing values

df.dropna(inplace=True)

fillna(): Fills missing values with a specified value

df['salary'].fillna(0, inplace=True)

interpolate(): Fills missing values using interpolation

df.interpolate(method='linear', inplace=True)

2️⃣ Removing Duplicates
In SQL:

Remove duplicate rows using DISTINCT

SELECT DISTINCT name, department FROM employees;

Delete duplicates while keeping only one row

DELETE FROM employees WHERE id NOT IN (SELECT MIN(id) FROM employees GROUP BY name, department);

In Python (Pandas):

Remove duplicate rows

df.drop_duplicates(inplace=True)

Keep only the first occurrence

df.drop_duplicates(subset=['name', 'department'], keep='first', inplace=True)

3️⃣ Standardizing Formats (Data Normalization)

Standardizing Text Case:

SQL: Convert text to uppercase or lowercase

SELECT UPPER(name) AS name_upper FROM employees;

Python: Convert text to lowercase

df['name'] = df['name'].str.lower()

Date Formatting:

SQL: Convert string to date format

SELECT

CONVERT(DATE, '2024-02-26', 120);

Python: Convert string to datetime

df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

4️⃣ ETL Process (Extract, Transform, Load)

Extract:

SQL: Retrieve data from databases

SELECT * FROM sales_data;

Python: Load data from CSV

df = pd.read_csv('data.csv')

Transform:

SQL: Modify data (cleaning, aggregations)

SELECT category, SUM(sales) AS total_sales FROM sales_data GROUP BY category;

Python: Apply transformations

df['total_sales'] = df.groupby('category')['sales'].transform('sum')

Load:
SQL: Insert cleaned data into a new table

INSERT INTO clean_sales_data (category, total_sales) 
SELECT category, SUM(sales) FROM sales_data GROUP BY category;

Python: Save cleaned data to a new CSV file

df.to_csv('cleaned_data.csv', index=False)

Mini Task for You: Write an SQL query to remove duplicate customer records, keeping only the first occurrence.

Here you can find the roadmap for data analyst: https://news.1rj.ru/str/sqlspecialist/1159

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

👍13❤5

7.26K views06:59

Data Analytics

Hi guys,

Many people charge too much to teach Excel, Power BI, SQL, Python & Tableau but my mission is to break down barriers. I have shared complete learning series to start your data analytics journey from scratch.

For those of you who are new to this channel, here are some quick links to navigate this channel easily.

Data Analyst Learning Plan 👇
https://news.1rj.ru/str/sqlspecialist/752

Python Learning Plan 👇
https://news.1rj.ru/str/sqlspecialist/749

Power BI Learning Plan 👇
https://news.1rj.ru/str/sqlspecialist/745

SQL Learning Plan 👇
https://news.1rj.ru/str/sqlspecialist/738

SQL Learning Series 👇
https://news.1rj.ru/str/sqlspecialist/567

Excel Learning Series 👇
https://news.1rj.ru/str/sqlspecialist/664

Power BI Learning Series 👇
https://news.1rj.ru/str/sqlspecialist/768

Python Learning Series 👇
https://news.1rj.ru/str/sqlspecialist/615

Tableau Essential Topics 👇
https://news.1rj.ru/str/sqlspecialist/667

Best Data Analytics Resources 👇
https://heylink.me/DataAnalytics

You can find more resources on Medium & Linkedin

Like for more ❤️

Thanks to all who support our channel and share it with friends & loved ones. You guys are really amazing.

Hope it helps :)

❤12👍4🥰1

6.86K views05:04

Data Analytics

Let's move to our next topic now Data Cleaning & Transformation Data cleaning and transformation are critical for preparing raw data for analysis. It involves handling missing data, removing duplicates, standardizing formats, and optimizing data structures.…

Exploratory Data Analysis (EDA)

EDA is the process of analyzing datasets to summarize key patterns, detect anomalies, and gain insights before applying machine learning or reporting.

1️⃣ Denoscriptive Statistics
Denoscriptive statistics help summarize and understand data distributions.

In SQL:

Calculate Mean (Average):

SELECT AVG(salary) AS average_salary FROM employees; 
Find Median (Using Window Functions) SELECT salary FROM ( SELECT salary, ROW_NUMBER() OVER (ORDER BY salary) AS row_num, COUNT(*) OVER () AS total_rows FROM employees ) subquery WHERE row_num = (total_rows / 2);

Find Mode (Most Frequent Value)

SELECT department, COUNT(*) AS count FROM employees GROUP BY department ORDER BY count DESC LIMIT 1;

Calculate Variance & Standard Deviation

SELECT VARIANCE(salary) AS salary_variance, STDDEV(salary) AS salary_std_dev FROM employees;

In Python (Pandas):

Mean, Median, Mode

df['salary'].mean() df['salary'].median() df['salary'].mode()[0]

Variance & Standard Deviation

df['salary'].var() df['salary'].std()

2️⃣ Data Visualization

Visualizing data helps identify trends, outliers, and patterns.

In SQL (For Basic Visualization in Some Databases Like PostgreSQL):

Create Histogram (Approximate in SQL)

SELECT salary, COUNT(*) FROM employees GROUP BY salary ORDER BY salary;

In Python (Matplotlib & Seaborn):

Bar Chart (Category-Wise Sales)

import matplotlib.pyplot as plt 
import seaborn as sns 
df.groupby('category')['sales'].sum().plot(kind='bar') 
plt.noscript('Total Sales by Category') 
plt.xlabel('Category') 
plt.ylabel('Sales') 
plt.show()

Histogram (Salary Distribution)

sns.histplot(df['salary'], bins=10, kde=True) 
plt.noscript('Salary Distribution') 
plt.show()

Box Plot (Outliers in Sales Data)

sns.boxplot(y=df['sales']) 
plt.noscript('Sales Data Outliers') 
plt.show()

Heatmap (Correlation Between Variables)

sns.heatmap(df.corr(), annot=True, cmap='coolwarm') plt.noscript('Feature Correlation Heatmap') plt.show()

3️⃣ Detecting Anomalies & Outliers

Outliers can skew results and should be identified.

In SQL:

Find records with unusually high salaries

SELECT * FROM employees WHERE salary > (SELECT AVG(salary) + 2 * STDDEV(salary) FROM employees);

In Python (Pandas & NumPy):

Using Z-Score (Values Beyond 3 Standard Deviations)

from scipy import stats df['z_score'] = stats.zscore(df['salary']) df_outliers = df[df['z_score'].abs() > 3]

Using IQR (Interquartile Range)

Q1 = df['salary'].quantile(0.25) 
Q3 = df['salary'].quantile(0.75) 
IQR = Q3 - Q1 
df_outliers = df[(df['salary'] < (Q1 - 1.5 * IQR)) | (df['salary'] > (Q3 + 1.5 * IQR))]

4️⃣ Key EDA Steps

Understand the Data → Check missing values, duplicates, and column types

Summarize Statistics → Mean, Median, Standard Deviation, etc.

Visualize Trends → Histograms, Box Plots, Heatmaps

Detect Outliers & Anomalies → Z-Score, IQR

Feature Engineering → Transform variables if needed

Mini Task for You: Write an SQL query to find employees whose salaries are above two standard deviations from the mean salary.

Here you can find the roadmap for data analyst: https://news.1rj.ru/str/sqlspecialist/1159

Like this post if you want me to continue covering all the topics! ❤️

Share with credits: https://news.1rj.ru/str/sqlspecialist

Hope it helps :)

#sql

👍20❤9

8.85K viewsedited 08:02

Data Analytics

Which of the following python library is not used for data visualization?

Anonymous Quiz

👍3❤1🔥1

1.06K voters7.49K views12:48

About

Blog

Apps

Platform