Machine Learning – Telegram
Machine Learning
39.2K subscribers
3.84K photos
32 videos
42 files
1.31K links
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
# Create a copy of the original image to draw on
output_img = img.copy()

# Draw a bounding box for each detected bottle
for box in bottle_boxes:
x1, y1, x2, y2 = map(int, box)
# Draw a green rectangle around each bottle
cv2.rectangle(output_img, (x1, y1), (x2, y2), (0, 255, 0), 2)

# Add the final count as text on the image
summary_text = f"Bottle Count: {bottle_count}"
cv2.putText(output_img, summary_text, (20, 50),
cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 4)

# Save the resulting image
cv2.imwrite('factory_bottles_result.jpg', output_img)

print("Result image with detections has been saved as 'factory_bottles_result.jpg'")


---

Step 6: Discussion of Results and Limitations

#Discussion #Limitations #FineTuning

Result: The code successfully uses a pre-trained YOLOv8 model to identify and count standard plastic bottles in an image. The final output provides both a numerical count and a visual confirmation of the detections.

Limitations of Pre-trained Model:
1. Occlusion: If bottles are heavily clustered or hiding behind each other, the model might miss some, leading to an undercount.
2. Unusual Shapes: The model is trained on common bottles (from the COCO dataset). If your factory produces bottles of a very unique shape or color, the model's accuracy might decrease.
3. Environmental Factors: Poor lighting, motion blur (if from a fast conveyor belt), or reflections can all negatively impact detection performance.

How to Improve (Next Steps): For a real-world, high-accuracy industrial application, you should not rely on a generic pre-trained model. The best approach is Fine-Tuning. This involves:
1. Collecting Data: Take hundreds or thousands of pictures of
your specific bottles in your actual factory environment*.
2. Annotating Data: Draw bounding boxes around every bottle in those images.
3. Training: Use this custom dataset to train (or "fine-tune") the YOLOv8 model. This teaches the model exactly what to look for in your specific use case, leading to much higher accuracy and reliability.

━━━━━━━━━━━━━━━
By: @DataScienceM
1
📌 RF-DETR Under the Hood: The Insights of a Real-Time Transformer Detection

🗂 Category: DEEP LEARNING

🕒 Date: 2025-10-31 | ⏱️ Read time: 6 min read

From rigid grids to adaptive attention, this is the evolutionary path that made detection transformers…
📌 TDS Newsletter: October Must-Reads on Agents, Python, Context Engineering, and More

🗂 Category: THE VARIABLE

🕒 Date: 2025-10-30 | ⏱️ Read time: 3 min read

A good month on TDS is one in which we get to share a wide…
📌 Long Short Term Memory (LSTM)- Improving RNNs

🗂 Category: DEEP LEARNING

🕒 Date: 2024-05-31 | ⏱️ Read time: 9 min read

How state of the art RNNs work
📌 Orchestrating a Dynamic Time-series Pipeline in Azure

🗂 Category: DATA ENGINEERING

🕒 Date: 2024-05-31 | ⏱️ Read time: 9 min read

Explore how to build, trigger, and parameterize a time-series data pipeline with ADF and Databricks,…
📌 Data Science Portfolios, Speeding Up Python, KANs, and Other May Must-Reads

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-30 | ⏱️ Read time: 4 min read

The stories that resonated the most with our community in the past month
Top 30 Pandas Functions & Methods
👇👇👇👇
Please open Telegram to view this post
VIEW IN TELEGRAM
#Pandas #DataAnalysis #Python #DataScience #Tutorial

Top 30 Pandas Functions & Methods

This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.

---

1. pd.DataFrame()
Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.

import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
print(df)

col1  col2
0 1 3
1 2 4


---

2. pd.Series()
Creates a new Series (a 1D labeled array).

import pandas as pd
s = pd.Series([10, 20, 30, 40], name='MyNumbers')
print(s)

0    10
1 20
2 30
3 40
Name: MyNumbers, dtype: int64


---

3. pd.read_csv()
Reads data from a CSV file into a DataFrame. (Assuming a file data.csv exists).

# Create a dummy csv file first
with open('data.csv', 'w') as f:
f.write('Name,Age\nAlice,25\nBob,30')

df = pd.read_csv('data.csv')
print(df)

Name  Age
0 Alice 25
1 Bob 30


---

4. df.to_csv()
Writes a DataFrame to a CSV file.

import pandas as pd
df = pd.DataFrame({'Name': ['Charlie'], 'Age': [35]})
# index=False prevents writing the DataFrame index to the file
df.to_csv('output.csv', index=False)
# You can check that 'output.csv' has been created.
print("File 'output.csv' created.")

File 'output.csv' created.

#PandasIO #DataFrame #Series

---

5. df.head()
Returns the first n rows of the DataFrame (default is 5).

import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.head(3))

Name  Value
0 A 1
1 B 2
2 C 3


---

6. df.tail()
Returns the last n rows of the DataFrame (default is 5).

import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.tail(2))

Name  Value
4 E 5
5 F 6


---

7. df.info()
Provides a concise summary of the DataFrame, including data types and non-null values.

import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3], 'col2': [4.0, 5.0, np.nan], 'col3': ['A', 'B', 'C']}
df = pd.DataFrame(data)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null int64
1 col2 2 non-null float64
2 col3 3 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes


---

8. df.shape
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)

(2, 3)

#DataInspection #PandasBasics

---

9. df.describe()
Generates denoscriptive statistics for numerical columns (count, mean, std, min, max, etc.).

import pandas as pd
df = pd.DataFrame({'Age': [22, 38, 26, 35, 29]})
print(df.describe())
3
Age
count 5.000000
mean 30.000000
std 6.363961
min 22.000000
25% 26.000000
50% 29.000000
75% 35.000000
max 38.000000


---

10. df.columns
Returns the column labels of the DataFrame.

import pandas as pd
df = pd.DataFrame({'Name': [], 'Age': [], 'City': []})
print(df.columns)

Index(['Name', 'Age', 'City'], dtype='object')


---

11. df.dtypes
Returns the data type of each column.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [25], 'Salary': [75000.50]})
print(df.dtypes)

Name       object
Age int64
Salary float64
dtype: object


---

12. Selecting a Column
Select a single column, which returns a Pandas Series.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
ages = df['Age']
print(ages)

0    25
1 30
Name: Age, dtype: int64

#DataSelection #Indexing #Statistics

---

13. df.loc[]
Access a group of rows and columns by label(s) or a boolean array.

import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.loc['Bob'])

Age     30
City LA
Name: Bob, dtype: object


---

14. df.iloc[]
Access a group of rows and columns by integer position(s).

import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.iloc[1]) # Get the second row (index 1)

Age     30
City LA
Name: Bob, dtype: object


---

15. df.isnull()
Returns a DataFrame of the same shape with boolean values indicating if a value is missing (NaN).

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [3, 4]})
print(df.isnull())

A      B
0 False False
1 True False


---

16. df.dropna()
Removes missing values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, 6]})
cleaned_df = df.dropna()
print(cleaned_df)

A  B
0 1.0 4
2 3.0 6

#DataCleaning #MissingData

---

17. df.fillna()
Fills missing (NaN) values with a specified value or method.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Score': [90, 85, np.nan, 92]})
filled_df = df.fillna(0)
print(filled_df)

Score
0 90.0
1 85.0
2 0.0
3 92.0


---

18. df.drop_duplicates()
Removes duplicate rows from the DataFrame.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Age': [25, 30, 25]}
df = pd.DataFrame(data)
unique_df = df.drop_duplicates()
print(unique_df)

Name  Age
0 Alice 25
1 Bob 30


---

19. df.rename()
Alters axes labels (e.g., column names).

import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2]})
renamed_df = df.rename(columns={'A': 'Column_A', 'B': 'Column_B'})
print(renamed_df)

Column_A  Column_B
0 1 2


---

20. series.value_counts()
Returns a Series containing counts of unique values.
import pandas as pd
s = pd.Series(['A', 'B', 'A', 'C', 'A', 'B'])
print(s.value_counts())

A    3
B 2
C 1
dtype: int64

#DataManipulation #Transformation

---

21. series.unique()
Returns an array of unique values in a Series.

import pandas as pd
s = pd.Series(['A', 'B', 'A', 'C', 'A', 'B'])
print(s.unique())

['A' 'B' 'C']


---

22. df.sort_values()
Sorts a DataFrame by the values of one or more columns.

import pandas as pd
data = {'Name': ['Charlie', 'Alice', 'Bob'], 'Age': [35, 25, 30]}
df = pd.DataFrame(data)
sorted_df = df.sort_values(by='Age')
print(sorted_df)

Name  Age
1 Alice 25
2 Bob 30
0 Charlie 35


---

23. df.groupby()
Groups a DataFrame using a mapper or by a Series of columns for aggregation.

import pandas as pd
data = {'Dept': ['HR', 'IT', 'HR', 'IT'], 'Salary': [70, 85, 75, 90]}
df = pd.DataFrame(data)
grouped = df.groupby('Dept').mean()
print(grouped)

Salary
Dept
HR 72.5
IT 87.5


---

24. df.agg()
Applies one or more aggregations over the specified axis.

import pandas as pd
data = {'Dept': ['HR', 'IT', 'HR', 'IT'], 'Salary': [70, 85, 75, 90]}
df = pd.DataFrame(data)
agg_results = df.groupby('Dept')['Salary'].agg(['mean', 'sum'])
print(agg_results)

mean  sum
Dept
HR 72.5 145
IT 87.5 175

#Aggregation #Grouping #Sorting

---

25. df.apply()
Applies a function along an axis of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})
# Apply a function to double each value in column 'A'
df['A_doubled'] = df['A'].apply(lambda x: x * 2)
print(df)

A   B  A_doubled
0 1 10 2
1 2 20 4
2 3 30 6


---

26. pd.merge()
Merges two DataFrames based on a common column or index, similar to a SQL join.

import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Role': ['Engineer', 'Analyst']})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)

ID   Name      Role
0 1 Alice Engineer
1 2 Bob Analyst


---

27. pd.concat()
Concatenates (stacks) pandas objects along a particular axis.

import pandas as pd
df1 = pd.DataFrame({'A': ['A0'], 'B': ['B0']})
df2 = pd.DataFrame({'A': ['A1'], 'B': ['B1']})
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)

A   B
0 A0 B0
0 A1 B1


---

28. df.pivot_table()
Creates a spreadsheet-style pivot table as a DataFrame.
2
import pandas as pd
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02'],
'City': ['NY', 'LA', 'NY'],
'Sales': [100, 150, 120]}
df = pd.DataFrame(data)
pivot = df.pivot_table(values='Sales', index='Date', columns='City')
print(pivot)

City             LA     NY
Date
2023-01-01 150.0 100.0
2023-01-02 NaN 120.0

#CombiningData #PivotTable

---

29. df.set_index()
Sets one or more existing columns as the DataFrame index.

import pandas as pd
data = {'ID': ['a1', 'a2'], 'Name': ['Alice', 'Bob']}
df = pd.DataFrame(data)
df_indexed = df.set_index('ID')
print(df_indexed)

Name
ID
a1 Alice
a2 Bob


---

30. df.reset_index()
Resets the index of the DataFrame, making the old index a new column.

import pandas as pd
data = {'Name': ['Alice', 'Bob']}
df = pd.DataFrame(data, index=['a1', 'a2'])
df_reset = df.reset_index()
print(df_reset)

index   Name
0 a1 Alice
1 a2 Bob

#Indexing #PandasTips

━━━━━━━━━━━━━━━
By: @DataScienceM
8
📌 The History of Bodybuilding Through Network Visualization

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-30 | ⏱️ Read time: 5 min read

Constructing the Shared Podium Graph of Mr. Olympia Winners (1965-2023) using Python and Gephi.
Top 30 MATLAB Image Processing Functions

#MATLAB #ImageProcessing #Basics

👇👇👇👇👇
Please open Telegram to view this post
VIEW IN TELEGRAM
Top 30 MATLAB Image Processing Functions

#MATLAB #ImageProcessing #Basics

#1. imread()
Reads an image from a file into a matrix.

img = imread('peppers.png');
disp('Image "peppers.png" loaded into variable "img".');

Image "peppers.png" loaded into variable "img".


#2. imshow()
Displays an image in a figure window.

img = imread('peppers.png');
imshow(img);
noscript('Peppers Image');

Output: A new figure window opens, displaying the 'peppers.png' image with the noscript "Peppers Image".


#3. imwrite()
Writes an image matrix to a file.

img = imread('cameraman.tif');
imwrite(img, 'my_cameraman.jpg');
disp('Image saved as my_cameraman.jpg');

Image saved as my_cameraman.jpg


#4. size()
Returns the dimensions of the image matrix (rows, columns, color channels).

rgb_img = imread('peppers.png');
gray_img = imread('cameraman.tif');
size_rgb = size(rgb_img);
size_gray = size(gray_img);
disp(['Size of RGB image: ', num2str(size_rgb)]);
disp(['Size of grayscale image: ', num2str(size_gray)]);

Size of RGB image: 384   512     3
Size of grayscale image: 256 256


#5. rgb2gray()
Converts an RGB color image to a grayscale intensity image.

rgb_img = imread('peppers.png');
gray_img = rgb2gray(rgb_img);
imshow(gray_img);
noscript('Grayscale Peppers');

Output: A figure window displays the grayscale version of the peppers image.

---
#MATLAB #ImageProcessing #Conversion #Transformation

#6. im2double()
Converts an image to double-precision format, scaling data to the range [0, 1].

img_uint8 = imread('cameraman.tif');
img_double = im2double(img_uint8);
disp(['Max value of original image: ', num2str(max(img_uint8(:)))]);
disp(['Max value of double image: ', num2str(max(img_double(:)))]);

Max value of original image: 253
Max value of double image: 0.99216


#7. imresize()
Resizes an image to a specified size.

img = imread('cameraman.tif');
resized_img = imresize(img, 0.5); % Resize to 50% of original size
imshow(resized_img);
noscript('Resized Cameraman');

Output: A figure window displays the cameraman image at half its original size.


#8. imrotate()
Rotates an image by a specified angle.

img = imread('cameraman.tif');
rotated_img = imrotate(img, 30, 'bilinear', 'crop');
imshow(rotated_img);
noscript('Rotated 30 Degrees');

Output: A figure window displays the cameraman image rotated by 30 degrees, cropped to the original size.


#9. imcrop()
Crops an image to a specified rectangle.

img = imread('peppers.png');
% [xmin ymin width height]
cropped_img = imcrop(img, [100 80 250 200]);
imshow(cropped_img);
noscript('Cropped Image');

Output: A figure window displays only the rectangular section specified from the peppers image.


#10. rgb2hsv()
Converts an RGB image to the Hue-Saturation-Value (HSV) color space.

rgb_img = imread('peppers.png');
hsv_img = rgb2hsv(rgb_img);
hue_channel = hsv_img(:,:,1); % Extract the Hue channel
imshow(hue_channel);
noscript('Hue Channel of Peppers Image');

Output: A figure window displays the Hue channel of the peppers image as a grayscale image.

---
#MATLAB #ImageProcessing #Enhancement

#11. imhist()
Displays the histogram of an image, showing the distribution of pixel intensity values.
1
gray_img = imread('pout.tif');
imhist(gray_img);
noscript('Histogram of a Low-Contrast Image');

Output: A figure window with a bar chart showing the intensity distribution of the 'pout.tif' image.


#12. histeq()
Enhances contrast using histogram equalization.

low_contrast_img = imread('pout.tif');
high_contrast_img = histeq(low_contrast_img);
imshow(high_contrast_img);
noscript('Histogram Equalized Image');

Output: A figure window displays a higher contrast version of the 'pout.tif' image.


#13. imadjust()
Adjusts image intensity values or colormap by mapping intensity values to new values.

img = imread('cameraman.tif');
adjusted_img = imadjust(img, [0.3 0.7], []);
imshow(adjusted_img);
noscript('Intensity Adjusted Image');

Output: A figure window showing a high-contrast version of the cameraman image, where intensities between 0.3 and 0.7 are stretched to the full [0, 1] range.


#14. imtranslate()
Translates (shifts) an image horizontally and vertically.

img = imread('cameraman.tif');
translated_img = imtranslate(img, [25, 15]); % Shift 25 pixels right, 15 pixels down
imshow(translated_img);
noscript('Translated Image');

Output: A figure window shows the cameraman image shifted to the right and down.


#15. imsharpen()
Sharpens an image using the unsharp masking method.

img = imread('peppers.png');
sharpened_img = imsharpen(img);
imshow(sharpened_img);
noscript('Sharpened Image');

Output: A figure window displays a crisper, more detailed version of the peppers image.

---
#MATLAB #ImageProcessing #Filtering #Noise

#16. imnoise()
Adds a specified type of noise to an image.

img = imread('cameraman.tif');
noisy_img = imnoise(img, 'salt & pepper', 0.02);
imshow(noisy_img);
noscript('Image with Salt & Pepper Noise');

Output: A figure window displays the cameraman image with random white and black pixels (noise).


#17. fspecial()
Creates a predefined 2-D filter kernel (e.g., for averaging, Gaussian blur, Laplacian).

h = fspecial('motion', 20, 45); % Create a motion blur filter
disp('Generated a 2D motion filter kernel.');
disp(h);

Generated a 2D motion filter kernel.
(Output is a matrix representing the filter kernel)


#18. imfilter()
Filters a multidimensional image with a specified filter kernel.

img = imread('cameraman.tif');
h = fspecial('motion', 20, 45);
motion_blur_img = imfilter(img, h, 'replicate');
imshow(motion_blur_img);
noscript('Motion Blurred Image');

Output: A figure window shows the cameraman image with a motion blur effect applied at a 45-degree angle.


#19. medfilt2()
Performs 2-D median filtering, which is excellent for removing 'salt & pepper' noise.

noisy_img = imnoise(imread('cameraman.tif'), 'salt & pepper', 0.02);
denoised_img = medfilt2(noisy_img);
imshow(denoised_img);
noscript('Denoised with Median Filter');

Output: A figure window shows the noisy image significantly cleaned up, with most salt & pepper noise removed.


#20. edge()
Finds edges in an intensity image using various algorithms (e.g., Sobel, Canny).
img = imread('cameraman.tif');
edges = edge(img, 'Canny');
imshow(edges);
noscript('Edges found with Canny Detector');

Output: A figure window displays a binary image showing only the detected edges from the original image in white.

---
#MATLAB #ImageProcessing #Segmentation #Morphology

#21. graythresh()
Computes a global image threshold from a grayscale image using Otsu's method.

img = imread('coins.png');
level = graythresh(img);
disp(['Optimal threshold level (Otsu): ', num2str(level)]);

Optimal threshold level (Otsu): 0.49412


#22. imbinarize()
Converts a grayscale image to a binary image based on a threshold.

img = imread('coins.png');
level = graythresh(img); % Find optimal threshold
bw_img = imbinarize(img, level);
imshow(bw_img);
noscript('Binarized Image (Otsu Method)');

Output: A figure window displays a black and white image of the coins.


#23. strel()
Creates a morphological structuring element (SE), which is used to probe an image in morphological operations.

se = strel('disk', 5);
disp('Created a disk-shaped structuring element with radius 5.');
disp(se);

Created a disk-shaped structuring element with radius 5.
(Output describes the strel object and shows its matrix representation)


#24. imdilate()
Dilates a binary image, making objects larger and filling small holes.

img = imread('text.png');
se = strel('line', 3, 90); % A vertical line SE
dilated_img = imdilate(img, se);
imshow(dilated_img);
noscript('Dilated Text');

Output: A figure window shows the text characters appearing thicker, especially in the vertical direction.


#25. imerode()
Erodes a binary image, shrinking objects and removing small noise.

img = imread('text.png');
se = strel('line', 3, 0); % A horizontal line SE
eroded_img = imerode(img, se);
imshow(eroded_img);
noscript('Eroded Text');

Output: A figure window shows the text characters appearing thinner, with horizontal parts possibly disappearing.

---
#MATLAB #ImageProcessing #Analysis

#26. imopen()
Performs morphological opening (erosion followed by dilation). It smooths contours and removes small objects.

original = imread('circbw.tif');
se = strel('disk', 10);
opened_img = imopen(original, se);
imshow(opened_img);
noscript('Morphologically Opened Image');

Output: A figure window displays the image with small protrusions removed and gaps between objects widened.


#27. bwareaopen()
Removes all connected components (objects) from a binary image that have fewer than a specified number of pixels.

img = imread('text.png');
cleaned_img = bwareaopen(img, 50); % Remove objects with fewer than 50 pixels
imshow(cleaned_img);
noscript('Image after removing small objects');

Output: A figure window shows the text image with small noise specks or broken parts of characters removed.


#28. bwlabel()
Labels connected components in a binary image.

img = imread('text.png');
[L, num] = bwlabel(img);
disp(['Number of connected objects found: ', num2str(num)]);

Number of connected objects found: 114


#29. regionprops()
Measures a set of properties for each labeled region in an image.
img = imread('coins.png');
bw = imbinarize(img);
stats = regionprops('table', bw, 'Centroid', 'MajorAxisLength', 'MinorAxisLength');
disp('Properties of the first 3 coins:');
disp(stats(1:3,:));

Properties of the first 3 coins:
Centroid MajorAxisLength MinorAxisLength
________ _______________ _______________

52.715 26.839 48.24 47.369
134.5 27.067 48.745 47.534
215.01 29.805 47.854 47.502


#30. hough()
Performs the Hough transform, used to detect lines in an image.

img = imrotate(imread('circuit.tif'), 33, 'crop');
edges = edge(img, 'canny');
[H, T, R] = hough(edges);
imshow(imadjust(rescale(H)), 'XData', T, 'YData', R, ...
'InitialMagnification', 'fit');
xlabel('\theta'), ylabel('\rho');
noscript('Hough Transform of Circuit');

Output: A figure window displaying the Hough transform as an image, where bright spots correspond to potential lines in the original image.


━━━━━━━━━━━━━━━
By: @DataScienceM
2
📌 Constructive Heuristics in Discrete Optimization

🗂 Category:

🕒 Date: 2024-05-30 | ⏱️ Read time: 13 min read

Obtain initial solutions for combinatorial optimization problems with Python examples
Top 100 Data Analyst Interview Questions & Answers

👇👇👇👇👇
Please open Telegram to view this post
VIEW IN TELEGRAM
Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;


#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;


#4. What is the difference between WHERE and HAVING?
A:
WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;


#5. What are the different types of SQL joins?
A:
(INNER) JOIN: Returns records that have matching values in both tables.
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);


#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;


#8. What is a primary key vs. a foreign key?
A:
• A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
• A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;


#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.
WITH DepartmentSales AS (
SELECT department, SUM(sale_amount) as total_sales
FROM sales
GROUP BY department
)
SELECT department, total_sales
FROM DepartmentSales
WHERE total_sales > 100000;

---
#11. Difference between UNION and UNION ALL?
A:
UNION combines the result sets of two or more SELECT statements and removes duplicate rows.
UNION ALL also combines result sets but includes all rows, including duplicates. It is faster because it doesn't check for duplicates.

#12. How would you find the total number of employees in each department?
A: Use COUNT() with GROUP BY.

SELECT department, COUNT(employee_id) as number_of_employees
FROM employees
GROUP BY department;


#13. What is the difference between RANK() and DENSE_RANK()?
A:
RANK() assigns a rank to each row within a partition. If there are ties, it skips the next rank(s). (e.g., 1, 2, 2, 4)
DENSE_RANK() also assigns ranks, but it does not skip any ranks in case of ties. (e.g., 1, 2, 2, 3)

#14. Write a query to get the Nth highest salary.
A: Use DENSE_RANK() in a CTE.

WITH SalaryRanks AS (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
FROM employees
)
SELECT salary
FROM SalaryRanks
WHERE rnk = 5; -- For the 5th highest salary


#15. What is COALESCE() used for?
A: The COALESCE() function returns the first non-NULL value in a list of expressions. It's useful for providing default values for nulls.

SELECT name, COALESCE(commission, 0) as commission
FROM employees; -- Replaces NULL commissions with 0

---
#16. How would you select all employees whose name starts with 'A'?
A: Use the LIKE operator with a wildcard (%).

SELECT name
FROM employees
WHERE name LIKE 'A%';


#17. Get the current date and time.
A: This is function-dependent on the SQL dialect.
• PostgreSQL/MySQL: NOW()
• SQL Server: GETDATE()

SELECT NOW();


#18. How can you extract the month from a date?
A: Use the EXTRACT function or MONTH().

-- Standard SQL
SELECT EXTRACT(MONTH FROM '2023-10-27');
-- MySQL
SELECT MONTH('2023-10-27');


#19. What is a subquery? What are the types?
A: A subquery is a query nested inside another query.
Scalar Subquery: Returns a single value (one row, one column).
Multi-row Subquery: Returns multiple rows.
Correlated Subquery: An inner query that depends on the outer query for its values. It is evaluated once for each row processed by the outer query.

#20. Write a query to find all employees who work in the 'Sales' department.
A: Use a JOIN or a subquery.

-- Using JOIN (preferred)
SELECT e.name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Sales';

---
#21. How would you calculate the month-over-month growth rate of sales?
A: Use the LAG() window function to get the previous month's sales and then apply the growth formula.

WITH MonthlySales AS (
SELECT
DATE_TRUNC('month', order_date)::DATE as sales_month,
SUM(sale_amount) as total_sales
FROM sales
GROUP BY 1
)
SELECT
sales_month,
total_sales,
(total_sales - LAG(total_sales, 1) OVER (ORDER BY sales_month)) / LAG(total_sales, 1) OVER (ORDER BY sales_month) * 100 as growth_rate
FROM MonthlySales;