Machine Learning – Telegram
Machine Learning
39.2K subscribers
3.85K photos
32 videos
42 files
1.31K links
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
📌 What Do Machine Learning Engineers Do?

🗂 Category: MACHINE LEARNING

🕒 Date: 2025-03-25 | ⏱️ Read time: 8 min read

Breaking down my role as a machine learning engineer
2
📌 From Fuzzy to Precise: How a Morphological Feature Extractor Enhances AI’s Recognition Capabilities

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2025-03-25 | ⏱️ Read time: 22 min read

Mimicking human visual perception to truly understand objects
1
📌 Build Your Own AI Coding Assistant in JupyterLab with Ollama and Hugging Face

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2025-03-24 | ⏱️ Read time: 8 min read

A step-by-step guide to creating a local coding assistant without sending your data to the…
1
📌 Evolving Product Operating Models in the Age of AI

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2025-03-21 | ⏱️ Read time: 14 min read

This article explores how the product operating model, and the core competencies of empowered product…
3
📌 No More Tableau Downtime: Metadata API for Proactive Data Health

🗂 Category: DATA SCIENCE

🕒 Date: 2025-03-21 | ⏱️ Read time: 14 min read

Leverage the power of the Metadata API to act on any potential data disruptions
📌 What Germany Currently Is Up To, Debt-Wise

🗂 Category: DATA SCIENCE

🕒 Date: 2025-03-21 | ⏱️ Read time: 6 min read

Billions, visualized to scale using python and HTML
📌 Google’s Data Science Agent: Can It Really Do Your Job?

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2025-03-21 | ⏱️ Read time: 11 min read

I tested Google’s Data Science Agent in Colab—here’s what it got right (and where it…
In Python, handling CSV files is straightforward using the built-in csv module for reading and writing tabular data, or pandas for advanced analysis—essential for data processing tasks like importing/exporting datasets in interviews.

# Reading CSV with csv module (basic)
import csv
with open('data.csv', 'r') as file:
reader = csv.reader(file)
data = list(reader) # data = [['Name', 'Age'], ['Alice', '30'], ['Bob', '25']]

# Writing CSV with csv module
import csv
with open('output.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Name', 'Age']) # Header
writer.writerows([['Alice', 30], ['Bob', 25]]) # Data rows

# Advanced: Reading with pandas (handles headers, missing values)
import pandas as pd
df = pd.read_csv('data.csv') # df = DataFrame with columns 'Name', 'Age'
print(df.head()) # Output: First 5 rows preview

# Writing with pandas
df.to_csv('output.csv', index=False) # Saves without row indices


#python #csv #pandas #datahandling #fileio #interviewtips

👉 @DataScience4
📌 Data Visualization Explained (Part 4): A Review of Python Essentials

🗂 Category: DATA SCIENCE

🕒 Date: 2025-10-25 | ⏱️ Read time: 8 min read

Learn the foundations of Python to take your data visualization game to the next level.
📌 Building a Geospatial Lakehouse with Open Source and Databricks

🗂 Category: DATA ENGINEERING

🕒 Date: 2025-10-25 | ⏱️ Read time: 10 min read

An example workflow for vector geospatial data science
3🔥1
📌 Agentic AI from First Principles: Reflection

🗂 Category: AGENTIC AI

🕒 Date: 2025-10-24 | ⏱️ Read time: 21 min read

From theory to code: building feedback loops that improve LLM accuracy
2
📌 How to Consistently Extract Metadata from Complex Documents

🗂 Category: LLM APPLICATIONS

🕒 Date: 2025-10-24 | ⏱️ Read time: 8 min read

Learn how to extract important pieces of information from your documents
📌 Choosing the Best Model Size and Dataset Size under a Fixed Budget for LLMs

🗂 Category: LARGE LANGUAGE MODELS

🕒 Date: 2025-10-24 | ⏱️ Read time: 5 min read

A small-scale exploration using Tiny Transformers
3
📌 Deploy an OpenAI Agent Builder Chatbot to a Website

🗂 Category: AGENTIC AI

🕒 Date: 2025-10-24 | ⏱️ Read time: 12 min read

Using OpenAI’s Agent Builder ChatKit
2
📌 When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2025-10-23 | ⏱️ Read time: 8 min read

Exploring the frequency fingerprints of Transformers to guide smarter knowledge distillation
📌 How to Keep AI Costs Under Control

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2025-10-23 | ⏱️ Read time: 4 min read

Lessons from Scaling LLMs
Sepp Hochreiter, who invented LSTM 30+ year ago, gave a keynote talk at Neurips 2024 and introduced xLSTM (Extended Long Short-Term Memory).

I designed this Excel exercise to help you understand how xLSTM works.

More: https://www.byhand.ai/p/xlstm
In Python, image processing unlocks powerful capabilities for computer vision, data augmentation, and automation—master these techniques to excel in ML engineering interviews and real-world applications! 🖼

# PIL/Pillow Basics - The essential image library
from PIL import Image

# Open and display image
img = Image.open("input.jpg")
img.show()

# Convert formats
img.save("output.png")
img.convert("L").save("grayscale.jpg") # RGB to grayscale

# Basic transformations
img.rotate(90).save("rotated.jpg")
img.resize((300, 300)).save("resized.jpg")
img.transpose(Image.FLIP_LEFT_RIGHT).save("mirrored.jpg")


# Advanced Manipulation - Professional editing
from PIL import ImageEnhance, ImageFilter

# Adjust brightness/contrast
enhancer = ImageEnhance.Brightness(img)
bright_img = enhancer.enhance(1.5) # 50% brighter

# Apply filters
blurred = img.filter(ImageFilter.BLUR)
sharpened = img.filter(ImageFilter.SHARPEN)
edges = img.filter(ImageFilter.FIND_EDGES)

# Color manipulation
color_enhancer = ImageEnhance.Color(img)
color_enhancer.enhance(2.0).save("vibrant.jpg") # Double saturation


# OpenCV Integration - Computer vision powerhouse
import cv2
import numpy as np

# Read and convert color spaces
cv_img = cv2.imread("input.jpg")
rgb_img = cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB)
hsv_img = cv2.cvtColor(cv_img, cv2.COLOR_BGR2HSV)

# Edge detection (Canny algorithm)
edges = cv2.Canny(cv_img, 100, 200)
cv2.imwrite("edges.jpg", edges)

# Face detection (interview favorite)
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
faces = face_cascade.detectMultiScale(rgb_img, 1.3, 5)
for (x, y, w, h) in faces:
cv2.rectangle(cv_img, (x, y), (x+w, y+h), (255, 0, 0), 2)
cv2.imwrite("faces.jpg", cv_img)


# Batch Processing - Production automation
import os
from PIL import Image

def process_images(input_dir, output_dir):
os.makedirs(output_dir, exist_ok=True)
for filename in os.listdir(input_dir):
if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
with Image.open(os.path.join(input_dir, filename)) as img:
# Resize while maintaining aspect ratio
img.thumbnail((800, 800))
# Apply watermark
watermark = Image.open("watermark.png")
img.paste(watermark, (img.width - watermark.width, img.height - watermark.height), watermark)
img.save(os.path.join(output_dir, filename))

process_images("raw_photos", "processed")


# Image Augmentation - Deep learning preparation
from torchvision import transforms

transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.RandomRotation(15),
transforms.Resize((224, 224)),
transforms.ToTensor()
])

# Apply to dataset
augmented_img = transform(img)


# EXIF Data Handling - Privacy/security critical
from PIL import Image

img = Image.open("photo_with_gps.jpg")

# Strip metadata (security interview question)
data = list(img.getdata())
clean_img = Image.new(img.mode, img.size)
clean_img.putdata(data)
clean_img.save("clean.jpg", "JPEG", exif=b"")

# Read specific metadata
exif = img.getexif()
if 36867 in exif: # DateTimeOriginal
print(exif[36867])


# Image Segmentation - Advanced computer vision
import numpy as np
import cv2

img = cv2.imread('input.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY_INV)

# Morphological operations
kernel = np.ones((2,2), np.uint8)
dilated = cv2.dilate(thresh, kernel, iterations=1)

# Find contours
contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
area = cv2.contourArea(cnt)
if area > 100: # Filter small contours
x, y, w, h = cv2.boundingRect(cnt)
cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)

cv2.imwrite("segmented.jpg", img)
# Memory Optimization - Handle large images
from PIL import Image

# Process without loading entire image
with Image.open("huge_image.tiff") as img:
# Work with tiles
for i, (x, y, w, h) in enumerate(img.tile):
tile = img.crop((x, y, x+w, y+h))
# Process tile
processed_tile = tile.filter(ImageFilter.SHARPEN)
# Paste back
img.paste(processed_tile, (x, y))
img.save("optimized.tiff")


# Async Processing - Modern Python requirement
import asyncio
from PIL import Image

async def process_image_async(filename):
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
None,
lambda: Image.open(filename).resize((500, 500)).save(f"thumb_{filename}")
)

async def main():
tasks = [process_image_async(f) for f in ["img1.jpg", "img2.jpg", "img3.jpg"]]
await asyncio.gather(*tasks)

asyncio.run(main())


# Cloud Integration - Production pipeline
from google.cloud import storage
from PIL import Image
import io

def process_gcs_image(bucket_name, source_blob, destination_blob):
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)

# Download from GCS
blob = bucket.blob(source_blob)
img_data = blob.download_as_bytes()
img = Image.open(io.BytesIO(img_data))

# Process image
img = img.convert("RGB").resize((1024, 1024))

# Upload back to GCS
buffer = io.BytesIO()
img.save(buffer, "JPEG")
bucket.blob(destination_blob).upload_from_string(
buffer.getvalue(),
content_type="image/jpeg"
)

process_gcs_image("user-photos", "raw/photo.jpg", "processed/photo.jpg")


# Dockerized Service - Deployment pattern
# Dockerfile snippet:
# FROM python:3.10-slim
# RUN pip install pillow opencv-python
# COPY image_service.py /app/
# CMD ["python", "/app/image_service.py"]

# image_service.py
from http.server import HTTPServer, BaseHTTPRequestHandler
from PIL import Image
import io

class ImageHandler(BaseHTTPRequestHandler):
def do_POST(self):
content_len = int(self.headers['Content-Length'])
img_data = self.rfile.read(content_len)

# Process image
img = Image.open(io.BytesIO(img_data))
img = img.resize((800, 800))

# Return processed image
buffer = io.BytesIO()
img.save(buffer, "JPEG")
self.send_response(200)
self.send_header("Content-Type", "image/jpeg")
self.end_headers()
self.wfile.write(buffer.getvalue())

HTTPServer(('', 8000), ImageHandler).serve_forever()


# Performance Benchmarking - Optimization proof
import time
from PIL import Image

# Compare resize methods
start = time.time()
for _ in range(100):
Image.open("input.jpg").resize((500, 500), Image.LANCZOS)
lanczos_time = time.time() - start

start = time.time()
for _ in range(100):
Image.open("input.jpg").resize((500, 500), Image.NEAREST)
nearest_time = time.time() - start

print(f"LANCZOS: {lanczos_time:.2f}s | NEAREST: {nearest_time:.2f}s")
# Output: LANCZOS: 4.20s | NEAREST: 1.80s (Quality vs Speed tradeoff)


# Image Hashing - Deduplication solution
import imagehash
from PIL import Image

def find_duplicates(image_dir, threshold=5):
hashes = {}
for filename in os.listdir(image_dir):
img = Image.open(os.path.join(image_dir, filename))
img_hash = imagehash.phash(img)

# Find similar images
for existing_hash, files in hashes.items():
if img_hash - existing_hash < threshold:
files.append(filename)
break
else:
hashes[img_hash] = [filename]

return [files for files in hashes.values() if len(files) > 1]

duplicates = find_duplicates("user_uploads")
# OCR Processing - Text extraction
import pytesseract
from PIL import Image

# Configure Tesseract path (if needed)
# pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'

img = Image.open("document.jpg")
text = pytesseract.image_to_string(img, lang='eng')
print(text[:200]) # First 200 characters

# Get bounding boxes for text
data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
for i, word in enumerate(data['text']):
if word:
x, y, w, h = data['left'][i], data['top'][i], data['width'][i], data['height'][i]
print(f"Word: {word} | Position: ({x},{y}) Size: {w}x{h}")


# Generative Art - Creative application
from PIL import Image, ImageDraw
import random

img = Image.new('RGB', (800, 800), (255, 255, 255))
draw = ImageDraw.Draw(img)

# Generate random geometric pattern
for _ in range(1000):
x1, y1 = random.randint(0, 800), random.randint(0, 800)
x2, y2 = x1 + random.randint(-100, 100), y1 + random.randint(-100, 100)
color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
draw.line([x1, y1, x2, y2], fill=color, width=random.randint(1, 5))

img.save("generative_art.jpg")


# Image Steganography - Security technique
def hide_message(image_path, message, output_path):
img = Image.open(image_path)
binary_message = ''.join(format(ord(c), '08b') for c in message) + '1111111111111110'

pixels = list(img.getdata())
new_pixels = []
bit_index = 0

for pixel in pixels:
if bit_index < len(binary_message):
r, g, b = pixel[:3]
r = (r & ~1) | int(binary_message[bit_index])
bit_index += 1
new_pixels.append((r, g, b))
else:
new_pixels.append(pixel)

img.putdata(new_pixels)
img.save(output_path)

hide_message("cover.jpg", "Secret message", "stego.png")


# Interview Power Move: Custom Filter Implementation
import numpy as np
from PIL import Image

def apply_sepia(img):
# Convert to numpy array
img_array = np.array(img)

# Sepia matrix (BT.709 coefficients)
sepia_matrix = np.array([
[0.393, 0.769, 0.189],
[0.349, 0.686, 0.168],
[0.272, 0.534, 0.131]
])

# Apply matrix multiplication
sepia_array = img_array @ sepia_matrix.T
sepia_array = np.clip(sepia_array, 0, 255).astype(np.uint8)

return Image.fromarray(sepia_array)

sepia_img = apply_sepia(Image.open("input.jpg"))
sepia_img.save("sepia.jpg")


# Pro Tip: Memory-Mapped Processing for Gigapixel Images
import numpy as np
from PIL import Image

def process_gigapixel(image_path):
# Create memory-mapped array
img = Image.open(image_path)
mmap_file = np.memmap('temp.mmap', dtype='uint8', mode='w+', shape=img.size[::-1] + (3,))

# Process in chunks
chunk_size = (1000, 1000)
for y in range(0, img.height, chunk_size[1]):
for x in range(0, img.width, chunk_size[0]):
box = (x, y, min(x+chunk_size[0], img.width), min(y+chunk_size[1], img.height))
chunk = np.array(img.crop(box))
mmap_file[y:box[3], x:box[2]] = chunk

# Process entire image in memory-mapped array
mmap_file = np.where(mmap_file > 128, 255, 0) # Binarize
Image.fromarray(mmap_file).save("processed.jpg")

process_gigapixel("gigapixel.tiff")
# Real-World Case Study: E-commerce Product Pipeline
import boto3
from PIL import Image
import io

def process_product_image(s3_bucket, s3_key):
# 1. Download from S3
s3 = boto3.client('s3')
response = s3.get_object(Bucket=s3_bucket, Key=s3_key)
img = Image.open(io.BytesIO(response['Body'].read()))

# 2. Standardize dimensions
img = img.convert("RGB")
img = img.resize((1200, 1200), Image.LANCZOS)

# 3. Remove background (simplified)
# In practice: use rembg or AWS Rekognition
img = remove_background(img)

# 4. Generate variants
variants = {
"web": img.resize((800, 800)),
"mobile": img.resize((400, 400)),
"thumbnail": img.resize((100, 100))
}

# 5. Upload to CDN
for name, variant in variants.items():
buffer = io.BytesIO()
variant.save(buffer, "JPEG", quality=95)
s3.upload_fileobj(
buffer,
"cdn-bucket",
f"products/{s3_key.split('/')[-1].split('.')[0]}_{name}.jpg",
ExtraArgs={'ContentType': 'image/jpeg', 'CacheControl': 'max-age=31536000'}
)

# 6. Generate WebP version
webp_buffer = io.BytesIO()
img.save(webp_buffer, "WEBP", quality=85)
s3.upload_fileobj(webp_buffer, "cdn-bucket", f"products/{s3_key.split('/')[-1].split('.')[0]}.webp")

process_product_image("user-uploads", "products/summer_dress.jpg")


By: @DataScienceM 👁

#Python #ImageProcessing #ComputerVision #Pillow #OpenCV #MachineLearning #CodingInterview #DataScience #Programming #TechJobs #DeveloperTips #AI #DeepLearning #CloudComputing #Docker #BackendDevelopment #SoftwareEngineering #CareerGrowth #TechTips #Python3
1