🤖🧠 HunyuanWorld-Mirror: Tencent’s Breakthrough in Universal 3D Reconstruction
🗓️ 03 Nov 2025
📚 AI News & Trends
The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...
#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI
🗓️ 03 Nov 2025
📚 AI News & Trends
The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...
#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI
💡 Top 50 Operations for Signal Processing in Python
Note: Most examples use
I. Signal Generation
• Create a time vector.
• Generate a sine wave.
• Generate a square wave.
• Generate a sawtooth wave.
• Generate Gaussian white noise.
• Generate a frequency-swept cosine (chirp).
• Generate an impulse signal (unit impulse).
• Generate a Gaussian pulse.
II. Signal Visualization & Properties
• Plot a signal.
• Calculate the mean value.
• Calculate the Root Mean Square (RMS).
• Calculate the standard deviation.
• Find the maximum value and its index.
III. Frequency Domain Analysis (FFT)
• Compute the Fast Fourier Transform (FFT).
• Get the frequency bins for the FFT.
• Plot the magnitude spectrum.
• Compute the Inverse FFT (IFFT).
• Compute the Power Spectral Density (PSD) using Welch's method.
IV. Digital Filtering
• Design a Butterworth low-pass filter.
• Apply a filter to a signal (zero-phase filtering).
• Design a Chebyshev Type I high-pass filter.
• Design a Bessel band-pass filter.
• Design an FIR filter using a window method.
• Plot the frequency response of a filter.
• Apply a median filter (good for salt-and-pepper noise).
• Apply a Wiener filter for noise reduction.
V. Resampling & Windowing
• Resample a signal to a new length.
• Decimate a signal (downsample by a factor).
• Create a Hamming window.
• Apply a window to a signal segment.
Note: Most examples use
numpy, scipy.signal, and matplotlib.pyplot. Assume they are imported as:import numpy as npfrom scipy import signalimport matplotlib.pyplot as pltI. Signal Generation
• Create a time vector.
fs = 1000 # Sampling frequency
t = np.linspace(0, 1, fs, endpoint=False)
• Generate a sine wave.
freq = 50 # Hz
sine_wave = np.sin(2 * np.pi * freq * t)
• Generate a square wave.
square_wave = signal.square(2 * np.pi * freq * t)
• Generate a sawtooth wave.
sawtooth_wave = signal.sawtooth(2 * np.pi * freq * t)
• Generate Gaussian white noise.
noise = np.random.normal(0, 1, len(t))
• Generate a frequency-swept cosine (chirp).
chirp_signal = signal.chirp(t, f0=1, f1=100, t1=1, method='linear')
• Generate an impulse signal (unit impulse).
impulse = signal.unit_impulse(100, 'mid') # at index 50 of 100
• Generate a Gaussian pulse.
gaus_pulse = signal.gausspulse(t, fc=5, bw=0.5)
II. Signal Visualization & Properties
• Plot a signal.
plt.plot(t, sine_wave)
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()
• Calculate the mean value.
mean_val = np.mean(sine_wave)
• Calculate the Root Mean Square (RMS).
rms_val = np.sqrt(np.mean(sine_wave**2))
• Calculate the standard deviation.
std_dev = np.std(sine_wave)
• Find the maximum value and its index.
max_val = np.max(sine_wave)
max_idx = np.argmax(sine_wave)
III. Frequency Domain Analysis (FFT)
• Compute the Fast Fourier Transform (FFT).
from scipy.fft import fft, fftfreq
yf = fft(sine_wave)
• Get the frequency bins for the FFT.
N = len(sine_wave)
xf = fftfreq(N, 1 / fs)[:N//2]
• Plot the magnitude spectrum.
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()
• Compute the Inverse FFT (IFFT).
from scipy.fft import ifft
original_signal = ifft(yf)
• Compute the Power Spectral Density (PSD) using Welch's method.
f, Pxx_den = signal.welch(sine_wave, fs, nperseg=1024)
IV. Digital Filtering
• Design a Butterworth low-pass filter.
b, a = signal.butter(4, 100, 'low', analog=False, fs=fs)
• Apply a filter to a signal (zero-phase filtering).
noisy_signal = sine_wave + noise
filtered_signal = signal.filtfilt(b, a, noisy_signal)
• Design a Chebyshev Type I high-pass filter.
b, a = signal.cheby1(4, 5, 100, 'high', fs=fs) # 5dB ripple
• Design a Bessel band-pass filter.
b, a = signal.bessel(4, [50, 150], 'band', fs=fs)
• Design an FIR filter using a window method.
numtaps = 101
fir_coeffs = signal.firwin(numtaps, cutoff=100, fs=fs)
• Plot the frequency response of a filter.
w, h = signal.freqz(b, a, fs=fs)
plt.plot(w, 20 * np.log10(abs(h)))
• Apply a median filter (good for salt-and-pepper noise).
median_filtered = signal.medfilt(noisy_signal, kernel_size=3)
• Apply a Wiener filter for noise reduction.
wiener_filtered = signal.wiener(noisy_signal)
V. Resampling & Windowing
• Resample a signal to a new length.
resampled = signal.resample(sine_wave, num=500) # Resample to 500 points
• Decimate a signal (downsample by a factor).
decimated = signal.decimate(sine_wave, q=4) # Downsample by 4
• Create a Hamming window.
window = signal.windows.hamming(51)
• Apply a window to a signal segment.
segment = sine_wave[0:51]
windowed_segment = segment * window
VI. Convolution & Correlation
• Perform linear convolution.
sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')
• Compute cross-correlation.
# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')
• Compute auto-correlation.
# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')
VII. Time-Frequency Analysis
• Compute and plot a spectrogram.
f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()
• Perform Continuous Wavelet Transform (CWT).
widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)
• Perform Hilbert transform to get the analytic signal.
analytic_signal = signal.hilbert(sine_wave)
• Calculate instantaneous frequency.
instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)
VIII. Feature Extraction
• Find peaks in a signal.
peaks, _ = signal.find_peaks(sine_wave, height=0.5)
• Find peaks with prominence criteria.
peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)
• Differentiate a signal (e.g., to find velocity from position).
derivative = np.diff(sine_wave)
• Integrate a signal.
from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)
• Detrend a signal to remove a linear trend.
trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)
IX. System Analysis
• Define a system via a transfer function (numerator, denominator).
# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])
• Compute the step response of a system.
t_step, y_step = signal.step(system)
• Compute the impulse response of a system.
t_impulse, y_impulse = signal.impulse(system)
• Compute the Bode plot of a system's frequency response.
w, mag, phase = signal.bode(system)
X. Signal Generation from Data
• Generate a signal from a function.
t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)
• Convert a list of values to a signal array.
my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)
• Read signal data from a WAV file.
from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')
• Create a pulse train signal.
pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples
#Python #SignalProcessing #SciPy #NumPy #DSP
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
💡 Top 50 Matplotlib Commands in Python
Note: Examples assume the following imports:
I. Figure & Basic Plots
• Create a figure.
• Create a basic line plot.
• Show/display the plot.
• Save a figure to a file.
• Create a scatter plot.
• Create a bar chart.
• Create a horizontal bar chart.
• Create a histogram.
• Create a pie chart.
• Create a box plot.
• Display a 2D array or image.
• Clear the current figure.
II. Labels, Titles & Legends
• Add a noscript to the plot.
• Add a label to the x-axis.
• Add a label to the y-axis.
• Add a legend.
• Add a grid.
• Add text to the plot at specific coordinates.
• Add an annotation with an arrow.
III. Axes & Ticks
• Set the x-axis limits.
• Set the y-axis limits.
• Set the x-axis ticks and labels.
• Set the y-axis ticks and labels.
• Set a logarithmic scale on an axis.
• Set the aspect ratio of the plot.
IV. Plot Customization
• Set the color of a plot.
• Set the line style.
• Set the line width.
• Set the marker style for points.
• Set the transparency (alpha).
• Use a predefined style.
• Fill the area between two curves.
• Create an error bar plot.
• Add a horizontal line.
• Add a vertical line.
• Add a colorbar for plots like
V. Subplots (Object-Oriented Approach)
• Create a figure and a grid of subplots (preferred method).
Note: Examples assume the following imports:
import matplotlib.pyplot as pltimport numpy as npI. Figure & Basic Plots
• Create a figure.
fig = plt.figure(figsize=(8, 6))
• Create a basic line plot.
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
• Show/display the plot.
plt.show()
• Save a figure to a file.
plt.savefig("my_plot.png", dpi=300)• Create a scatter plot.
plt.scatter(x, np.cos(x))
• Create a bar chart.
categories = ['A', 'B', 'C']
values = [3, 7, 2]
plt.bar(categories, values)
• Create a horizontal bar chart.
plt.barh(categories, values)
• Create a histogram.
data = np.random.randn(1000)
plt.hist(data, bins=30)
• Create a pie chart.
plt.pie(values, labels=categories, autopct='%1.1f%%')
• Create a box plot.
plt.boxplot([data, data*2])
• Display a 2D array or image.
matrix = np.random.rand(10, 10)
plt.imshow(matrix, cmap='viridis')
• Clear the current figure.
plt.clf()
II. Labels, Titles & Legends
• Add a noscript to the plot.
plt.noscript("Sine Wave")• Add a label to the x-axis.
plt.xlabel("Time (s)")• Add a label to the y-axis.
plt.ylabel("Amplitude")• Add a legend.
plt.plot(x, np.sin(x), label='Sine')
plt.plot(x, np.cos(x), label='Cosine')
plt.legend()
• Add a grid.
plt.grid(True)
• Add text to the plot at specific coordinates.
plt.text(2, 0.5, 'An important point')
• Add an annotation with an arrow.
plt.annotate('Peak', xy=(np.pi/2, 1), xytext=(3, 1.5),
arrowprops=dict(facecolor='black', shrink=0.05))III. Axes & Ticks
• Set the x-axis limits.
plt.xlim(0, 5)
• Set the y-axis limits.
plt.ylim(-1.5, 1.5)
• Set the x-axis ticks and labels.
plt.xticks([0, np.pi, 2*np.pi], ['0', '$\pi$', '$2\pi$'])
• Set the y-axis ticks and labels.
plt.yticks([-1, 0, 1])
• Set a logarithmic scale on an axis.
plt.yscale('log')• Set the aspect ratio of the plot.
plt.axis('equal') # Other options: 'tight', 'off'IV. Plot Customization
• Set the color of a plot.
plt.plot(x, np.sin(x), color='red')
• Set the line style.
plt.plot(x, np.sin(x), linestyle='--')
• Set the line width.
plt.plot(x, np.sin(x), linewidth=3)
• Set the marker style for points.
plt.plot(x, np.sin(x), marker='o')
• Set the transparency (alpha).
plt.hist(data, alpha=0.5)
• Use a predefined style.
plt.style.use('ggplot')• Fill the area between two curves.
plt.fill_between(x, np.sin(x), np.cos(x), alpha=0.2)
• Create an error bar plot.
y_err = 0.2 * np.ones_like(x)
plt.errorbar(x, np.sin(x), yerr=y_err)
• Add a horizontal line.
plt.axhline(y=0, color='k', linestyle='-')
• Add a vertical line.
plt.axvline(x=np.pi, color='k', linestyle='-')
• Add a colorbar for plots like
imshow or scatter.plt.colorbar(label='Magnitude')
V. Subplots (Object-Oriented Approach)
• Create a figure and a grid of subplots (preferred method).
fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots
• Plot on a specific subplot (Axes object).
axes[0, 0].plot(x, np.sin(x))
• Set the noscript for a specific subplot.
axes[0, 0].set_noscript('Subplot 1')• Set labels for a specific subplot.
axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')• Add a legend to a specific subplot.
axes[0, 0].legend(['Sine'])
• Add a main noscript for the entire figure.
fig.supnoscript('Main Figure Title')• Automatically adjust subplot parameters for a tight layout.
plt.tight_layout()
• Share x or y axes between subplots.
fig, axes = plt.subplots(2, 1, sharex=True)
• Get the current Axes instance.
ax = plt.gca()
• Create a second y-axis that shares the x-axis.
ax2 = ax.twinx()
VI. Specialized Plots
• Create a contour plot.
X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)
• Create a filled contour plot.
plt.contourf(X, Y, Z)
• Create a stream plot for vector fields.
U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)
• Create a 3D surface plot.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)
#Python #Matplotlib #DataVisualization #DataScience #Plotting
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI
🗓️ 04 Nov 2025
📚 AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
🗓️ 04 Nov 2025
📚 AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
💡 Top 50 Pandas Operations in Python
(Note: Examples assume the import
I. Series & DataFrame Creation
• Create a pandas Series from a list.
• Create a DataFrame from a dictionary of lists.
• Create a DataFrame from a list of dictionaries.
• Read data from a CSV file.
• Create a date range.
II. Data Inspection & Selection
• View the first 5 rows.
• View the last 5 rows.
• Get a concise summary of the DataFrame.
• Get denoscriptive statistics for numerical columns.
• Get the dimensions of the DataFrame (rows, columns).
• Get the column labels.
• Get the index (row labels).
• Select a single column.
• Select multiple columns.
• Select rows by label/index name using
• Select rows by integer position using
• Perform boolean/conditional selection.
• Filter rows using
III. Data Cleaning
• Check for missing/null values.
• Drop rows with any missing values.
• Fill missing values with a specific value.
• Check for duplicated rows.
• Drop duplicated rows.
IV. Data Manipulation & Operations
• Drop specified labels (columns or rows).
• Rename columns.
• Set a column as the index.
• Reset the index.
• Apply a function along an axis (e.g., per column).
• Apply a function element-wise to a Series.
• Sort by values in a column.
• Sort by index.
• Change the data type of a column.
• Create a new column based on a calculation.
V. Grouping & Aggregation
(Note: Examples assume the import
import pandas as pd and import numpy as np)I. Series & DataFrame Creation
• Create a pandas Series from a list.
s = pd.Series([1, 3, 5, np.nan, 6, 8])
• Create a DataFrame from a dictionary of lists.
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)• Create a DataFrame from a list of dictionaries.
data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)• Read data from a CSV file.
df = pd.read_csv('my_file.csv')• Create a date range.
dates = pd.date_range('20230101', periods=6)II. Data Inspection & Selection
• View the first 5 rows.
df.head()
• View the last 5 rows.
df.tail()
• Get a concise summary of the DataFrame.
df.info()
• Get denoscriptive statistics for numerical columns.
df.describe()
• Get the dimensions of the DataFrame (rows, columns).
df.shape
• Get the column labels.
df.columns
• Get the index (row labels).
df.index
• Select a single column.
df['col1'] # or df.col1
• Select multiple columns.
df[['col1', 'col2']]
• Select rows by label/index name using
.loc.df.loc[0:2, ['col1']] # Select rows 0,1,2 and column 'col1'
• Select rows by integer position using
.iloc.df.iloc[0:3, 0:1] # Select first 3 rows and first column
• Perform boolean/conditional selection.
df[df['col1'] > 2]
• Filter rows using
.isin().df[df['col1'].isin([1, 3])]
III. Data Cleaning
• Check for missing/null values.
df.isnull().sum() # Returns a Series with counts of nulls per column
• Drop rows with any missing values.
df.dropna()
• Fill missing values with a specific value.
df.fillna(value=0)
• Check for duplicated rows.
df.duplicated()
• Drop duplicated rows.
df.drop_duplicates(inplace=True)
IV. Data Manipulation & Operations
• Drop specified labels (columns or rows).
df.drop('col1', axis=1) # Drop a column• Rename columns.
df.rename(columns={'col1': 'new_col1_name'})• Set a column as the index.
df.set_index('col1')• Reset the index.
df.reset_index(drop=True)
• Apply a function along an axis (e.g., per column).
df.apply(np.cumsum)
• Apply a function element-wise to a Series.
df['col1'].map(lambda x: x*100)
• Sort by values in a column.
df.sort_values(by='col1', ascending=False)
• Sort by index.
df.sort_index(axis=1, ascending=False)
• Change the data type of a column.
df['col1'].astype('float')• Create a new column based on a calculation.
df['new_col'] = df['col1'] * 2
V. Grouping & Aggregation
🔥1
• Group data by a column.
• Group by a column and get the sum.
• Apply multiple aggregation functions at once.
• Get the size of each group.
• Get the frequency counts of unique values in a Series.
• Create a pivot table.
VI. Merging, Joining & Concatenating
• Merge two DataFrames (like a SQL join).
• Concatenate (stack) DataFrames along an axis.
• Join DataFrames on their indexes.
VII. Input & Output
• Write a DataFrame to a CSV file.
• Write a DataFrame to an Excel file.
• Read data from an Excel file.
• Read from a SQL database.
VIII. Time Series & Special Operations
• Use the string accessor (
• Use the datetime accessor (
• Create a rolling window calculation.
• Create a basic plot from a Series or DataFrame.
#Python #Pandas #DataAnalysis #DataScience #Programming
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
df.groupby('col1')• Group by a column and get the sum.
df.groupby('col1').sum()• Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])• Get the size of each group.
df.groupby('col1').size()• Get the frequency counts of unique values in a Series.
df['col1'].value_counts()
• Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
VI. Merging, Joining & Concatenating
• Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')
• Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows
• Join DataFrames on their indexes.
left_df.join(right_df, how='outer')
VII. Input & Output
• Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)• Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')• Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')• Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)VIII. Time Series & Special Operations
• Use the string accessor (
.str) for Series operations.s.str.lower()
s.str.contains('pattern')
• Use the datetime accessor (
.dt) for Series operations.s.dt.year
s.dt.day_name()
• Create a rolling window calculation.
df['col1'].rolling(window=3).mean()
• Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')
#Python #Pandas #DataAnalysis #DataScience #Programming
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❤6👍1🔥1
📌 NumPy for Absolute Beginners: A Project-Based Approach to Data Analysis
🗂 Category: DATA SCIENCE
🕒 Date: 2025-11-04 | ⏱️ Read time: 14 min read
Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.
#NumPy #Python #DataAnalysis #DataScience
🗂 Category: DATA SCIENCE
🕒 Date: 2025-11-04 | ⏱️ Read time: 14 min read
Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.
#NumPy #Python #DataAnalysis #DataScience
📌 What Building My First Dashboard Taught Me About Data Storytelling
🗂 Category: DATA SCIENCE
🕒 Date: 2025-11-04 | ⏱️ Read time: 7 min read
The experience of building a first data dashboard offers a powerful lesson in data storytelling. The key takeaway is that prioritizing clarity over complexity is crucial for turning raw data into a compelling and understandable narrative. Effective dashboards don't just display metrics; they communicate insights by focusing on a clear story, ensuring the audience can easily grasp and act upon the information presented.
#DataStorytelling #DataVisualization #DashboardDesign #DataAnalytics
🗂 Category: DATA SCIENCE
🕒 Date: 2025-11-04 | ⏱️ Read time: 7 min read
The experience of building a first data dashboard offers a powerful lesson in data storytelling. The key takeaway is that prioritizing clarity over complexity is crucial for turning raw data into a compelling and understandable narrative. Effective dashboards don't just display metrics; they communicate insights by focusing on a clear story, ensuring the audience can easily grasp and act upon the information presented.
#DataStorytelling #DataVisualization #DashboardDesign #DataAnalytics
❤1
Advanced Data Analyst Certification Exam
Instructions:
This exam consists of 50 multiple-choice and scenario-based questions.
The suggested time for each question is indicated. Total Time: 75 Minutes.
• Choose the single best answer for each question.
---
Section 1: Advanced Data Wrangling & Manipulation (Pandas)
• (Time: 75s) You have a DataFrame
a)
b)
c)
d) Both A and C are correct.
• (Time: 75s)
a) 100
b) 80
c) 70
d) 110 (100 + 80 - 70)
• (Time: 90s) You have a time-series DataFrame
a)
b)
c)
d) Both A and B are correct.
• (Time: 90s) Why is using vectorized operations (e.g.,
a) Vectorized operations are easier to write.
b)
c) Vectorized operations are significantly faster as they are executed in optimized C code.
d)
• (Time: 75s) How would you select all rows where the first-level index is 'A' and the second-level index is 'one' from a MultiIndex DataFrame
a)
b)
c)
d) Both A and C can achieve this.
• (Time: 60s) Which statement best describes the difference between
a)
b)
c)
d) They are functionally identical.
• (Time: 75s) You have a time-series with missing values. Which method is most appropriate for filling
a)
b)
c)
d)
• (Time: 60s) When is it most beneficial to convert a DataFrame column to the
a) When the column contains unique numerical IDs.
b) When the column has a large number of rows but a small number of unique string values.
c) When the column is used for complex mathematical calculations.
d) When the column contains floating-point numbers.
Instructions:
This exam consists of 50 multiple-choice and scenario-based questions.
The suggested time for each question is indicated. Total Time: 75 Minutes.
• Choose the single best answer for each question.
---
Section 1: Advanced Data Wrangling & Manipulation (Pandas)
• (Time: 75s) You have a DataFrame
df with columns category and value. How do you calculate the mean and standard deviation of value for each category in a single operation?a)
df.groupby('category').agg(['mean', 'std'])b)
df.groupby('category').mean() and df.groupby('category').std()c)
df.pivot_table(index='category', values='value', aggfunc=('mean', 'std'))d) Both A and C are correct.
• (Time: 75s)
df1 has 100 rows. df2 has 80 rows. Both have a common column user_id. 70 users are present in both DataFrames. How many rows will pd.merge(df1, df2, on='user_id', how='outer') produce?a) 100
b) 80
c) 70
d) 110 (100 + 80 - 70)
• (Time: 90s) You have a time-series DataFrame
ts_df with daily sales data indexed by date. How do you downsample the data to get the total sales for each month?# Assume ts_df.index is a DatetimeIndex
a)
ts_df.resample('M').sum()b)
ts_df.groupby(pd.Grouper(freq='M')).sum()c)
ts_df.rolling('30D').sum()d) Both A and B are correct.
• (Time: 90s) Why is using vectorized operations (e.g.,
df['col1'] * 2) generally preferred over using df.apply(lambda row: row['col1'] * 2, axis=1) in pandas?a) Vectorized operations are easier to write.
b)
apply cannot be used on rows.c) Vectorized operations are significantly faster as they are executed in optimized C code.
d)
apply does not work with numerical data.• (Time: 75s) How would you select all rows where the first-level index is 'A' and the second-level index is 'one' from a MultiIndex DataFrame
df_multi?a)
df_multi.loc['A', 'one']b)
df_multi.iloc['A', 'one']c)
df_multi.xs(('A', 'one'))d) Both A and C can achieve this.
• (Time: 60s) Which statement best describes the difference between
pivot_table and groupby?a)
groupby is for numerical data, pivot_table is for categorical.b)
pivot_table is a specialized version of groupby that is used to reshape the data with a new index and columns.c)
groupby is faster but less flexible than pivot_table.d) They are functionally identical.
• (Time: 75s) You have a time-series with missing values. Which method is most appropriate for filling
NaNs by using the value of the previous valid observation?a)
df.fillna(method='bfill')b)
df.fillna(df.mean())c)
df.interpolate()d)
df.fillna(method='ffill')• (Time: 60s) When is it most beneficial to convert a DataFrame column to the
category dtype?a) When the column contains unique numerical IDs.
b) When the column has a large number of rows but a small number of unique string values.
c) When the column is used for complex mathematical calculations.
d) When the column contains floating-point numbers.
❤1🔥1
• (Time: 90s) What is the purpose of the
a) To perform data visualization directly from a DataFrame.
b) To chain together a sequence of custom functions into a clean, readable workflow.
c) To connect to a database pipeline.
d) To perform multi-threaded operations.
Section 2: Data Visualization & Interpretation
• (Time: 75s) You want to compare the distribution of house prices (a continuous variable) across several different neighborhoods (a categorical variable). Which plot is most suitable?
a) A line chart.
b) A scatter plot.
c) A box plot or a violin plot.
d) A pie chart.
• (Time: 90s) You observe a strong positive correlation between ice cream sales and crime rates. What is the most likely explanation?
a) Eating ice cream causes people to commit crimes.
b) The correlation is spurious; a confounding variable (e.g., temperature) is influencing both.
c) Committing crimes causes people to buy ice cream.
d) The data is incorrect.
• (Time: 60s) When is it appropriate to use a logarithmic scale on a chart's axis?
a) When you want to emphasize small differences between large numbers.
b) When the data spans several orders of magnitude and is highly skewed.
c) When dealing with negative values.
d) When plotting categorical data.
• (Time: 60s) A heatmap is most effective for visualizing:
a) A time-series dataset.
b) The relationship between two continuous variables.
c) A correlation matrix or the magnitude of a phenomenon over a 2D space.
d) The proportion of categories in a dataset.
• (Time: 90s) What is the primary advantage of using "faceting" (or "small multiples") in data visualization?
a) It combines all data into a single, summary plot.
b) It allows you to create 3D visualizations.
c) It enables the comparison of data distributions or relationships across many subsets of a dataset, with consistent axes.
d) It is the only way to plot geographical data.
• (Time: 75s) What does a Q-Q (Quantile-Quantile) plot primarily help you assess?
a) The correlation between two variables.
b) The central tendency of a dataset.
c) Whether a sample of data follows a specific theoretical distribution (e.g., a normal distribution).
d) The variance of a dataset.
Section 3: Statistical Concepts & Hypothesis Testing
• (Time: 75s) What is the correct definition of a p-value?
a) The probability that the null hypothesis is true.
b) The probability of observing a result as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true.
c) The probability that the alternative hypothesis is true.
d) The significance level of the test.
• (Time: 60s) A pharmaceutical company fails to reject the null hypothesis for a new drug's effectiveness, when in reality, the drug is effective. This is an example of:
a) Type I Error (False Positive)
b) Type II Error (False Negative)
c) Correct Decision
d) Standard Error
• (Time: 75s) An analyst wants to determine if there is a statistically significant difference in the average purchase amount between male and female customers. Which statistical test is most appropriate?
a) Chi-squared test
b) ANOVA
c) Paired t-test
d) Independent two-sample t-test
• (Time: 75s) To test for an association between two categorical variables, such as 'region' and 'product preference', you should use a(n):
a) Correlation coefficient
b) Chi-squared test of independence
c) T-test
d) Linear regression
.pipe() method in pandas?a) To perform data visualization directly from a DataFrame.
b) To chain together a sequence of custom functions into a clean, readable workflow.
c) To connect to a database pipeline.
d) To perform multi-threaded operations.
Section 2: Data Visualization & Interpretation
• (Time: 75s) You want to compare the distribution of house prices (a continuous variable) across several different neighborhoods (a categorical variable). Which plot is most suitable?
a) A line chart.
b) A scatter plot.
c) A box plot or a violin plot.
d) A pie chart.
• (Time: 90s) You observe a strong positive correlation between ice cream sales and crime rates. What is the most likely explanation?
a) Eating ice cream causes people to commit crimes.
b) The correlation is spurious; a confounding variable (e.g., temperature) is influencing both.
c) Committing crimes causes people to buy ice cream.
d) The data is incorrect.
• (Time: 60s) When is it appropriate to use a logarithmic scale on a chart's axis?
a) When you want to emphasize small differences between large numbers.
b) When the data spans several orders of magnitude and is highly skewed.
c) When dealing with negative values.
d) When plotting categorical data.
• (Time: 60s) A heatmap is most effective for visualizing:
a) A time-series dataset.
b) The relationship between two continuous variables.
c) A correlation matrix or the magnitude of a phenomenon over a 2D space.
d) The proportion of categories in a dataset.
• (Time: 90s) What is the primary advantage of using "faceting" (or "small multiples") in data visualization?
a) It combines all data into a single, summary plot.
b) It allows you to create 3D visualizations.
c) It enables the comparison of data distributions or relationships across many subsets of a dataset, with consistent axes.
d) It is the only way to plot geographical data.
• (Time: 75s) What does a Q-Q (Quantile-Quantile) plot primarily help you assess?
a) The correlation between two variables.
b) The central tendency of a dataset.
c) Whether a sample of data follows a specific theoretical distribution (e.g., a normal distribution).
d) The variance of a dataset.
Section 3: Statistical Concepts & Hypothesis Testing
• (Time: 75s) What is the correct definition of a p-value?
a) The probability that the null hypothesis is true.
b) The probability of observing a result as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true.
c) The probability that the alternative hypothesis is true.
d) The significance level of the test.
• (Time: 60s) A pharmaceutical company fails to reject the null hypothesis for a new drug's effectiveness, when in reality, the drug is effective. This is an example of:
a) Type I Error (False Positive)
b) Type II Error (False Negative)
c) Correct Decision
d) Standard Error
• (Time: 75s) An analyst wants to determine if there is a statistically significant difference in the average purchase amount between male and female customers. Which statistical test is most appropriate?
a) Chi-squared test
b) ANOVA
c) Paired t-test
d) Independent two-sample t-test
• (Time: 75s) To test for an association between two categorical variables, such as 'region' and 'product preference', you should use a(n):
a) Correlation coefficient
b) Chi-squared test of independence
c) T-test
d) Linear regression
🔥1