NEW BOT Телеграм, страница

Machine Learning

🤖🧠 HunyuanWorld-Mirror: Tencent’s Breakthrough in Universal 3D Reconstruction

🗓️ 03 Nov 2025
📚 AI News & Trends

The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...

#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI

608 views21:20

📖 Read More

📣 BEST TELEGRAM CHANNELS

Machine Learning

📌 Data Scientists Work in the Cloud. Here’s How to Practice This as a Student (Part 2: Python)

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-29 | ⏱️ Read time: 9 min read

Because data scientists don’t write production code in the Udemy code editor

629 views01:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

💡 Top 50 Operations for Signal Processing in Python

Note: Most examples use numpy, scipy.signal, and matplotlib.pyplot. Assume they are imported as:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

I. Signal Generation

• Create a time vector.

fs = 1000  # Sampling frequency
t = np.linspace(0, 1, fs, endpoint=False)

• Generate a sine wave.

freq = 50 # Hz
sine_wave = np.sin(2 * np.pi * freq * t)

• Generate a square wave.

square_wave = signal.square(2 * np.pi * freq * t)

• Generate a sawtooth wave.

sawtooth_wave = signal.sawtooth(2 * np.pi * freq * t)

• Generate Gaussian white noise.

noise = np.random.normal(0, 1, len(t))

• Generate a frequency-swept cosine (chirp).

chirp_signal = signal.chirp(t, f0=1, f1=100, t1=1, method='linear')

• Generate an impulse signal (unit impulse).

impulse = signal.unit_impulse(100, 'mid') # at index 50 of 100

• Generate a Gaussian pulse.

gaus_pulse = signal.gausspulse(t, fc=5, bw=0.5)

II. Signal Visualization & Properties

• Plot a signal.

plt.plot(t, sine_wave)
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()

• Calculate the mean value.

mean_val = np.mean(sine_wave)

• Calculate the Root Mean Square (RMS).

rms_val = np.sqrt(np.mean(sine_wave**2))

• Calculate the standard deviation.

std_dev = np.std(sine_wave)

• Find the maximum value and its index.

max_val = np.max(sine_wave)
max_idx = np.argmax(sine_wave)

III. Frequency Domain Analysis (FFT)

• Compute the Fast Fourier Transform (FFT).

from scipy.fft import fft, fftfreq
yf = fft(sine_wave)

• Get the frequency bins for the FFT.

N = len(sine_wave)
xf = fftfreq(N, 1 / fs)[:N//2]

• Plot the magnitude spectrum.

plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()

• Compute the Inverse FFT (IFFT).

from scipy.fft import ifft
original_signal = ifft(yf)

• Compute the Power Spectral Density (PSD) using Welch's method.

f, Pxx_den = signal.welch(sine_wave, fs, nperseg=1024)

IV. Digital Filtering

• Design a Butterworth low-pass filter.

b, a = signal.butter(4, 100, 'low', analog=False, fs=fs)

• Apply a filter to a signal (zero-phase filtering).

noisy_signal = sine_wave + noise
filtered_signal = signal.filtfilt(b, a, noisy_signal)

• Design a Chebyshev Type I high-pass filter.

b, a = signal.cheby1(4, 5, 100, 'high', fs=fs) # 5dB ripple

• Design a Bessel band-pass filter.

b, a = signal.bessel(4, [50, 150], 'band', fs=fs)

• Design an FIR filter using a window method.

numtaps = 101
fir_coeffs = signal.firwin(numtaps, cutoff=100, fs=fs)

• Plot the frequency response of a filter.

w, h = signal.freqz(b, a, fs=fs)
plt.plot(w, 20 * np.log10(abs(h)))

• Apply a median filter (good for salt-and-pepper noise).

median_filtered = signal.medfilt(noisy_signal, kernel_size=3)

• Apply a Wiener filter for noise reduction.

wiener_filtered = signal.wiener(noisy_signal)

V. Resampling & Windowing

• Resample a signal to a new length.

resampled = signal.resample(sine_wave, num=500) # Resample to 500 points

• Decimate a signal (downsample by a factor).

decimated = signal.decimate(sine_wave, q=4) # Downsample by 4

• Create a Hamming window.

window = signal.windows.hamming(51)

• Apply a window to a signal segment.

517 views04:19

Machine Learning

segment = sine_wave[0:51]
windowed_segment = segment * window

VI. Convolution & Correlation

• Perform linear convolution.

sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')

• Compute cross-correlation.

# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')

• Compute auto-correlation.

# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')

VII. Time-Frequency Analysis

• Compute and plot a spectrogram.

f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()

• Perform Continuous Wavelet Transform (CWT).

widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)

• Perform Hilbert transform to get the analytic signal.

analytic_signal = signal.hilbert(sine_wave)

• Calculate instantaneous frequency.

instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)

VIII. Feature Extraction

• Find peaks in a signal.

peaks, _ = signal.find_peaks(sine_wave, height=0.5)

• Find peaks with prominence criteria.

peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)

• Differentiate a signal (e.g., to find velocity from position).

derivative = np.diff(sine_wave)

• Integrate a signal.

from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)

• Detrend a signal to remove a linear trend.

trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)

IX. System Analysis

• Define a system via a transfer function (numerator, denominator).

# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])

• Compute the step response of a system.

t_step, y_step = signal.step(system)

• Compute the impulse response of a system.

t_impulse, y_impulse = signal.impulse(system)

• Compute the Bode plot of a system's frequency response.

w, mag, phase = signal.bode(system)

X. Signal Generation from Data

• Generate a signal from a function.

t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)

• Convert a list of values to a signal array.

my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)

• Read signal data from a WAV file.

from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')

• Create a pulse train signal.

pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples

#Python #SignalProcessing #SciPy #NumPy #DSP

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

380 views04:19

Machine Learning

💡 Top 50 Matplotlib Commands in Python

Note: Examples assume the following imports:
import matplotlib.pyplot as plt
import numpy as np

I. Figure & Basic Plots

• Create a figure.

fig = plt.figure(figsize=(8, 6))

• Create a basic line plot.

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))

• Show/display the plot.

plt.show()

• Save a figure to a file.

plt.savefig("my_plot.png", dpi=300)

• Create a scatter plot.

plt.scatter(x, np.cos(x))

• Create a bar chart.

categories = ['A', 'B', 'C']
values = [3, 7, 2]
plt.bar(categories, values)

• Create a horizontal bar chart.

plt.barh(categories, values)

• Create a histogram.

data = np.random.randn(1000)
plt.hist(data, bins=30)

• Create a pie chart.

plt.pie(values, labels=categories, autopct='%1.1f%%')

• Create a box plot.

plt.boxplot([data, data*2])

• Display a 2D array or image.

matrix = np.random.rand(10, 10)
plt.imshow(matrix, cmap='viridis')

• Clear the current figure.

plt.clf()

II. Labels, Titles & Legends

• Add a noscript to the plot.

plt.noscript("Sine Wave")

• Add a label to the x-axis.

plt.xlabel("Time (s)")

• Add a label to the y-axis.

plt.ylabel("Amplitude")

• Add a legend.

plt.plot(x, np.sin(x), label='Sine')
plt.plot(x, np.cos(x), label='Cosine')
plt.legend()

• Add a grid.

plt.grid(True)

• Add text to the plot at specific coordinates.

plt.text(2, 0.5, 'An important point')

• Add an annotation with an arrow.

plt.annotate('Peak', xy=(np.pi/2, 1), xytext=(3, 1.5),
             arrowprops=dict(facecolor='black', shrink=0.05))

III. Axes & Ticks

• Set the x-axis limits.

plt.xlim(0, 5)

• Set the y-axis limits.

plt.ylim(-1.5, 1.5)

• Set the x-axis ticks and labels.

plt.xticks([0, np.pi, 2*np.pi], ['0', '$\pi$', '$2\pi$'])

• Set the y-axis ticks and labels.

plt.yticks([-1, 0, 1])

• Set a logarithmic scale on an axis.

plt.yscale('log')

• Set the aspect ratio of the plot.

plt.axis('equal') # Other options: 'tight', 'off'

IV. Plot Customization

• Set the color of a plot.

plt.plot(x, np.sin(x), color='red')

• Set the line style.

plt.plot(x, np.sin(x), linestyle='--')

• Set the line width.

plt.plot(x, np.sin(x), linewidth=3)

• Set the marker style for points.

plt.plot(x, np.sin(x), marker='o')

• Set the transparency (alpha).

plt.hist(data, alpha=0.5)

• Use a predefined style.

plt.style.use('ggplot')

• Fill the area between two curves.

plt.fill_between(x, np.sin(x), np.cos(x), alpha=0.2)

• Create an error bar plot.

y_err = 0.2 * np.ones_like(x)
plt.errorbar(x, np.sin(x), yerr=y_err)

• Add a horizontal line.

plt.axhline(y=0, color='k', linestyle='-')

• Add a vertical line.

plt.axvline(x=np.pi, color='k', linestyle='-')

• Add a colorbar for plots like imshow or scatter.

plt.colorbar(label='Magnitude')

V. Subplots (Object-Oriented Approach)

• Create a figure and a grid of subplots (preferred method).

474 views04:21

Machine Learning

fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots

• Plot on a specific subplot (Axes object).

axes[0, 0].plot(x, np.sin(x))

• Set the noscript for a specific subplot.

axes[0, 0].set_noscript('Subplot 1')

• Set labels for a specific subplot.

axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')

• Add a legend to a specific subplot.

axes[0, 0].legend(['Sine'])

• Add a main noscript for the entire figure.

fig.supnoscript('Main Figure Title')

• Automatically adjust subplot parameters for a tight layout.

plt.tight_layout()

• Share x or y axes between subplots.

fig, axes = plt.subplots(2, 1, sharex=True)

• Get the current Axes instance.

ax = plt.gca()

• Create a second y-axis that shares the x-axis.

ax2 = ax.twinx()

VI. Specialized Plots

• Create a contour plot.

X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)

• Create a filled contour plot.

plt.contourf(X, Y, Z)

• Create a stream plot for vector fields.

U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)

• Create a 3D surface plot.

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)

#Python #Matplotlib #DataVisualization #DataScience #Plotting

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

685 views04:21

Machine Learning

📌 SQL Explained: Normal Forms

🗂 Category: DATA ENGINEERING

🕒 Date: 2024-05-29 | ⏱️ Read time: 9 min read

Applying 1st, 2nd and 3rd normal forms to a database

899 views05:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

📌 Simple Ways to Speed Up Your PyTorch Model Training

🗂 Category: MACHINE LEARNING

🕒 Date: 2024-05-28 | ⏱️ Read time: 12 min read

If all machine learning engineers want one thing, it’s faster model training - maybe after good test…

921 views09:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

📌 Fine-Tune Smaller Transformer Models: Text Classification

🗂 Category: MACHINE LEARNING

🕒 Date: 2024-05-28 | ⏱️ Read time: 22 min read

Using Microsoft’s Phi-3 to generate synthetic data

❤1

923 views13:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

📌 How I Assess the Memory Consumption of My Python Code

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2024-05-28 | ⏱️ Read time: 6 min read

Different approaches to measure the memory consumption of a variable or a function

929 views17:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

📌 Scaling Monosemanticity: Anthropic’s One Step Towards Interpretable & Manipulable LLMs

🗂 Category:

🕒 Date: 2024-05-28 | ⏱️ Read time: 13 min read

From prompt engineering to activation engineering for more controllable and safer LLMs

❤1

822 views21:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

🗓️ 04 Nov 2025
📚 AI News & Trends

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...

#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource

701 views22:21

📖 Read More

📣 BEST TELEGRAM CHANNELS

Machine Learning

📌 Introduction to Domain Adaptation- Motivation, Options, Tradeoffs

🗂 Category:

🕒 Date: 2024-05-28 | ⏱️ Read time: 15 min read

Stepping out of the “comfort zone” – part 1/3 of a deep-dive into domain adaptation…

🔥1

629 views01:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

💡 Top 50 Pandas Operations in Python

(Note: Examples assume the import import pandas as pd and import numpy as np)

I. Series & DataFrame Creation

• Create a pandas Series from a list.

s = pd.Series([1, 3, 5, np.nan, 6, 8])

• Create a DataFrame from a dictionary of lists.

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

• Create a DataFrame from a list of dictionaries.

data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)

• Read data from a CSV file.

df = pd.read_csv('my_file.csv')

• Create a date range.

dates = pd.date_range('20230101', periods=6)

II. Data Inspection & Selection

• View the first 5 rows.

df.head()

• View the last 5 rows.

df.tail()

• Get a concise summary of the DataFrame.

df.info()

• Get denoscriptive statistics for numerical columns.

df.describe()

• Get the dimensions of the DataFrame (rows, columns).

df.shape

• Get the column labels.

df.columns

• Get the index (row labels).

df.index

• Select a single column.

df['col1'] # or df.col1

• Select multiple columns.

df[['col1', 'col2']]

• Select rows by label/index name using .loc.

df.loc[0:2, ['col1']] # Select rows 0,1,2 and column 'col1'

• Select rows by integer position using .iloc.

df.iloc[0:3, 0:1] # Select first 3 rows and first column

• Perform boolean/conditional selection.

df[df['col1'] > 2]

• Filter rows using .isin().

df[df['col1'].isin([1, 3])]

III. Data Cleaning

• Check for missing/null values.

df.isnull().sum() # Returns a Series with counts of nulls per column

• Drop rows with any missing values.

df.dropna()

• Fill missing values with a specific value.

df.fillna(value=0)

• Check for duplicated rows.

df.duplicated()

• Drop duplicated rows.

df.drop_duplicates(inplace=True)

IV. Data Manipulation & Operations

• Drop specified labels (columns or rows).

df.drop('col1', axis=1) # Drop a column

• Rename columns.

df.rename(columns={'col1': 'new_col1_name'})

• Set a column as the index.

df.set_index('col1')

• Reset the index.

df.reset_index(drop=True)

• Apply a function along an axis (e.g., per column).

df.apply(np.cumsum)

• Apply a function element-wise to a Series.

df['col1'].map(lambda x: x*100)

• Sort by values in a column.

df.sort_values(by='col1', ascending=False)

• Sort by index.

df.sort_index(axis=1, ascending=False)

• Change the data type of a column.

df['col1'].astype('float')

• Create a new column based on a calculation.

df['new_col'] = df['col1'] * 2

V. Grouping & Aggregation

🔥1

586 views03:07

Machine Learning

• Group data by a column.

df.groupby('col1')

• Group by a column and get the sum.

df.groupby('col1').sum()

• Apply multiple aggregation functions at once.

df.groupby('col1').agg(['mean', 'count'])

• Get the size of each group.

df.groupby('col1').size()

• Get the frequency counts of unique values in a Series.

df['col1'].value_counts()

• Create a pivot table.

pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

VI. Merging, Joining & Concatenating

• Merge two DataFrames (like a SQL join).

pd.merge(left_df, right_df, on='key_column')

• Concatenate (stack) DataFrames along an axis.

pd.concat([df1, df2]) # Stacks rows

• Join DataFrames on their indexes.

left_df.join(right_df, how='outer')

VII. Input & Output

• Write a DataFrame to a CSV file.

df.to_csv('output.csv', index=False)

• Write a DataFrame to an Excel file.

df.to_excel('output.xlsx', sheet_name='Sheet1')

• Read data from an Excel file.

pd.read_excel('input.xlsx', sheet_name='Sheet1')

• Read from a SQL database.

pd.read_sql_query('SELECT * FROM my_table', connection_object)

VIII. Time Series & Special Operations

• Use the string accessor (.str) for Series operations.

s.str.lower()
s.str.contains('pattern')

• Use the datetime accessor (.dt) for Series operations.

s.dt.year
s.dt.day_name()

• Create a rolling window calculation.

df['col1'].rolling(window=3).mean()

• Create a basic plot from a Series or DataFrame.

df['col1'].plot(kind='hist')

#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

❤6👍1🔥1

833 views03:07

Machine Learning

• Group data by a column. df.groupby('col1') • Group by a column and get the sum. df.groupby('col1').sum() • Apply multiple aggregation functions at once. df.groupby('col1').agg(['mean', 'count']) • Get the size of each group. df.groupby('col1').size() • Get…

please more likes

764 views03:10

Machine Learning

📌 Streamlining E-commerce: Leveraging Entity Resolution for Product Matching

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-28 | ⏱️ Read time: 11 min read

How Google figures out the price of a product across websites

❤2

751 views05:19

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

📌 NumPy for Absolute Beginners: A Project-Based Approach to Data Analysis

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-04 | ⏱️ Read time: 14 min read

Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.

#NumPy #Python #DataAnalysis #DataScience

725 views08:20

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

📌 What Building My First Dashboard Taught Me About Data Storytelling

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-04 | ⏱️ Read time: 7 min read

The experience of building a first data dashboard offers a powerful lesson in data storytelling. The key takeaway is that prioritizing clarity over complexity is crucial for turning raw data into a compelling and understandable narrative. Effective dashboards don't just display metrics; they communicate insights by focusing on a clear story, ensuring the audience can easily grasp and act upon the information presented.

#DataStorytelling #DataVisualization #DashboardDesign #DataAnalytics

❤1

645 views08:47

📖 Read and Learn

🧪 Explore Data Science

Machine Learning

Advanced Data Analyst Certification Exam

Instructions:
This exam consists of 50 multiple-choice and scenario-based questions.
The suggested time for each question is indicated. Total Time: 75 Minutes.
• Choose the single best answer for each question.

---

Section 1: Advanced Data Wrangling & Manipulation (Pandas)

• (Time: 75s) You have a DataFrame df with columns category and value. How do you calculate the mean and standard deviation of value for each category in a single operation?
a) df.groupby('category').agg(['mean', 'std'])
b) df.groupby('category').mean() and df.groupby('category').std()
c) df.pivot_table(index='category', values='value', aggfunc=('mean', 'std'))
d) Both A and C are correct.

• (Time: 75s) df1 has 100 rows. df2 has 80 rows. Both have a common column user_id. 70 users are present in both DataFrames. How many rows will pd.merge(df1, df2, on='user_id', how='outer') produce?
a) 100
b) 80
c) 70
d) 110 (100 + 80 - 70)

• (Time: 90s) You have a time-series DataFrame ts_df with daily sales data indexed by date. How do you downsample the data to get the total sales for each month?

# Assume ts_df.index is a DatetimeIndex

a) ts_df.resample('M').sum()
b) ts_df.groupby(pd.Grouper(freq='M')).sum()
c) ts_df.rolling('30D').sum()
d) Both A and B are correct.

• (Time: 90s) Why is using vectorized operations (e.g., df['col1'] * 2) generally preferred over using df.apply(lambda row: row['col1'] * 2, axis=1) in pandas?
a) Vectorized operations are easier to write.
b) apply cannot be used on rows.
c) Vectorized operations are significantly faster as they are executed in optimized C code.
d) apply does not work with numerical data.

• (Time: 75s) How would you select all rows where the first-level index is 'A' and the second-level index is 'one' from a MultiIndex DataFrame df_multi?
a) df_multi.loc['A', 'one']
b) df_multi.iloc['A', 'one']
c) df_multi.xs(('A', 'one'))
d) Both A and C can achieve this.

• (Time: 60s) Which statement best describes the difference between pivot_table and groupby?
a) groupby is for numerical data, pivot_table is for categorical.
b) pivot_table is a specialized version of groupby that is used to reshape the data with a new index and columns.
c) groupby is faster but less flexible than pivot_table.
d) They are functionally identical.

• (Time: 75s) You have a time-series with missing values. Which method is most appropriate for filling NaNs by using the value of the previous valid observation?
a) df.fillna(method='bfill')
b) df.fillna(df.mean())
c) df.interpolate()
d) df.fillna(method='ffill')

• (Time: 60s) When is it most beneficial to convert a DataFrame column to the category dtype?
a) When the column contains unique numerical IDs.
b) When the column has a large number of rows but a small number of unique string values.
c) When the column is used for complex mathematical calculations.
d) When the column contains floating-point numbers.

❤1🔥1

890 views09:24

Machine Learning

• (Time: 90s) What is the purpose of the .pipe() method in pandas?
a) To perform data visualization directly from a DataFrame.
b) To chain together a sequence of custom functions into a clean, readable workflow.
c) To connect to a database pipeline.
d) To perform multi-threaded operations.

Section 2: Data Visualization & Interpretation

• (Time: 75s) You want to compare the distribution of house prices (a continuous variable) across several different neighborhoods (a categorical variable). Which plot is most suitable?
a) A line chart.
b) A scatter plot.
c) A box plot or a violin plot.
d) A pie chart.

• (Time: 90s) You observe a strong positive correlation between ice cream sales and crime rates. What is the most likely explanation?
a) Eating ice cream causes people to commit crimes.
b) The correlation is spurious; a confounding variable (e.g., temperature) is influencing both.
c) Committing crimes causes people to buy ice cream.
d) The data is incorrect.

• (Time: 60s) When is it appropriate to use a logarithmic scale on a chart's axis?
a) When you want to emphasize small differences between large numbers.
b) When the data spans several orders of magnitude and is highly skewed.
c) When dealing with negative values.
d) When plotting categorical data.

• (Time: 60s) A heatmap is most effective for visualizing:
a) A time-series dataset.
b) The relationship between two continuous variables.
c) A correlation matrix or the magnitude of a phenomenon over a 2D space.
d) The proportion of categories in a dataset.

• (Time: 90s) What is the primary advantage of using "faceting" (or "small multiples") in data visualization?
a) It combines all data into a single, summary plot.
b) It allows you to create 3D visualizations.
c) It enables the comparison of data distributions or relationships across many subsets of a dataset, with consistent axes.
d) It is the only way to plot geographical data.

• (Time: 75s) What does a Q-Q (Quantile-Quantile) plot primarily help you assess?
a) The correlation between two variables.
b) The central tendency of a dataset.
c) Whether a sample of data follows a specific theoretical distribution (e.g., a normal distribution).
d) The variance of a dataset.

Section 3: Statistical Concepts & Hypothesis Testing

• (Time: 75s) What is the correct definition of a p-value?
a) The probability that the null hypothesis is true.
b) The probability of observing a result as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true.
c) The probability that the alternative hypothesis is true.
d) The significance level of the test.

• (Time: 60s) A pharmaceutical company fails to reject the null hypothesis for a new drug's effectiveness, when in reality, the drug is effective. This is an example of:
a) Type I Error (False Positive)
b) Type II Error (False Negative)
c) Correct Decision
d) Standard Error

• (Time: 75s) An analyst wants to determine if there is a statistically significant difference in the average purchase amount between male and female customers. Which statistical test is most appropriate?
a) Chi-squared test
b) ANOVA
c) Paired t-test
d) Independent two-sample t-test

• (Time: 75s) To test for an association between two categorical variables, such as 'region' and 'product preference', you should use a(n):
a) Correlation coefficient
b) Chi-squared test of independence
c) T-test
d) Linear regression

🔥1

485 views09:24

About

Blog

Apps

Platform