Data Science & Machine Learning – Telegram
Data Science & Machine Learning
72.5K subscribers
770 photos
2 videos
68 files
680 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
Some important questions to crack data science interview

Q. Describe how Gradient Boosting works.

A. Gradient boosting is a type of machine learning boosting. It relies on the intuition that the best possible next model, when combined with previous models, minimizes the overall prediction error. If a small change in the prediction for a case causes no change in error, then next target outcome of the case is zero. Gradient boosting produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.


Q. Describe the decision tree model.

A. Decision Trees are a type of Supervised Machine Learning where the data is continuously split according to a certain parameter. The leaves are the decisions or the final outcomes. A decision tree is a machine learning algorithm that partitions the data into subsets.


Q. What is a neural network?

A. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. They, also known as Artificial Neural Networks, are the subset of Deep Learning.


Q. Explain the Bias-Variance Tradeoff

A. The bias–variance tradeoff is the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters.


Q. What’s the difference between L1 and L2 regularization?

A. The main intuitive difference between the L1 and L2 regularization is that L1 regularization tries to estimate the median of the data while the L2 regularization tries to estimate the mean of the data to avoid overfitting. That value will also be the median of the data distribution mathematically.

ENJOY LEARNING 👍👍
👍7
Which of the following is not a python library?
Anonymous Quiz
1%
Pandas
2%
Numpy
4%
Matplotlib
80%
Dictionary
13%
Seaborn
👍2
Which of the following is used specifically for applying machine learning algorithms?
Anonymous Quiz
14%
Matplotlib
71%
Scikit-learn
7%
Seaborn
8%
Scipy
Some important questions to crack data science interview Part-2

𝐐1. 𝐩-𝐯𝐚𝐥𝐮𝐞?

𝐀ns. p-value is a measure of the probability that an observed difference could have occurred just by random chance. The lower the p-value, the greater the statistical significance of the observed difference. P-value can be used as an alternative to or in addition to pre-selected confidence levels for hypothesis testing.


𝐐2. 𝐈𝐧𝐭𝐞𝐫𝐩𝐨𝐥𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐄𝐱𝐭𝐫𝐚𝐩𝐨𝐥𝐚𝐭𝐢𝐨𝐧?

𝐀ns. Interpolation is the process of calculating the unknown value from known given values whereas extrapolation is the process of calculating unknown values beyond the given data points.



𝐐3. 𝐔𝐧𝐢𝐟𝐨𝐫𝐦𝐞𝐝 𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 & 𝐧𝐨𝐫𝐦𝐚𝐥 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧?

𝐀ns. The normal distribution is bell-shaped, which means value near the center of the distribution are more likely to occur as opposed to values on the tails of the distribution. The uniform distribution is rectangular-shaped, which means every value in the distribution is equally likely to occur.

𝐐4. 𝐑𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝𝐞𝐫 𝐒𝐲𝐬𝐭𝐞𝐦𝐬?

𝐀ns. The recommender system mainly deals with the likes and dislikes of the users. Its major objective is to recommend an item to a user which has a high chance of liking or is in need of a particular user based on his previous purchases. It is like having a personalized team who can understand our likes and dislikes and help us in making the decisions regarding a particular item without being biased by any means by making use of a large amount of data in the repositories which are generated day by day.

𝐐5. 𝐉𝐎𝐈𝐍 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐢𝐧 𝐒𝐐𝐋

𝐀ns. The SQL Joins clause is used to combine records from two or more tables in a database.

𝐐6. 𝐒𝐪𝐮𝐚𝐫𝐞𝐝 𝐞𝐫𝐫𝐨𝐫 𝐚𝐧𝐝 𝐚𝐛𝐬𝐨𝐥𝐮𝐭𝐞 𝐞𝐫𝐫𝐨𝐫?

𝐀ns. mean squared error (MSE), and mean absolute error (MAE) are used to evaluate the regression problem's accuracy. The squared error is everywhere differentiable, while the absolute error is not (its derivative is undefined at 0). This makes the squared error more amenable to the techniques of mathematical optimization.

ENJOY LEARNING 👍👍
👍2
Interview Questions with Answers
Part-3
👇👇

Q. What is the loss function SVM tries to minimize?


A. Although there is no "loss function" for hard-margin SVMs, the loss does exist when solving soft-margin SVMs. The hinge loss is a loss function used in machine learning to train classifiers. For "maximum-margin" classification, the hinge loss is utilised, most notably for support vector machines (SVMs).



Q. Detect heteroscedasticity in simple linear regression?

A. Heteroscedasticity refers to the situation where the spread of the residuals changes in a systematic way over the range of observed values. A fitted value vs. residual plot is the simplest technique to determine heteroscedasticity. The "cone" form is a clear marker of heteroscedasticity if the residuals become significantly more spread out as the fitted values get greater. The Breusch-Pagan test is a more formal, mathematical method of determining heteroscedasticity.


Q. Explain ANOVA?

A. The analysis of variance (ANOVA) is a statistical technique for determining if the means of two or more groups differ significantly. One-way ANOVA, two-way ANOVA, and multivariate ANOVA are the three types. An ANOVA's null hypothesis is that there is no significant difference between the groups. The alternative hypothesis proposes that the groups have at least one substantial difference. The null hypothesis is rejected and the alternative hypothesis is validated if the p-value associated with the F is less than.05. If the null hypothesis is rejected, one concludes that the means of all the groups are not equal.


Q. Determine no. of neighbors in KNN?

A. The number of neighbors(K) in KNN is a hyperparameter that must be chosen during model construction. According to research, there is no ideal number of neighbors for all types of data sets. A small number of neighbors is the most flexible fit, resulting in low bias but high variation, whereas a big number of neighbors results in a smoother decision boundary, resulting in reduced variance but higher bias. If the number of classes is even, data scientists usually choose an odd number. You can also test the model's performance by creating it with different values of k and comparing the results. Elbow technique is another option.



Q. What do you mean by central trend?

A. The central trend is a denoscription of a dataset represented by a single value that represents the data distribution's center. The following measurements can be used to describe the central tendency of a dataset. The sum of all values in a dataset divided by the total number of values is the mean. The middle value in an ascending-ordered dataset is called the median. The most often occurring value in a dataset is defined by the mode. Despite the fact that the measures listed above are the most generally employed to describe central tendency, there are others, such as geometric mean, harmonic mean, midrange, and geometric median.


ENJOY LEARNING 👍👍
👍1
▪️11 MACHIN LEARNING METHODS YOU SHOULD LEARN

1. Regression
2. Classification
3. Clustering
4. Dimensionality Reduction
5. Ensemble Methods
6. Neural Networks and Deep Learning
7. Transfer learning
8. Reinforcement Learning
9. Natural Language Processing
10. Computer Vision
11. Word Embeddings
👍2
DATA SCIENCE INTERVIEW QUESTIONS
[PART-4]

Q. Why does overfitting occur?

A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model


Q. What is ensemble learning?

A. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.


Q. What is F1 score?

A. The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall.


Q. What is pickling and unpickling?

A.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.


Q. What is lambda function?

A. Python Lambda Functions are anonymous function means that the function is without a name. As we already know that the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python.


Q. What is the trade of between bias and variance ?

A. Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.

ENJOY LEARNING 👍👍
👍1
DATA SCIENCE INTERVIEW QUESTIONS
[PART-5]

Q. Ways to avoid overfitting

A. Some steps that we can take to avoid it:

1. Data augmentation

2. L1/L2 Regularization

3. Remove layers / number of units per layer

4. Cross-validation



Q. Image classification algorithms

A. Image Classification algorithms are the algorithms which are used to classify labels for images using their characteristics. Example: Convolutional Neural Networks.



Q. args will return?

A. The special syntax *args in function is used to pass a variable number of arguments to a function. It is used to pass a non-key worded, variable-length argument list. The syntax is to use the symbol * to take in a variable number of arguments; by convention, it is often used with the word args.



Q. Difference between having and where clause in SQL.

A. WHERE Clause is used to filter the records from the table based on the specified condition. HAVING Clause is used to filter record from the groups based on the specified condition.



Q. How do you handle categorical data?

A. One-Hot Encoding is the most common, correct way to deal with non-ordinal categorical data. It consists of creating an additional feature for each group of the categorical feature and mark each observation belonging (Value=1) or not (Value=0) to that group.



Q. What Is Interpolation And Extrapolation?

A. Interpolation is the process of calculating the unknown value from known given values whereas extrapolation is the process of calculating unknown values beyond the given data points.



Q. SQL joins and Groups

A. The SQL Joins clause is used to combine records from two or more tables in a database. The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country".



Q. How do you handle null values and which Imputation method is more favorable?

A. Ways to handle missing values in the dataset:

Deleting Rows with missing values.

Impute missing values for continuous variable.

Impute missing values for categorical variable.

Other Imputation Methods.

Using Algorithms that support missing values.

Prediction of missing values.

Multiple imputation is more advantageous than the single imputation because it uses several complete data sets and provides both the within-imputation and between-imputation variability.

ENJOY LEARNING 👍👍
Linear Regression Algorithm

Linear regression is a supervised learning algorithm used to model the relationships between observed variables. The idea behind simple linear regression is to "fit" the observations of two variables into a linear relationship between them by drawing the best-fit line "closest" to the points.

Regression is a common process used in many applications of statistics in the real world. There are two main types of applications:

Predictions: After a series of observations of variables, regression analysis gives a statistical model for the relationship between the variables. This model can be used to generate predictions: given two variables x and y, the model can predict values of y given future observations of x. This idea can be used to predict the outcome of political elections, the behavior of the stock market, or the performance of a professional athlete.

Correlation: The model given by a regression analysis will often fit some kinds of data better than others. This can be used to analyze correlations between variables and to refine a statistical model to incorporate further inputs: if the model describes certain subsets of the data points very well, but is a poor predictor for other data points, it can be instructive to examine the differences between the different types of data points for a possible explanation. This type of application is common in scientific tests, e.g. of the effects of a proposed drug on the patients in a controlled study.

Although many measures of best fit are possible, for most applications the best-fitting line is found using the method of least squares. That is, viewing y as a linear function of x, the method finds the linear function which minimizes the sum of the squares of the errors.
👍3
Which line is used for linear regression Algorithms?
Anonymous Quiz
9%
Over-fit line
23%
Maximum slope line
4%
Under fit line
64%
Best-fit line
Data Science Interview Questions
[PART- 6]


Q.1 Describe how Gradient Boosting works.

A. Gradient boosting is a type of machine learning boosting. It relies on the intuition that the best possible next model, when combined with previous models, minimizes the overall prediction error. If a small change in the prediction for a case causes no change in error, then next target outcome of the case is zero. Gradient boosting produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.



Q.2 Describe the decision tree model.

A. Decision Trees are a type of Supervised Machine Learning where the data is continuously split according to a certain parameter. The leaves are the decisions or the final outcomes. A decision tree is a machine learning algorithm that partitions the data into subsets.



Q.3 What is a neural network?

A. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. They, also known as Artificial Neural Networks, are the subset of Deep Learning.



Q.4 Explain the Bias-Variance Tradeoff

A. The bias–variance tradeoff is the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters.



Q.5 What’s the difference between L1 and L2 regularization?

A. The main intuitive difference between the L1 and L2 regularization is that L1 regularization tries to estimate the median of the data while the L2 regularization tries to estimate the mean of the data to avoid overfitting. That value will also be the median of the data distribution mathematically.

ENJOY LEARNING 👍👍
Today's Question - 
How would you predict who will renew their subnoscription next month? What data would you need to solve this? What analysis would you do? Would you build predictive models? If so, which algorithms?


Let’s assume that we’re trying to predict renewal rate for Netflix subnoscription. So our problem statement is to predict which users will renew their subnoscription plan for the next month.

Next, we must understand the data that is needed to solve this problem. In this case, we need to check the number of hours the channel is active for each household, the number of adults in the household, number of kids, which channels are streamed the most, how much time is spent on each channel, how much has the watch rate varied from last month, etc. Such data is needed to predict whether or not a person will continue the subnoscription for the upcoming month.

After collecting this data, it is important that you find patterns and correlations. For example, we know that if a household has kids, then they are more likely to subscribe. Similarly, by studying the watch rate of the previous month, you can predict whether a person is still interested in a subnoscription. Such trends must be studied.

The next step is analysis. For this kind of problem statement, you must use a classification algorithm that classifies customers into 2 groups:

Customers who are likely to subscribe next month

Customers who are not likely to subscribe next month

Would you build predictive models? Yes, in order to achieve this you must build a predictive model that classifies the customers into 2 classes like mentioned above.

Which algorithms to choose? You can choose classification algorithms such as Logistic Regression, Random Forest, Support Vector Machine, etc.

Once you’ve opted the right algorithm, you must perform model evaluation to calculate the efficiency of the algorithm. This is followed by deployment.

ENJOY LEARNING 👍👍
Today's Guesstimate question -  How will you estimate the number of weddings that take place in a year in India?


Facts
India's population in a year - 1.3 bill
Population breakup - Rural - 70% and Urban - 30%

Assumptions
Every year India's population would grow steadily, but the growth won't be very fast-paced.
Every man and women would be eventually married (homogeneously or heterogeneously). They won't prematurely die or prefer not to marry. People would be married only once.
In rural areas the age of marriage (in average) is between 15 - 35 year range. Similarly, in urban areas = 20 - 35 years. India is a young country, and 15 - 35 year range has around 50% of the total population.

Rural estimation
Rural population = 70% * 1.3 bill = 900 mill
Population within marriage age in a year = 50% * 900 mill = 450 mill
Number of marriages to happen = 450 / 2 = 225 mill marriages
These people will marry within a 20 year time period according to our assumptions.
Number of rural marriages in a year = 225 mill / 20 = 11.25 mill marriages
 
Urban estimation
Urban population = 30% * 1.3 bill = 400 mill
Population within marriage age in a year = 50% * 400 mill = 200 mill
Number of marriages to happen = 200 / 2 = 100 mill marriages
These people will marry within a 15 year time period according to our assumptions.
Number of urban marriages in a year = 100 mill / 15 = 6.6 mill marriages
 
Many people die in accidents prematurely, and won’t marry. In addition, most people don't marry as well as a consumer preference parameter. So, our market number is over-estimated. Even if we try to normalize it by introducing an error percentage of around 10%, the final number number will be lesser by around 10%-15%.
Answer = Approximately 14 million marriages occur in a year in India

ENJOY LEARNING 👍👍
Data Science Interview Questions
[PART - 7]

𝐐1. 𝐩-𝐯𝐚𝐥𝐮𝐞?

𝐀ns. p-value is a measure of the probability that an observed difference could have occurred just by random chance. The lower the p-value, the greater the statistical significance of the observed difference. P-value can be used as an alternative to or in addition to pre-selected confidence levels for hypothesis testing.


𝐐2. 𝐈𝐧𝐭𝐞𝐫𝐩𝐨𝐥𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐄𝐱𝐭𝐫𝐚𝐩𝐨𝐥𝐚𝐭𝐢𝐨𝐧?

𝐀ns. Interpolation is the process of calculating the unknown value from known given values whereas extrapolation is the process of calculating unknown values beyond the given data points.



𝐐3. 𝐔𝐧𝐢𝐟𝐨𝐫𝐦𝐞𝐝 𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 & 𝐧𝐨𝐫𝐦𝐚𝐥 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧?

𝐀ns. The normal distribution is bell-shaped, which means value near the center of the distribution are more likely to occur as opposed to values on the tails of the distribution. The uniform distribution is rectangular-shaped, which means every value in the distribution is equally likely to occur.

𝐐4. 𝐑𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝𝐞𝐫 𝐒𝐲𝐬𝐭𝐞𝐦𝐬?

𝐀ns. The recommender system mainly deals with the likes and dislikes of the users. Its major objective is to recommend an item to a user which has a high chance of liking or is in need of a particular user based on his previous purchases. It is like having a personalized team who can understand our likes and dislikes and help us in making the decisions regarding a particular item without being biased by any means by making use of a large amount of data in the repositories which are generated day by day.

𝐐5. 𝐉𝐎𝐈𝐍 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐢𝐧 𝐒𝐐𝐋

𝐀ns. The SQL Joins clause is used to combine records from two or more tables in a database.

𝐐6. 𝐒𝐪𝐮𝐚𝐫𝐞𝐝 𝐞𝐫𝐫𝐨𝐫 𝐚𝐧𝐝 𝐚𝐛𝐬𝐨𝐥𝐮𝐭𝐞 𝐞𝐫𝐫𝐨𝐫?

𝐀ns. mean squared error (MSE), and mean absolute error (MAE) are used to evaluate the regression problem's accuracy. The squared error is everywhere differentiable, while the absolute error is not (its derivative is undefined at 0). This makes the squared error more amenable to the techniques of mathematical optimization.

ENJOY LEARNING 👍👍
Today's Question on Probability

Two candidates Aman and Mohan appear for a Data Science Job interview. The probability of Aman cracking the interview is 1/8 and that of Mohan is 5/12. What is the probability that at least one of them will crack the interview?


The probability of Aman getting selected for the interview is P(A) = 1/8 The probability of Mohan getting selected for the interview is P(B)=5/12
Now, the probability of at least one of them getting selected can be denoted at the Union of A and B, which means
P(A U B) =P(A)+ P(B) – (P(A ∩ B)) ………………………(1)
Where P(A ∩ B) stands for the probability of both Aman and Mohan getting selected for the job. To calculate the final answer, we first have to find out the value of P(A ∩ B) So, P(A ∩ B) = P(A) * P(B)
1/8 * 5/12
5/96
Now, put the value of P(A ∩ B) into equation 1
P(A U B) =P(A)+ P(B) – (P(A ∩ B))
1/8 + 5/12 -5/96
So, the answer will be 47/96.

ENJOY LEARNING 👍👍
Data Science Interview Questions
[Part -8]

Q. How would you build a model to predict credit card fraud?
A. Use Kaggle's Credit card fraud dataset, start with EDA (Exploratory Data Analysis). Applying train, test split over the data and then finally choosing any model like logistic regression, XGBoost or Random Forest. After Hyperparameter tuning and fitting the model, the final step would be evaluating its performance.

Q. How would you derive new features from features that already exist?
A. Feature engineering is applied first to generate additional features, and then feature selection is done to eliminate irrelevant, redundant, or highly correlated features. This includes techniques like Binning, Data manipulation etc.

Q. If you’re attempting to predict a customer’s gender, and you only have 100 data points, what problems could arise?
A. Overfitting because we might learn too much into some particular patterns within this small sample set so we lose generalization abilities on other datasets.

Q. Suppose you were given two years of transaction history. What features would you use to predict credit risk?
A. Following are the features that can be used in such case.
Transaction amount,
Transaction count,
Transaction frequency,
transaction category: bar, grocery, jwery etc.
transaction channels: credit card, debit card, international wire transfer etc.
distance between transaction address and mailing address,
fraud/ risk score

Q. Explain overfitting and what steps you can take to prevent it.
A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. Some steps that we can take to avoid it:
1. Data augmentation
2. L1/L2 Regularization
3. Remove layers / number of units per layer
4. Cross-validation

Q. Why does SVM need to maximize the margin between support vectors?
A. Our goal is to maximize the margin because the hyperplane for which the margin is maximum is the optimal hyperplane. Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible in the plane.

ENJOY LEARNING 👍👍
1
Data Science Interview Questions
[Part - 9]


Q. Difference between array and list

A. The main difference between these two data types is the operation you can perform on them. Lists are containers for elements having differing data types but arrays are used as containers for elements of the same data type.



Q. Which is faster dictionary or list for look up

A. Dictionary is faster because you used a better algorithm. The reason is because a dictionary is a lookup, while a list is an iteration. Dictionary uses a hash lookup, while your list requires walking through the list until it finds the result from beginning to the result each time.



Q. How much time SVM takes to complete if 1 iteration takes 10sec for 1st class.
And there are 4 classes.

A. It would take 4*10 = 40 seconds to train one-vs-all method one to one.



Q. Kernals in svm with difference

A. Kernel Function in SVM is a method used to take data as input and transform into the required form of processing data.

Gaussian Kernel Radial Basis Function (RBF) : It is used to perform transformation, when there is no prior knowledge about data and it uses radial basis method to improve the transformation.

Sigmoid Kernel: this function is equivalent to a two-layer, perceptron model of neural network, which is used as activation function for artificial neurons.

Polynomial Kernel: It represents the similarity of vectors in training set of data in a feature space over polynomials of the original variables used in kernel.

Linear Kernel: used when data is linearly separable.


ENJOY LEARNING 👍👍
👍2
Today's Probability Question

 Three zebras are sitting on each corner of an equilateral triangle. Each zebra randomly picks a direction and only runs along the outline of the triangle to either opposite edge of the triangle. What is the probability that none of the zebras collide?


• Let's imagine all of the zebras on an equilateral triangle. They each have two options of directions to go in if they are running along the outline to either edge. Given the case is random, let's compute the possibilities in which they fail to collide.



• There are only really two possibilities. The zebras will either all choose to run in a clockwise direction or a counter-clockwise direction.



• Let's calculate the probabilities of each. The probability that every zebra will choose to go clockwise will be the product of each zebra choosing the clockwise direction. Given there are two choices (counterclockwise or clockwise), that would be 1/2 * 1/2 * 1/2 = 1/8



• The probability of every zebra going counter-clockwise is the same at 1/8. Therefore, if we sum up the probabilities, we get the correct probability of 1/4 or 25%.
Data Science Interview Questions
[PART- 10]


Q. Difference between WHERE and HAVING in SQL

A. The main difference between them is that the WHERE clause is used to specify a condition for filtering records before any groupings are made, while the HAVING clause is used to specify a condition for filtering values from a group.


Q. Explain confusion matrix ?

A. A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class.


Q. Explain PCA

A. The principal components are eigenvectors of the data's covariance matrix. Thus, the principal components are often computed by eigen decomposition of the data covariance matrix or singular value decomposition of the data matrix. PCA is the simplest of the true eigenvector-based multivariate analyses and is closely related to factor analysis.


Q. How do you cut a cake into 8 equal parts using only 3 straight cuts ?

A. Cut the cake from middle first, then pile up the one piece on another, and then again cut it straight from the middle which will leave you with 4 pieces. Finally, put all the 4 pieces on one another, and cut it for the third time. This is how with 3 straight cuts, you can cut cake into 8 equal pieces.


Q. Explain kmeans clustering

A. K-means clustering aims to partition data into k clusters in a way that data points in the same cluster are similar and data points in the different clusters are farther apart. Similarity of two points is determined by the distance between them.


Q. How is KNN different from k-means clustering?

A. K-means clustering represents an unsupervised algorithm, mainly used for clustering, while KNN is a supervised learning algorithm used for classification.


Q. Stock market prediction: You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy). Would you treat this as a classification or a regression problem?

A. It is a classification problem.


ENJOY LEARNING 👍👍