Data Analyst Interview Resources – Telegram
Data Analyst Interview Resources
51.8K subscribers
256 photos
1 video
53 files
320 links
Join our telegram channel to learn how data analysis can reveal fascinating patterns, trends, and stories hidden within the numbers! 📊

For ads & suggestions: @love_data
Download Telegram
Here's a snippet of code written in C:

for(int i=0;i<n;i++)
{
for(int j=0;j<n;j++)
{
//some operation
}
}


How can I write something equivalent in SQL?

Solution:Alias

say, you are asked to count the frequency of each color occurring in the same the table , so you can write something like this:


select distinct color ,(select count(*) from colors where c.color=color) from colors c;
👍151
Q.Autoencoder methods

A. Autoencoder is a type of neural network where the output layer has the same dimensionality as the input layer. In simpler words, the number of output units in the output layer is equal to the number of input units in the input layer. Various techniques exist to prevent autoencoders from learning the identity function and to improve their ability to capture important ' information and learn richer representations. 1.Sparse autoencoder (SAE) 2. Denoising autoencoder (DAE) 3. Contractive autoencoder (CAE) 4. Principal component analysis.


Q. L1 and L2 regularization?


A. L1 regularization gives output in binary weights from 0 to 1 for the model's features and is adopted for decreasing the number of features in a huge dimensional dataset. L2 regularization disperse the error terms in all the weights that leads to more accurate customized final models.


Q. How to measure the Euclidean distance betweeen the two arrays in numpy?

A. Euclidean distance is defined in mathematics as the magnitude or length of the line segment between two points. There are multiple methods for measuring the euclidean methods.

Method 1. In this method, we first initialize two numpy arrays. Then, we use linalg.norm() of numpy basically to compute the euclidean distance directly.

Method 2. In this method, we first initialize two numpy arrays. Then, we take the difference of the two arrays, compute the dot product of the result, and transpose of the result. Then we take the square root of the answer. This is another way to implement Euclidean distance.

Method 3. In this method, we first initialize two numpy arrays. Then, we compute the difference of these arrays and take their square. We take the sum of the squared elements, and after that, we take the square root in the end. This is another way to implement Euclidean distance.


Q.What are the support vectors in SVM?

A. Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.


Q. How do you handle categorical data?

A. One-Hot Encoding is the most common, correct way to deal with non-ordinal categorical data. It consists of creating an additional feature for each group of the categorical feature and mark each observation belonging (Value=1) or not (Value=0) to that group.


Q. What is coerrelation?

A.Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It's a common tool for describing simple relationships without making a statement about cause and effects


Q. What is covariance?

A. Covariance is nothing but a measure of correlation. Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co variance tells you how two variables vary together
👍151
Data Analyst Roadmap
👍222🥰2👎1
1. What is concurrency control in DBMS?

This is a process of managing simultaneous operations in a database so that database integrity is not compromised. The following are the two approaches involved in concurrency control:
Optimistic approach – Involves versioning
Pessimistic approach – Involves locking

2. What is a checkpoint in DBMS and when does it occur?

A checkpoint is a mechanism where all the previous logs are removed from the system and are permanently stored on the storage disk. So, basically, checkpoints are those points from where the transaction log record can be used to recover all the committed data up to the point of crash.

3. What are groups in Tableau?

A group is a combination of dimension members that make higher level categories. For example, if you are working with a view that shows average test scores by major, you may want to group certain majors together to create major categories.

4. How are nested IF statements used in Excel?

The function IF() can be nested when we have multiple conditions to meet. The FALSE value in the first IF function is replaced by another IF function to make a further test.

5. Do you wanna make your career in Data Science & Analytics but don't know how to start ?

https://news.1rj.ru/str/sqlspecialist/398

Here is a complete roadmap from scratch that will make you technically strong enough to crack any Data Scientist / Analyst and also learn Pro Career Growth Hacks to land on your Dream Job.
👍10
SELECT Syntax

1. SELECT column1, column2, ...
FROM table_name;

Here, column1, column2, ... are the field/column names of the table you want to select data from.



2. SELECT * FROM table_name;

Here * ( star ) means all column names/fields

3. SELECT DISTINCT Syntax

SELECT DISTINCT column1, column2, ...
FROM table_name;

The SELECT DISTINCT statement is used to return only distinct (different) values.

Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

4. WHERE Syntax

SELECT column1, column2, ...
FROM table_name
WHERE condition;


The SQL WHERE Clause
The WHERE clause is used to filter records.

It is used to extract only those records that fulfill a specified condition.

WHERE Clause Example
The following SQL statement selects all the customers from the country "Mexico", in the "Customers" table:

Example

SELECT * FROM Customers
WHERE Country='Mexico' ;
👍121
Sample Resume

Red -> action verbs
Pink -> hard skill
Yellow -> soft skill
Cyan -> impact statement
Green -> impact matrix

Resume Tips: https://news.1rj.ru/str/sqlspecialist/464
👍132👌1
Resume Template.pdf
64.5 KB
Resume Template for Data Analyst Fresher
👍122🎉2
Second_Language_Acquisition_Research_Series_Guilherme_D_Garcia_Data.pdf
5.7 MB
Book: Data Visualization and Analysis in Second Language Research(2021)
👍12
Few common problems with lot of resumes:

1. 𝐈𝐫𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧.
I understand that there are a lot of achievements that we are personally proud of (things like represented school/clg in XYZ competition or school head/class head etc), but not all of them are relevant to technical roles. As a fresher, try to focus more on technical achievements rather than managerial ones.

2. 𝐋𝐚𝐜𝐤 𝐨𝐟 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐩𝐫𝐨𝐣𝐞𝐜𝐭𝐬.
Many resumes have the same common projects, such as:
Creating just the front-end using HTML and CSS and redirecting all the work to an open-source API (e.g., weather prediction and recipe suggestion apps).

Most common projects are: -
Tic-tac-toe game.
Sorting algorithms visualizers.
To-do application.
Movie listing.

The codes for these projects are often copied and pasted from GitHub repositories.

Projects are like a bounty. If you are prepared well and have quality projects in your resume, you can set the tempo of the interview. It is one of the few questions that you will almost certainly be asked in the interview.

I don't understand why we can spend 2 years preparing for data structures and algorithms (DSA) and competitive programming (CP), but not even 2 weeks to create quality projects.
Even if your resume passes the applicant tracking system (ATS) and recruiter's screening, weak projects can still lead to your rejection in interviews. And this is completely in your hands.

I feel that this topic needs a lot more discussion about the type and quality of projects that one needs. Let me know if you want a dedicated post on this.

3. 𝐋𝐚𝐜𝐤 𝐨𝐟 𝐪𝐮𝐚𝐧𝐭𝐢𝐭𝐚𝐭𝐢𝐯𝐞 𝐝𝐚𝐭𝐚.
For technical roles, adding quantitative data has a big impact.
For example, instead of saying "I wrote unit tests for service X and reduced the latency of service Y by caching," you can say "I wrote unit tests and increased the code coverage from 80% to 95% of service X and reduced latency from 100 milliseconds to 50 milliseconds of service Y."
👍162
Data Analyst Interview Questions

1. What do Tableau's sets and groups mean?

Data is grouped using sets and groups according to predefined criteria. The primary distinction between the two is that although a set can have only two options—either in or out—a group can divide the dataset into several groups. A user should decide which group or sets to apply based on the conditions.

2.What in Excel is a macro?

An Excel macro is an algorithm or a group of steps that helps automate an operation by capturing and replaying the steps needed to finish it. Once the steps have been saved, you may construct a Macro that the user can alter and replay as often as they like.

Macro is excellent for routine work because it also gets rid of mistakes. Consider the scenario when an account manager needs to share reports about staff members who owe the company money. If so, it can be automated by utilising a macro and making small adjustments each month as necessary.


3.Gantt chart in Tableau

A Tableau Gantt chart illustrates the duration of events as well as the progression of value across the period. Along with the time axis, it has bars. The Gantt chart is primarily used as a project management tool, with each bar representing a project job.

4.In Microsoft Excel, how do you create a drop-down list?

Start by selecting the Data tab from the ribbon.
Select Data Validation from the Data Tools group.
Go to Settings > Allow > List next.
Choose the source you want to offer in the form of a list array.
👍2
1.What is quick filter in tableau?

Whenever using a filter in Tableau, it comes with some options to change the functionality of filter very easily, such as using it as a single value drop down or single value list or multiple value list or multiple value drop down and various other options. After we set a filter to a sheet just right click on the sheet and there you can see all the quick filter options. Changes made to these options will also change the aesthetics of filter shown on the sheet.

2.How to calculate percentage in tableau?

To calculate the percentage of data on your worksheet. Go to Analysis pane and select Percentages of, there you will see a lot percentage options such as percentage of table, column, row, pane, row in pane, column in pane and cell. Select any of the above options then define the total value o which percentage is to be calculated. The option you choose will be uniform to all the rows and columns and there is no way to specify different options to rows and columns.

3. What is Power Pivot?

The Power Pivot is an in-memory data modeling component. It provides highly compressed data storage with fast calculation. It helps you build a data model, relationships, creating formulas, calculated columns, Pivot Tables, and Pivot Charts from multiple resources.

4. What is x-velocity in Power Pivot?

X-Velocity is the in-memory analytics engine behind Power Pivot that loads and handles huge data in Power BI. It stores data in columnar storage that results in faster processing.
👍7
Data Science Fundamentals for Python and MongoDB
👇👇
https://news.1rj.ru/str/datasciencefun/1370
👍2
1.How to create filters in Power BI?

Filters are an integral part of Power BI reports. They are used to slice and dice the data as per the dimensions we want. Filters are created in a couple of ways.

Using Slicers: A slicer is a visual under Visualization Pane. This can be added to the design view to filter our reports. When a slicer is added to the design view, it requires a field to be added to it. For example- Slicer can be added for Country fields. Then the data can be filtered based on countries.
Using Filter Pane: The Power BI team has added a filter pane to the reports, which is a single space where we can add different fields as filters. And these fields can be added depending on whether you want to filter only one visual(Visual level filter), or all the visuals in the report page(Page level filters), or applicable to all the pages of the report(report level filters)


2.How to sort data in Power BI?

Sorting is available in multiple formats. In the data view, a common sorting option of alphabetical order is there. Apart from that, we have the option of Sort by column, where one can sort a column based on another column. The sorting option is available in visuals as well. Sort by ascending and descending option by the fields and measure present in the visual is also available.


3.How to convert pdf to excel?


Open the PDF document you want to convert in XLSX format in Acrobat DC.
Go to the right pane and click on the “Export PDF” option.
Choose spreadsheet as the Export format.
Select “Microsoft Excel Workbook.”
Now click “Export.”
Download the converted file or share it.


4. How to enable macros in excel?

Click the file tab and then click “Options.”
A dialog box will appear. In the “Excel Options” dialog box, click on the “Trust Center” and then “Trust Center Settings.”
Go to the “Macro Settings” and select “enable all macros.”
Click OK to apply the macro settings.
👍8
1. What is Density-based Clustering?

Density-Based Clustering is an unsupervised machine learning method that identifies different groups or clusters in the data space. These clustering techniques are based on the concept that a cluster in the data space is a contiguous region of high point density, separated from other such clusters by contiguous regions of low point density.

Partition-based(K-means) and Hierarchical clustering techniques are highly efficient with normal-shaped clusters while density-based techniques are efficient in arbitrary-shaped clusters or detecting outliers.

2. How to create empty tables with the same structure as another table?

To create empty tables:
Using the INTO operator to fetch the records of one table into a new table while setting a WHERE clause to false for all entries, it is possible to create empty tables with the same structure. As a result, SQL creates a new table with a duplicate structure to accept the fetched entries, but nothing is stored into the new table since the WHERE clause is active.

3. What is a Parameter in Tableau? Give an Example.

A parameter is a dynamic value that a customer could select, and you can use it to replace constant values in calculations, filters, and reference lines.

For example, when creating a filter to show the top 10 products based on total profit instead of the fixed value, you can update the filter to show the top 10, 20, or 30 products using a parameter.

4. How will you write the formula for the following in Excel? - Multiply the value in cell A1 by 10, add the result by 5, and divide it by 2.

To write a formula for the above-stated question, we have to follow the PEDMAS Precedence. The correct answer is ((A1*10)+5)/2.

Answers such as =A1*10+5/2 and =(A1*10)+5/2 are not correct. We must put parentheses brackets after a particular operation.

5. How can you remove duplicate values in a range of cells?

1. To delete duplicate values in a column, select the highlighted cells, and press the delete button. After deleting the values, go to the ‘Conditional Formatting’ option present in the Home tab. Choose ‘Clear Rules’ to remove the rules from the sheet. 2. You can also delete duplicate values by selecting the ‘Remove Duplicates’ option under Data Tools present in the Data tab.
👍11
1. What is the difference between the RANK() and DENSE_RANK() functions?

The RANK() function in the result set defines the rank of each row within your ordered partition. If both rows have the same rank, the next number in the ranking will be the previous rank plus a number of duplicates. If we have three records at rank 4, for example, the next level indicated is 7. The DENSE_RANK() function assigns a distinct rank to each row within a partition based on the provided column value, with no gaps. If we have three records at rank 4, for example, the next level indicated is 5.

2. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?

One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesn’t affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.

3. What is the shortcut to add a filter to a table in EXCEL?

The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.

4. What is DAX in Power BI?

DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.

5. Define shelves and sets in Tableau?

Shelves: Every worksheet in Tableau will have shelves such as columns, rows, marks, filters, pages, and more. By placing filters on shelves we can build our own visualization structure. We can control the marks by including or excluding data.
Sets: The sets are used to compute a condition on which the dataset will be prepared. Data will be grouped together based on a condition. Fields which is responsible for grouping are known assets. For example – students having grades of more than 70%.
👍11
1. What is a UNIQUE constraint?

The UNIQUE Constraint prevents identical values in a column from appearing in two records. The UNIQUE constraint guarantees that every value in a column is unique.

2. What is a Self-Join?

A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.

3. What is the case when in SQL Server?

The CASE statement is used to construct logic in which one column’s value is determined by the values of other columns. The condition to be tested is specified by the WHEN statement. If the WHEN condition returns TRUE, the THEN sentence explains what to do.
When none of the WHEN conditions return true, the ELSE statement is executed. The END keyword brings the CASE statement to a close.

4. What is the main difference between ‘BETWEEN’ and ‘IN’ condition operators?

BETWEEN operator is used to display rows based on a range of values in a row whereas the IN condition operator is used to check for values contained in a specific set of values.
👍5
1. Explain data cleansing.

Data cleaning, also known as data cleansing or data scrubbing or wrangling, is basically a process of identifying and then modifying, replacing, or deleting the incorrect, incomplete, inaccurate, irrelevant, or missing portions of the data as the need arises. This fundamental element of data science ensures data is correct, consistent, and usable. 

2. What is an Affinity Diagram?

Ans. An Affinity Diagram is an analytical tool used to cluster or organize data into subgroups based on their relationships. These data or ideas are mostly generated from discussions or brainstorming sessions and are used in analyzing complex issues.

3. Which questions should you ask the user/client before you create a dashboard?

Though this depends on the user’s requirements, still some of the common questions that I would ask the client before creating a dashboard are :

What is the purpose of the dashboard?Should the dashboard be retrospective or real-time?How detailed the dashboard should be?How tech and data-savvy is the end-user?Does the data need to be segmented?Should I explain the dashboard design to you?

4. What is an Alias in SQL?

An alias is a feature of SQL that is supported by most, if not all, RDBMSs. It is a temporary name assigned to the table or table column for the purpose of a particular SQL query. In addition, aliasing can be employed as an confusion technique to secure the real names of database fields. A table alias is also called a correlation name.

An alias is represented explicitly by the AS keyword but in some cases, the same can be performed without it as well. 
👍2
1. How to change a table name in SQL?

This is the command to change a table name in SQL:
ALTER TABLE table_name
RENAME TO new_table_name;
We will start off by giving the keywords ALTER TABLE, then we will follow it up by giving the original name of the table, after that, we will give in the keywords RENAME TO and finally, we will give the new table name.

2. Find the Constraint information from the table?

There are so many times where user needs to find out the specific constraint information of the table. The following queries are useful, SELECT * From User_Constraints; SELECT * FROM User_Cons_Columns;

3. What is the difference between clustered and non-clustered indexes?

Clustered indexes can be read rapidly rather than non-clustered indexes.
Clustered indexes store data physically in the table or view whereas, non-clustered indexes do not store data in the table as it has separate structure from the data row.

4. What are the subsets of SQL?

DDL (Data Definition Language): Used to define the data structure it consists of the commands like CREATE, ALTER, DROP, etc.
DML (Data Manipulation Language): Used to manipulate already existing data in the database, commands like SELECT, UPDATE, INSERT
DCL (Data Control Language): Used to control access to data in the database, commands like GRANT, REVOKE.
👍2