Data Analyst Interview Resources – Telegram
Data Analyst Interview Resources
51.7K subscribers
256 photos
1 video
53 files
320 links
Join our telegram channel to learn how data analysis can reveal fascinating patterns, trends, and stories hidden within the numbers! 📊

For ads & suggestions: @love_data
Download Telegram
Data Analyst Roadmap
👍222🥰2👎1
1. What is concurrency control in DBMS?

This is a process of managing simultaneous operations in a database so that database integrity is not compromised. The following are the two approaches involved in concurrency control:
Optimistic approach – Involves versioning
Pessimistic approach – Involves locking

2. What is a checkpoint in DBMS and when does it occur?

A checkpoint is a mechanism where all the previous logs are removed from the system and are permanently stored on the storage disk. So, basically, checkpoints are those points from where the transaction log record can be used to recover all the committed data up to the point of crash.

3. What are groups in Tableau?

A group is a combination of dimension members that make higher level categories. For example, if you are working with a view that shows average test scores by major, you may want to group certain majors together to create major categories.

4. How are nested IF statements used in Excel?

The function IF() can be nested when we have multiple conditions to meet. The FALSE value in the first IF function is replaced by another IF function to make a further test.

5. Do you wanna make your career in Data Science & Analytics but don't know how to start ?

https://news.1rj.ru/str/sqlspecialist/398

Here is a complete roadmap from scratch that will make you technically strong enough to crack any Data Scientist / Analyst and also learn Pro Career Growth Hacks to land on your Dream Job.
👍10
SELECT Syntax

1. SELECT column1, column2, ...
FROM table_name;

Here, column1, column2, ... are the field/column names of the table you want to select data from.



2. SELECT * FROM table_name;

Here * ( star ) means all column names/fields

3. SELECT DISTINCT Syntax

SELECT DISTINCT column1, column2, ...
FROM table_name;

The SELECT DISTINCT statement is used to return only distinct (different) values.

Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

4. WHERE Syntax

SELECT column1, column2, ...
FROM table_name
WHERE condition;


The SQL WHERE Clause
The WHERE clause is used to filter records.

It is used to extract only those records that fulfill a specified condition.

WHERE Clause Example
The following SQL statement selects all the customers from the country "Mexico", in the "Customers" table:

Example

SELECT * FROM Customers
WHERE Country='Mexico' ;
👍121
Sample Resume

Red -> action verbs
Pink -> hard skill
Yellow -> soft skill
Cyan -> impact statement
Green -> impact matrix

Resume Tips: https://news.1rj.ru/str/sqlspecialist/464
👍132👌1
Resume Template.pdf
64.5 KB
Resume Template for Data Analyst Fresher
👍122🎉2
Second_Language_Acquisition_Research_Series_Guilherme_D_Garcia_Data.pdf
5.7 MB
Book: Data Visualization and Analysis in Second Language Research(2021)
👍12
Few common problems with lot of resumes:

1. 𝐈𝐫𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧.
I understand that there are a lot of achievements that we are personally proud of (things like represented school/clg in XYZ competition or school head/class head etc), but not all of them are relevant to technical roles. As a fresher, try to focus more on technical achievements rather than managerial ones.

2. 𝐋𝐚𝐜𝐤 𝐨𝐟 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐩𝐫𝐨𝐣𝐞𝐜𝐭𝐬.
Many resumes have the same common projects, such as:
Creating just the front-end using HTML and CSS and redirecting all the work to an open-source API (e.g., weather prediction and recipe suggestion apps).

Most common projects are: -
Tic-tac-toe game.
Sorting algorithms visualizers.
To-do application.
Movie listing.

The codes for these projects are often copied and pasted from GitHub repositories.

Projects are like a bounty. If you are prepared well and have quality projects in your resume, you can set the tempo of the interview. It is one of the few questions that you will almost certainly be asked in the interview.

I don't understand why we can spend 2 years preparing for data structures and algorithms (DSA) and competitive programming (CP), but not even 2 weeks to create quality projects.
Even if your resume passes the applicant tracking system (ATS) and recruiter's screening, weak projects can still lead to your rejection in interviews. And this is completely in your hands.

I feel that this topic needs a lot more discussion about the type and quality of projects that one needs. Let me know if you want a dedicated post on this.

3. 𝐋𝐚𝐜𝐤 𝐨𝐟 𝐪𝐮𝐚𝐧𝐭𝐢𝐭𝐚𝐭𝐢𝐯𝐞 𝐝𝐚𝐭𝐚.
For technical roles, adding quantitative data has a big impact.
For example, instead of saying "I wrote unit tests for service X and reduced the latency of service Y by caching," you can say "I wrote unit tests and increased the code coverage from 80% to 95% of service X and reduced latency from 100 milliseconds to 50 milliseconds of service Y."
👍162
Data Analyst Interview Questions

1. What do Tableau's sets and groups mean?

Data is grouped using sets and groups according to predefined criteria. The primary distinction between the two is that although a set can have only two options—either in or out—a group can divide the dataset into several groups. A user should decide which group or sets to apply based on the conditions.

2.What in Excel is a macro?

An Excel macro is an algorithm or a group of steps that helps automate an operation by capturing and replaying the steps needed to finish it. Once the steps have been saved, you may construct a Macro that the user can alter and replay as often as they like.

Macro is excellent for routine work because it also gets rid of mistakes. Consider the scenario when an account manager needs to share reports about staff members who owe the company money. If so, it can be automated by utilising a macro and making small adjustments each month as necessary.


3.Gantt chart in Tableau

A Tableau Gantt chart illustrates the duration of events as well as the progression of value across the period. Along with the time axis, it has bars. The Gantt chart is primarily used as a project management tool, with each bar representing a project job.

4.In Microsoft Excel, how do you create a drop-down list?

Start by selecting the Data tab from the ribbon.
Select Data Validation from the Data Tools group.
Go to Settings > Allow > List next.
Choose the source you want to offer in the form of a list array.
👍2
1.What is quick filter in tableau?

Whenever using a filter in Tableau, it comes with some options to change the functionality of filter very easily, such as using it as a single value drop down or single value list or multiple value list or multiple value drop down and various other options. After we set a filter to a sheet just right click on the sheet and there you can see all the quick filter options. Changes made to these options will also change the aesthetics of filter shown on the sheet.

2.How to calculate percentage in tableau?

To calculate the percentage of data on your worksheet. Go to Analysis pane and select Percentages of, there you will see a lot percentage options such as percentage of table, column, row, pane, row in pane, column in pane and cell. Select any of the above options then define the total value o which percentage is to be calculated. The option you choose will be uniform to all the rows and columns and there is no way to specify different options to rows and columns.

3. What is Power Pivot?

The Power Pivot is an in-memory data modeling component. It provides highly compressed data storage with fast calculation. It helps you build a data model, relationships, creating formulas, calculated columns, Pivot Tables, and Pivot Charts from multiple resources.

4. What is x-velocity in Power Pivot?

X-Velocity is the in-memory analytics engine behind Power Pivot that loads and handles huge data in Power BI. It stores data in columnar storage that results in faster processing.
👍7
Data Science Fundamentals for Python and MongoDB
👇👇
https://news.1rj.ru/str/datasciencefun/1370
👍2
1.How to create filters in Power BI?

Filters are an integral part of Power BI reports. They are used to slice and dice the data as per the dimensions we want. Filters are created in a couple of ways.

Using Slicers: A slicer is a visual under Visualization Pane. This can be added to the design view to filter our reports. When a slicer is added to the design view, it requires a field to be added to it. For example- Slicer can be added for Country fields. Then the data can be filtered based on countries.
Using Filter Pane: The Power BI team has added a filter pane to the reports, which is a single space where we can add different fields as filters. And these fields can be added depending on whether you want to filter only one visual(Visual level filter), or all the visuals in the report page(Page level filters), or applicable to all the pages of the report(report level filters)


2.How to sort data in Power BI?

Sorting is available in multiple formats. In the data view, a common sorting option of alphabetical order is there. Apart from that, we have the option of Sort by column, where one can sort a column based on another column. The sorting option is available in visuals as well. Sort by ascending and descending option by the fields and measure present in the visual is also available.


3.How to convert pdf to excel?


Open the PDF document you want to convert in XLSX format in Acrobat DC.
Go to the right pane and click on the “Export PDF” option.
Choose spreadsheet as the Export format.
Select “Microsoft Excel Workbook.”
Now click “Export.”
Download the converted file or share it.


4. How to enable macros in excel?

Click the file tab and then click “Options.”
A dialog box will appear. In the “Excel Options” dialog box, click on the “Trust Center” and then “Trust Center Settings.”
Go to the “Macro Settings” and select “enable all macros.”
Click OK to apply the macro settings.
👍8
1. What is Density-based Clustering?

Density-Based Clustering is an unsupervised machine learning method that identifies different groups or clusters in the data space. These clustering techniques are based on the concept that a cluster in the data space is a contiguous region of high point density, separated from other such clusters by contiguous regions of low point density.

Partition-based(K-means) and Hierarchical clustering techniques are highly efficient with normal-shaped clusters while density-based techniques are efficient in arbitrary-shaped clusters or detecting outliers.

2. How to create empty tables with the same structure as another table?

To create empty tables:
Using the INTO operator to fetch the records of one table into a new table while setting a WHERE clause to false for all entries, it is possible to create empty tables with the same structure. As a result, SQL creates a new table with a duplicate structure to accept the fetched entries, but nothing is stored into the new table since the WHERE clause is active.

3. What is a Parameter in Tableau? Give an Example.

A parameter is a dynamic value that a customer could select, and you can use it to replace constant values in calculations, filters, and reference lines.

For example, when creating a filter to show the top 10 products based on total profit instead of the fixed value, you can update the filter to show the top 10, 20, or 30 products using a parameter.

4. How will you write the formula for the following in Excel? - Multiply the value in cell A1 by 10, add the result by 5, and divide it by 2.

To write a formula for the above-stated question, we have to follow the PEDMAS Precedence. The correct answer is ((A1*10)+5)/2.

Answers such as =A1*10+5/2 and =(A1*10)+5/2 are not correct. We must put parentheses brackets after a particular operation.

5. How can you remove duplicate values in a range of cells?

1. To delete duplicate values in a column, select the highlighted cells, and press the delete button. After deleting the values, go to the ‘Conditional Formatting’ option present in the Home tab. Choose ‘Clear Rules’ to remove the rules from the sheet. 2. You can also delete duplicate values by selecting the ‘Remove Duplicates’ option under Data Tools present in the Data tab.
👍11
1. What is the difference between the RANK() and DENSE_RANK() functions?

The RANK() function in the result set defines the rank of each row within your ordered partition. If both rows have the same rank, the next number in the ranking will be the previous rank plus a number of duplicates. If we have three records at rank 4, for example, the next level indicated is 7. The DENSE_RANK() function assigns a distinct rank to each row within a partition based on the provided column value, with no gaps. If we have three records at rank 4, for example, the next level indicated is 5.

2. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?

One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesn’t affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.

3. What is the shortcut to add a filter to a table in EXCEL?

The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.

4. What is DAX in Power BI?

DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.

5. Define shelves and sets in Tableau?

Shelves: Every worksheet in Tableau will have shelves such as columns, rows, marks, filters, pages, and more. By placing filters on shelves we can build our own visualization structure. We can control the marks by including or excluding data.
Sets: The sets are used to compute a condition on which the dataset will be prepared. Data will be grouped together based on a condition. Fields which is responsible for grouping are known assets. For example – students having grades of more than 70%.
👍11
1. What is a UNIQUE constraint?

The UNIQUE Constraint prevents identical values in a column from appearing in two records. The UNIQUE constraint guarantees that every value in a column is unique.

2. What is a Self-Join?

A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.

3. What is the case when in SQL Server?

The CASE statement is used to construct logic in which one column’s value is determined by the values of other columns. The condition to be tested is specified by the WHEN statement. If the WHEN condition returns TRUE, the THEN sentence explains what to do.
When none of the WHEN conditions return true, the ELSE statement is executed. The END keyword brings the CASE statement to a close.

4. What is the main difference between ‘BETWEEN’ and ‘IN’ condition operators?

BETWEEN operator is used to display rows based on a range of values in a row whereas the IN condition operator is used to check for values contained in a specific set of values.
👍5
1. Explain data cleansing.

Data cleaning, also known as data cleansing or data scrubbing or wrangling, is basically a process of identifying and then modifying, replacing, or deleting the incorrect, incomplete, inaccurate, irrelevant, or missing portions of the data as the need arises. This fundamental element of data science ensures data is correct, consistent, and usable. 

2. What is an Affinity Diagram?

Ans. An Affinity Diagram is an analytical tool used to cluster or organize data into subgroups based on their relationships. These data or ideas are mostly generated from discussions or brainstorming sessions and are used in analyzing complex issues.

3. Which questions should you ask the user/client before you create a dashboard?

Though this depends on the user’s requirements, still some of the common questions that I would ask the client before creating a dashboard are :

What is the purpose of the dashboard?Should the dashboard be retrospective or real-time?How detailed the dashboard should be?How tech and data-savvy is the end-user?Does the data need to be segmented?Should I explain the dashboard design to you?

4. What is an Alias in SQL?

An alias is a feature of SQL that is supported by most, if not all, RDBMSs. It is a temporary name assigned to the table or table column for the purpose of a particular SQL query. In addition, aliasing can be employed as an confusion technique to secure the real names of database fields. A table alias is also called a correlation name.

An alias is represented explicitly by the AS keyword but in some cases, the same can be performed without it as well. 
👍2
1. How to change a table name in SQL?

This is the command to change a table name in SQL:
ALTER TABLE table_name
RENAME TO new_table_name;
We will start off by giving the keywords ALTER TABLE, then we will follow it up by giving the original name of the table, after that, we will give in the keywords RENAME TO and finally, we will give the new table name.

2. Find the Constraint information from the table?

There are so many times where user needs to find out the specific constraint information of the table. The following queries are useful, SELECT * From User_Constraints; SELECT * FROM User_Cons_Columns;

3. What is the difference between clustered and non-clustered indexes?

Clustered indexes can be read rapidly rather than non-clustered indexes.
Clustered indexes store data physically in the table or view whereas, non-clustered indexes do not store data in the table as it has separate structure from the data row.

4. What are the subsets of SQL?

DDL (Data Definition Language): Used to define the data structure it consists of the commands like CREATE, ALTER, DROP, etc.
DML (Data Manipulation Language): Used to manipulate already existing data in the database, commands like SELECT, UPDATE, INSERT
DCL (Data Control Language): Used to control access to data in the database, commands like GRANT, REVOKE.
👍2
1. What are Query and Query language?

A query is nothing but a request sent to a database to retrieve data or information. The required data can be retrieved from a table or many tables in the database.

Query languages use various types of queries to retrieve data from databases. SQL, Datalog, and AQL are a few examples of query languages; however, SQL is known to be the widely used query language.


2. What are Superkey and candidate key?

A super key may be a single or a combination of keys that help to identify a record in a table. Know that Super keys can have one or more attributes, even though all the attributes are not necessary to identify the records.

A candidate key is the subset of Superkey, which can have one or more than one attributes to identify records in a table. Unlike Superkey, all the attributes of the candidate key must be helpful to identify the records.


3. What do you mean by buffer pool and mention its benefits?

A buffer pool in SQL is also known as a buffer cache. All the resources can store their cached data pages in a buffer pool. The size of the buffer pool can be defined during the configuration of an instance of SQL Server.
The following are the benefits of a buffer pool:

Increase in I/O performance
Reduction in I/O latency
Increase in transaction throughput
Increase in reading performance


4. What is the difference between Zero and NULL values in SQL?

When a field in a column doesn’t have any value, it is said to be having a NULL value. Simply put, NULL is the blank field in a table. It can be considered as an unassigned, unknown, or unavailable value. On the contrary, zero is a number, and it is an available, assigned, and known value.
👍41
1. What is Data Integrity?

Data Integrity is the assurance of accuracy and consistency of data over its entire life-cycle and is a critical aspect of the design, implementation, and usage of any system which stores, processes, or retrieves data. It also defines integrity constraints to enforce business rules on the data when it is entered into an application or a database.

2. What is the Difference Between Joining and Blending in Tableau?

Combining the data from two or more different sources is data blending, such as Oracle, Excel, and SQL Server. In data blending, each data source contains its own set of dimensions and measures. Combining the data between two or more tables or sheets within the same data source is data joining. All the combined tables or sheets contain a common set of dimensions and measures.

3. What is slicing in Python?
As the name suggests, ‘slicing’ is taking parts of.
Syntax for slicing is [start : stop : step]
start is the starting index from where to slice a list or tuple
stop is the ending index or where to stop.
step is the number of steps to jump.
Default value for start is 0, stop is number of items, step is 1.
Slicing can be done on strings, arrays, lists, and tuples.

4. What is the difference between NOW() and CURRENT_DATE() in SQL?

NOW() returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.

The simple difference between NOW() and CURRENT_DATE() is that NOW() will fetch the current date and time both in format ‘YYYY-MM_DD HH:MM:SS’ while CURRENT_DATE() will fetch the date of the current day ‘YYYY-MM_DD’.
👍7
1. How to change a table name in SQL?
This is the command to change a table name in SQL:
ALTER TABLE table_name
RENAME TO new_table_name;
We will start off by giving the keywords ALTER TABLE, then we will follow it up by giving the original name of the table, after that, we will give in the keywords RENAME TO and finally, we will give the new table name.

2. How to use LIKE in SQL?
The LIKE operator checks if an attribute value matches a given string pattern. Here is an example of LIKE operator
SELECT * FROM employees WHERE first_name like ‘Steven’;
With this command, we will be able to extract all the records where the first name is like “Steven”.

3. If we drop a table, does it also drop related objects like constraints, indexes, columns, default, views and sorted procedures?
Yes, SQL server drops all related objects, which exists inside a table like constraints, indexes, columns, defaults etc. But dropping a table will not drop views and sorted procedures as they exist outside the table.

4. Explain SQL Constraints.
SQL Constraints are used to specify the rules of data type in a table. They can be specified while creating and altering the table. The following are the constraints in SQL: NOT NULL CHECK DEFAULT UNIQUE PRIMARY KEY FOREIGN KEY
👍2