1. What is the Impact of Outliers on Logistic Regression?
The estimates of the Logistic Regression are sensitive to unusual observations such as outliers, high leverage, and influential observations. Therefore, to solve the problem of outliers, a sigmoid function is used in Logistic Regression.
2. What is the difference between vanilla RNNs and LSTMs?
The main difference between vanilla RNNs and LSTMs is that LSTMs are able to better remember long-term dependencies, while vanilla RNNs tend to forget them. This is due to the fact that LSTMs have a special type of memory cell that can retain information for longer periods of time, while vanilla RNNs only have a single layer of memory cells.
3. What is Masked Language Model in NLP?
Masked language models help learners to understand deep representations in downstream tasks by taking an output from the corrupt input. This model is often used to predict the words to be used in a sentence.
4. Why is the KNN Algorithm known as Lazy Learner?
When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.
The estimates of the Logistic Regression are sensitive to unusual observations such as outliers, high leverage, and influential observations. Therefore, to solve the problem of outliers, a sigmoid function is used in Logistic Regression.
2. What is the difference between vanilla RNNs and LSTMs?
The main difference between vanilla RNNs and LSTMs is that LSTMs are able to better remember long-term dependencies, while vanilla RNNs tend to forget them. This is due to the fact that LSTMs have a special type of memory cell that can retain information for longer periods of time, while vanilla RNNs only have a single layer of memory cells.
3. What is Masked Language Model in NLP?
Masked language models help learners to understand deep representations in downstream tasks by taking an output from the corrupt input. This model is often used to predict the words to be used in a sentence.
4. Why is the KNN Algorithm known as Lazy Learner?
When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.
👍16
Industry Data Science vs Academia Data Science
Comparing Data Science in academia and Data Science in industry is like comparing tennis with table tennis: they sound similar but in the end, they are completely different!
5 big differences between Data Science in academia and in industry 👇:
1️⃣ Model vs Data: Academia focuses on models, industry focuses on data. In academia, it’s all about trying to find the best model architecture to optimise a defined metric. In industry, loading and processing the data accounts for around 80% of the job.
2️⃣ Novelty vs Efficiency: The end goal of academia is often to publish a paper and to do so, you will need to find and implement a novel approach. Industry is all about efficiency: reusing existing models as much as possible and applying them to your use case.
3️⃣ Complex vs Simple: More often than not, academia requires complex solutions. I know that this isn’t always the case but unfortunately, complex papers get a higher chance of being accepted at top conferences. In industry, it’s all about simplicity: trying to find the simplest solution that solves a specific problem.
4️⃣ Theory vs Engineering: To succeed in academia, you need to have strong theoretical and maths skills. To succeed in industry, you need to develop strong engineering skills. It is great to be able to train a model in a notebook but if you cannot deploy your model in production, it will be completely useless.
5️⃣ Knowledge impact vs $ impact: In academia, it’s all about creating new work and expanding human knowledge. In industry, it is all about using data to drive value and increase revenue.
Comparing Data Science in academia and Data Science in industry is like comparing tennis with table tennis: they sound similar but in the end, they are completely different!
5 big differences between Data Science in academia and in industry 👇:
1️⃣ Model vs Data: Academia focuses on models, industry focuses on data. In academia, it’s all about trying to find the best model architecture to optimise a defined metric. In industry, loading and processing the data accounts for around 80% of the job.
2️⃣ Novelty vs Efficiency: The end goal of academia is often to publish a paper and to do so, you will need to find and implement a novel approach. Industry is all about efficiency: reusing existing models as much as possible and applying them to your use case.
3️⃣ Complex vs Simple: More often than not, academia requires complex solutions. I know that this isn’t always the case but unfortunately, complex papers get a higher chance of being accepted at top conferences. In industry, it’s all about simplicity: trying to find the simplest solution that solves a specific problem.
4️⃣ Theory vs Engineering: To succeed in academia, you need to have strong theoretical and maths skills. To succeed in industry, you need to develop strong engineering skills. It is great to be able to train a model in a notebook but if you cannot deploy your model in production, it will be completely useless.
5️⃣ Knowledge impact vs $ impact: In academia, it’s all about creating new work and expanding human knowledge. In industry, it is all about using data to drive value and increase revenue.
👍18👏4❤2
Here are some incredible platforms where you can download datasets for your project:
Our World in Data https://ourworldindata.org/
World Health Organization (https://www.who.int/data/gho
Statcounter (https://gs.statcounter.com/
Food and Agriculture Organization of the UN (FAO) (https://www.fao.org/home/en
World Bank (https://data.worldbank.org/)
Our World in Data https://ourworldindata.org/
World Health Organization (https://www.who.int/data/gho
Statcounter (https://gs.statcounter.com/
Food and Agriculture Organization of the UN (FAO) (https://www.fao.org/home/en
World Bank (https://data.worldbank.org/)
❤7👍1
8 AI Tools Just for Fun:
1. Tattoo Artist
https://tattoosai.com
2. Talk to Books
https://books.google.com/talktobooks/
3. Vintage Headshots
https://myheritage.com/ai-time-machine
4. Hello to Past
https://hellohistory.ai
5. Fake yourself
https://fakeyou.com
6. Unreal Meal
https://unrealmeal.ai
7. Reface AI
https://hey.reface.ai
8. Voice Changer
https://voicemod.net
1. Tattoo Artist
https://tattoosai.com
2. Talk to Books
https://books.google.com/talktobooks/
3. Vintage Headshots
https://myheritage.com/ai-time-machine
4. Hello to Past
https://hellohistory.ai
5. Fake yourself
https://fakeyou.com
6. Unreal Meal
https://unrealmeal.ai
7. Reface AI
https://hey.reface.ai
8. Voice Changer
https://voicemod.net
Tattoosai
AI-powered Tattoo Generator: Your Personal Tattoo Artist
If you have an idea for a tattoo but can't find the right design, let our AI generate one within seconds. It lets you create the perfect design based on what you like, and it will give you unlimited options so that there's something for everyone.
👍9❤1😁1
1. Can you explain how the memory cell in an LSTM is implemented computationally?
The memory cell in an LSTM is implemented as a forget gate, an input gate, and an output gate. The forget gate controls how much information from the previous cell state is forgotten. The input gate controls how much new information from the current input is allowed into the cell state. The output gate controls how much information from the cell state is allowed to pass out to the next cell state.
2. What is CTE in SQL?
A CTE (Common Table Expression) is a one-time result set that only exists for the duration of the query. It allows us to refer to data within a single SELECT, INSERT, UPDATE, DELETE, CREATE VIEW, or MERGE statement's execution scope. It is temporary because its result cannot be stored anywhere and will be lost as soon as a query's execution is completed.
3. List the advantages NumPy Arrays have over Python lists?
Python’s lists, even though hugely efficient containers capable of a number of functions, have several limitations when compared to NumPy arrays. It is not possible to perform vectorised operations which includes element-wise addition and multiplication. They also require that Python store the type information of every element since they support objects of different types. This means a type dispatching code must be executed each time an operation on an element is done.
4. What’s the F1 score? How would you use it?
The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst.
5. Name an example where ensemble techniques might be useful?
Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data). You could list some examples of ensemble methods (bagging, boosting, the “bucket of models” method) and demonstrate how they could increase predictive power.
The memory cell in an LSTM is implemented as a forget gate, an input gate, and an output gate. The forget gate controls how much information from the previous cell state is forgotten. The input gate controls how much new information from the current input is allowed into the cell state. The output gate controls how much information from the cell state is allowed to pass out to the next cell state.
2. What is CTE in SQL?
A CTE (Common Table Expression) is a one-time result set that only exists for the duration of the query. It allows us to refer to data within a single SELECT, INSERT, UPDATE, DELETE, CREATE VIEW, or MERGE statement's execution scope. It is temporary because its result cannot be stored anywhere and will be lost as soon as a query's execution is completed.
3. List the advantages NumPy Arrays have over Python lists?
Python’s lists, even though hugely efficient containers capable of a number of functions, have several limitations when compared to NumPy arrays. It is not possible to perform vectorised operations which includes element-wise addition and multiplication. They also require that Python store the type information of every element since they support objects of different types. This means a type dispatching code must be executed each time an operation on an element is done.
4. What’s the F1 score? How would you use it?
The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst.
5. Name an example where ensemble techniques might be useful?
Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data). You could list some examples of ensemble methods (bagging, boosting, the “bucket of models” method) and demonstrate how they could increase predictive power.
👍10
Python Notes 👇
https://news.1rj.ru/str/pythondevelopersindia/576
https://news.1rj.ru/str/pythondevelopersindia/576
👍4
🖥 Free Courses on Large Language Models
▪ChatGPT Prompt Engineering for Developers
▪LangChain for LLM Application Development
▪Building Systems with the ChatGPT API
▪Google Cloud Generative AI Learning Path
▪Introduction to Large Language Models with Google Cloud
▪LLM University
▪Full Stack LLM Bootcamp
▪ChatGPT Prompt Engineering for Developers
▪LangChain for LLM Application Development
▪Building Systems with the ChatGPT API
▪Google Cloud Generative AI Learning Path
▪Introduction to Large Language Models with Google Cloud
▪LLM University
▪Full Stack LLM Bootcamp
👍10❤1
Stack Overflow jumps into the generative AI world with OverflowAI
👍12
1. What are the different subsets of SQL?
Data Definition Language (DDL) – It allows you to perform various operations on the database such as CREATE, ALTER, and DELETE objects.
Data Manipulation Language(DML) – It allows you to access and manipulate data. It helps you to insert, update, delete and retrieve data from the database.
Data Control Language(DCL) – It allows you to control access to the database. Example – Grant, Revoke access permissions.
2. List the different types of relationships in SQL.
There are different types of relations in the database:
One-to-One – This is a connection between two tables in which each record in one table corresponds to the maximum of one record in the other.
One-to-Many and Many-to-One – This is the most frequent connection, in which a record in one table is linked to several records in another.
Many-to-Many – This is used when defining a relationship that requires several instances on each sides.
Self-Referencing Relationships – When a table has to declare a connection with itself, this is the method to employ.
3. How to create empty tables with the same structure as another table?
To create empty tables:
Using the INTO operator to fetch the records of one table into a new table while setting a WHERE clause to false for all entries, it is possible to create empty tables with the same structure. As a result, SQL creates a new table with a duplicate structure to accept the fetched entries, but nothing is stored into the new table since the WHERE clause is active.
4. What is Normalization and what are the advantages of it?
Normalization in SQL is the process of organizing data to avoid duplication and redundancy. Some of the advantages are:
Better Database organization
More Tables with smaller rows
Efficient data access
Greater Flexibility for Queries
Quickly find the information
Easier to implement Security
Data Definition Language (DDL) – It allows you to perform various operations on the database such as CREATE, ALTER, and DELETE objects.
Data Manipulation Language(DML) – It allows you to access and manipulate data. It helps you to insert, update, delete and retrieve data from the database.
Data Control Language(DCL) – It allows you to control access to the database. Example – Grant, Revoke access permissions.
2. List the different types of relationships in SQL.
There are different types of relations in the database:
One-to-One – This is a connection between two tables in which each record in one table corresponds to the maximum of one record in the other.
One-to-Many and Many-to-One – This is the most frequent connection, in which a record in one table is linked to several records in another.
Many-to-Many – This is used when defining a relationship that requires several instances on each sides.
Self-Referencing Relationships – When a table has to declare a connection with itself, this is the method to employ.
3. How to create empty tables with the same structure as another table?
To create empty tables:
Using the INTO operator to fetch the records of one table into a new table while setting a WHERE clause to false for all entries, it is possible to create empty tables with the same structure. As a result, SQL creates a new table with a duplicate structure to accept the fetched entries, but nothing is stored into the new table since the WHERE clause is active.
4. What is Normalization and what are the advantages of it?
Normalization in SQL is the process of organizing data to avoid duplication and redundancy. Some of the advantages are:
Better Database organization
More Tables with smaller rows
Efficient data access
Greater Flexibility for Queries
Quickly find the information
Easier to implement Security
❤8👍7
"📊 Data Analysis Tip: Have you ever wondered how outliers can impact your analysis? Outliers are data points that significantly differ from the rest of your dataset. They can skew results and affect the accuracy of your insights.
Tip: Before removing outliers, it's essential to understand their origin. Are they errors, natural variations, or something else? Removing or adjusting them without proper justification can lead to biased results.
Tip: Before removing outliers, it's essential to understand their origin. Are they errors, natural variations, or something else? Removing or adjusting them without proper justification can lead to biased results.
👍11
📚 9 must-have Python developer tools.
1. PyCharm IDE
2. Jupyter notebook
3. Keras
4. Pip Package
5. Python Anywhere
6. Scikit-Learn
7. Sphinx
8. Selenium
9. Sublime Text
1. PyCharm IDE
2. Jupyter notebook
3. Keras
4. Pip Package
5. Python Anywhere
6. Scikit-Learn
7. Sphinx
8. Selenium
9. Sublime Text
❤11👍7
Bu𝗶𝗹𝗱 𝗥𝗲𝘀𝘂𝗺𝗲𝘀 𝗮𝗻𝗱 𝗽𝗿𝗲𝗽𝗮𝗿𝗲 𝗳𝗼𝗿 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄s
1. Interviewai.me • Mock interview with Al
2. Jobwizard.earlybird. rocks • Auto fill job applicaions
3. Interviewgpt.a • Interview questions
4. Majorgen.com • Resume and cover letter builder
5. Metaview.ai • Interview notes
6. Kadoa.com/joblens • Personalized job recommendations
7. Huru.ai • Mock interview and get feedback
8. Accio.springworks.in • Resume scan
9. Interviewsby.a • ChatGPT-based interview coach
10. MatchThatRoleAl.com • Job search
11. Applyish.com • Apply automatically
12. HnResumeToJobs.com • Resume to jobs
13. FixMyResume.xyz • Fix your resume
14. Resumatic.ai • Create your resume with ChatGPT
15. Rankode.ai • Rank your programming skills
Bonus: Apply for AI jobs → http://t.me/aijobz
1. Interviewai.me • Mock interview with Al
2. Jobwizard.earlybird. rocks • Auto fill job applicaions
3. Interviewgpt.a • Interview questions
4. Majorgen.com • Resume and cover letter builder
5. Metaview.ai • Interview notes
6. Kadoa.com/joblens • Personalized job recommendations
7. Huru.ai • Mock interview and get feedback
8. Accio.springworks.in • Resume scan
9. Interviewsby.a • ChatGPT-based interview coach
10. MatchThatRoleAl.com • Job search
11. Applyish.com • Apply automatically
12. HnResumeToJobs.com • Resume to jobs
13. FixMyResume.xyz • Fix your resume
14. Resumatic.ai • Create your resume with ChatGPT
15. Rankode.ai • Rank your programming skills
Bonus: Apply for AI jobs → http://t.me/aijobz
👍19❤2
📚 9 must-have Python developer tools.
1. PyCharm IDE
2. Jupyter notebook
3. Keras
4. Pip Package
5. Python Anywhere
6. Scikit-Learn
7. Sphinx
8. Selenium
9. Sublime Text
1. PyCharm IDE
2. Jupyter notebook
3. Keras
4. Pip Package
5. Python Anywhere
6. Scikit-Learn
7. Sphinx
8. Selenium
9. Sublime Text
❤2👍1🔥1
Microsoft is integrating python with MS Excel on cloud. So in newer updates you don't have to install anything extra and you'll able to leverage python libraries right within from excel
❤7
Some useful PYTHON libraries for data science
NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++
SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.
Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.
Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.
Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.
Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of denoscriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.
Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets.
Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data.
Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information.
SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code.
Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient.
Additional libraries, you might need:
os for Operating system and file operations
networkx and igraph for graph based data manipulations
regular expressions for finding patterns in text data
BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information from just a single webpage in a run.
NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++
SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.
Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.
Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.
Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.
Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of denoscriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.
Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets.
Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data.
Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information.
SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code.
Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient.
Additional libraries, you might need:
os for Operating system and file operations
networkx and igraph for graph based data manipulations
regular expressions for finding patterns in text data
BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information from just a single webpage in a run.
👍8❤5
Top 10 Computer Vision Project Ideas
1. Edge Detection
2. Photo Sketching
3. Detecting Contours
4. Collage Mosaic Generator
5. Barcode and QR Code Scanner
6. Face Detection
7. Blur the Face
8. Image Segmentation
9. Human Counting with OpenCV
10. Colour Detection
1. Edge Detection
2. Photo Sketching
3. Detecting Contours
4. Collage Mosaic Generator
5. Barcode and QR Code Scanner
6. Face Detection
7. Blur the Face
8. Image Segmentation
9. Human Counting with OpenCV
10. Colour Detection
❤9👍1