Data Science Portfolio - Kaggle Datasets & AI Projects | Artificial Intelligence – Telegram
Data Science Portfolio - Kaggle Datasets & AI Projects | Artificial Intelligence
37.1K subscribers
282 photos
76 files
336 links
Free Datasets For Data Science Projects & Portfolio

Buy ads: https://telega.io/c/DataPortfolio

For Promotions/ads: @coderfun @love_data
Download Telegram
🚨30 FREE Dataset Sources for Data Science Projects🔥

Data Simplifier: https://datasimplifier.com/best-data-analyst-projects-for-freshers/

US Government Dataset: https://www.data.gov/

Open Government Data (OGD) Platform India: https://data.gov.in/

The World Bank Open Data: https://data.worldbank.org/

Data World: https://data.world/

BFI - Industry Data and Insights: https://www.bfi.org.uk/data-statistics

The Humanitarian Data Exchange (HDX): https://data.humdata.org/

Data at World Health Organization (WHO): https://www.who.int/data

FBI’s Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/

AWS Open Data Registry: https://registry.opendata.aws/

FiveThirtyEight: https://data.fivethirtyeight.com/

IMDb Datasets: https://www.imdb.com/interfaces/

Kaggle: https://www.kaggle.com/datasets

UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php

Google Dataset Search: https://datasetsearch.research.google.com/

Nasdaq Data Link: https://data.nasdaq.com/

Recommender Systems and Personalization Datasets: https://cseweb.ucsd.edu/~jmcauley/datasets.html

Reddit - Datasets: https://www.reddit.com/r/datasets/

Open Data Network by Socrata: https://www.opendatanetwork.com/

Climate Data Online by NOAA: https://www.ncdc.noaa.gov/cdo-web/

Azure Open Datasets: https://azure.microsoft.com/en-us/services/open-datasets/

IEEE Data Port: https://ieee-dataport.org/

Wikipedia: Database: https://dumps.wikimedia.org/

BuzzFeed News: https://github.com/BuzzFeedNews/everything

Academic Torrents: https://academictorrents.com/

Yelp Open Dataset: https://www.yelp.com/dataset

The NLP Index by Quantum Stat: https://index.quantumstat.com/

Computer Vision Online: http://www.computervisiononline.com/dataset

Visual Data Discovery: https://www.visualdata.io/

Roboflow Public Datasets: https://public.roboflow.com/

Computer Vision Group, TUM: https://vision.in.tum.de/data/datasets
👍85
To transition from Data Analyst ➡️ Data Scientist, you will have to focus on building relevant projects! 🎯

Predictive Analytics Project
→ Built a model to predict customer behaviour by analyzing past purchase patterns and used time series forecasting to predict future trends.

Sentiment Analysis using NLP
→ Developed a sentiment analysis model that categorized customer feedback into positive, neutral, and negative sentiments to improve products.

Personalized Recommendation Engine
→ Created a recommendation engine using collaborative and content-based filtering to give personalized suggestions based on user’s browsing history and preferences.

Tailor every project to focus on business impact and user experience, which can help you stand out to recruiters. 💪🏻
👍82
Knowing the tools won't be enough to become a master of data analytics!

See if your soft skills are worthy of the rank of master:

1. 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝗰𝗮𝘁𝗶𝗼𝗻: Can you translate your findings into easily digestible insights for non-technical stakeholders?

2. 𝗣𝗿𝗼𝗯𝗹𝗲𝗺-𝗦𝗼𝗹𝘃𝗶𝗻𝗴: Is your work focused on solving actual business problems, and are you able to pick the most efficient approach to solve them?

3. 𝗦𝘁𝗮𝗸𝗲𝗵𝗼𝗹𝗱𝗲𝗿 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁: Are you building strong relationships with your stakeholders, understanding their needs, and providing them with regular updates?

4. 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴: The data landscape is constantly changing. Are you keeping up with new tools and trends?

5. 𝗣𝗿𝗼𝗱𝘂𝗰𝘁/𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁: Are you aware of the life cycle of your data products? Do you have a structured approach to plan, prioritize, and track your work?

6. 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗔𝗰𝘂𝗺𝗲𝗻: Can you understand the language and needs of the business and put your data work into context?

7. 𝗗𝗼𝗺𝗮𝗶𝗻 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲: Do you know the processes, products, and challenges of your domain?


If you want to earn the rank of master in the data field, start working on your soft skills now.
👍91
Few ways to optimise SQL Queries 👇👇

Use Indexing: Properly indexing your database tables can significantly speed up query performance by allowing the database to quickly locate the rows needed for a query.

Optimize Joins: Minimize the number of joins and use appropriate join types (e.g., INNER JOIN, LEFT JOIN) to ensure efficient data retrieval.

Avoid SELECT * : Instead of selecting all columns using SELECT *, explicitly specify only the columns needed for the query to reduce unnecessary data transfer and processing overhead.

Use WHERE Clause Wisely: Filter rows early in the query using WHERE clause to reduce the dataset size before joining or aggregating data.

Avoid Subqueries: Whenever possible, rewrite subqueries as JOINs or use Common Table Expressions (CTEs) for better performance.

Limit the Use of DISTINCT: Minimize the use of DISTINCT as it requires sorting and duplicate removal, which can be resource-intensive for large datasets.

Optimize GROUP BY and ORDER BY: Use GROUP BY and ORDER BY clauses judiciously, and ensure that they are using indexed columns whenever possible to avoid unnecessary sorting.

Consider Partitioning: Partition large tables to distribute data across multiple nodes, which can improve query performance by reducing I/O operations.

Monitor Query Performance: Regularly monitor query performance using tools like query execution plans, database profiler, and performance monitoring tools to identify and address bottlenecks.

Hope it helps :)
👍73
Here is the list of few projects (found on kaggle). They cover Basics of Python, Advanced Statistics, Supervised Learning (Regression and Classification problems) & Data Science
Please also check the discussions and notebook submissions for different approaches and solution after you tried yourself.

1. Basic python and statistics

Pima Indians :- https://www.kaggle.com/uciml/pima-indians-diabetes-database
Cardio Goodness fit :- https://www.kaggle.com/saurav9786/cardiogoodfitness
Automobile :- https://www.kaggle.com/toramky/automobile-dataset

2. Advanced Statistics

Game of Thrones:-https://www.kaggle.com/mylesoneill/game-of-thrones
World University Ranking:-https://www.kaggle.com/mylesoneill/world-university-rankings
IMDB Movie Dataset:- https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset

3. Supervised Learning

a) Regression Problems

How much did it rain :- https://www.kaggle.com/c/how-much-did-it-rain-ii/overview
Inventory Demand:- https://www.kaggle.com/c/grupo-bimbo-inventory-demand
Property Inspection predictiion:- https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction
Restaurant Revenue prediction:- https://www.kaggle.com/c/restaurant-revenue-prediction/data
IMDB Box office Prediction:-https://www.kaggle.com/c/tmdb-box-office-prediction/overview

b) Classification problems

Employee Access challenge :- https://www.kaggle.com/c/amazon-employee-access-challenge/overview
Titanic :- https://www.kaggle.com/c/titanic
San Francisco crime:- https://www.kaggle.com/c/sf-crime
Customer satisfcation:-https://www.kaggle.com/c/santander-customer-satisfaction
Trip type classification:- https://www.kaggle.com/c/walmart-recruiting-trip-type-classification
Categorize cusine:- https://www.kaggle.com/c/whats-cooking

4. Some helpful Data science projects for beginners

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/digit-recognizer

https://www.kaggle.com/c/titanic

5. Intermediate Level Data science Projects

Black Friday Data : https://www.kaggle.com/sdolezel/black-friday

Human Activity Recognition Data : https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones

Trip History Data : https://www.kaggle.com/pronto/cycle-share-dataset

Million Song Data : https://www.kaggle.com/c/msdchallenge

Census Income Data : https://www.kaggle.com/c/census-income/data

Movie Lens Data : https://www.kaggle.com/grouplens/movielens-20m-dataset

Twitter Classification Data : https://www.kaggle.com/c/twitter-sentiment-analysis2

ENJOY LEARNING ✅️✅️


#datascienceprojects
👍103
Creating a data science portfolio is a great way to showcase your skills and experience to potential employers. Here are some steps to help you create a strong data science portfolio:

1. Choose relevant projects: Select a few data science projects that demonstrate your skills and interests. These projects can be from your previous work experience, personal projects, or online competitions.

2. Clean and organize your code: Make sure your code is well-documented, organized, and easy to understand. Use comments to explain your thought process and the steps you took in your analysis.

3. Include a variety of projects: Try to include a mix of projects that showcase different aspects of data science, such as data cleaning, exploratory data analysis, machine learning, and data visualization.

4. Create visualizations: Data visualizations can help make your portfolio more engaging and easier to understand. Use tools like Matplotlib, Seaborn, or Tableau to create visually appealing charts and graphs.

5. Write project summaries: For each project, provide a brief summary of the problem you were trying to solve, the dataset you used, the methods you applied, and the results you obtained. Include any insights or recommendations that came out of your analysis.

6. Showcase your technical skills: Highlight the programming languages, libraries, and tools you used in each project. Mention any specific techniques or algorithms you implemented.

7. Link to your code and data: Provide links to your code repositories (e.g., GitHub) and any datasets you used in your projects. This allows potential employers to review your work in more detail.

8. Keep it updated: Regularly update your portfolio with new projects and skills as you gain more experience in data science. This will show that you are actively engaged in the field and continuously improving your skills.

By following these steps, you can create a comprehensive and visually appealing data science portfolio that will impress potential employers and help you stand out in the competitive job market.

#DataPortfolio
👍82