- Damn Data
- Posts
- Data Science Resources (free for access)
Data Science Resources (free for access)
Foundational Skills
Foundational skills form the basis of true understanding, which will in turn allow you to discover novel solutions, build more accurate models, and make better decisions.
Programming and Data Wrangling
First, you’ll need to know at least one scripting language well enough to wrangle datasets, prototype models, and perform analyses.
We strongly recommend choosing between Python or R, as they are both open-source (free), widely adopted, and supported by active communities.
• Python is more common in software startups, large tech firms, and adTech. Python tends to be more flexible because it’s a general purpose programming language. It’s also better for deep learning and processing data.
• R / RStudio is popular in research, finance, and analytics. R is a statistical programming language that has mature libraries for econometrics, statistics, and machine learning.
Python Resources:
• Learn Python the Hard Way (Online Book) – Recommended for beginners who want a complete course in programming with Python.
• LearnPython.org (Interactive Tutorial) – Short, interactive tutorial for those who just need a quick way to pick up Python syntax.
• How to Think Like a Computer Scientist (Interactive Book) – Interactive “CS 101” course taught in Python that really focuses on the art of problem solving. This goes beyond the bare minimum needed to get started, but it’s such a wonderful gem that we had to include it here.
• PythonChallenge.com (Online Puzzle) – Fun puzzle with 33 levels that you can solve with Python programming.
• How to Learn Python for Data Science, The Self-Starter Way – Guide that covers these resources in more detail.
R / RStudio Resources:
• R for Data Science (Online Book) – Recommended for beginners who want a complete course in data science with R.
• Swirl (Interactive R Package) – Very cool R package that you can install and learn the language directly from inside RStudio (the most common interface used to run R).
Statistics and Probability
A strong statistics foundation helps you fully understand machine learning, conditional probability, A/B testing, and many other core skills.
• Statistics and Probability (Khan Academy) – Practical introduction to statistics and probability from Khan Academy. Recommended for getting up to speed quickly.
• Harvard Stats 110: Probability (Video Series) – Rigorous treatment of probability theory from Harvard. Recommended for building deeper mastery.
• Think Stats: Probability and Statistics for Programmers – Free PDF available. Excellent resource for those with programming backgrounds. Quote: “If you have basic skills in Python, you can use them to learn concepts in probability and statistics and practical skills for working with data.”
• Crash Course on Basic Statistics (PDF) – Short PDF that covers a whirlwind review of key topics. We like this review sheet because it has simple intuitive explanations for each concept.
• How to Learn Statistics for Data Science, The Self-Starter Way – Guide that covers these resources in more detail.
Technical Skills
Data science is all about converting raw data into insights, predictions, software, and so on. Therefore, you’ll need to be comfortable working with data.
Data Collection
Everything hinges on the quality and quantity of your data. Just as a chemist needs the right chemicals, you’ll need relevant data. There are 4 common ways to collect data:
Internal Data. This is proprietary data that your company collects through its operations or through partnerships with other providers. This is usually the most relevant data.
Searching Online. Need a labeled set of 8 million videos? There’s a webpage for that… Seriously, you’d be surprised at what you can find out there. Online datasets allow you to prototype before investing in proprietary data.
API’s. API’s allow you to programmatically (and legally) access datasets that other companies collect. You can find anything from Twitter feeds to weather data to financial data.
Web Scraping. Web crawling and scraping is a powerful tool that you must use responsibly. It opens a whole new world, but make sure to respect terms of services.
API Resources:
• Python: requests Quickstart Guide (Tutorial) – How to use the requests library to request data from API’s.
• R: httr Quickstart Guide (Tutorial) – How to use the httr library to request data from API’s.
Web Scraping Resources:
• R: rvest (Tutorial) – Basic web scraping with the rvest library.
• Python Web Scraping Libraries – Overview of the Python web scraping landscape.
2.2.** SQL**
SQL is the lingua franca for database management and querying, and you should be able to write complex queries.
• Intro to SQL by Khan Academy (Course) – Comprehensive video series that covers every important SQL topic.
• sqlcourse.com (Interactive Tutorial) – Great to use review or a quick crash course.
• SQL Fundamentals (Course) – Course that covers the basics of SQL. Includes quizzes along the way to test your understanding.
Data Visualization
Data visualization is important for exploratory analysis and for communicating your insights, and no list of data science resources would be complete without this topic.
• Data Visualization in Python (Video Series) – Tutorial on using the matplotlib library in Python.
• Data Visualization in R (Video Series) – Tutorial on using the ggplot library in R.
• Python Seaborn Tutorial – Tutorial for the seaborn library in Python, which we strongly recommend for beginners.
Applied Machine Learning
Machine learning is a broad umbrella term that contains many sub-tasks. In a nutshell, it’s about teaching computers how to learn patterns and models from data.
• Machine Learning by Andrew Ng (Video Series) – This is the gold standard when it comes to learning the theory behind machine learning courses.
• Elements of Statistical Learning (PDF) – Reference text. This is one of the classic textbooks of the industry, but it requires a solid math background.
• An Introduction to Statistical Learning in R (PDF) – Reference text. Another classic textbook that has gentler math requirements.
• How to Learn Machine Learning, the Self-Starter Way – Beginner-friendly overview of the machine learning landscape.
• Modern Machine Learning Algorithms: Strengths and Weaknesses – Concise tour of machine learning algorithms.
• Python Machine Learning Tutorial – End-to-end tutorial for training your first model using Python’s Scikit-Learn library.
Thanks For Reading! Please share and subscribe.