Introduction to Data Science: Courses and Course Materials
- The Open Source Data Science Masters Program. Note: The outline for this masters program provides an excellent overview of skill sets needed in data science.
- UC Berkeley Data Science 100 Syllabus, Lectures, Resources, and This alternate UC Berkeley Data Science 100 offered by a different instructor
- Harvard Introductory Course on Data Science
- Coursera Courses on Data Science. Note: Coursera courses often provide free videos, but sometimes charge if you want full-access to all files or to receive certification.
- Lynda.com offers a number of introductory video tutorials on data science by subscription. If you are a college / University student, or a member of a library with a digital subscription, check to see if you have free access to Lynda.com
Introduction to Data Science: Podcasts
Videos about becoming a data scientist
- The life of a data scientist / Josh Wills
- Dirty secrets of data science / Hillary Mason
- How to become a data scientist in 6 months / Tetiana Ivanova
Interviews with data scientists
- Metamarket Blog: Up-to-date interviews with data scientists
- epub: Data Scientists at Work
- epub: The Data Science Handbook
Web sites for networking with other data scientists
- Data Science Central
- Search Meetup.com for local meetup groups focusing on data science and related topics like Python programming
- Search Google for your regional “big data” hub – e.g., The Midwest Big Data Hub, as well as working groups based on themes within your hub, e.g., Midwest Working Groups
- Search for conferences and symposiums representing a cross-section between data science and the topics you are most passionate about, such as Open Science. Also, keep in mind that local symposiums can provide a great place to network for jobs even if they are small (and are usually much less expensive to attend).
- 7 Conferences Data Scientists Shouldn’t Miss
- Neural Information Processing – NIPS
- My Data Science Infographics Pinterest Site
- Intermediate Data Science Plan for 2017
- Machine learning tools by industry
- 24 articles about what data science is
- Differences between machine learning, deep learning, data science, artificial intelligence, etc.
- Why I Left My Data Science Master’s Program
- MS Program, Bootcamp, or MOOCs?
- 5 Things You Should Know Before Getting a Degree in Data Science
Machine Learning: Tutorials and resources for hackers using Python and Github
- Introduction to Machine Learning Respository
- Deep Learning for Natural Language Processing
- Machine learning for hackers: Python Jupyter notebook
- Machine learning for hackers: with Python, Github tutorial, emphasizing Bayesian methods
- Building Machine Learning Systems with Python source code
Machine Learning: Video Tutorials and Courses
- Machine Learning courses by Udemy — a collection of both free & paid tutorials that have been taken by more than 5 million students worldwide.
- 2016 free statistical machine learning course with video-lectures by Larry Wasserman from Carnegie Mellon University
- Other top free video courses and tutorials on machine learning
- A (free) Course in Machine Learning by Hal Daumé III
- Coursera offers a number of courses on or covering Machine Learning including: [Neural Networks using Python / University of Toronto] (https://www.coursera.org/learn/neural-networks-deep-learning) and Machine Learning / Standford University. Note: Usually free to view videos.
- Lynda Machine Learning Video Tutorials. Note: Lynda subscription costs $20 – $35 a month. Many Colleges and Universities are providing Lynda for free to students, faculty and staff.
Machine Learning: Podcasts
Machine Learning: Infographics on Pinterest
Machine Learning: Packages
- mlpy Machine Learning Python
- Machine Learning Toolkit MILK
- MDP a collection of supervised and unsupervised learning algorithms
- pyBrain modular Machine Learning Library for Python
- Caffe framework for convolutional neural network algorithms
- Nolearn framework wrapping scikit neural networks
- OverFeat Convolutional Network-based image features extractor and classifier
- Hebel GPU-Accelerated Deep Learning Library in Python
- neurolab simple and powerful Neural Network Library for Python. Contains based neural networks, train algorithms and flexible framework to create and explore other networks
- Pylearn2 and Theano deep learning libraries
Machine Learning: Deep Learning
Machine Learning: Alternatives to and Limitations of Machine Learning
- Some data scientists believe that Probabilistic Computing will someday overtake and replace machine learning because it is possible for data analysts with limited expertise in data science to quickly arrive at solutions that are easier to interpret. It is based on a Naive Bayes approach.
- SIDE NOTE: I heard a fascinating story recently related to this approach: there was a head-to-head competition at MIT between experts in three domains: one in Probabilistic Computing ( Dr. Mansinghka, an advisor to Google DeepMind), one in Machine Learning, and one in Statistics. They were given two problem sets – one with a known solution, and one with no known solution. The Probabilistic Computing expert provided both solutions after a few hours, which were deemed significantly better solutions than the ones the ML expert and statistician came up with after about a day and a half.
- Article on the importance of taking a human-centered approach to machine learning
- Here’s a tool that helps to address the problem of not knowing your data well enough to implement machine learning “responsibly” – Facets
- When NOT to use Deep Learning
Machine Learning: Skill Sets Needed
Experimental design / Working with machine learning algorithms / Feature engineering / Prediction vs. explanation / Network analysis / Collaborative filtering / Code up machine learning algorithms on single machines and on clusters of machines / Amazon AWS / Working on problems with terabytes of data / Machine learning pipelines for petabyte-scale data / Algorithmic design / Parallel computing (with MapReduce)
Machine Learning: Potential Tools Needed
Python / Python libraries for linear algebra, plotting, machine learning: numpy, matplotlib, sk-learn / Github for submitting project code / MapReduce / Hadoop / MrJob / Spark / Spark Core / data frames / Spark Shell / Spark Streaming / Spark SQL / MLlib