My name is Ted Petrou and I am the developer and instructor of the Intro to Data Science Bootcamp, a course dedicated to teaching the fundamentals of data science with Python. The live courses are available either on the weekends or during the week. In this article, I will discuss the overall approach I took in designing the course as well as details of the syllabus.
This course targets those who are just beginning their data science journey. It only assumes that students have a deep desire to learn the fundamentals of data science. No previous programming experience is required. Most previous students have backgrounds in science, but many others do not.
I am an expert at data exploration and machine learning using Python and the author of Pandas Cookbook, a thorough step-by-step guide to accomplish a variety of data analysis tasks with Pandas. I am ranked in the top .1% of Stack Overflow users of all time. I have taught over 100 days of live data science classes and from experience know exactly where many of the pain points are and which workflows do best when teaching beginners. I have authored the Python data exploration libraries Dexplo and Dexplot.
I am a very enthusiastic and energetic teacher and am eager to transfer you the skillset to be a data scientist. When students are completing exercises, I will set right next to them, watch them code, help them with their thought process, and catch mistakes.
The course begins by covering the basics of the Python programming language. The material is broken down into 10 main sections:
Each section contains many practice exercises to reinforce the material just covered. Exercises progress in level of difficulty, beginning with mostly one-line answers isolated to a very particular topic before moving to multi-line solutions that necessitate bringing together many of the previous topics. By the time we cover classes and object-oriented programming, we will be ready to build a complete program. Over 200 pages of material and 100 exercises with detailed solutions are provided.
Students will incorporate all of the ideas from the first week and put together a complete poker game that allows a human to play a rules-based artificial intelligence.
We then switch gears and begin our data exploration journey by getting introduced to the Pandas library, the primary tool to analyze data in Python. In particular, we cover how to select subsets of data, a very common, yet particularly troubling topic for those seeing Pandas for the first time. Two live online classes are delivered to cover this material in detail.
Day 1: Minimally Sufficient Pandas
The Pandas library is powerful yet confusing as there are always multiple operations to complete the same task. Students will learn a small, yet an important subset of Pandas that will allow them to complete many tasks without getting distracted by syntax. Students will also learn a simple yet effective process for building a workflow in a Jupyter Notebook.
Day 2: Split-Apply-Combine
Insights within datasets are often hidden amongst different groupings. The split-apply-combine paradigm is the fundamental procedure to explore differences amongst distinct groups within datasets.
Day 3: Tidy Data
Real-world data is messy and not immediately available for aggregation, visualization or machine learning. Identifying messy data and transforming it into tidy data provides a structure to data for making further analysis easier.
Day 4: Exploratory Data Analysis
Exploratory data analysis is a process to gain understanding and intuition about datasets. Visualizations are the foundations of EDA and communicate the discoveries within. Matplotlib, the workhorse for building visualizations will be covered, followed by pandas effortless interface to it. Finally, the Seaborn library, which works directly with tidy data, will be used to create effortless and elegant visualizations.
Day 5: Applied Machine Learning
After tidying, exploring, and visualizing data, machine learning models can be applied to gain deeper insights into the data. Workflows for preparing, modeling, validating and predicting data with Python’s powerful machine learning library Scikit-Learn will be built. The very latest workflows for Scikit-Learn have been incorporated into this material. See this blog post by me for more info.
Students will complete three end-to-end data analyses that will incorporate all of the material learned during the previous weeks. In addition, students will learn how to create and present results with interactive dashboards.
This course is taught with a student-centered approach. During class time, focus alternates from short live coding sessions by the instructor to student-centered exercises. These short intervals of instructor-guided lessons allow students to immediately practice what they learned. It also gives the instructor feedback on how the students are progressing. Since this class is targeted for novices, lots of mistakes and bad habits are able to be mitigated and corrected right when they first appear.
The class is limited to only 15 students, so I am able to quickly put out the fires as they arise during the exercises.
Before the bootcamp starts, I tell my students to segregate the concepts from the tools. The concepts being things such as selecting subsets of data, testing for tidy data, split-apply-combine, binarization of categorical variables, etc… The tools being the Python libraries Pandas, Matplotlib, and Scikit-Learn. The tools help us implement the concepts. They are merely the means to an end. The tools are always changing in data science. It could easily be that Pandas will be replaced in 10 years time. The concepts, however, are permanent fixtures that help you understand the data.
That said, you must be proficient with at least one tool to master the concepts. And, occasionally knowing a tool very well, will provide you with knowledge of a particular concept that you may not have been aware of.
One of the best ways to learn is by struggling with a problem, attempting a solution, and then looking at an expert’s detailed solution. All exercises and assignments have detailed solutions.
If you have questions on any of our events, classes, or our corporate training, please contact us using the below form.