Workshops

WORKSHOPS HUB

Looking to get hands-on with Data Science? Our workshops will teach you the ins-and-outs of how to start a Data Science Project. Check them out below:

MLOps Learning Series (3): Hyperparameter Tuning & Model Training, 7/12/2023

Description: In the third workshop of the MLOps Learning Series, we use the datasets cleaned, wrangled, and preprocessed in the previous workshop to train and tune both tree-based models and linear models. We also give a quick refrsher on p-values and stepwise regression.

Topics Covered:

Hyperparameters
Data Preprocessing
P-values
Linear Regression

Presentation: Hyperparameter Tuning Instructions: Workshop Instructions

MLOps Learning Series (2): EDA and Data Preprocessing, 7/5/2023

Description: In the second workshop of the MLOps Learning Series, we build on our knowledge from the previous workshop by using the data we gathered, prepared, and wrangled using Polars to perform Exploratory Data Analysis (EDA) and preprocess the data for use in Machine Learning. We conclude the workshop with some real-world examples of EDA and Data Preprocessing.

Topics Covered:

Exploratory Data Analysis (EDA)
Data Preprocessing
Data Preparation
Data Quality

Presentation: EDA and Data Preprocessing Instructions: Workshop Instructions

MLOps Learning Series (1): Introduction to Polars, 6/28/2023

Description: In the first workshop of the MLOps Learning Series, we take a high-level overview of Polars. Polars is a Rust-based Data Manipulation library and is the successor to Pandas. First, we go over what MLOps is, why it's important, and how the Machine Learning Lifecycle works. To finish the workshop, we go over how Polars is used in Machine Learning applications, the basic objects and methods used in the library, and close out with an in-depth tutorial and various challenges (for both beginners and advanced users).

Topics Covered:

Data Wrangling
Data Preparation
Querying
Data Quality

Presentation: Introduction to MLOps with Polars Instructions: Workshop Instructions

Random Forests Workshop, 6/7/2023

Description: Random Forests are random ensembles of Decision Trees used to optimize Informain Gain and reduce bias in Feature Selection. In this workshop, we discuss how Random Forests work, why they are used, and when to use them over Neural Networks. The examples completed in this workshop were done in Python using the Pandas and SciKit-Learn libraries, though the problems can also be done using other tools such as R or RapidMiner.

Topics Covered:

Information Gain
Data Preparation
Entropy
Data Preprocessing

Presentation: Optimizing Decision Trees with Random Forests Instructions: Workshop Instructions

Decision Tree Classification Workshop, 5/31/2023

Description: Decision Trees are statistical models that use deeply nested if-else statements in a tree like structure. These can be used for both classification and prediction. In this workshop, we go over the algorithms behind Decision Trees, their use cases, and their advantages and disadvantages. The challenges at the end of the workshop were completed in Python using Pandas and SciKit-Learn, though the problems can also be done using other tools such as R or RapidMiner.

Topics Covered:

Information Gain
Data Preparation
Entropy
Information Gain vs. Least Squares

Presentation: Decision Tree Classification Models Instructions: Workshop Instructions

Optimizing and Interpreting Regression Models Workshop, 5/24/2023

Description: This workshop is a follow-up to the previous workshop. In this workshop, we discuss the importance of p-values and r-squared in optimizing and interpreting Linear and Logistic Regression Machine Learning models and complete two challenges to predict Flight Ticket Prices and whether or not a paient was diagnosed with diabetes. These challenges were completed in Python using the Scikit-Learn and StatsModels libraries.

Topics Covered:

P-values
R-squared
Test Statistics
Stepwise Regression Analysis

Presentation: Optimizing and Interpreting Regression Models Instructions: Workshop Instructions

Linear vs. Logistic Regression Workshop, 5/10/2023

Description: Regression is the use of linear algebra to use one or more independent variables to predict a dependent variable. In this workshop, we give an overview of the different types of regression, what p-values are and why they're important, and how to create regression model using SciKit-Learn and StatsModels.

Topics Covered:

Linear Regression
Logistic Regression
P-values
Null/Alternative Hypothesis Testing

Presentation: Linear vs. Logistic Regression Instructions: Workshop Instructions

Data Wrangling Workshop, 3/1/2023

Description: Data Wrangling is the practice of gathering data from a multitude of sources to create a comprehensive dataset used for analysis or deployment. In this workshop, we go over the most common types of data sources, how to work with them, and how to create a robust dataset using tools such as Pandas.

Topics Covered:

Data Warehouses
APIs
Data Manipulation
ETL

Presentation: Data Wrangling Instructions: Workshop Instructions

Structured vs. Semi-Structured vs. Unstructured Data Workshop, 2/22/2023

Description: In today's world, there are three primary types of data: Structured, Semi-Structured, and Unstructured. In this workshop, we go over the differences between these types of data, their use cases, and why it's important to become familiar with all of them.

Topics Covered:

File Formats
Relational Databases
Data Manipulation
ETL
Feature Engineering

Presentation: Structured vs. Unstructured Data Instructions: Workshop Instructions