Looking to get hands-on with Data Science? Our workshops will teach you the ins-and-outs of how to start a Data Science Project. Check them out below:
Description: In the third workshop of the MLOps Learning Series, we use the datasets cleaned, wrangled, and preprocessed in the previous workshop to train and tune both tree-based models and linear models. We also give a quick refrsher on p-values and stepwise regression.
Description: In the second workshop of the MLOps Learning Series, we build on our knowledge from the previous workshop by using the data we gathered, prepared, and wrangled using Polars to perform Exploratory Data Analysis (EDA) and preprocess the data for use in Machine Learning. We conclude the workshop with some real-world examples of EDA and Data Preprocessing.
Description: In the first workshop of the MLOps Learning Series, we take a high-level overview of Polars. Polars is a Rust-based Data Manipulation library and is the successor to Pandas. First, we go over what MLOps is, why it's important, and how the Machine Learning Lifecycle works. To finish the workshop, we go over how Polars is used in Machine Learning applications, the basic objects and methods used in the library, and close out with an in-depth tutorial and various challenges (for both beginners and advanced users).
Description: Random Forests are random ensembles of Decision Trees used to optimize Informain Gain and reduce bias in Feature Selection. In this workshop, we discuss how Random Forests work, why they are used, and when to use them over Neural Networks. The examples completed in this workshop were done in Python using the Pandas and SciKit-Learn libraries, though the problems can also be done using other tools such as R or RapidMiner.
Description: Decision Trees are statistical models that use deeply nested if-else statements in a tree like structure. These can be used for both classification and prediction. In this workshop, we go over the algorithms behind Decision Trees, their use cases, and their advantages and disadvantages. The challenges at the end of the workshop were completed in Python using Pandas and SciKit-Learn, though the problems can also be done using other tools such as R or RapidMiner.
Description: This workshop is a follow-up to the previous workshop. In this workshop, we discuss the importance of p-values and r-squared in optimizing and interpreting Linear and Logistic Regression Machine Learning models and complete two challenges to predict Flight Ticket Prices and whether or not a paient was diagnosed with diabetes. These challenges were completed in Python using the Scikit-Learn and StatsModels libraries.
Description: Regression is the use of linear algebra to use one or more independent variables to predict a dependent variable. In this workshop, we give an overview of the different types of regression, what p-values are and why they're important, and how to create regression model using SciKit-Learn and StatsModels.
Description: Data Wrangling is the practice of gathering data from a multitude of sources to create a comprehensive dataset used for analysis or deployment. In this workshop, we go over the most common types of data sources, how to work with them, and how to create a robust dataset using tools such as Pandas.
Description: In today's world, there are three primary types of data: Structured, Semi-Structured, and Unstructured. In this workshop, we go over the differences between these types of data, their use cases, and why it's important to become familiar with all of them.