DISCLAIMER: Since Pandas is a Python Library, it is assumed you have a basic understanding of the Fundamentals of Programming in Python. This includes but is not limited to: Variables & Data Types, Operators, Conditionals, Loops, and Functions.
When working with data, there are hundreds, possibly thousands, of different ways to analyze, manipulate, and process specific data points to produce value. Pandas allows data professionals with the ability to automate these processes. This tutorial will cover the following learning objectives:
What is Pandas
Why Pandas is Important
What is Pandas
Summary
Pandas is a Python library that provides easy-to-use Data Structures and Data Analysis tools.
To get pandas into your Python Script or Notebook Environment, run the following line of code import pandas as pd
NOTE: Pandas is built on NumPy (NUMeric PYthon). This is important to know since Pandas uses NumPy's data types and builds it's internal data structures from NumPy Arrays (basically Python Lists but all the elements MUST have the same data type).
NOTE: If you don't currently have Pandas installed in your local environment, you can do so by running the following line of code in your terminal: pip install pandas==2.0.0
Why Pandas is Important
NOTE: To follow along with this tutorial, you can Download a Jupyter Notebook or activate a Notebook in Kaggle.
Summary
Pandas, like Excel, works with data in a tabular format. If your source data isn't natively in a tabluar format, you can convert it into a similar format, known as a Dataframe.
Although Excel provides a vast array of built-in functions and add-ins to help with data cleaning, manipulation, and visualization, it gets stuck with large datasets (think >10,000 rows).
With Pandas you can load data from multiple data sources (including Databases, Text Files, JSON Files, and Excel Workbooks), merge data from different sources, and prepare, manipulate, and visualize data for Machine Learning and other types of Data Analysis.
NOTE: Although Pandas is built to work with large datasets, it has it's limits. For this reason, Polars has become a front-runner in the Data Manipulation library space. However, since Polars is a newer library, there isn't much community support compared to Pandas, thus we will go over the basics of Pandas (luckily at least 85% of the concepts transfer over to Polars).