PANDAS TUTORIALS

Pandas Tutorials: DataFrames vs. Series

[Enter image here]
DISCLAIMER: In 99% of Use Cases, Pandas is shortened to "pd", thus you should get used to seeing and using this abbreviation.

Pandas merges the utlities provided in Excel, the logic of SQL, and the efficiency of Python into one complete package. DataFramesare the primary data structure used by pandas to perform data cleaning/manipulation tasks, whereas Series are used to make changes on a single Column more efficiently. This tutorial will cover the following learning objectives:

  • What a Series is and When to Use It
  • What a DataFrame is and When to Use It

What a Series is and When to Use It




Summary

  • A Series is similar to a column in Excel. It's a list of data points all of the same data type. To create a Pandas Series, use the following syntax:
    series1 = pd.Series(data)
  • If you don't have a file or database connection setup, you can use NumPy's random module to create a random assortment of values. Example:
    series1 = pd.Series(np.random.randn(5))
  • The Index of a series is the unique identifier for each row. By default, the Index starts at 0 and goes up to the last row in the Series.
  • The Name of a Series, not to be confused with the variable name assigned to the value, is used when combining multiple Series into a DataFrame (discussed in the next section). This parameter can be used with the following syntax:
    series1 = pd.Series(data, name='[name]')
  • NOTE: You can specify different index values, though the length of the index must match the number of values in the Series, thus it can get quite tedious.
  • NOTE: Throughout this series of tutorials, you'll see a combination of NumPy Arrays and Pandas Series. Just know that these two are the same when it comes to representing one-dimensional data. However, NumPy Arrays won't have access to the same transformational methods as Pandas Series.

What a DataFrame is and When to Use It




Summary:

  • A DataFrame is a two-dimensional data structure comprised of one or more Series. The first dimension is the column headers and the second dimension is the actual row values. To create a Pandas DataFrame, use the following syntax:
    dataframe1 = pd.DataFrame(data)
  • Just like in a Series, DataFrames have an Index that is used to uniquely identify each row.
  • If you want to create a DataFrame from a custom assortment of data, use the following syntax:
    data = {[column_name1]: [list of values], [column_name2]: [list of values]}
  • NOTE: Since the vast majority of Pandas' objects are DataFrames, it's wise to NOT put the commonly used "df" abbrevation when naming variables that contain a DataFrame.
  • NOTE: In both NumPy and Pandas, NULLs are commonly represented as "NaN". This is short for "Not A Number".

Exercise

Congratulations! You just completed the DataFrames vs. Series Tutorial! To help test your knowledge, let's practice creating DataFrames and Series Objects.
**It's highly recommended that you complete the exercise outlined in the previous tutorial before beginning this exercise.**

Instructions:

  1. Open your IDE (e.g., VS Code, PyCharm, Spyder).
  2. Create a New Jupyter Notebook, or similar Notebook Environment. Name it "dataframes-vs-series.ipynb"
  3. In the Newly Created Notebook, complete the following tasks:
    1. Import the Pandas library with its common abbreviation.
    2. Declare a variable named "points_scored" and assign it a value of a Pandas Series with the following values: [21.9, 22.4, 19.8, 14.2, 16.7, 22.0]. Name the Series "ppg".
    3. Declare a variable named "rebounds_grabbed" and assign it a value of a Pandas Series with the following values: [4.4, 6.0, 11.2, 2.1, 0.2, 5.5]. Name the Series "rpg".
    4. Declare a variable named "assists_made" and assign it a value of a Pandas Series with the following values: [6.6, 2.1, 1.6, 9.7, 8.0, 4.1]. Name the series "apg".
    5. Declare a variable named "player_stats" and define the data as a combination of the three Series objects you just created.

      HINT: Use the series.name method to specify the column name as the Series name previously defined.
    6. Give the "player_stats" DataFrame the following Index values: ["Point Guard", "Shooting Guard", "Center", "Small Forward", "Power Forward", "6th Man"]

      HINT: Use the dataframe.index method to assign the list of values to the DataFrame's index AFTER you assigned the DataFrame to the variable.
  4. Exercise Completed! Click here to view the answers.
  5. Have any issues with the above exercise? Post your question on Discord!