PANDAS TUTORIALS

Pandas Tutorials: Slicing DataFrames

[Enter image here]

Most of the time, your DataFrames will have more than 10 columns, which makes choosing specific columns quite tedious. Slicing makes this process much less tedious. This tutorial will cover the following learning objectives:

  • How to Slice DataFrame Columns

How to Slice DataFrame Columns




Summary

  • If you want to selct a single column from a DataFrame, use the following syntax:
    dataframe['column_name']
  • If you want to select specific rows in a single column, use the following syntax:
    dataframe['column_name'][min_index:max_index]
  • If you want to select multiple columns from a DataFrame, use the following syntax:
    dataframe[['column1', 'column2', 'column2']]
  • The loc method is used to filter a DataFrame on either Rows, Columns, or both Rows and Columns. This is used with the following syntax:
    dataframe.loc[[row_filter], [column_filter]]
  • If you want to select all the rows in a DataFrame but only a few adjacent columns, use the following syntax:
    dataframe.loc[:, 'column1':'column5']
  • If you want to select all the rows in a DataFrame but only a few non-adjacent columns, use the following syntax:
    dataframe.loc[:, ['column1', 'column4', 'column18']]
  • If you want to select rows that match a specific condition AND select all columns, use the following syntax:
    dataframe.loc[[row_filter], :]

    Example:
    customers.loc['first_name' == 'Janice', :]
  • The iloc method is identical to the loc method, except that iloc uses the indices of each Row and Column rather than names. This is used with the following syntax:
    dataframe.iloc[[row_filter], [column_filter]]
  • In the example below, we'll use the iloc method to select rows 50-100 and only include the name, price, and description columns:
    inventory.iloc[50:101, [2, 3, 5]]
  • If you want to view a Series as a DataFrame, use the following syntax:
    dataframe[['column_name']]
    OR
    pd.DataFrame(dataframe['column_name'])
  • NOTE: As mentioned in the video, when slicing a DataFrame using Indices, the end index is EXCLUSIVE, meaning it ignores the last item, thus you should always add one to the last item you want (e.g., if you want columns 1-3, specify the indices 1:4).

Exercise

Congratulations! You just completed the Slicing DataFrames Tutorial! To help test your knowledge, let's practice Selecting Rows and Columns from DataFrames.
**It's highly recommended that you complete the exercise outlined in the previous tutorial before beginning this exercise.**

Instructions:

  1. Open your IDE (e.g., VS Code, PyCharm, Spyder).
  2. Create a New Jupyter Notebook, or similar Notebook Environment. Name it "slicing.ipynb"
  3. In the Notebook, complete the following tasks:
    1. Download the Following Files:
    2. Read the CSV File into a DataFrame object named "listings"
    3. Select the "header" column WITHOUT using the loc method.
    4. Select rows 15 through 90 in the "exterior_color" column WITHOUT using the loc method.
    5. Select the "header", "mileage", and "price" columns WITHOUT using the loc method.
    6. Using the loc method, select all rows that have a drivetrain of "AWD". Include only the "header" and "drivetrain" columns.

      NOTE: The "drivetrain" column has a LEADING space, thus you should include a space in front of the characters in the string.
    7. Using the iloc method, select rows 35 through 120 in the "engine" column. Only include the "engine" column.
  4. Exercise Completed! Click here to view the answers.
  5. Have any issues with the above exercise? Post your question on Discord!