PANDAS TUTORIALS

Pandas Tutorials: Organizing Columns

[Enter image here]

Data Cleaning is the process of filtering, manipulating, and organizing data for use in Analytics and Machine Learning. Pandas provides a vast array of techniques to clean data. Over the next several tutorials, we'll discuss the most prominent techniques used by Data Scientists. This tutorial will cover the following learning objectives:

  • How to Rename DataFrame Columns
  • How to Reorder DataFrame Columns
  • How to Drop DataFrame Columns

How to Rename DataFrame Columns




Summary

  • The rename method is used to rename one or more columns. This is used with the following syntax:
    dataframe.rename(columns={'old_column_name':'new_column_name', 'old_column_name':'new_column_name'})
  • If you have the correct column names but they're not in the right order, you can switch the column names using the following syntax:
    dataframe.columns = ['column1', 'column3', 'column2']

How to Reorder DataFrame Columns




Summary

  • We were introduced to the power of the sort_index method in the previous tutorial. However, the sort_index method sorts the indices of the rows by default. If you want to sort columns alphabetically, use the following syntax:
    dataframe.sort_index(axis=1)
  • If you want to sort the columns in a DataFrame manually, use the following syntax:
    dataframe[['column5', 'column9', 'column1', 'column3']]
  • NOTE: You'll rarely need to sort columns alphabetically, so it would be wise to get in the habit of reordering columns using the manual approach.

How to Drop DataFrame Columns




Summary

  • The drop method is used to drop either columns or rows matching a specific condition. When used to drop columns, this is used with the following syntax:
    dataframe.drop(['column1', 'column2'], axis=1)
  • If you want to drop columns by their index values, use the following syntax:
    dataframe.drop(df.columns[[0,1]], axis=1)
  • NOTE: Unless you know your dataset inside and out, it would be wise to get in the habit of using the first method mentioned in the video above.

Exercise

Congratulations! You just completed the Organizing Columns Tutorial! To help test your knowledge, let's practice Renaming, Reordering, and Dropping Columns.
**It's highly recommended that you complete the exercise outlined in the previous tutorial before beginning this exercise.**

Instructions:

  1. Open your IDE (e.g., VS Code, PyCharm, Spyder).
  2. Create a New Jupyter Notebook, or similar Notebook Environment. Name it "organizing-columns.ipynb"
  3. In the Notebook, complete the following tasks:
    1. Download the Following Files:
    2. Read the Pickle File into a DataFrame object named "listings"
    3. Rename the "header" column to "listing_title" and the "transmission" column to "transmission_type". Make the changes permenant.
    4. Reorder the columns to reflect the following: vin, price, listing_title, trim, engine, mileage, transmission_type, drivetrain, fuel_type, mpg, exterior_color, interior_color, location
    5. Drop the Column with the index of 13.
    6. Drop the "mpg" column. Make the changes permenant.
  4. Exercise Completed! Click here to view the answers.
  5. Have any issues with the above exercise? Post your question on Discord!