Once you have the data you want, it's helpful to view it from different perspectives. Sorting allows you to look at the values in your DataFrames in different orders. This tutorial will cover the following learning objectives:
How to Sort DataFrame Columns Using the "sort_values" Method
How to Sort a DataFrame Using the "sort_index" Method
How to Sort DataFrame Columns Using the "sort_values" Method
Summary
The sort_values method is used to sort a DataFrame by the values in one or more columns. Just like the SQL ORDER BY Clause, the sort_values method sorts columns in ascending order by default. This is used with the following syntax (for a single column): dataframe.sort_values(by='column_name')
If you want to sort your DataFrame by a single column in DESCENDING order, use the following syntax: dataframe.sort_values(by='column_name', ascending=False)
If you want to easily see the missing values in a particular column, use the following syntax: dataframe.sort_values(by='column_name', na_position='first')
Unlike traditional objects in Python, the inplace argument allows you to permenantly change the state of the variable. When used with the sort_values method, this is used with the following syntax: dataframe.sort_values(by='column_name', inplace=True)
If you want to sort your DataFrame on multiple columns, use the following syntax:: dataframe.sort_values(by=['column1', 'column2', 'column3'])
If you are sorting your DataFrame on multiple columns but you want one column to be sort in ascending order and another in descending order, use the following syntax: dataframe.sort_values(by=['column1', 'column2', 'column3'], ascending=[True, False, False])
NOTE: When sorting on multiple columns, Pandas will sort the DataFrame by the first column mentioned, and then go down the list of column names supplied.
How to Sort a DataFrame Using the "sort_index" Method
Summary
When you first create a DataFrame, the index will almost always be in ascending order, starting at 0. However, when you apply some filters or sort the values, the index can become scrambled. The sort_index method is used to reorder the index. This is used with the following syntax: dataframe.sort_index()
The main issue with the sort_index method is that the values get rearranged as well. For example, if you has a DataFrame sorted by column "A", and you then applied the sort_index method, that sort order would get overwritten. The reset_index method is used to fix this problem. When you apply some filters and sort the values to your liking, you can apply the reset_index method to keep the sort order but make the index look the way it should. This is used with the following syntax: dataframe.reset_index(drop=True)
NOTE: When using the reset_index method, it's critical that you set the "drop" parameter to TRUE. If you don't do this, a new column will be created representing the former index.
Exercise
Congratulations! You just completed the Sorting DataFrames Tutorial! To help test your knowledge, let's
practice applying sorting DataFrames in various ways.
**It's highly recommended that you
complete the exercise outlined in the previous tutorial before beginning this exercise.**
Instructions:
Open your IDE (e.g., VS Code, PyCharm, Spyder).
Create a New Jupyter Notebook, or similar Notebook Environment. Name it "sorting-dataframes.ipynb"
Read the Parquet File into a DataFrame object named "listings"
Sort the DataFrame by the "mileage" column in descending order.
Sort the DataFrame by the "location", "transmission", and "drivetrain" columns. Sort the "location" column in descending order and all other columns in ascending order. Make the changes to the DataFrame permenant.
Sort the index.
Reset the index. Make sure to use the correct parameters.
Exercise Completed! Click here to view the answers.
Have any issues with the above exercise? Post your question on Discord!