The next technique related to Data Cleaning is Combining DataFrames. If you want to merge two DataFrames but they don't have a shred column, you can combine them into one DataFrame. This tutorial will cover the following learning objectives:
How to Combine Multiple DataFrames Using the "concat" Method
How to Combine Multiple DataFrames Using the "concat" Method
Summary
The concat method is used to combine two or more DataFrames. This is the same as Unions in Relational Databases. This is used with the following syntax: pd.concat([dataframe1, dataframe2])
The ignore_index parameter is used to flatten the new index to make it look cleaner. THis is useful for reducing the number of steps in your pipeline (e.g., having to use reset_index "n" number of times). This is used with the following syntax: pd.concat([dataframe1, dataframe2], ignore_index=True)
The keys parameter is used to create a MultiIndex that helps show where each group of records comes from. This is similar to the "indicator" parameter of the "merge" method. This parameter DOES NOT work with the "ignore_index" parameter, as they would cancel each other out. This is used with the following syntax: pd.concat([dataframe1, dataframe2], keys=['key1', 'key2'])
If you want to combine two or more DataFrames that have different schemas (that is different columns), then use the following syntax: pd.concat([dataframe1, dataframe2], axis=1)
If you want to combine a DataFrame and a Series, use the following syntax: pd.concat([dataframe1, series1], axis=1)
NOTE: If the two DataFrames don't have the same number of columns, the DataFrame with the lesser number of columns will have its records be filled with NaNs in the columns that aren't present.
Exercise
Congratulations! You just completed the Combining DataFrames Tutorial! To help test your knowledge, let's
practice Combining Multiple DataFrames.
**It's highly recommended that you
complete the exercise outlined in the previous tutorial before beginning this exercise.**
Instructions:
Open your IDE (e.g., VS Code, PyCharm, Spyder).
Create a New Jupyter Notebook, or similar Notebook Environment. Name it "joining-dataframes.ipynb"