PANDAS TUTORIALS

Pandas Tutorials: Working with Text Files

[Enter image here]

What makes Pandas so popular is it's ability to convert various types of files into DataFrames. Whether you're working with an Excel Workbook, CSV/TSV File, or JSON, Pandas can convert it into a DataFrame. This tutorial will cover the following learning objectives:

  • How to Convert an Excel Workbook into a DataFrame
  • How to Convert a CSV File into a DataFrame
  • How to Convert a JSON File into a DataFrame
  • How to Export a DataFrame into Another File Format

How to Convert an Excel Workbook into a DataFrame




Summary

  • The read_excel method is used to convert an Excel Workbook into a DataFrame. This is used with the following syntax, showing the first sheet in the workbook as default:
    pd.read_excel('file_name.xlsx')
  • If you want to read a specific sheet within an Excel Workbook, use the following syntax:
    pd.read_excel('file_name.xlsx', sheet_name='sheet_name')
  • If you want to list out all the Sheets in your Workbook into a single DataFrame, use the syntax below. This will return a dictionary and you simply need to specify the key to create a DataFrame off a specific DataFrame in the dictionary.
    all_sheets = pd.read_excel('file_name.xlsx', sheet_name=None)
    all_sheets[key]
  • If you want to combine all the Sheets in your Workbook into a single DataFrame, use the following syntax:
    pd.concat(pd.read_excel('file_name.xlsx', sheet_name=None), ignore_index=True)
  • NOTE: Although you can use the Sheet's Index when specifying the sheet to read into the DataFrame, it's a best practice to always use the Sheet's Name since it's easier for others to understand your code.
  • NOTE: If the file you want to use is not in your working directory, you'll need to specify either the Absolute or Relative file path.

How to Convert a CSV File into a DataFrame




Summary

  • The read_csv method is used to convert a CSV file into a DataFrame. This method is used with the following syntax:
    pd.read_csv('file_name.csv')
  • NOTE: If you need to specify the Absolute or Relative file path, you can either put an "r" in front of the file path, or separate the levels of hierarchy by double back slashes "\\". Example:
    pd.read_csv(r"C:\Users\name\Documents\test.csv")
    OR
    pd.read_csv("C:\\Users\\name\\Documents\\test.csv")

How to Convert a JSON File into a DataFrame




Summary

  • JSON, JavaScript Object Notation, is a lightweight format for storing and transporting data across the web. It's formatted as a list of key-value pairs and arrays (nested dictionaries).
  • The read_json method is used to convert a JSON file into a DataFrame. This process is also known as Flattening. The style of JSON used below is known as "column" style. This method is used with the following syntax:

    Sample Format:
    data = {
       "Column1": {"0":"row1_value1", "1":"row2_value1"},
       "Column2": {"0":"row1_value2", "1":"row2_value2"},
       "Column3": {"0":"row1_value3", "1":"row2_value3"}
    }


    Syntax:
    pd.read_json('file_name.json')
    OR
    pd.read_json('file_name.json', orient='columns')
  • If the JSON file you're reading is in a "split" format, see example below, use the following syntax:

    Sample Format:
    data = {
       "index":["0", "1"],
       "columns":["Column1", "Column2", "Column3"],
       "data":["row1_value1", "row1_value2", "row1_value3"],
              ["row2_value1" "row2_value2", "row2_value3"]
      ]
    }

    Syntax:
    pd.read_json('file_name.json', orient='split')
  • If the JSON file you're reading is in a "Index" format, see example below, use the following syntax:

    Sample Format:
    data = {
       "0":{"Column1":"row1_value1", "Column2":"row1_value2", "Column3":"row1_value3"},
       "1":{"Column1":"row2_value1", "Column2":"row2_value2", "Column3":"row2_value3"},
       "2":{"Column1":"row3_value1", "Column2:"row3_value2", "Column3":"row3_value3" },
    }

    Syntax:
    pd.read_json('file_name.json', orient='index')
  • If the JSON file you're reading is in a "Records" format, see example below, use the following syntax:

    Sample Format:
    data = {
       {"Column1":"row1_value1", "Column2":"row1_value2", "Column3":"row1_value3"},
       {"Column1":"row2_value1", "Column2":"row2_value2", "Column3":"row2_value3"},
       {"Column1":"row3_value1", "Column2:"row3_value2", "Column3":"row3_value3"},
    }

    Syntax:
    pd.read_json('file_name.json', orient='records')
  • If the JSON file you're reading is in a "Values" format, see example below, use the following syntax:

    Sample Format:
    data = [
       ["row1_value1", "row1_value2", "row1_value3"],
       ["row2_value1", "row2_value2", "row2_value3"],
       ["row3_value1", "row3_value2", "row3_value3"],
    ]

    Syntax:
    pd.read_json('file_name.json', orient='values')
  • NOTE: When slicing values in Python, use the following notation: starting_value:ending_value. If you want to slice values in reverse, use the following notation: starting_value:ending_value:-1

How to Export a DataFrame into Another File Format


Export DataFrame to CSV File


Export DataFrame to JSON File


Summary:

  • The to_csv method is used to export a DataFrame to a CSV File. This is used with the following syntax:
    dataframe.to_csv('file_name.csv')
  • While working with DataFrames, Indices can be useful in ensuring you have no duplicates in your data and conducting various data quality checks. However, once you export your DataFrame, the Index doesn't provide much of a benefit. Thus, it's usually wise to not include the index. To exclude the Index when exporting a DataFrame to a CSV File, use the following syntax:
    dataframe.to_csv('file_name.csv', index=False)
  • When appending a DataFrame to an existing CSV file, you shouldn't include the column headers as those would already be included in the file. To exclude column headers when uploading a DataFrame to a CSV file, use the following syntax:
    dataframe.to_csv('file_name.csv', headers=False)
  • The to_json method is used to export a DataFrame to a JSON File. Just like in the read_json method, you can specify how you would like to orient the JSON data using the orient parameter. This is used with the following syntax:
    dataframe.to_json('file_name.json', orient='orientation')

Exercise

Congratulations! You just completed the Working with Text Files Tutorial! To help test your knowledge, let's practice Reading and Writing some Text Files into and from DataFrames.
**It's highly recommended that you complete the exercise outlined in the previous tutorial before beginning this exercise.**

Instructions:

  1. Open your IDE (e.g., VS Code, PyCharm, Spyder).
  2. Create a New Jupyter Notebook, or similar Notebook Environment. Name it "text-files.ipynb"
  3. In the Notebook, complete the following tasks:
    1. Download the Following Files:
    2. Read the Excel Workbook into a DataFrame object named "unclean_listings_excel"
    3. Read the CSV File into a DataFrame Object named "unclean_listings_csv"
    4. Read the JSON File into a DataFrame Object named "unclean_listings_json"
    5. Export the "unclean_listings_excel" variable to a CSV file named "converted_listings.csv". Exclude the index.
    6. Export the "unclean_listings_json" variable to a JSON file named "converted_listings.json". Orient the data to "Split" format.
  4. Exercise Completed! Click here to view the answers.
  5. Have any issues with the above exercise? Post your question on Discord!