STATISTICS TUTORIALS

Stats Tutorials: Types of Variables

[Enter image here]

At the heart of Linear Algebra and Statistics are variables. These typically represent a characteristic of an observation. An observation is a term in statistics that refers to a record or row in a table (e.g., a customer in a database, a sale that took place). There are various types of variables that need to be explored. In this tutorial, we'll cover the following learning objectives:

  • Discrete vs. Continuous Variables
  • Ordinal vs. Nominal Variables
  • Interval vs. Ratio Variables

Discrete vs. Continuous Variables




Summary

  • Discrete Variables/Discrete Data are countable and can only accept numbers in a finite range. This data is typically represented as integers. When representing text, or strings, there is a finite number of categories (e.g, country, color, product category).
  • The primary characteristic of Discrete Variables is the notion that there can't be any values between the fixed intervals (e.g., if you have a field showing the number of items in a customer's shopping cart, there can't be 5.75 items).
  • Continuous Variables/Continuous Data are infinite and any values can exist between the intervals. For example, if you are measuring how much a person weighs, their weight can be anywhere on a scale from 0 to "x" number of pounds.
  • The Scale/Grain of your data can convert a Variable between Discrete and Continuous. For example, when measuring a person's weight, you could round up to the nearest whole pound, or get an exact measurement. It all depends on how you want the data to be measured.
  • NOTE: When identifying variables in a statistical experiment, it's critical to understand the scale associated with each variable to correctly categorize the variable as either Discrete or Continuous. This has a strong impact on the rest of your experiment, as will be shown in later tutorials.

Ordinal vs. Nominal Variables




Summary

  • Categorical Data is another name for Discrete Data. This data can be represented either as integers or categories (e.g., colors, product categories).
  • Numerical Data is any data that can directly be measured. This data can be either Discrete OR Continuous, depending on the scale (e.g., Age, Height, Weight).
  • Nominal Data is a subset of Categorical Data that represents categories with no inherent rank or order. For example the color of a product doesn't have a natural order. This type of data is commonly used to compare values across groups, such as number of occurrences, or how different groups interact with numeric variables.
  • Ordinal Data is a subset of Categorical Data that represents categories with an inherent rank or order. For example, a product review with 5 stars will have a greater value than one with only 1 star. These make it easier to convert the text data into numerical data based on a given scale to then measure the data.
  • NOTE: As mentioned both in the video and the previous section, it's important to understand what level of measurement each of your variables represents as it impacts what statistical tests can be conducted on your dataset.

Interval vs. Ratio Variables




Summary

  • Interval Data is a subset of Numerical Data that represents quantitative data within a specific range and are at fixed lengths. For example, if you're measuring a person's weight and you round to the nearest pound, the variable would be classified as Interval.
  • The primary characteristic of Interval Data is that there is no fixed "zero point". For example, if you're measuring tempurature and get a value of 0, that doesn't mean there isn't a reading, it could easily mean that the temperature was 0 degrees.
  • Ratio Data is a subset of Numerical Data that represents quantitative data with a fixed zero point. For example, the age of a person has to be 0 or above. This type of data is the most granular and the best to work with as it's easy to compare against similar variables in your dataset.