SPOTIFY PLAYLIST SONG RECOMMENDATION APP

OVERVIEW

Data Science as a discipline encompasses a variety of fields, including Machine Learning, Data Engineering, Software Engineering, and Data Analytics. Because of this, employers expect Data Scientists to understand the core concepts surrounding these fields.

For our first collaborative project, we will be building off of Spotify's Million Playlist Dataset Challenge by creating a web app that allows users to choose, create, or import a custom playlist and receive a list of suggested songs to add to their playlist.

TEAMS

As mentioned above, this will be a collaborative project. Thus, we will be breaking out into the following teams:

Data Science & Software Engineering
Team Leader
  • Jacob Banuelos
Team Roles
  • Front-End Developer
  • Back-End Developer
  • Machine Learning Engineer
  • Unit Testing Specialist
Team Responsibilities
  • Build front-end UI
  • Build CI/CD pipeline
  • Deploy web app
  • Conduct unit testing
  • Build KNN or K-Means model to find clusters
  • Train and test KNN or K-Means model
  • Validate KNN or K-Means model
Data Engineering & Data Analytics
Team Leader
  • Chase Pattee
Team Roles
  • Database Administrator (DBA)
  • ELT Developer
  • Workflow Optimization Specialist
  • Query Optimization Specialist
  • Tableau Developer
Team Responsibilities
  • Build batch ELT pipeline
  • Deploy workflow orchestration tool
  • Build and deploy data storage tools
  • Create local DB for temporary data storage
  • Create back-end data warehouse
  • Create training and testing datasets
  • Create reporting dashboard for project metrics
  • Deploy Change Data Capture (CDC) techniques for data warehouse

TASKS

Now that we have specified what roles are necessary to complete this project, we will outline the individual tasks that need to be accomplished. To keep things organized, we will refer to the Data Science & Software Engineering Team as Team A and the Data Engineering & Data Analytics Team as Team B.

Phase 1: Data Warehouse Configuration
  1. Create connection to Spotify API (Team A)
  2. Create ingestion pipeline (Team B)
  3. Create data model using dbt (Team B)
  4. Design and implement data warehouse for data storage (Team B)
  5. Deploy CDC features in data warehouse (Team B)
Phase 2: ELT Procedures
  1. Design and implement batch ELT pipeline (Team B)
  2. Create and deploy Prefect workflows (Team B)
  3. Create indexes and views (Team B)
Phase 3: Machine Learning Model Development
  1. Build KNN or K-Means model (Team A)
  2. Create training dataset (Team B)
  3. Traing KNN or K-Means model (Team A)
  4. Identify KPIs used in reporting dashboard (Team B)
  5. Create testing dataset (Team B)
  6. Test and validate KNN or K-Means model (Team A)
  7. Build front-end UI (Team A)
  8. Build CI/CD pipeline (Team A)
Phase 4: Deploy and Test Web App
  1. Conduct unit testing (Team A)
  2. Deploy web app (Team A)
  3. Create reporting dashboard (Team B)

TOOLS

To accomplish the tasks outlined above, we will be using the following tools (in order of appearance):

  • Spotipy (Python library)
  • Prefect
  • Snowflake*
  • dbt
  • Pandas (Python library)
  • PyTorch (Python library)*
  • Tableau
  • Streamlit (Python UI framework)
  • GitHub Actions
  • PyTest (Python library)*

* subject to change