<!-- TITLE: Spotify data -->
# How to get it?
+ There were some public challenges that released huge gobs of data: [link prediction task](https://www.aicrowd.com/challenges/spotify-sequential-skip-prediction-challenge-old).
+ They include some interesting [song features](https://www.semanticscholar.org/paper/Deep-content-based-music-recommendation-Oord-Dieleman/eeff60867041d2ea92d1b38a20c2031d240d8872) that encode some "deep" content of what the song is, for neural net training
+ [Here](https://towardsdatascience.com/predicting-spotify-track-skips-49cf4a48b2a5) is an example analysis
+ All the analyses submitted for this challenge are (by requirement) open source
+ [Top 50 by year](https://www.kaggle.com/leonardopena/top50spotify2019)
+ [Hit predictor dataset](https://www.kaggle.com/theoverman/the-spotify-hit-predictor-dataset) has 40k songs labeled "hit" or "flop"
+ [All the songs](https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks) has 170k+ songs
# What to do with it?
+ First, need to isolate a *behavior* (so we're not just analysing the acoustic shapes of albums, etc.)
+ This includes a skip, or starting on a specific song (i.e. the beginning of a session)
+ Second, there should be some interesting datapoint
+ Maybe this just has to do with *who* listens to *what*? Cultural grouping etc.