Spotify Playlist Lookup: a Retrospective
Spotify Playlist Lookup was a music discovery project of mine. It is has now been defunct for over 2 years. It's time to reflect now, before I forget everything.
Some of the tech I used: Flask, Celery, RabbitMQ, PostgreSQL, SQL Alchemy, Docker, Figma, Vue.js (v3) with Typescript, Spotify developer API, Nginx, Hetzner VPS, Matomo analytics.
The What and the Why #
The idea was to enable people to find Spotify playlists by tracks that they contain. The rationale went something like: "I love this specific genre-defying song, it's pretty hard to find anything similar. Maybe somebody else has created a playlist containing songs that have similar qualities to the one I like."
Surprisingly Spotify itself did not (and to my knowledge still does not) provide a native way for this kind of query. There is even an official Spotify forum post from 2012, that is getting replies to this day!
Over a month and a half or so, during evenings and weekends, I put together a (IMO) pretty good attempt at solving the problem. It was a website (with a RESTful API if you wanted to integrate it somewhere else) where you could search for a track by name, choose the one you wanted and see which public playlists had that track.
The How #
The project ended up quite involved, I won't try to recall everything, but here are some of the parts I find interesing.
Data gathering #
Turns out Spotify doesn't just offer a big list of playlist IDs in any sort of centrally accessible way, so the playlists were gathered in a semi-manual process. Once you have that starting point though, all the details and tracks can be queried for through the developer API. The playlist ID process went something like this:
- Go to open.spotify.com
- Search for playlists with specific prompt*
- Scroll for a bit so that a good amount of them get loaded: 500 seemed to be the sweet spot to get the most out of the search phrase
- Open browsers developer view and copy html into clipboard
- Run python script:
# get_playlists.py
# Get playlists from the clipboard
data = pyperclip.paste()
# Parse out playlist IDs.
#
# Each part of this split is a string of characters
# that starts with the playlist ID.
splits = data.split('href="/playlist/')
# The ID is always 22 symbols, so we just take those and
# discard everything else afterward.
# This approach is a bit weird
# but works perfectly well for how simple it is.
playlist_ids = [s[:22] for s in splits][1:]
d = dict.fromkeys(playlist_ids)
# Get and update stored master list of deduplicated IDs
with open ("out/db", "rb") as db:
all_playlists = pickle.load(db)
merged_playlists = all_playlists | d
with open ("out/db", "wb") as db:
pickle.dump(merged_playlists, db)
# Deduplicate
playlist_ids = list(set(d) - set(all_playlists))
# Format for web import
res_formatted = ",".join(playlist_ids)
# Send new concatenated IDs to clipboard
pyperclip.copy(res_formatted)
- Go to this projects site (used to be at playlists.dags.dev), open the "Import" page and just paste the output from the script. (The import could also be done through the API)
It was built this way, rather than just doing everything straight through the backend, in the hopes that other people would contribute playlist IDs.
Once I sat down and started, this process was only really limited by how fast I could come up with new prompts, the search, scroll and run-script part didn't take very long, so I did not end up automating it, though nothing really stopped me from doing so.
Within a few evenings of doing this, the system had something like 50k playlists and 3M tracks.
* Coming up with queries to search for was pretty fun in and of itself. Among the obvious like sub-genre names, open phrases like "I want" or "Tonight we" turned out to be really effective and produced fairly vibrant lists of playlists. Also, random words and phrases from the zeitgeist of different periods like "crunk", "chillax", "salty", etc. got some pretty fun and creative results too.
The import process #
After hitting Go
in the Import page, the details of the playlist and its tracks were gathered from the Spotify developer API for each ID imported. This happened within Celery and using RabbitMQ as the broker.
Looking back, this was overkill for what the site ended up being. I chose to use Celery & RabbitMQ to try out some new tech and it did end up working, but since I was the only one doing the importing, there really was not much point other than learning (which I do not regret in the slightest). Anyway, doing it with a python script manually triggered from the server would have been just as functional and way easier to maintain.
The data store #
All of this imported data got stored in a PostgreSQL database in a separate Docker container with the schema being managed with migrations from SQLAlchemy.
The important bits got stored in three tables: playlists, tracks and playlist_tracks. I feel like those are pretty self-explanatory.
The site #
This was my attempt at learning a front-end framework. I chose v3 of Vue.js with Typescript.
I liked how Vue was organized, but the docs and information online were (and recently nothing much had changed) maddenigly split between v2 and v3, making it pretty confusing. This was also the first time I had anything to do with Typescript and I'm pretty sure I did near-everything wrong with it despite getting to a working state. Suffice it to say, I should have done more docs and best-practices deepdives followed by refactoring.
I would not want to try to pick up so many new frameworks and languages with a single solo project ever again. Everything surprisingly ended up working well, but I did not gain as much understanding from the exercise as I would have when going at a slower pace.
(I don't have any screenshots of the site on hand, but I will update this post if I come by any.)
Some bragging and what happened to the project #
After I was fairly confident about the functionality and design I put up a "Show HN" post on Hacker News (link up top if you're interested). Within a few hours it shot up to the number one spot and my site got hugged to death (no rate limiter yet and there was a stats counter that queried the db every time for track and playlist count - dumb mistake, but was also very easy to fix - stats just needed to be cached and would refresh every hour or so).
It was my first real encounter with having built something and people around the world being interested, saying nice things and using it. I can't really put into words how great that felt. Even better, two, let me repeat, TWO real live people even "bought me a coffee" for my troubles. Maybe seems like nothing special, and maybe it is, but for me it was amazing, made even better by the fact, that the site didn't have any real advertisement for the option to donate - just a small logo in a corner. I think the memory of those feelings will motivate me for a long time to try to generate something useful for the public good.
Despite the great launch (~10k visits on the first day if I recall correctly), daily views quickly withered away and hovered around ~100/day for some more weeks with a few spikes from exposure from mailing lists and blog posts, but ultimately after a few months ended up at about 8-15 views per day.
After a few months of stagnation and still no motivation** in sight I decided to call quits - the VPS costs got accrued to the point where I could not just ignore them anymore and keep waiting. I think the positive reception kind of "burned me out" in a positive way. I had already got all the validation of public engagement and Github stars I could ever want and the way forward was blocked by fundamental issues with the approach.
I just couldn't scale it out with my resources. It had some tens of thousands of playlists and some millions of tracks, but that is a fraction of a miniscule part of everything available through Spotify. Even this fraction though, while storing the minimum possible details, had quickly eaten up the storage of my server. So even if I could, in the long term, ingest a lot more data, it would not be possible to financially sustain everything myself.
I absolutely did not want to put up any ads. And as for some subscription model - I felt that the service was not useful and couldn't really be without adding a lot more features.
So I just stopped everything and took it down.
Honestly, I don't really regret much. It was a good learning opportunity. It got me to actually finish a public-facing project (as much as any software is ever finished). It still feels important enough for me that I'm writing about it now.
I probably could have organized everything to have way less moving parts. No containers, no queue and message broker at the very least. Probably no fancy front-end either, vanilla JS would have worked just as well. Lastly, while I was impressed by Matomo analytics for what it was, looking back, I just did not need that feature-set. Simple nginx log parsing would have been enough.
** This project was actually my way of trying to impress a potential employer. During their two month long recruitment process, despite having gone through countless calls, tests and interviews, I did not even get a chance to show it to anybody...talk about a hit to motivation. But that's a story for another day.
Some random bits #
Here are a few more thoughts and findings that I wanted to share, but did not find an organic place for in the main text.
Spotify developer API #
I planned on adding a "login with Spotify" feature, on which the users playlists would get added to the system. It would have been an elegant way to grow without much manual labor. Unfortunately this was not possible - Spotify simply did not allow access to user data through their API even with user consent without going through a seemingly pretty drawn out and bureaucratic application process meant for large, commercial apps.
Flask factory pattern #
This approach of instantiatng the app and passing around the main app object to register all the different things that needed it, like Celery, SQL Alchemy, CORS handler, rate limiter, database, route blueprint, login manager etc. was the single biggest time sink. I think I reorganized the import and registration order, and its organization over source files dozens of times until hitting on a working solution. It really should not have taken so long.
- Previous: What is this blog?