Update: Full analysis and everything you need at my github https://github.com/mexindian/Musical-chord-progressions

The Hooktheory.com database contains analyses of over 5000 songs*. These analyses are uploaded by users and allow for all these songs to be analyzed in bulk, as well as individually. One of these ‘all song’ analyses enables users to gather chord progressions on ALL songs (see the analysis file to see how i did it, using the hooktheory API and R). This allowed us to  create a Sankey visualization of all chord progressions in the Hooktheory database.

Check it out!


(If you prefer the dynamic version where you can play with the data, have a look at the following link: Click here!).

Explaining the figure a little bit: What interests us here is the type of chords used, regardless of the song’s scale, so that 1->5->6 in the figure above includes songs in key of C major that have the chord progression C->G->Am and songs in the key of A major that have A->E->F#m (if the songs have the same Roman numerals and are in the same relative major.  In reality, the API blends songs into rough categories regardless of the song’s mode, so it’s impossible to know for sure what we’re dealing with).

The chord progressions start from the left, and continue to the right. So for example, the transition 4->1->5->6 is one of the most popular ones… and is in fact present in 327 songs! Check em out!


In the API, chord probabilities are stated as a percent, such that the relative importance of each chord is known at each step (the normalization technique is not known). In their API, there were 29 chords available at the start of all progressions. For every subsequent transition, the number of chord options increases (which is expected), but for the purpose of this visualization, I only keep the original 29 chords for every transition for graphical purposes (I expect these 29 to be the most common anyway, so it’s not that much of a big deal). Also, since the thickness of the lines I’m plotting are in and of themselves probabilities, and the probability that you are on that chord is different, the “total thickness of each transition” isn’t the same. Very lazily, I just normalized all probabilities across each transition so that each transition “mega bar” is kind-of the same height. I’m sure there’s a better way to do it, the community is invited to improve!

My analysis is here, collaboration and/or remixing with attribution is welcome! (and if you improve the normalization method, please let me know and I’ll update this post).


Possible Legend (thanks to HertzDevil):

The numbers are as they are represented in the Trends search string, here in EBNF metasyntax:

(* Roman numerals *)

numeral = “1” | “2” | “3” | “4” | “5” | “6” | “7”;

(* Borrowed modes, from Dorian to Locrian *)

mode = “D” | “Y” | “L” | “M” | “b” | “C”;

(* Figured bass for triadic and seventh chords *)

inversion = “6” | “64” | “7” | “65” | “43” | “42”;

(* Functions available for applied chords *)

function = “4” | “5” | “7”;

(* Basic chords or borrowed chords in the relative Major key *)

simple-chord = [mode], numeral, [inversion];

(* Applied chords *)

applied-chord = function, [inversion], “/”, numeral;

(* Chord progressions for both the Trends page and the API *)

chord = simple-chord | applied-chord;

trends-progression = chord, {“.”, chord};

api-progression = chord, {“,”, chord};

Parting thoughts:


(thanks to Laure Belotti for editorial prowess)


EDIT: I’ve been getting great feedback on this post. Please check out the great conversations]4 and here. Giving credit where it’s due, turns out Axis of Evil wasn’t the first to talk about Chord-progression overusage, check out this dude. More credit where it’s due, turns out I wasn’t the first one to come up with this idea (great minds indeed…). And finally, I’m sure you nerds all checked out hooktheory, but take a look at these]8 resources also!


*EDIT2: Originally I was under the impression that the hooktheory database contained over 25000 songs… but a hooktheory admin clarified that in fact there’s just over 5000.