Harvard: A Pseudo-Cinderella Story About Algorithms

In case you don’t follow sports or my tweets (which I know you don’t), this past weekend and the next two weekends comprise what is known across American post-secondary institutions as March Madness. It’s the ultimate college basketball tournament that happens every year; if a team is good enough over the course of their season, they receive a ranking from 1 (the best) to 16 (the least best) in their conference.  People fill out brackets, stating who they believe will beat who and who will ultimately win the tournament.  They base their decisions on  a wide variety of criteria: some have statistical evidence to back of their choices and some just pick their favourite mascots.

There is almost always a game playing on some broadcasting network, namely TSN, CBS, TBS, and TNT.  16 teams from 4 conferences square off in a single elimination game, cutting the number of participating teams in half after every game.  1 faces 16, 2 faces 15, 3 against 14, etc…  If a team makes it to the second weekend, the Sweet 16 round, by winning 2 games the first weekend, that means they are “going to the big dance”.  If a team that is ranked low (typically 13-16) manages to win their 2 games from the first weekend and make it to the Sweet 16, they are called a Cinderella team (because she finally gets to go to the ball when nobody thought she ever would).

This may sound terribly boring to you and you are wondering “what does this have to do with English 503?”  Well, I will tell you; we have been talking about algorithms and their efficacy and accuracy when applied to subjective fields.

TSN had a clip between games where they interviewed a professor of statistics from University of Toronto who had come up with his own algorithm for figuring out how well teams would do.  Please don’t ask me to explain the algorithm because I don’t have a clue what it means; I just know that after plugging in all his information, each team received a numerical value.  His theory was that in any given match up, the team with the higher number would win.  Not all teams he picked for his Final Four were top ranked teams which is significant to note.  A very similar idea to the “Moneyball” concept seen in the movie with Brad Pitt and Jonah Hill.

This sounds like a great idea, a sure fire way to get a perfect bracket, except for one thing: Harvard.  Yes, Harvard.

Harvard was ranked 14 in their conference and they were matched up against New Mexico, the #3 seed.  Harvard has never been known to do well in March Madness, they rarely even make the tournament.  What do you really expect, most of these kids did not grow up playing the game in the streets with their friends.  Leading up to this tournament, the team had not made a victory in the tournament in 75 years.  Seems like an easy pick for New Mexico, right?

Wrong.

Harvard won in a game that busted everyone’s brackets on the very first day of the tournament.  They beat New Mexico 68-62 in an unbelievable turn of events.  Which just goes to show that this tournament truly is “madness” in many respects.  Any team can win on any given day.  Just like how Arizona completely decimated Harvard 74-51 two days after their win, meaning Harvard will not be going to the dance.  But that is besides the point.  Nobody cares how far Harvard goes, we were all just shocked to see them there at all.

The most significant thing to note about this unfortunate turn of events for New Mexico, besides the fact that it was 75 years in the making for Harvard, is that the aforementioned professor at the U of T had New Mexico in his Final Four.  On paper, this should have been a blow out game.  Absolutely nobody had Harvard in their brackets, 75 years of tradition would support that.  The algorithm failed.

This algorithm did not take in to account human subjectivity.  Nobody, not even a machine, could foresee Harvard winning a game; Harvard didn’t even think it would happen.  It was not that the data input was incorrect, but that human subjectivity cannot be 100% accurately predicted or replicated.

Now, for how this relates to English 503.  We came to the conclusion that in fact there could be a machine created that is the sum of all our shared experiences and influences and that factors every single thing in to its text analysis.  Where I see the problem arising is in using machines to produce the literature themselves.  I feel that this would be less effective based solely on the fact that it would not be a subjective human creating the work.  There will always be that gap between algorithmic capabilities/quantitative data and human production/qualitative experiences.

It may be that this is just my personal opinion or preference because I don’t know enough about the field of digital humanities to be able to make a more informed opinion but I still stand by my point.  Machines don’t take in to account the power of the human will, it’s not exactly quantifiable nor can we ourselves actually pinpoint how effective any one person’s subjectivity can be in any given situation.  We can make educated guesses or predictions but we can never really know for sure.  Our predictions will be correct, until they are not.  They will be correct until Harvard wants to try to fit into glass slippers.

This entry was posted in Response. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *