Why Facebook’s news algorithm is doomed to failure

Tags

, , , ,

Another day and another update to Facebook’s news algorithm. Facebook claims they use around 100,000 factors to identify the most “relevant” content for users. Yet the constant tweaking and tuning of the algorithm suggests that the algorithm will never work as desired.

The latest change to Facebook’s news algorithm records how long a user spends reading a particular post in the feed. This apparently compensates for content that gets more screen time but didn’t get any engagement. As any model builder knows, adding new predictors and tweaking parameters can improve in-sample performance but at the expense of out-of-sample performance. We also know that with enough parameters, training data can be fit perfectly, again at the expense of out-of-sample performance. Facebook’s content ranking model underperforms with a hundred thousand variables, which implies either that Facebookians don’t understand modeling (and are overfitting) or the problem is generally intractable.

Does that mean this enterprise of algorithmic content curation is inherently doomed? The recent revelation that Apple is hiring human editors for their news app suggests that perhaps the problem is intractable. An alternative explanation is that it’s the failure of model design. Fundamentally there are two structural problems in algorithmic content curation: a primitive model of content and serialized timelines. These issues are not exclusive to Facebook and are systemic to social media data in general.

Content Models

Part of the charm of social media feeds is that they contain all sorts of content. In addition to news, there’s gossip from your friends, content marketing, special interest stories, and so on. There’s a certain serendipity inherent in timelines that people seem to enjoy. A recent survey showed that most people on Facebook “bump” into news as opposed to seeking it, which reinforces this notion of serendipitous reading. But that’s not the only way to read the news, and a majority of Twitter users actively seek news.

A serendipitous timeline might be beneficial when browsing and killing time, but when looking for specific content it becomes an obstacle. Finding multiple pieces of content on the same subject becomes a futile exercise in scrolling. Sometimes we might only want to read gossip, while other times we may be looking for industry news. Here’s another scenario: imagine reading long character pieces or stories about a travel destination in The New Yorker or National Geographic on a lazy Sunday. Would you read the same content during a weekday lunch or coffee break? Likewise, what you read while commuting, such as the morning news or a casual magazine, is different from what you read during work, which might be more industry specific. Successful content curation balances personalized content and serendipitous content. Personalization is a measure of how targeted content is to your specific interests. Breaking news like the Fukushima reactor meltdown or the Syrian crisis are of general interest and not particularly tailored to anyone’s interests (although geography certainly influences importance). On the other hand, stories about gardening or running are directed towards enthusiasts and benefit from personalization.

Equally important is the time sensitivity of content. Breaking news is most valuable when it’s near instantaneous, whereas an article about trekking in Machu Pichu is not so time sensitive. Collectively, Zato Novo calls the dimensions of personalization and time sensitivity the content continuum. Content whose value decays quickly is highly time sensitive. Using the same examples as above, a breaking news story from four years ago about the Fukushima reactor is not so valuable now since we already know what happened and is literally old news. In contrast, a four year old article on planting a certain type of perennial is still relevant despite its age.

It’s informative to plot different content types according to these two scales. The size of each content type represents how much variation it has along either scale. A special interest magazine includes narrow subjects like model railroads or other hobbies and cater to specific individuals. Trade publications like the IEEE’s Computational Intelligence magazine are a tad more general, while local news and daily news serve a much more general audience.

content_continuum

Notice that we make a distinction between two types of breaking news: industry specific news, which is personalized, versus global breaking news that is not personalized. Both are time sensitive, but industry news is generally more niche than global news. At times, industry news, like a company IPO, can become global news.

Two other attributes are correlated closely with time sensitivity: content length and article accuracy. Highly time sensitive content like breaking news is typically short in length and susceptible to errors and omissions. The most extreme case is a news ticker with only a headline. This makes sense, since there isn’t enough time to fact check an evolving story until more time has elapsed. The recent CNN flub is a good example of this. Compare this to long form reading in magazines that we expect to be well researched and free of factual errors. Stories like the Rolling Stone article on alleged UVA rape culture are expected to pass rigorous editorial standards and suffer a huge backlash when those standards are not met.

Now that we understand where different content types live on the content continuum, we can consider when these content types will be consumed. Individual behavior will always be idiosyncratic, so we want to focus on general behaviors and not outliers. During work, the most appropriate content might be industry and global breaking news but not gossip about friends. However, depending on the job, someone might check friend gossip during a smoke break. Many people zone out over the weekend and do some pleasure reading or read up on a hobby. Workaholics might take that time to read a trade publication. Using the content continuum as a conceptual guide, it becomes easier to model reading behaviors since each content type exhibits its own patterns.

Serialized Timelines

Presumably Facebook’s ranking algorithm takes some form of the content continuum into account, but the serial nature of timelines exacerbates the problem. To understand the trouble with serialized timelines requires a slight detour into the world of electrical engineering. When multiple signals are sent from one place to another, the data are often serialized into a single stream. This process is called multiplexing and takes place in telecommunications and computer networks. At the other end of the system, there is a corresponding demultiplexer that reconstructs each individual signal.

There are all sorts of techniques for doing this and is a hallmark of both wired and cellular communication networks. In fact, this concept was first implemented for telegraphs, so that messages could be sent both ways on a single wire. To the untrained eye, a multiplexed signal looks random. Here’s an example of three sine waves with different frequencies and their sum (dark line). Without knowing the source signals, it would be difficult to extract information from the aggregate signal.

For the innumerate, here’s a textual example to illustrate the same point: “MHATainhrsdey    felhlvaaeemderb cy aeww  halwesiar tsest  ulwtrehhe ia ltttae om M baag,sro  y. s   nw  oe  wn  ,t   ,”. The text looks like gibberish but is actually four consecutive lines of a nursery rhyme. Here we’ve taken one letter from each line and sequenced them serially. Without knowing how the the serial representation is encoded, it’s difficult to extract meaning from the stream of characters.

Now imagine your timeline. Social media suffers from a similar problem because there is no such concept of a demux. Each friend or source is broadcasting their own independent signal and timeline, which is being serialized into a single, personalized feed. Many of the publicized weights for Facebook content is based on how people interact with content as opposed to what the content is. Suppose you’re in the mood for gossip. You might scroll past stories about industry news and just skip to posts by friends and family. At other times you might do the opposite and focus on industry news but not gossip. Now the algorithm is monitoring how you engage with the content and how much time you spend, so when your mood changes, it will be confused and continue to give you content associated with your old mood. Clearly our mood affects the content we want to see (and of course Facebook knows that the content we see can affect our mood). When we explicitly choose different publications, we are mentally demultiplexing content based on purpose. But when the majority of content comes from a single serial source, it becomes harder to do that. In this format the content will be spread across the content continuum as a jumbled mix of photos, friend gossip, special interests, etc. Ignoring the content continuum and cramming all these content types into a single feed seems downright crazy. Imagine if the New York Times did away with topical sections and instead organized articles based on what they thought you would find interesting. Furthermore, imagine that the only way to control the organization is to hide or show different authors!

In short, Facebook’s content ranking algorithm is fundamentally flawed. The solution is not necessarily human edited, but rather a better model of content. At Zato Novo we are addressing these issues to improve the performance of algorithmic content curation via the Me Meme service. By leveraging the content continuum, users gain control over the content they see in a meaningful way.

Follow

Get every new post delivered to your Inbox.

Join 421 other followers