On Obsessive Recommendations

One of the challenges facing video platforms today is how to recommend content to users. As mentioned previously, my household recently switched to streaming in lieu of cable TV. On one of the services I watched a film in Spanish and the next time they were recommending me a bunch of Spanish-language films.

There have been a bunch of problems from YouTube creating rabbit holes of content where the recommendation engine over-recommends specific content.

Consider for a moment what it would be like to be transplanted to a world where everybody is obsessed with powdered wigs. You would constantly be hearing about wig powder, wigs, methods for upkeep. . . It’s tiresome just thinking about it. But instead we live in a world where only the wigfolk are obsessed with wigs. And they are obsessed. They go on video sites and watch nothing but wig videos all the time. The recommendation systems learn they want wigs, wigs, wigs. And the wig-lovers community is big enough that when someone new comes along who loves wigs, the system quickly recommends them a bunch of wig videos and they watch them all.

So your friend is reading about ancient times and happens to have a question about wigs. They go on the video site and find a video that answers the question. But now the friend is getting tons of video recommendations about wigs! Oh no!

Oh, yes. If we train the computers to be obsessed with obsession, they will try to find our obsessions. They will bombard us with every topic someone obsessed over and see if they can obsess us.

One wonders whether David Foster Wallace was right about Infinite Jest, but it will one day be an errant piece of video content that happens to find the nexus of everyone’s obsessions and the video recommendation systems will recommend it over and over so that it’s the only thing anyone can watch. But that’s not the point.

The point is that recommending content is fine, but it should not be obsessive. There has to be a better way.


The other side of the coin is when the obsession is what the user wants. If you have an account dedicated to video game news, you want it to be obsessed with that topic. You don’t want to see anything else when you use that account. That may be where some of the errant training arose in some of the recommendation systems.

The logical tweak to the recommendation systems would be to try to detect the type of account. Assume it’s a non-obsessive account, and only once the user has done enough to signal otherwise should the system switch over to obsessive mode. One would guess this would fix things for a lot of recommendation systems.

To go back to my Spanish film example, it’s likely that people who watch a single Spanish language film will want to watch a bunch of them. But maybe not, if they’re like me and only have a functional grasp on the language. So the second fix would be to bucket recommendations. Have a new grouping that says, “Spanish-language Films” and then let the user make the choice. After they show the preference for that bucket, the system can assume it was correct.

It reminds me of the old joke about a blind food test that got mixed up with a blind toothpaste test, and. . . you guessed it: four out of five dentists recommend Spot’s Dog Food (also, dogs hate toothpaste). Or, to put it the other way, if you put dog food up against bacon and steak, you’ll find out what dogs really want to eat. If the recommendation system only offers the user one type of content, there’s no guarantee that the recommendation system is worth a damn. It’s only when the user can choose the obsession over the myriad alternatives that you know (which, of course, is why faux capitalists always want to limit the competition—if your choice is dog food or more dog food, you’ll choose dog food).

Big Data on Small Computers

US motto, e pluribus unum, on the back of a dime
Shows US motto (e pluribus unum) on the reverse of a US dime.

One of the great emerging fields of computing is the use of big data and machine learning. This is a process whereby large datasets actually teach computers to do things like translate text, interpret human speech, categorize images, and so on. The problem with this is, so far, it requires large amounts of data and a lot of computing power.

The paradigm is largely opposed to the types of computing people would prefer to do and use. We would rather not send our voice data out to the Internet or have the Internet always listening or watching us to get these benefits of machine learning. But while the advances in technology will allow for us to crunch the data on smaller devices, it will be difficult to have the corpus of data needed for training and use.

It remains to be seen whether smaller datasets or synthesized datasets (where a large dataset is somehow compressed or distilled into the important parts) will emerge. So how do we get big data in our relatively small computers?

It is likely that the problem will provoke the emergence of more distributed systems, something many have wanted and waited for. Distributed systems or collaborative computing allows your computer(s) to participate in computing larger datasets. Projects like the Search for Extra-Terrestrial Intelligence (SETI) have used such distributed computing for over a decade.

The main challenge will be finding ways to break up data to send to the distributed system that protect privacy. That is, if you send the whole voice capture to the distributed system (as you do, AFAIK, with cloud services like Apple’s Siri), you risk the same privacy issues as with the cloud model.

Instead, it should be possible to break up inputs (video or audio) and send portions (possibly with some redundancy, depending on e.g., if word breaks can be determined locally) to several systems and let them each return only a partial recognition of the whole.

It also remains to be seen whether this piecemeal approach will be as functional as the whole-system approach in all cases. While this splitting undoubtedly takes place in whole-systems like Siri, the reassembly and final processing surely takes place over the whole input. That final step may not be easily managed over a distributed system, at least not while protecting privacy.

Consider asking, “what is the time in Rome?” which might be processed as slightly off, due to pronunciation, “what is the dime in Rome?” In a whole-system approach it’s likely easier to infer dime → time at some late step, rather than if each hands back a partial result and the final recipient has less knowledge of how it was made. In a question case like that, the final text is likely targeted to a search engine, which will correct (though it could take the question literally and say, “It is the €0.10 coin.”).

For situations where the voice command lends insufficient context for local correction, it could be a greater challenge.

The good news is that it does look like it’s possible for us to have these distributed systems replace proprietary cloud solutions. The questions are when and how they will emerge, and where they might be weaker.