Vicki Boykis Data, tech, and sometimes Nutella

Recsys 2021 Recap

wheat field April, Allan D’Arcangelo

Recsys 2021 was in October. Since I’m now focusing on this area of ML at work, I attended (virtually only, unfortunately, but the online experience was as good as it could be at a hybrid conference, so major hats off to the organizers!) at took around 30 pages of notes.

There was SO much content! In the interest of condensing my learnings before I forget them, here is some of the high-level content that was most interesting to me. If you’re looking for more recaps, Eugene also has a great one.

  • Theme 1: Algorithmic responsibility A big overall conference theme this year, both for Recsys2021 and other ML conferences (such as Instagram’s Workshop on Recommendations at Scale) is the role of recommender systems in surfacing relevant content to users, responsibly. This is something I’ve thought extensively about before, but mostly on my own, so it was great to hear some industry and academia perspectives on what that means holistically. An interesting paper surfaced from the conference was that people prefer algorithmic recommendations to human ones, in specific contexts.

    Cynthia Liem’s keynote on what it means for recommender systems to surface “worthy” content in the context of how people consume classical music was definitely noteworthy. (Video forthcoming.) A good technical deep dive on ways to avoid bias was this repo with lots of resources, and an interesting talk featured work on trying to reverse-engineer YouTube’s recommender to audit whether they were filtering out negative content.

  • Theme 2: Performance evaluation of recsys How do you know whether recommendations offered and consumed by the user are “good”? From a business perspective, we often use metrics like clickthrough rate and other related engagement metrics and even then don’t often agree on what good metrics should be. On the system side, evaluation of the performance of recommender systems also remains a non-standardized concept across industry, with precision being the only key universally recognized metric among both academic and industry papers. Precision is generally thought of as the percent of returned results (aka recommendations) in any given set of items that are actually “good”, for some definition of “good”.

    A recent paper surveying academic papers found that, “in 47% of cases, we cannot easily know how the metric is defined because the definition is not clear or absent.” So we as an industry still have a while to go before we understand how to best evaluate these systems. (For more on metrics, see my post about how all numbers are made up)

  • Theme 3: Online (aka realtime) recommendations Another common theme has improving online recommendations, aka offering refreshed recommendations based on what the user has seen in the past n time, where n < 1 minute. This is an increasingly hot area of machine learning which is also very hard to do, so it’s fun to see how different companies approach it. GrubHub for example, deals with this problem by re-training only a part of its data (1 day versus 4 days) to avoid model drift (aka changes in the underlying model’s features or business environment that make models out of date (for example, user behavior changes due to Covid.) Nvidia presented an interesting way (one of the most talked-about things during the conference) of a way to serve session-based recommendations based on HuggingFace’s Transformers.

  • Theme 4: Comparable Company Architectures: One of the interesting points of the conference was hearing about how other companies are tackling the issues of recommendations. Nike, GrubHub, Netflix, and Peloton all presented some versions of their architectures and reinforced my belief that building these systems is still very new and very hard and we’re all kind of building the staircase as we climb it.