Search | arXiv e-print repository

Beyond Item Dissimilarities: Diversifying by Intent in Recommender Systems

Authors: Yuyan Wang, Cheenar Banerjee, Samer Chucri, Fabio Soldo, Sriraj Badam, Ed H. Chi, Minmin Chen

Abstract: Recommender systems that overly focus on short-term engagement prevents users from exploring diverse interests. To tackle this challenge, numerous diversification algorithms have been proposed. These algorithms typically rely on measures of item similarity, aiming to maximize the dissimilarity across items in the final set of recommendations. In this work, we demonstrate the benefits of going beyo… ▽ More Recommender systems that overly focus on short-term engagement prevents users from exploring diverse interests. To tackle this challenge, numerous diversification algorithms have been proposed. These algorithms typically rely on measures of item similarity, aiming to maximize the dissimilarity across items in the final set of recommendations. In this work, we demonstrate the benefits of going beyond item-level similarities by utilizing higher-level user understanding--specifically, user intents that persist across multiple interactions or recommendation sessions--in diversification. Our approach is motivated by the observation that user behaviors on online platforms are largely driven by their underlying intents. Therefore, final recommendations should ensure that a diverse set of intents is accurately represented. While user intent has primarily been studied in the context of search, it is less clear how to incorporate real-time dynamic intent predictions in recommender systems. To address this gap, we develop a probabilistic intent-based whole-page diversification framework for the final stage of a recommender system. Starting with a prior belief of user intents, the proposed framework sequentially selects items for each position based on these beliefs and subsequently updates posterior beliefs about the intents. This approach ensures that different user intents are represented on a page, towards optimizing long-term user experience. We experiment with the intent diversification framework on YouTube. Live experiments on a diverse set of intents show that our framework increases Daily Active Users and overall user enjoyment, validating its effectiveness in facilitating long-term planning. Specifically, it enables users to consistently discover and engage with diverse content that aligns with their underlying intents over time, leading to an improved long-term user experience. △ Less

Submitted 9 August, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2305.15498 [pdf, other]

Large Language Models for User Interest Journeys

Authors: Konstantina Christakopoulou, Alberto Lalama, Cj Adams, Iris Qu, Yifat Amir, Samer Chucri, Pierce Vollucci, Fabio Soldo, Dina Bseiso, Sarah Scodel, Lucas Dixon, Ed H. Chi, Minmin Chen

Abstract: Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation. Their potential for deeper user understanding and improved personalized user experience on recommendation platforms is, however, largely untapped. This paper aims to address this gap. Recommender systems today capture users' interests through encoding their historical activities on the… ▽ More Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation. Their potential for deeper user understanding and improved personalized user experience on recommendation platforms is, however, largely untapped. This paper aims to address this gap. Recommender systems today capture users' interests through encoding their historical activities on the platforms. The generated user representations are hard to examine or interpret. On the other hand, if we were to ask people about interests they pursue in their life, they might talk about their hobbies, like I just started learning the ukulele, or their relaxation routines, e.g., I like to watch Saturday Night Live, or I want to plant a vertical garden. We argue, and demonstrate through extensive experiments, that LLMs as foundation models can reason through user activities, and describe their interests in nuanced and interesting ways, similar to how a human would. We define interest journeys as the persistent and overarching user interests, in other words, the non-transient ones. These are the interests that we believe will benefit most from the nuanced and personalized descriptions. We introduce a framework in which we first perform personalized extraction of interest journeys, and then summarize the extracted journeys via LLMs, using techniques like few-shot prompting, prompt-tuning and fine-tuning. Together, our results in prompting LLMs to name extracted user journeys in a large-scale industrial platform demonstrate great potential of these models in providing deeper, more interpretable, and controllable user understanding. We believe LLM powered user understanding can be a stepping stone to entirely new user experiences on recommendation platforms that are journey-aware, assistive, and enabling frictionless conversation down the line. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:1006.1165 [pdf, ps, other]

Optimal Source-Based Filtering of Malicious Traffic

Authors: Fabio Soldo, Katerina Argyraki, Athina Markopoulou

Abstract: In this paper, we consider the problem of blocking malicious traffic on the Internet, via source-based filtering. In particular, we consider filtering via access control lists (ACLs): these are already available at the routers today but are a scarce resource because they are stored in the expensive ternary content addressable memory (TCAM). Aggregation (by filtering source prefixes instead of indi… ▽ More In this paper, we consider the problem of blocking malicious traffic on the Internet, via source-based filtering. In particular, we consider filtering via access control lists (ACLs): these are already available at the routers today but are a scarce resource because they are stored in the expensive ternary content addressable memory (TCAM). Aggregation (by filtering source prefixes instead of individual IP addresses) helps reduce the number of filters, but comes also at the cost of blocking legitimate traffic originating from the filtered prefixes. We show how to optimally choose which source prefixes to filter, for a variety of realistic attack scenarios and operators' policies. In each scenario, we design optimal, yet computationally efficient, algorithms. Using logs from Dshield.org, we evaluate the algorithms and demonstrate that they bring significant benefit in practice. △ Less

Submitted 6 June, 2010; originally announced June 2010.

Comments: Conference version appeared in Infocom 2009. Journal version submitted to ToN

arXiv:0908.2007 [pdf, other]

Predictive Blacklisting as an Implicit Recommendation System

Authors: Fabio Soldo, Anh Le, Athina Markopoulou

Abstract: A widely used defense practice against malicious traffic on the Internet is through blacklists: lists of prolific attack sources are compiled and shared. The goal of blacklists is to predict and block future attack sources. Existing blacklisting techniques have focused on the most prolific attack sources and, more recently, on collaborative blacklisting. In this paper, we formulate the problem o… ▽ More A widely used defense practice against malicious traffic on the Internet is through blacklists: lists of prolific attack sources are compiled and shared. The goal of blacklists is to predict and block future attack sources. Existing blacklisting techniques have focused on the most prolific attack sources and, more recently, on collaborative blacklisting. In this paper, we formulate the problem of forecasting attack sources (also referred to as predictive blacklisting) based on shared attack logs as an implicit recommendation system. We compare the performance of existing approaches against the upper bound for prediction, and we demonstrate that there is much room for improvement. Inspired by the recent Netflix competition, we propose a multi-level prediction model that is adjusted and tuned specifically for the attack forecasting problem. Our model captures and combines various factors, namely: attacker-victim history (using time-series) and attackers and/or victims interactions (using neighborhood models). We evaluate our combined method on one month of logs from Dshield.org and demonstrate that it improves significantly the state-of-the-art. △ Less

Submitted 13 August, 2009; originally announced August 2009.

Comments: Comments: 11 pages; Submitted to INFOCOM 2010

arXiv:0811.3828 [pdf, other]

Optimal Filtering of Malicious IP Sources

Authors: Fabio Soldo, Athina Markopoulou, Katerina Argyraki

Abstract: How can we protect the network infrastructure from malicious traffic, such as scanning, malicious code propagation, and distributed denial-of-service (DDoS) attacks? One mechanism for blocking malicious traffic is filtering: access control lists (ACLs) can selectively block traffic based on fields of the IP header. Filters (ACLs) are already available in the routers today but are a scarce resour… ▽ More How can we protect the network infrastructure from malicious traffic, such as scanning, malicious code propagation, and distributed denial-of-service (DDoS) attacks? One mechanism for blocking malicious traffic is filtering: access control lists (ACLs) can selectively block traffic based on fields of the IP header. Filters (ACLs) are already available in the routers today but are a scarce resource because they are stored in the expensive ternary content addressable memory (TCAM). In this paper, we develop, for the first time, a framework for studying filter selection as a resource allocation problem. Within this framework, we study five practical cases of source address/prefix filtering, which correspond to different attack scenarios and operator's policies. We show that filter selection optimization leads to novel variations of the multidimensional knapsack problem and we design optimal, yet computationally efficient, algorithms to solve them. We also evaluate our approach using data from Dshield.org and demonstrate that it brings significant benefits in practice. Our set of algorithms is a building block that can be immediately used by operators and manufacturers to block malicious traffic in a cost-efficient way. △ Less

Submitted 24 November, 2008; originally announced November 2008.

Comments: submitted to Infocom 09

Showing 1–5 of 5 results for author: Soldo, F