A Look Into News Avoidance Through AWRS: An Avoidance-Aware Recommender System

Igor L.R. Azevedo 0000-0001-5144-825X The University of TokyoTokyoJapan [email protected] Toyotaro Suzumura 0000-0001-6412-8386 The University of TokyoTokyoJapan [email protected]  and  Yuichiro Yasui 0000-0002-4175-9318 Nikkei Inc.TokyoJapan [email protected]
Abstract.

In recent years, journalists have expressed concerns about the increasing trend of news article avoidance, especially within specific domains. This issue has been exacerbated by the rise of recommender systems. Our research indicates that recommender systems should consider avoidance as a fundamental factor. We argue that news articles can be characterized by three principal elements: exposure, relevance, and avoidance, all of which are closely interconnected. To address these challenges, we introduce AWRS, an Avoidance-Aware Recommender System. This framework incorporates avoidance awareness when recommending news, based on the premise that news article avoidance conveys significant information about user preferences. Evaluation results on three news datasets in different languages (English, Norwegian, and Japanese) demonstrate that our method outperforms existing approaches.

Recommender Systems, News Modeling, News Avoidance

1. Introduction

Recommender systems are extensively used to present users with options that align closely with their interests. In the news domain, these systems face unique challenges not typically encountered in other areas, such as the importance of timeliness, novelty, and relevance. These factors create a rapidly changing environment where news consumption patterns shift quickly.

Traditional methods relying solely on user click history and basic profiling have given way to sophisticated techniques that delve deeper into understanding user preferences and behavioral patterns. For instance, the NRMS model (Wu et al., 2019d) uses multi-head self-attention to learn news representations from news titles by modeling the interactions between words. In the user encoder, they learn representations of users from their browsed news and use multi-head self-attention to capture the relatedness between the news. Moreover, the NAML (Wu et al., 2019b) model proposes a neural news recommendation approach which can learn informative representations of users and news by exploiting different kinds of news information such as titles, bodies, and topic categories. Additionally, the LSTUR model (An et al., 2019) learns representations of news from their titles and topic categories and uses an attention network to select important words. In the user encoder, they propose to learn long-term user representations from the embeddings of their IDs.

More recently, other models have tried to increase their performance by focusing on different paradigms. For instance, the GLORY model (Yang et al., 2023) integrates global information with local user interactions to optimize content personalization. Moreover, LANCER (Bae et al., 2023) incorporates the concept of news lifetime to strategically enhance the negative sample space by perceiving that news articles have a finite influence period. Addressing challenges such as the cold-start problem and popularity bias, the PP-Rec model (Qi et al., 2021) utilizes news popularity metrics to enhance recommendation accuracy. In their method, the ranking score for recommending a candidate news to a target user is the combination of a personalized matching score and a news popularity score. Meanwhile, the CAUM model (Qi et al., 2022) leverages candidate-aware self-attention networks to capture global user interests based on candidate news items. They propose a candidate-aware CNN network to incorporate candidate news into local behavior context modeling and learn candidate-aware short-term user interest. Moreover, MANNeR (Iana et al., 2024), a modular framework for flexible multi-aspect (neural) news recommendation that supports ad-hoc customization over individual aspects at inference time, focuses on balancing recommendation performance with diversity across various metrics.

However, one factor that has become increasingly common in today’s digital landscape is the phenomenon of news avoidance. This behavior reflects a deliberate rejection or unintentional neglect of traditional news consumption, often influenced by a preference for alternative media sources (Schrøder, 2019; Villi et al., 2022; Fitzpatrick, 2022). This trend, which can be temporary or selective, underscores a nuanced understanding of how consumers interact with news, shaped by their interests and a growing skepticism towards certain topics. As the demand rises for news that aligns with personal values, especially in a competitive and diverse media landscape, the concept of relevance becomes paramount. Challenges exacerbated by the pandemic, such as news fatigue and distrust in mainstream media, emphasize the importance of recommender systems capable of recognizing and integrating avoidance behaviors.

Despite the aforementioned advancements, the integration of news avoidance behaviors into recommender systems has been overlooked, at least to the best of our knowledge. By engaging domain experts such as journalists and addressing their concerns (Schrøder, 2019; Villi et al., 2022; Fitzpatrick, 2022; Heitz et al., 2022), we explored how to perceive and integrate avoidance strategies within these news recommendation systems. This approach not only enhances the relevance and effectiveness of news recommendations but also acknowledges the evolving dynamics of user engagement with news content in today’s digital landscape.

In summary, our contributions are as follows: (1) Our model presents a novel concept, and to the best of our knowledge, we are the first to explore the perspective of avoidance in recommender systems. (2) We introduce the AWRS framework (Avoidance-Aware Recommender System), which incorporates avoidance awareness, including time and relevance modules, to improve performance in user matching recommendations. (3) Extensive experiments on three diverse real-world datasets show that AWRS consistently delivers superior performance across a wide range of metrics.

2. Related Work

2.1. Personalized News Recommender Systems

Neural content-based models have become the leading method for personalized news recommendation, surpassing traditional systems that relied on manual feature engineering (Wu et al., 2023). These Neural News Recommender (NNR) models typically include a news encoder that transforms input features into embeddings and a user encoder that creates user-level representations from clicked news (Wu et al., 2023, 2019d, 2019c; Okura et al., 2017; An et al., 2019; Wu et al., 2022b). The recommendation score is computed by comparing the candidate news embedding against the user embedding (Wang et al., 2018; Wu et al., 2019b). These models are trained using point-wise classification objectives with negative sampling (Huang et al., 2013; Wu et al., 2019d).

Personalized NNR systems focus on matching news content to user preferences, often limiting exposure to diverse viewpoints and creating ”filter bubbles” (Iana et al., 2024; Heitz et al., 2022; Pariser, 2011; Li and Wang, 2019). This leads to homogeneous news consumption, reinforcing users’ initial stances. To address these issues, researchers have explored various methods to diversify recommendations. One approach involves re-ranking personalized recommendation sets (Gharahighehi and Vens, 2021) to increase diversity. Another approach involves multi-task training of NNRs (Moreira et al., 2019; Qi et al., 2022; Wu et al., 2022a; Choi et al., 2022), where the primary personalization objective is coupled with auxiliary objectives that promote aspect-based diversification.

Recent advancements in news recommenders aim to enhance content-based personalization by considering additional aspects: (1) Sentiment - SentiRec (Wu et al., 2020b) recommends news with diverse sentiments. (2) Popularity - PP-Rec (Qi et al., 2021) incorporates news popularity to address cold-start and diversity issues. (3) Global Representations - GLORY (Yang et al., 2023) uses a global news graph and gated graph neural networks to enrich news representations. (4) Recency - LANCER (Bae et al., 2023) introduces the concept of lifetime to improve news recommendation models.

2.2. Open Possibilities and Current Limitations

As previously mentioned, news avoidance has become a growing concern for news platforms, particularly those utilizing recommender systems due to the creation of ”filter bubbles” (Pariser, 2011). We decided to investigate how avoidance affects the behavior of news article clicks. Subsequent sections will illustrate the intrinsic relationship between avoidance, exposure, and the number of clicks a news article receives. This aspect, overlooked in prior studies to the best of our knowledge, can enhance news recommendations by providing insights into the interaction between exposure and avoidance, especially regarding popular items.

In the news domain, the item with the most clicks is often highly exposed and less avoided. We argue that news article avoidance conveys significant information about user preferences. For instance, during an election, election-related articles may be popular but often avoided by those fatigued by political news. If a user reads a political article that is largely avoided by others, it suggests a strong interest in that topic. Conversely, clicking on a popular football article, one that is little avoided, might indicate less about the user’s preference and more about its current popularity. In other words, largely avoided news articles provide more information about user tastes than less avoided ones. This understanding allows us to incorporate such contextual information into recommender systems, improving the understanding of user preferences. The following sections will formally define each of the concepts discussed above.

3. Principal Elements

We identified three principal elements for characterizing a news article: exposure, avoidance, and relevance. To explain these concepts please refer to the Figure 1. In the figure, the black dashed line represents the behaviors dataset, listing news articles (impressions) at each time point t𝑡titalic_t. For simplicity, assume each positive news candidate (clicked article) has a fixed number of negative candidates. The off-white square labeled ”exposed news articles” shows all news articles viewed by users at time t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Imagine a news platform where at any time t𝑡titalic_t, a certain number of news articles 𝒩t={n1,n2,,nk}subscript𝒩𝑡subscript𝑛1subscript𝑛2subscript𝑛𝑘\mathcal{N}_{t}=\{n_{1},n_{2},...,n_{k}\}caligraphic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } and users 𝒰t={u1,u2,,uj}subscript𝒰𝑡subscript𝑢1subscript𝑢2subscript𝑢𝑗\mathcal{U}_{t}=\{u_{1},u_{2},...,u_{j}\}caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } interact. When user u1subscript𝑢1u_{1}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT views article n1subscript𝑛1n_{1}italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, they are ”exposed” to it and can choose to click or not. The box contains all k𝑘kitalic_k articles viewed by all j𝑗jitalic_j users at time t𝑡titalic_t.

Refer to caption
Figure 1. A schematic for the explanation of avoidance and exposure.

The light pink rectangle, called the ”impression list,” represents all news articles viewed by user u𝑢uitalic_u, both positive and negative. An impression log records the articles displayed to a user at a specific time and their click behaviors (Wu et al., 2020a). The set of impressions at time t𝑡titalic_t is termed exposure. Exposure is contingent upon the publication date (time) of news articles before they can be displayed to users. Therefore, at a specific time t𝑡titalic_t, certain news articles may be exposed while others are not, simply because they have not yet been published. This concept is illustrated by the arrow at the bottom of the figure, depicting how the impression lists change over time. Finally, we use dark blue and dark pink to indicate whether articles have been clicked or not, respectively.

3.1. Exposure

Definition 3.1 (Number of Exposures - nE(n,t)subscriptnEntn_{E}(n,t)italic_n start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_n , italic_t )).

The total number of times a specific news article n𝑛nitalic_n is presented to users on the platform up to a defined time t𝑡titalic_t. This metric quantifies all instances where the article n𝑛nitalic_n was displayed to active users, thus providing opportunities for being clicked.

Definition 3.2 (Number of Impressions - nI(t)subscriptnItn_{I}(t)italic_n start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_t )).

An Impression is defined as an individual data record in a news dataset in a given time t𝑡titalic_t.

Definition 3.3 (Exposure Per Impression - EPI(n,t)EPIntEPI(n,t)italic_E italic_P italic_I ( italic_n , italic_t )).

The Exposure Per Impression (EPI𝐸𝑃𝐼EPIitalic_E italic_P italic_I) for a news article n𝑛nitalic_n at time t𝑡titalic_t is calculated as follows:

(1) EPI(n,t)=nE(n,t)nI(t)𝐸𝑃𝐼𝑛𝑡subscript𝑛𝐸𝑛𝑡subscript𝑛𝐼𝑡EPI(n,t)=\frac{n_{E}(n,t)}{n_{I}(t)}italic_E italic_P italic_I ( italic_n , italic_t ) = divide start_ARG italic_n start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_n , italic_t ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_t ) end_ARG

This ratio provides a relative measure of how extensively a news article n𝑛nitalic_n has been exposed in relation to the total number of impressions recorded.

Example: At time t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we know that we have 100 impression lists, and the news article n174subscript𝑛174n_{174}italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT has been displayed in 50 of them. Thus, we have nE(n174,t1)=50subscript𝑛𝐸subscript𝑛174subscript𝑡150n_{E}(n_{174},t_{1})=50italic_n start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 50 and nI(t1)=100subscript𝑛𝐼subscript𝑡1100n_{I}(t_{1})=100italic_n start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 100. Hence, the exposure per impression of this news article n174subscript𝑛174n_{174}italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT is EPI(n174,t1)=50100=0.5𝐸𝑃𝐼subscript𝑛174subscript𝑡1501000.5EPI(n_{174},t_{1})=\frac{50}{100}=0.5italic_E italic_P italic_I ( italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = divide start_ARG 50 end_ARG start_ARG 100 end_ARG = 0.5.

Refer to caption
Figure 2. Av(n,t) vs. EPI(n,t)𝐴𝑣𝑛𝑡 vs. 𝐸𝑃𝐼𝑛𝑡Av(n,t)\text{ vs. }EPI(n,t)italic_A italic_v ( italic_n , italic_t ) vs. italic_E italic_P italic_I ( italic_n , italic_t ) for MIND-small.

3.2. Avoidance

Avoidance reflects the deliberate non-engagement with specific news content by readers. This phenomenon may arise from various factors, such as a lack of interest in the topic, discomfort with the content, or distrust towards the source. Understanding the patterns of avoidance is crucial for analyzing user behavior and optimizing content delivery to enhance reader engagement.

Definition 3.4 (Avoidance - Av(n,t)AvntAv(n,t)italic_A italic_v ( italic_n , italic_t )).

Avoidance for a news article n𝑛nitalic_n over a timeframe t𝑡titalic_t quantifies the proportion of potential exposures that did not result in interaction. This is mathematically defined as follows:

(2) Av(n,t)=1nclk(n,t)nE(n,t)𝐴𝑣𝑛𝑡1subscript𝑛𝑐𝑙𝑘𝑛𝑡subscript𝑛𝐸𝑛𝑡Av(n,t)=1-\frac{n_{clk}(n,t)}{n_{E}(n,t)}italic_A italic_v ( italic_n , italic_t ) = 1 - divide start_ARG italic_n start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT ( italic_n , italic_t ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_n , italic_t ) end_ARG

here, nclk(n,t)subscript𝑛𝑐𝑙𝑘𝑛𝑡n_{clk}(n,t)italic_n start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT ( italic_n , italic_t ) represents the number of clicks received by the article n𝑛nitalic_n up to the specified time t𝑡titalic_t, and nE(n,t)subscript𝑛𝐸𝑛𝑡n_{E}(n,t)italic_n start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_n , italic_t ) denotes the total number of exposures of the article n𝑛nitalic_n within the same timeframe. The value of Av(n,t)𝐴𝑣𝑛𝑡Av(n,t)italic_A italic_v ( italic_n , italic_t ) ranges from 0 to 1, where 1 indicates total avoidance (no clicks relative to exposures) and 0 signifies complete engagement (every exposure resulted in a click).

Example: Given that the news article n174subscript𝑛174n_{174}italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT at time t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT received 20 clicks out of 50 exposures, its avoidance is calculated as Av(n174,t1)=1nclk(n174,t1)nE(n174,t1)=12050=0.6𝐴𝑣subscript𝑛174subscript𝑡11subscript𝑛𝑐𝑙𝑘subscript𝑛174subscript𝑡1subscript𝑛𝐸subscript𝑛174subscript𝑡1120500.6Av(n_{174},t_{1})=1-\frac{n_{clk}(n_{174},t_{1})}{n_{E}(n_{174},t_{1})}=1-% \frac{20}{50}=0.6italic_A italic_v ( italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 1 - divide start_ARG italic_n start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_n start_POSTSUBSCRIPT 174 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG = 1 - divide start_ARG 20 end_ARG start_ARG 50 end_ARG = 0.6.

3.2.1. Visualization of Avoidance

We analyzed the correlation among article avoidance, exposure, and the number of clicks for news articles. We plotted Av(n,t)𝐴𝑣𝑛𝑡Av(n,t)italic_A italic_v ( italic_n , italic_t ) versus EPI(n,t)𝐸𝑃𝐼𝑛𝑡EPI(n,t)italic_E italic_P italic_I ( italic_n , italic_t ) for the MIND-small dataset on November 9, 2019, at 16:00 and 22:00, as shown in Figure 2. Additionally, we incorporated a color scale to indicate the normalized number of clicks and represented the radius of each news article based on its number of clicks. The graph demonstrates that articles maintaining a balanced level of exposure and low avoidance tend to attract more clicks over time. Articles that show an increase in EPI𝐸𝑃𝐼EPIitalic_E italic_P italic_I and a decrease in Av𝐴𝑣Avitalic_A italic_v (as indicated by the red arrow) receive the highest number of clicks. For example, article N41881 maintains a stable balance between EPI𝐸𝑃𝐼EPIitalic_E italic_P italic_I and Av𝐴𝑣Avitalic_A italic_v from 16:00 to 22:00, making it the most clicked news article. In contrast, article N1034 shows a shift in its EPI𝐸𝑃𝐼EPIitalic_E italic_P italic_I and Av𝐴𝑣Avitalic_A italic_v balance during the same time window. It demonstrates an upward trend in exposure (EPIabsent𝐸𝑃𝐼\uparrow EPI↑ italic_E italic_P italic_I) and an increase in avoidance (Avabsent𝐴𝑣\rightarrow Av→ italic_A italic_v) over time, impacting its click rate, which indicates a gradual increase in clicks (larger circle radius). This analysis underscores how avoidance and exposure dynamics influence click rates, offering insights to enhance recommender systems.

Refer to caption
(a) MIND-small - English
Refer to caption
(b) Adressa one-week - Norwegian
Refer to caption
(c) Nikkei - Japanese
Figure 3. The graph for Av(n,t)𝐴𝑣𝑛𝑡Av(n,t)italic_A italic_v ( italic_n , italic_t ) vs EPI(n,t)𝐸𝑃𝐼𝑛𝑡EPI(n,t)italic_E italic_P italic_I ( italic_n , italic_t ) is computed for the MIND-small (a), Adressa one-week (b), and Nikkei (c) datasets using D=5𝐷5D=5italic_D = 5, resulting in 25 distinct regions.

3.3. Relevance

Relevance indicates how well a news article aligns with readers’ interests, but measuring it is complex and requires specific data. Key factors in measuring relevance include: (1) time since publication, (2) news text embedding, (3) exposure per impression EPI(n,t)𝐸𝑃𝐼𝑛𝑡EPI(n,t)italic_E italic_P italic_I ( italic_n , italic_t ), (4) avoidance Av(n,t)𝐴𝑣𝑛𝑡Av(n,t)italic_A italic_v ( italic_n , italic_t ), and (5) number of clicks clk(n,t)𝑐𝑙𝑘𝑛𝑡clk(n,t)italic_c italic_l italic_k ( italic_n , italic_t ). Each factor is weighted within the model to evaluate relevance, offering a robust framework for assessing the alignment of a news article with user interests. More details on the calculation of relevance can be found in Section 5.3.

4. User Engagement Embedding

To measure the impact of avoidance and exposure on user interaction with a news article, we calculate a metric called user engagement (𝒖𝒆𝒖𝒆\bm{ue}bold_italic_u bold_italic_e)—how users interact with a specific news article based on its avoidance and exposure per impression. We divide our graph of Av(n,t) vs. EPI(n,t)𝐴𝑣𝑛𝑡 vs. 𝐸𝑃𝐼𝑛𝑡Av(n,t)\text{ vs. }EPI(n,t)italic_A italic_v ( italic_n , italic_t ) vs. italic_E italic_P italic_I ( italic_n , italic_t ) into D×D𝐷𝐷D\times Ditalic_D × italic_D distinct regions. Depending on the values of Av(n,t)𝐴𝑣𝑛𝑡Av(n,t)italic_A italic_v ( italic_n , italic_t ) and EPI(n,t)𝐸𝑃𝐼𝑛𝑡EPI(n,t)italic_E italic_P italic_I ( italic_n , italic_t ), a news article falls into different regions. Formally, the user engagement indices are computed as:

(3) iue=Depiidx+avidxsubscript𝑖ue𝐷𝑒𝑝subscript𝑖idx𝑎subscript𝑣idxi_{\text{ue}}=D\cdot epi_{\text{idx}}+av_{\text{idx}}italic_i start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = italic_D ⋅ italic_e italic_p italic_i start_POSTSUBSCRIPT idx end_POSTSUBSCRIPT + italic_a italic_v start_POSTSUBSCRIPT idx end_POSTSUBSCRIPT

where epiidx𝑒𝑝subscript𝑖idxepi_{\text{idx}}italic_e italic_p italic_i start_POSTSUBSCRIPT idx end_POSTSUBSCRIPT is the vector of exposure per impression indices for historical clicks, avidx𝑎subscript𝑣idxav_{\text{idx}}italic_a italic_v start_POSTSUBSCRIPT idx end_POSTSUBSCRIPT is the vector of avoidance indices for historical clicks, and D𝐷Ditalic_D is a constant representing the number of distinct values. The user engagement embedding is then obtained using an embedding layer:

(4) 𝒖𝒆=𝐖ue(iue)𝒖𝒆subscript𝐖uesubscript𝑖ue\bm{ue}=\mathbf{W}_{\text{ue}}(i_{\text{ue}})bold_italic_u bold_italic_e = bold_W start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT ( italic_i start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT )

where 𝐖uesubscript𝐖ue\mathbf{W}_{\text{ue}}bold_W start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT is an embedding layer that maps each index to an embedding vector, and iuesubscript𝑖uei_{\text{ue}}italic_i start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT are the computed indices from the above equation 3. The user engagement embedding (𝒖𝒆𝒖𝒆\bm{ue}bold_italic_u bold_italic_e) captures the spatial representation based on the quantized values of avoidance and exposure per impression. This embedding is used as input for our models to capture the information conveyed when a specific news article is largely avoided by many users but still clicked by a particular user. The concept behind the user engagement embedding (𝐮𝐞𝐮𝐞\mathbf{ue}bold_ue) is to integrate avoidance-awareness into recommender models. Figure 3 illustrates this division for D=5𝐷5D=5italic_D = 5, resulting in D×D=25𝐷𝐷25D\times D=25italic_D × italic_D = 25 distinct regions.

5. Methodology

As highlighted in section 2.1, different models focus on various aspects of news recommendations. We introduce a new aspect: Avoidance. Our method, AWRS, improves recommendations by incorporating avoidance-awareness. Unlike traditional methods based only on clicks and popularity, AWRS integrates avoidance and relevance. The conceptual framework is shown in Figure 4.

5.1. Problem Formulation

Given a user u𝑢uitalic_u and a candidate news article ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, our goal is to compute an interest score Ints𝐼𝑛subscript𝑡𝑠Int_{s}italic_I italic_n italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT that quantifies user u𝑢uitalic_u’s potential engagement with ncsubscript𝑛𝑐n_{c}italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. We evaluate a collection of candidate news articles 𝐍c=[nc1,nc2,,ncL]subscript𝐍𝑐subscript𝑛𝑐1subscript𝑛𝑐2subscript𝑛𝑐𝐿\mathbf{N}_{c}=[n_{c1},n_{c2},\ldots,n_{cL}]bold_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = [ italic_n start_POSTSUBSCRIPT italic_c 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_c 2 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT italic_c italic_L end_POSTSUBSCRIPT ] and recommend the highest-ranking articles to user u𝑢uitalic_u. User u𝑢uitalic_u has a history of clicked news articles 𝐇u=[h1,h2,,hM]subscript𝐇𝑢subscript1subscript2subscript𝑀\mathbf{H}_{u}=[h_{1},h_{2},\ldots,h_{M}]bold_H start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = [ italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_h start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ]. Each news article is characterized by its title (𝒯𝒯\mathcal{T}caligraphic_T), abstract (𝒜𝒜\mathcal{A}caligraphic_A), category (𝒞catsubscript𝒞𝑐𝑎𝑡\mathcal{C}_{cat}caligraphic_C start_POSTSUBSCRIPT italic_c italic_a italic_t end_POSTSUBSCRIPT), and associated entities (𝐄i=[e1,e2,,ek]subscript𝐄𝑖subscript𝑒1subscript𝑒2subscript𝑒𝑘\mathbf{E}_{i}=[e_{1},e_{2},\ldots,e_{k}]bold_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ]).

Refer to caption
Figure 4. Overview of our proposed approach for AWRS.

5.2. Document Encoder

Our news encoder, inspired by (Qi et al., 2021, 2022; Wu et al., 2019b), extracts detailed information from text and entities within a news article using a multi-head attention architecture. It incorporates embeddings from the title, category, and entities, generating embeddings with Pretrained Language Models (PLMs) following (Iana et al., 2024). We used RoBERTa Base (Liu et al., 2019) for the MIND-small dataset, NB-BERT Base (Kummervold et al., 2021) for the Adressa one-week dataset, and a finetuned version of the DeBERTa model (He et al., 2021) called Japanese DeBERTa V2111https://huggingface.co/ku-nlp/deberta-v2-base-japanese. For all models, only the last four PLM layers were fine-tuned. The final news embedding n combines ntsubscriptn𝑡\textbf{n}_{t}n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (title), ncsubscriptn𝑐\textbf{n}_{c}n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT (category), and nesubscriptn𝑒\textbf{n}_{e}n start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT (entity). For the Nikkei and Adressa one-week datasets, only title and category information were used.

5.3. Avoidance-aware Relevance Predictor

Refer to caption
Figure 5. (a) Avoidance-aware Relevance Predictor and (b) Avoidance-aware User Encoder Schematics.

As discussed in Section 3.3, measuring relevance at a given time t𝑡titalic_t involves considering several factors: the content of the article, the time elapsed since publication, avoidance and exposure per impression values, and the number of clicks. Therefore, the Avoidance-aware Relevance Predictor, illustrated in Figure 5 (a), evaluates the relevance of each previously clicked news article n𝑛nitalic_n. This assessment incorporates the time elapsed, encoded using the Time2Vec (Kazemi et al., 2019) model, resulting in the time elapsed embedding 𝒕elsubscript𝒕𝑒𝑙\bm{t}_{el}bold_italic_t start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT. The news embedding 𝒏𝒏\bm{n}bold_italic_n, obtained from the news encoder described in Section 5.2, along with the number of clicks at time t𝑡titalic_t denoted as nclk(n,t)subscript𝑛𝑐𝑙𝑘𝑛𝑡n_{clk}(n,t)italic_n start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT ( italic_n , italic_t ), and the impact of avoidance and exposure per impression are integrated into the user engagement embedding 𝒖𝒆𝒖𝒆\bm{ue}bold_italic_u bold_italic_e, as detailed in Section 4. Consequently, the avoidance-aware relevance scores (rawsubscriptr𝑎𝑤\textbf{r}_{aw}r start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT) for each news article are computed as follows:

(5) raw=σ(nclk(n,t)wctr+r^wr^)subscriptr𝑎𝑤𝜎subscript𝑛clk𝑛𝑡subscript𝑤ctr^𝑟subscript𝑤^𝑟\textbf{r}_{aw}=\sigma\left(n_{\textit{clk}}(n,t)\cdot w_{\text{ctr}}+\hat{r}% \cdot w_{\hat{r}}\right)r start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT = italic_σ ( italic_n start_POSTSUBSCRIPT clk end_POSTSUBSCRIPT ( italic_n , italic_t ) ⋅ italic_w start_POSTSUBSCRIPT ctr end_POSTSUBSCRIPT + over^ start_ARG italic_r end_ARG ⋅ italic_w start_POSTSUBSCRIPT over^ start_ARG italic_r end_ARG end_POSTSUBSCRIPT )

here, r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG is the weighted sum of the relevance scores influenced by the news article conveyed information, defined as

(6) r^=Wr^ic+(1W)r^tue^𝑟𝑊subscript^𝑟ic1𝑊subscript^𝑟tue\hat{r}=W\cdot\hat{r}_{\text{ic}}+(1-W)\cdot\hat{r}_{\text{tue}}over^ start_ARG italic_r end_ARG = italic_W ⋅ over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT ic end_POSTSUBSCRIPT + ( 1 - italic_W ) ⋅ over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT tue end_POSTSUBSCRIPT

here the weight W𝑊Witalic_W is computed as W=σ(Ψ1([n,𝒖𝒆,𝒕el]))𝑊𝜎subscriptΨ1n𝒖𝒆subscript𝒕𝑒𝑙W=\sigma(\Psi_{1}([\textbf{n},\bm{ue},\bm{t}_{el}]))italic_W = italic_σ ( roman_Ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ n , bold_italic_u bold_italic_e , bold_italic_t start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT ] ) ), where [,,][\cdot,\cdot,\cdot][ ⋅ , ⋅ , ⋅ ] denotes vector concatenation, σ𝜎\sigmaitalic_σ is the sigmoid function, and Ψ1subscriptΨ1\Psi_{1}roman_Ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is a dense layer. The term r^icsubscript^𝑟ic\hat{r}_{\text{ic}}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT ic end_POSTSUBSCRIPT represents the relevance score influenced by the news article embedding (n):

(7) r^ic=Ψ2(n),r^tue=Ψ3([𝒖𝒆,tel])formulae-sequencesubscript^𝑟icsubscriptΨ2nsubscript^𝑟tuesubscriptΨ3𝒖𝒆subscriptt𝑒𝑙\hat{r}_{\text{ic}}=\Psi_{2}(\textbf{n}),\quad\hat{r}_{\text{tue}}=\Psi_{3}([% \bm{ue},\textbf{t}_{el}])over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT ic end_POSTSUBSCRIPT = roman_Ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( n ) , over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT tue end_POSTSUBSCRIPT = roman_Ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( [ bold_italic_u bold_italic_e , t start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT ] )

here the term r^tuesubscript^𝑟tue\hat{r}_{\text{tue}}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT tue end_POSTSUBSCRIPT represents the relevance score influenced by the combination of the user engagement embedding (𝒖𝒆𝒖𝒆\bm{ue}bold_italic_u bold_italic_e) and the time elapsed embedding (𝒕elsubscript𝒕𝑒𝑙\bm{t}_{el}bold_italic_t start_POSTSUBSCRIPT italic_e italic_l end_POSTSUBSCRIPT). Finally, the overall relevance score (𝒓awsubscript𝒓𝑎𝑤\bm{r}_{aw}bold_italic_r start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT) is obtained by considering both the number of clicks up to time t𝑡titalic_t (nclk(n,t)subscript𝑛𝑐𝑙𝑘𝑛𝑡n_{clk}(n,t)italic_n start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT ( italic_n , italic_t )) and the computed relevance score (r^^𝑟\hat{r}over^ start_ARG italic_r end_ARG), adjusted by their respective weights (wnclksubscript𝑤subscript𝑛𝑐𝑙𝑘w_{n_{clk}}italic_w start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT and wr^subscript𝑤^𝑟w_{\hat{r}}italic_w start_POSTSUBSCRIPT over^ start_ARG italic_r end_ARG end_POSTSUBSCRIPT) and passed through a sigmoid activation function (σ𝜎\sigmaitalic_σ) for normalization, as indicated by Equation 5.

5.4. Avoidance-aware User Encoder

The core idea of AWRS is to integrate avoidance-awareness. Leveraging the CAUM model’s architecture (Qi et al., 2022), which facilitates the extraction of how closely candidate news articles relate to previously clicked ones, AWRS enhances the candidate-aware attention network with user engagement embeddings. This enhancement refines the comprehension of user preferences by examining the categories of news articles users interact with, as well as their levels of exposure and avoidance. This approach yields a nuanced depiction of user interests.

5.4.1. Historical and Candidate News Vector Concatenation

For each historical and candidate news item, we enhance its representation by appending the corresponding user engagement embeddings. This generates enriched feature vectors for a more tailored recommendation approach. Given a set 𝐍csubscript𝐍𝑐\mathbf{N}_{c}bold_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT of candidate news articles and a set 𝐇usubscript𝐇𝑢\mathbf{H}_{u}bold_H start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT of historical clicks for a user u𝑢uitalic_u, we concatenate the user engagement embedding 𝒖𝒆𝒖𝒆\bm{ue}bold_italic_u bold_italic_e for each item. Let \mathscr{H}script_H and 𝒩𝒩\mathscr{N}script_N be the sets with concatenated user engagement embeddings. Formally, for each i𝑖iitalic_i-th item 𝒉i=[hui,𝒖𝒆hi]subscript𝒉𝑖subscripthsubscript𝑢𝑖𝒖subscript𝒆subscript𝑖\bm{h}_{i}=[\textbf{h}_{u_{i}},\bm{ue}_{h_{i}}]bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ h start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_u bold_italic_e start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] and 𝒏i=[𝐧ci,𝒖𝒆ci]subscript𝒏𝑖subscript𝐧subscript𝑐𝑖𝒖subscript𝒆subscript𝑐𝑖\bm{n}_{i}=[\mathbf{n}_{c_{i}},\bm{ue}_{c_{i}}]bold_italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ bold_n start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_u bold_italic_e start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ]. Where 𝒖𝒆hi𝒖subscript𝒆subscript𝑖\bm{ue}_{h_{i}}bold_italic_u bold_italic_e start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝒖𝒆ci𝒖subscript𝒆subscript𝑐𝑖\bm{ue}_{c_{i}}bold_italic_u bold_italic_e start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT are the user engagement embeddings for the i𝑖iitalic_i-th historical and candidate news items, respectively, and [,][\cdot,\cdot][ ⋅ , ⋅ ] denotes vector concatenation. Thus, we have the sets:

(8) =[𝒉1,,𝒉M],𝒩=[𝒏c1,,𝒏cL]formulae-sequencesubscript𝒉1subscript𝒉𝑀𝒩subscript𝒏𝑐1subscript𝒏𝑐𝐿\mathscr{H}=[\bm{h}_{1},...,\bm{h}_{M}],\quad\mathscr{N}=[\bm{n}_{c1},...,\bm{% n}_{cL}]script_H = [ bold_italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_h start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ] , script_N = [ bold_italic_n start_POSTSUBSCRIPT italic_c 1 end_POSTSUBSCRIPT , … , bold_italic_n start_POSTSUBSCRIPT italic_c italic_L end_POSTSUBSCRIPT ]

5.4.2. Avoidance-aware Self-Attention Layer

Given the history of clicked news articles uMsuperscriptsubscript𝑢𝑀\mathscr{H}_{u}^{M}script_H start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT, where M𝑀Mitalic_M represents the total number of articles previously clicked by the user u𝑢uitalic_u, we use multiple self-attention heads to assess similarities and extract relatedness information between the i𝑖iitalic_i-th and j𝑗jitalic_j-th historical clicks, hisubscripth𝑖\textbf{h}_{i}h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and hjsubscripth𝑗\textbf{h}_{j}h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, respectively:

(9) r^i,jk=qiTWkr𝒉j,qiT=Qu𝒉iformulae-sequencesubscriptsuperscript^𝑟𝑘𝑖𝑗subscriptsuperscriptq𝑇𝑖subscriptsuperscriptW𝑟𝑘subscript𝒉𝑗subscriptsuperscriptq𝑇𝑖subscriptQ𝑢subscript𝒉𝑖\hat{r}^{k}_{i,j}=\textbf{q}^{T}_{i}\textbf{W}^{r}_{k}\bm{h}_{j},\quad\textbf{% q}^{T}_{i}=\textbf{Q}_{u}\bm{h}_{i}over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT W start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = Q start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

here, r^i,jksubscriptsuperscript^𝑟𝑘𝑖𝑗\hat{r}^{k}_{i,j}over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is the attention score from the k𝑘kitalic_k-th head, with QusubscriptQ𝑢\textbf{Q}_{u}Q start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT as the projection matrix and WkrsubscriptsuperscriptW𝑟𝑘\textbf{W}^{r}_{k}W start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as the parameters of the k𝑘kitalic_k-th attention head. We adaptively select significant long-range relatedness metrics to model user interest in the candidate news 𝒩𝒩\mathscr{N}script_N based on their contextual relevance

(10) ri,jk=r^i,jk+qcTWkr𝒉jsubscriptsuperscript𝑟𝑘𝑖𝑗subscriptsuperscript^𝑟𝑘𝑖𝑗subscriptsuperscriptq𝑇𝑐subscriptsuperscriptW𝑟𝑘subscript𝒉𝑗r^{k}_{i,j}=\hat{r}^{k}_{i,j}+\textbf{q}^{T}_{c}\textbf{W}^{r}_{k}\bm{h}_{j}italic_r start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT W start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT

where qcT=Qc𝒩subscriptsuperscriptq𝑇𝑐subscriptQ𝑐𝒩\textbf{q}^{T}_{c}=\textbf{Q}_{c}\mathscr{N}q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = Q start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT script_N. This enhances the attention score with candidate-specific adjustments. The augmented representation liksubscriptsuperscriptl𝑘𝑖\textbf{l}^{k}_{i}l start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each click is formulated through attention weights:

(11) lik=Wokj=1Nγjk𝒉j,γjk=exp(ri,jk)p=1Nexp(ri,pk)formulae-sequencesubscriptsuperscriptl𝑘𝑖subscriptsuperscriptW𝑘𝑜superscriptsubscript𝑗1𝑁subscriptsuperscript𝛾𝑘𝑗subscript𝒉𝑗subscriptsuperscript𝛾𝑘𝑗subscriptsuperscript𝑟𝑘𝑖𝑗superscriptsubscript𝑝1𝑁subscriptsuperscript𝑟𝑘𝑖𝑝\textbf{l}^{k}_{i}=\textbf{W}^{k}_{o}\sum_{j=1}^{N}\gamma^{k}_{j}\bm{h}_{j},% \quad\gamma^{k}_{j}=\frac{\exp(r^{k}_{i,j})}{\sum_{p=1}^{N}\exp(r^{k}_{i,p})}l start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_γ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG roman_exp ( italic_r start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_exp ( italic_r start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_p end_POSTSUBSCRIPT ) end_ARG

WoksubscriptsuperscriptW𝑘𝑜\textbf{W}^{k}_{o}W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT is the projection matrix for the k𝑘kitalic_k-th head. The comprehensive contextual representation lisubscriptl𝑖\textbf{l}_{i}l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each click is derived by merging outputs from all K𝐾Kitalic_K attention heads. Note that by incorporating user engagement embeddings, our relatedness information now considers the influence of both avoidance and exposure, providing more context to user click patterns.

5.4.3. Avoidance-aware CNN Layer

The historical news vectors undergo processing through a convolutional neural network (CNN) equipped with self-attention mechanisms. According to (Qi et al., 2022), this involves applying multiple filters to capture patterns among local contexts of adjacent clicks and candidate news, formulated as

(12) si=Wawc[𝒉ih;;𝒉i;;𝒉i+h;𝒩]subscripts𝑖subscriptW𝑎𝑤𝑐subscript𝒉𝑖subscript𝒉𝑖subscript𝒉𝑖𝒩\textbf{s}_{i}=\textbf{W}_{awc}[\bm{h}_{i-h};\dots;\bm{h}_{i};\dots;\bm{h}_{i+% h};\mathscr{N}]s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = W start_POSTSUBSCRIPT italic_a italic_w italic_c end_POSTSUBSCRIPT [ bold_italic_h start_POSTSUBSCRIPT italic_i - italic_h end_POSTSUBSCRIPT ; … ; bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; … ; bold_italic_h start_POSTSUBSCRIPT italic_i + italic_h end_POSTSUBSCRIPT ; script_N ]

here, sisubscripts𝑖\textbf{s}_{i}s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the local contextual representation of the i𝑖iitalic_i-th click, 2h+1212h+12 italic_h + 1 represents the CNN window size, and WawcsubscriptW𝑎𝑤𝑐\textbf{W}_{awc}W start_POSTSUBSCRIPT italic_a italic_w italic_c end_POSTSUBSCRIPT refers to the parameters of the avoidance-aware filters. These local contextual representations [s1,s2,,sN]subscript𝑠1subscript𝑠2subscript𝑠𝑁[s_{1},s_{2},...,s_{N}][ italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] of all clicked news encode candidate-aware short-term user interests. A unified contextual representation misubscriptm𝑖\textbf{m}_{i}m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each i𝑖iitalic_i-th click is then formed by aggregating lisubscriptl𝑖\textbf{l}_{i}l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and sisubscripts𝑖\textbf{s}_{i}s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, expressed as mi=Φ1[si,li]subscriptm𝑖subscriptΦ1subscripts𝑖subscriptl𝑖\textbf{m}_{i}=\Phi_{1}[\textbf{s}_{i},\textbf{l}_{i}]m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], where Φ1subscriptΦ1\Phi_{1}roman_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT denotes a dense layer.

5.4.4. Avoidance-aware Final Attention Layer

We use a candidate-aware attention network to model the importance of clicked news based on their relevance to avoidance-aware candidate news 𝒩𝒩\mathscr{N}script_N. This builds the avoidance-aware user embedding representation uawsubscriptu𝑎𝑤\textbf{u}_{aw}u start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT, given by uaw=i=1Nαimisubscriptu𝑎𝑤subscriptsuperscript𝑁𝑖1subscript𝛼𝑖subscriptm𝑖\textbf{u}_{aw}=\sum^{N}_{i=1}\alpha_{i}\textbf{m}_{i}u start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT = ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the weight of the i𝑖iitalic_i-th click:

(13) αi=exp(Φ2(mi,𝒩))j=1Nexp(Φ2(mj,𝒩))subscript𝛼𝑖subscriptΦ2subscriptm𝑖𝒩subscriptsuperscript𝑁𝑗1subscriptΦ2subscriptm𝑗𝒩\alpha_{i}=\frac{\exp(\Phi_{2}(\textbf{m}_{i},\mathscr{N}))}{\sum^{N}_{j=1}% \exp(\Phi_{2}(\textbf{m}_{j},\mathscr{N}))}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG roman_exp ( roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , script_N ) ) end_ARG start_ARG ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT roman_exp ( roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , script_N ) ) end_ARG

here, Φ2subscriptΦ2\Phi_{2}roman_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a dense layer. This approach encodes avoidance-aware user interests relevant to the candidate news into uawsubscriptu𝑎𝑤\textbf{u}_{aw}u start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT, enhancing interest matching accuracy. The general overview of the Avoidance-aware User Encoder is shown in Figure 5 (b).

5.4.5. Final Relevance Scores

The relevance scores combine the news candidate’s avoidance-awareness 𝒩𝒩\mathscr{N}script_N with the score from the avoidance-aware user vector uawsubscriptu𝑎𝑤\textbf{u}_{aw}u start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT. Specifically, the preliminary interest scores are calculated as Ints=𝒩Tuaw𝐼𝑛superscriptsubscript𝑡𝑠superscript𝒩𝑇subscriptu𝑎𝑤Int_{s}^{\prime}=\mathscr{N}^{T}\cdot\textbf{u}_{aw}italic_I italic_n italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = script_N start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⋅ u start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT. However, to obtain the final scores, we apply our Avoidance-aware Relevance Predictor, which considers the time-varying nature of avoidance, as shown in Figure 4. Thus,

(14) Ints=(1η)𝐫aw+ηInts𝐼𝑛subscript𝑡𝑠1𝜂subscript𝐫𝑎𝑤𝜂𝐼𝑛superscriptsubscript𝑡𝑠Int_{s}=(1-\eta)\cdot\mathbf{r}_{aw}+\eta\cdot Int_{s}^{\prime}italic_I italic_n italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = ( 1 - italic_η ) ⋅ bold_r start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT + italic_η ⋅ italic_I italic_n italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

where η=σ(Φ3(𝐮aw))𝜂𝜎subscriptΦ3subscript𝐮𝑎𝑤\eta=\sigma(\Phi_{3}(\mathbf{u}_{aw}))italic_η = italic_σ ( roman_Φ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( bold_u start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT ) ). Here, Φ3subscriptΦ3\Phi_{3}roman_Φ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is a dense layer, 𝐫awsubscript𝐫𝑎𝑤\mathbf{r}_{aw}bold_r start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT is the relevance avoidance-aware score as defined earlier by equation 5, and η𝜂\etaitalic_η is computed from the user representation 𝐮awsubscript𝐮𝑎𝑤\mathbf{u}_{aw}bold_u start_POSTSUBSCRIPT italic_a italic_w end_POSTSUBSCRIPT using a dense network with sigmoid activation. This approach allows our model to better capture patterns related to the information conveyed by theavoidance and exposure values of a news article.

5.5. Loss Function

Inspired by (Wu et al., 2019d), we use negative sampling during model training. For each clicked news item (positive sample), we randomly sample K𝐾Kitalic_K non-clicked items (negative samples) from the same impression and shuffle their order. The click probability score of the positive item is y^+superscript^𝑦\hat{y}^{+}over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, and the scores for the K𝐾Kitalic_K negative items are [y^1,y^2,,y^K]subscriptsuperscript^𝑦1subscriptsuperscript^𝑦2subscriptsuperscript^𝑦𝐾[\hat{y}^{-}_{1},\hat{y}^{-}_{2},\ldots,\hat{y}^{-}_{K}][ over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ]. Then, these scores are normalized using the softmax function given by

(15) pi=exp(y^i+)exp(y^i+)+j=1Kexp(y^i,j)subscript𝑝𝑖subscriptsuperscript^𝑦𝑖subscriptsuperscript^𝑦𝑖superscriptsubscript𝑗1𝐾subscriptsuperscript^𝑦𝑖𝑗p_{i}=\frac{\exp(\hat{y}^{+}_{i})}{\exp(\hat{y}^{+}_{i})+\sum_{j=1}^{K}\exp(% \hat{y}^{-}_{i,j})}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG roman_exp ( over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG roman_exp ( over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT roman_exp ( over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) end_ARG

We frame the click prediction task as a pseudo (K+1)𝐾1(K+1)( italic_K + 1 )-way classification problem. The loss function is the negative log-likelihood of all positive samples S𝑆Sitalic_S: =iSlog(pi)subscript𝑖𝑆subscript𝑝𝑖\mathcal{L}=-\sum_{i\in S}\log(p_{i})caligraphic_L = - ∑ start_POSTSUBSCRIPT italic_i ∈ italic_S end_POSTSUBSCRIPT roman_log ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). This method trains the model to distinguish between clicked and non-clicked news items.

6. Experiment

6.1. Experimental Settings

6.1.1. Datasets

We evaluated AWRS on three datasets: MIND-small (English) (Wu et al., 2020a), Adressa one-week (Norwegian) (Gulla et al., 2017), and Nikkei (Japanese). Nikkei Inc.222https://www.nikkei.com/, based in Japan, is a leading media corporation renowned for its comprehensive coverage of business, economic, and financial news. Table 1 summarizes the train, validation, and test splits used in our study. For Nikkei, we analyzed 15,803 news articles, dividing user click behavior into training (Jan 16-20, 2023), validation (Jan 21, 2023), and testing (Jan 22, 2023) sets. The Adressa one-week dataset includes 22,136 news articles, with user click behaviors split into training (Jan 1-5, 2017), validation (Jan 6, 2017), and testing (Jan 7, 2017) sets. And the MIND-small dataset comprises 93,698 news articles, with user click behaviors segmented into training (Nov 9-13, 2019), validation (Nov 14, 2019), and testing (Nov 15, 2019) sets.

Table 1. Dataset Statistics
Dataset Train Validation Test
Impr./Users Impr./Users Impr./Users
Adressa 181,279 / 83,599 36,412 / 27,943 145,626 / 68,565
MIND-small 124,229 / 45,214 29,498 / 19,703 70,938 / 48,593
Nikkei 137,142 / 23,139 10,560 / 6,201 9,695 / 5,805

6.1.2. Baselines

We conducted a comparative analysis of the AWRS model against several state-of-the-art (SOTA) baseline models to evaluate its performance. The models compared include: (1) NRMS (Wu et al., 2019d), (2) NAML (Wu et al., 2019b), (3) LSTUR (An et al., 2019), (4) TANR (Wu et al., 2019a), (5) SentiRec (Wu et al., 2020b), (6) MINER (Li et al., 2022), (7) MINS (Wang and Lu, 2022), (8) CenNewsRec (Qi et al., 2020), (9) MANNeR-CR333For the Nikkei dataset and MANNeR-CR baseline model, we had to reduce the maximum number of tokens due to memory constraints. (Iana et al., 2024), (10) PP-REC (Qi et al., 2021), and (11) CAUM (Qi et al., 2022).

6.1.3. Evaluation Metrics

In accordance with prior research (Qi et al., 2022; Yang et al., 2023), we evaluate model performance using Area under the ROC Curve (AUC), Mean Reciprocal Rank (MRR), and normalized discounted cumulative gain (nDCG@5 and nDCG@10).

Table 2. Performance comparisson for the Nikkei, MIND-small, and Adressa one-week datasets. The best results are bolded, and the second best are underlined. All the results reported here used D=5𝐷5D=5italic_D = 5.
Nikkei MIND-small Adressa one-week
MODEL AUC MRR nDCG@5 nDCG@10 AUC MRR nDCG@5 nDCG@10 AUC MRR nDCG@5 nDCG@10
NRMS 0.5140 0.3110 0.2117 0.3069 0.5000 0.2731 0.2532 0.3198 0.5847 0.2102 0.1824 0.2894
NAML 0.5011 0.3323 0.2324 0.3322 0.5000 0.2806 0.2606 0.3246 0.5000 0.2267 0.1984 0.2945
LSTUR 0.4953 0.2878 0.1929 0.2883 0.5000 0.3000 0.2830 0.3468 0.5358 0.3091 0.2931 0.3576
TANR 0.5007 0.3005 0.2059 0.3011 0.5224 0.2964 0.2773 0.3425 0.5359 0.2622 0.2491 0.3136
SentiRec 0.5000 0.3051 0.2042 0.2991 0.5289 0.2773 0.2642 0.3281 0.5313 0.2087 0.1754 0.2831
MINER 0.5439 0.3296 0.2342 0.3613 0.6141 0.2722 0.2591 0.2385 0.6012 0.2165 0.1908 0.3054
MINS 0.5007 0.2723 0.1674 0.2677 0.4266 0.1995 0.1756 0.2385 0.7018 0.3511 0.3837 0.4703
CenNewsRec 0.5249 0.2873 0.1679 0.2521 0.4837 0.1992 0.1770 0.2384 0.5980 0.3211 0.3194 0.3781
MANNeR-CR 0.5000 0.3422 0.2422 0.3451 0.5917 0.3018 0.2826 0.3469 0.5000 0.2248 0.1741 0.2481
PP-REC 0.4970 0.2731 0.1869 0.2808 0.5686 0.2912 0.2695 0.3301 0.6729 0.4335 0.4129 0.4714
CAUM 0.5606 0.3278 0.2391 0.3411 0.5528 0.3133 0.2972 0.3615 0.6300 0.2476 0.2434 0.3014
AWRS 0.6723 0.4973 0.4028 0.4781 0.5978 0.3267 0.3105 0.3722 0.7670 0.4826 0.5134 0.5627

6.1.4. Environment Configuration

During training, we used mixed precision and the Adam optimizer (Kingma and Ba, 2017). The learning rates were 1e-5 for MIND-small and Nikkei, and 1e-6 for Adressa one-week. We used k=4𝑘4k=4italic_k = 4 negative samples for all datasets. Each model for the MIND-small was trained for 10 epochs and for Adressa one-week and Nikkei 3 epochs, using a single NVIDIA A100 GPU, with implementations in the newsreclib framework444Upon publication, we plan to release parts of the code to the research community for further development and replication of our results. (Iana et al., 2023). When calculating avoidance and user engagement embedding, we measured it every 1 hour for the MIND-small dataset, every 5 hours for the Adressa one-week dataset, and every 2 hours for the Nikkei dataset.

7. Results

With the results at hand we comprehensively evaluated AWRS by answering the following research questions:

  • RQ1 (Accuracy): Does the proposed AWRS method outperform existing baselines in terms of accurately predicting user-clicked news articles?

  • RQ2 (Effect of Each Component): How does each component of our proposed architecture contribute to the accuracy scores? Specifically, what is the impact of adding the Avoidance-aware User Encoder and Avoidance-aware Relevance Predictor components?

  • RQ3 (Effect of Varying D𝐷Ditalic_D): To compute the user engagement embedding (𝒖𝒆𝒖𝒆\bm{ue}bold_italic_u bold_italic_e), we divided our avoidance region space (avidx𝑎subscript𝑣𝑖𝑑𝑥av_{idx}italic_a italic_v start_POSTSUBSCRIPT italic_i italic_d italic_x end_POSTSUBSCRIPT) and exposure per impression (epiidx𝑒𝑝subscript𝑖𝑖𝑑𝑥epi_{idx}italic_e italic_p italic_i start_POSTSUBSCRIPT italic_i italic_d italic_x end_POSTSUBSCRIPT) into distinct regions determined by the constant D𝐷Ditalic_D. How does accuracy vary when changing this constant?

  • RQ4 (Performance without PLMs): How does our model perform in the absence of pretrained language models (PLMs)?

  • RQ5 (Limitations): What are some of the limitations of our model compared to existing baselines?

7.1. RQ1 (Accuracy)

Table 2 presents a performance comparison between the baselines and our proposed AWRS model. As shown in the table, our model demonstrates improvements in nearly all key metrics, attributed to its enhanced awareness of avoidance and exposure. For instance, if a user opts to read an article about chess, a typically avoided topic, it suggests a strong interest in this subject. Conversely, an article about the Oscars, which is generally less avoided, might be clicked due to its current popularity rather than genuine interest. It is crucial to note that such patterns may change over time. For example, during a specific period involving controversial news about chess, articles on chess might become less avoided, whereas Oscars articles might be more avoided if they are far from the premiere date. Therefore, we argue that avoidance and time are closely related. This understanding allows us to delve deeper into our discussion for each dataset.

7.1.1. Nikkei

Our proposed AWRS model is the best-performing model for this dataset, with the second-best performance varying across different metrics. The second highest AUC was achieved by CAUM (Qi et al., 2022), the second best MRR and nDCG@5 by MANNeR (Iana et al., 2024), and the second highest nDCG@10 by MINER (Li et al., 2022). All these baselines have complex architectures for encoding user behaviors, demonstrating their efficacy in predicting user preferences. However, by incorporating user engagement embeddings with our Avoidance-aware User Encoder alongside our Avoidance-aware Relevance Predictor as indicated in figures 4 and 5, our model achieves a better understanding of user patterns. This leads to an avoidance-aware contextualized assessment of each article, enabling our model to extract more information by understanding the avoidance of news articles.

7.1.2. MIND-small

For this dataset, our model outperforms others in terms of MRR, nDCG@5, and nDCG@10. However, it performs slightly worse than MINER (Li et al., 2022) for AUC. Generally, the second-best model was CAUM (Qi et al., 2022), for reasons similar to those elaborated for Nikkei. We argue that by fully capturing the avoidance behavior of news articles, our model better understands user patterns.

7.1.3. Adressa one-week

We employed various baselines, each with different advantages and disadvantages. The Adressa dataset contains click patterns for news articles published long ago. Despite encompassing the initial weeks of 2017, the dataset includes user interactions with very old news articles (e.g., a user clicking on a news article from the 2000s during the early weeks of 2017). Therefore, baselines such as PP-REC (Qi et al., 2021), which consider time as a key metric, perform very well on this dataset. However, our AWRS model not only incorporates time as a factor through the Time2Vec (Kazemi et al., 2019) module, but also considers avoidance, enhancing the contextualization for the model’s decision-making process.

7.2. RQ2, RQ3 and RQ4 (Ablation Studies)

We aim to address research questions RQ2, RQ3, and RQ4 by exposing an ablation study, as demonstrated in the following sections.

Table 3. Impact of the Avoidance-aware User Encoder and the Avoidance-aware Relevance Predictor modules on AWRS.
Dataset AWRS Model AUC MRR nDCG@5 nDCG@10
MIND-small only rel 0.5818 0.3244 0.3083 0.3690
only avoid 0.5838 0.3077 0.2884 0.3511
avoid + rel 0.5978 0.3267 0.3105 0.3722
Adressa one-week only rel 0.6354 0.3621 0.3948 0.4526
only avoid 0.5423 0.2564 0.2204 0.2499
avoid + rel 0.7670 0.4826 0.5134 0.5627
Nikkei only rel 0.6362 0.4638 0.3701 0.4534
only avoid 0.6522 0.4889 0.3784 0.4568
avoid + rel 0.6723 0.4973 0.3784 0.4781
Table 4. AWRS scores for different datasets with varying D𝐷Ditalic_D values.
Dataset Metrics D = 5 D = 7 D = 10 D = 15 D = 20
MIND-small AUC 0.5978 0.6213 0.6430 0.6479 0.6494
MRR 0.3267 0.3214 0.3499 0.3481 0.3409
nDCG@5 0.3105 0.3049 0.3285 0.3289 0.3226
nDCG@10 0.3722 0.3684 0.3914 0.3923 0.3864
Adressa one-week AUC 0.7670 0.6561 0.6997 0.6443 0.7652
MRR 0.4826 0.4310 0.4598 0.4109 0.5055
nDCG@5 0.5134 0.4172 0.4799 0.3869 0.5300
nDCG@10 0.5627 0.4670 0.5062 0.4459 0.6033
Nikkei AUC 0.6723 0.6466 0.6252 0.6283 0.6424
MRR 0.4973 0.4804 0.4168 0.4497 0.4494
nDCG@5 0.4028 0.3732 0.3178 0.3405 0.3435
nDCG@10 0.4781 0.4531 0.4096 0.4279 0.4330
Table 5. Performance comparison for the MIND-small dataset. All the results reported here used D=5𝐷5D=5italic_D = 5 and GloVe.
Model AUC MRR nDCG@5 nDCG@10
SentiRec 0.5425 0.3060 0.2826 0.3486
TANR 0.5389 0.3240 0.3039 0.3683
LSTURINI 0.5482 0.3194 0.2995 0.3644
NAML 0.5010 0.3350 0.3176 0.3802
NRMS 0.5571 0.2924 0.2730 0.3370
GLORY 0.6603 0.2978 0.3218 0.3825
CAUM 0.6200 0.3414 0.3207 0.3854
AWRS 0.6613 0.3425 0.3249 0.3896

7.2.1. RQ2 (Effect of Each Component)

To assess the influence of each component in our architecture, specifically the Avoidance-aware User Encoder and the Avoidance-aware Relevance Predictor, we conducted experiments by removing each component separately and calculating the MRR, nDCG@5, and nDCG@10 scores for the MIND-small, Adressa one-week, and Nikkei datasets. As shown in Table 3, it is evident that each component individually underperforms compared to when both components are used together. We argue that this is because the Avoidance-aware User Encoder is responsible for understanding how user behavior historically patterns in terms of avoidance. It analyzes how the news articles that the user has clicked on were avoided. For example, is the user more interested in less avoided or more avoided news articles? Conversely, the Avoidance-aware Relevance Predictor gains a better understanding of the temporal and global influence of avoidance. Thus the impact of avoidance on the decision of which news article is more relevant can vary depending on the timing of the recommendation. This dual approach enhances the model’s ability to accurately predict user preferences.

7.2.2. RQ3 (Effect of Varying D𝐷Ditalic_D)

To investigate the impact of adjusting the parameter D𝐷Ditalic_D on the AWRS model, we conducted experiments with values D=5,7,10,15,𝐷571015D=5,7,10,15,italic_D = 5 , 7 , 10 , 15 , and 20202020. As detailed in Table 4, varying D𝐷Ditalic_D has a notable effect on model performance, albeit with nuanced outcomes. Notably, optimal scores for the Nikkei dataset are observed at D=5𝐷5D=5italic_D = 5 and D=7𝐷7D=7italic_D = 7, whereas for the Adressa one-week dataset, peak performance is achieved at D=20𝐷20D=20italic_D = 20. In the case of the MIND-small dataset, the highest AUC score emerges at D=20𝐷20D=20italic_D = 20, with MRR peaking at D=10𝐷10D=10italic_D = 10, and both nDCG@5 and nDCG@10 showing optimal results at D=15𝐷15D=15italic_D = 15. Our findings suggest that adjusting D𝐷Ditalic_D influences the embedding space’s complexity, potentially leading to overfitting or underfitting of avoidance behavior in news article recommendations. Thus, our grid search aimed to pinpoint the most effective D𝐷Ditalic_D values tailored to each dataset and metric.

7.2.3. RQ4 (Performance without PLMs)

We investigated how excluding pretrained language models (PLMs) in favor of GloVe embeddings (Pennington et al., 2014) would affect our model’s performance. For evaluation, we utilized the same datasplit from the MIND-small dataset described in Section 6.1.1. In this comparison, we included several baseline models for a more comprehensive evaluation: we employed GLORY (Yang et al., 2023), SentiRec (Wu et al., 2020b), CAUM (Qi et al., 2022), TANR (Wu et al., 2019a), NRMS (Wu et al., 2019d), NAML (Wu et al., 2019b), and LSTUR (An et al., 2019). Table 5 demonstrates that our model, AWRS, exhibits superior performance compared to such models. The reduced performance gains compared to PLMs are due to PLMs’ advanced capability in capturing more detailed contextual information from news articles.

7.3. RQ5 (Limitations)

From our exploration of current methodologies in news article recommendation systems, we identify two notable approaches. Firstly, there are methods like PP-REC (Qi et al., 2021) that leverage contextual information such as time to enhance recommendations. Secondly, methods such as those discussed in GLORY (Yang et al., 2023) utilize global information derived from static global graphs. Our proposed model aligns with the former category, leveraging time-varying avoidance data of news articles to enhance user recommendations. However, like all models, our approach has its drawbacks. For instance, one limitation of our model is its increased memory requirements due to the need for extensive per-unit-time information such as avoidance (Av𝐴𝑣Avitalic_A italic_v) and Exposure Per Impression (EPI𝐸𝑃𝐼EPIitalic_E italic_P italic_I). Despite this increased memory demand, we did not observe a corresponding increase in computational training time for our models.

8. Conclusion

In this study, we emphasize the importance of avoidance awareness. Our research highlights that integrating avoidance behaviors can significantly enhance the predictive accuracy of recommender systems. Essentially, analyzing the timely avoidance behavior of news articles provides deeper insights into user preferences. This understanding enables us to incorporate contextual information into recommender systems, thereby refining our understanding of user preferences. By integrating our Avoidance-aware Relevance Predictor and Avoidance-aware User Encoder modules, we have developed a novel model named AWRS. This model demonstrates robust performance across three real-world datasets, showing improvements in various benchmark metrics. Our approach not only shows enhanced accuracy but also establishes a foundation for future research into understanding user behavior through avoidance.

Acknowledgements.
This work is partially supported by ”Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN)” in Japan (Project ID: jh241004), JSPS KAKENHI Grant Number 23K28098, and the Monbukagakusho: MEXT (Ministry of Education, Culture, Sports, Science and Technology - Japan) scholarship.

References

  • (1)
  • An et al. (2019) Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu, and Xing Xie. 2019. Neural News Recommendation with Long- and Short-term User Representations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 336–345. https://doi.org/10.18653/v1/P19-1033
  • Bae et al. (2023) Hong-Kyun Bae, Jeewon Ahn, Dongwon Lee, and Sang-Wook Kim. 2023. LANCER: A Lifetime-Aware News Recommender System. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (Jun. 2023), 4141–4148. https://doi.org/10.1609/aaai.v37i4.25530
  • Choi et al. (2022) Seonghwan Choi, Hyeondey Kim, and Manjun Gim. 2022. Do Not Read the Same News! Enhancing Diversity and Personalization of News Recommendation. In Companion Proceedings of the Web Conference 2022 (Virtual Event, Lyon, France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, 1211–1215. https://doi.org/10.1145/3487553.3524936
  • Fitzpatrick (2022) Neil Fitzpatrick. 2022. No News is Not Good News: The Implications of News Fatigue and News Avoidance in a Pandemic World. ATHENS JOURNAL OF MASS MEDIA AND COMMUNICATIONS (2022). https://api.semanticscholar.org/CorpusID:246590237
  • Gharahighehi and Vens (2021) Alireza Gharahighehi and Celine Vens. 2021. Diversification in session-based news recommender systems. Personal and Ubiquitous Computing 27, 1 (July 2021), 5–15. https://doi.org/10.1007/s00779-021-01606-4
  • Gulla et al. (2017) Jon Atle Gulla, Lemei Zhang, Peng Liu, Özlem Özgöbek, and Xiaomeng Su. 2017. The Adressa dataset for news recommendation. In Proceedings of the International Conference on Web Intelligence (Leipzig, Germany) (WI ’17). Association for Computing Machinery, New York, NY, USA, 1042–1048. https://doi.org/10.1145/3106426.3109436
  • He et al. (2021) Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Wei Chen. 2021. DeBERTa: Decoding-Enhanced BERT with Disentangled Attention. In 2021 International Conference on Learning Representations. https://www.microsoft.com/en-us/research/publication/deberta-decoding-enhanced-bert-with-disentangled-attention-2/ Under review.
  • Heitz et al. (2022) Lucien Heitz, Juliane A. Lischka, Alena Birrer, Bibek Paudel, Suzanne Tolmeijer, Laura Laugwitz, and Abraham Bernstein. 2022. Benefits of Diverse News Recommendations for Democracy: A User Study. Digital Journalism 10, 10 (2022), 1710–1730. https://doi.org/10.1080/21670811.2021.2021804 arXiv:https://doi.org/10.1080/21670811.2021.2021804
  • Huang et al. (2013) Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, California, USA) (CIKM ’13). Association for Computing Machinery, New York, NY, USA, 2333–2338. https://doi.org/10.1145/2505515.2505665
  • Iana et al. (2023) Andreea Iana, Goran Glavaš, and Heiko Paulheim. 2023. NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation. arXiv:2310.01146 [cs.IR]
  • Iana et al. (2024) Andreea Iana, Goran Glavaš, and Heiko Paulheim. 2024. Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation. arXiv:2307.16089 [cs.IR]
  • Kazemi et al. (2019) Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. 2019. Time2Vec: Learning a Vector Representation of Time. arXiv:1907.05321 [cs.LG]
  • Kingma and Ba (2017) Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]
  • Kummervold et al. (2021) Per E Kummervold, Javier De la Rosa, Freddy Wetjen, and Svein Arne Brygfjeld. 2021. Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), Simon Dobnik and Lilja Øvrelid (Eds.). Linköping University Electronic Press, Sweden, Reykjavik, Iceland (Online), 20–29. https://aclanthology.org/2021.nodalida-main.3
  • Li et al. (2022) Jian Li, Jieming Zhu, Qiwei Bi, Guohao Cai, Lifeng Shang, Zhenhua Dong, Xin Jiang, and Qun Liu. 2022. MINER: Multi-Interest Matching Network for News Recommendation. In Findings of the Association for Computational Linguistics: ACL 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 343–352. https://doi.org/10.18653/v1/2022.findings-acl.29
  • Li and Wang (2019) Miaomiao Li and Licheng Wang. 2019. A Survey on Personalized News Recommendation Technology. IEEE Access PP (10 2019), 1–1. https://doi.org/10.1109/ACCESS.2019.2944927
  • Liu et al. (2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL]
  • Moreira et al. (2019) Gabriel De Souza P. Moreira, Dietmar Jannach, and Adilson Marques Da Cunha. 2019. Contextual Hybrid Session-Based News Recommendation With Recurrent Neural Networks. IEEE Access 7 (2019), 169185–169203. https://doi.org/10.1109/access.2019.2954957
  • Okura et al. (2017) Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based News Recommendation for Millions of Users. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada) (KDD ’17). Association for Computing Machinery, New York, NY, USA, 1933–1942. https://doi.org/10.1145/3097983.3098108
  • Pariser (2011) Eli Pariser. 2011. The Filter Bubble: What the Internet Is Hiding from You. Penguin Group , The.
  • Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). Association for Computational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162
  • Qi et al. (2021) Tao Qi, Fangzhao Wu, Chuhan Wu, and Yongfeng Huang. 2021. PP-Rec: News Recommendation with Personalized User Interest and Time-aware News Popularity. arXiv:2106.01300 [cs.IR]
  • Qi et al. (2022) Tao Qi, Fangzhao Wu, Chuhan Wu, and Yongfeng Huang. 2022. News Recommendation with Candidate-aware User Modeling. arXiv:2204.04726 [cs.IR]
  • Qi et al. (2020) Tao Qi, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, and Xing Xie. 2020. Privacy-Preserving News Recommendation Model Learning. In Findings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 1423–1432. https://doi.org/10.18653/v1/2020.findings-emnlp.128
  • Schrøder (2019) K Schrøder. 2019. What do news readers really want to read about? How relevance works for news audiences. Technical Report Feb 2019. 1–36 pages.
  • Villi et al. (2022) Mikko Villi, Tali Aharoni, Keren Tenenboim-Weinblatt, Pablo J. Boczkowski, Kaori Hayashi, Eugenia Mitchelstein, Akira Tanaka, and Neta Kligler-Vilenchik. 2022. Taking a Break from News: A Five-nation Study of News Avoidance in the Digital Era. Digital Journalism 10, 1 (2022), 148–164. https://doi.org/10.1080/21670811.2021.1904266 arXiv:https://doi.org/10.1080/21670811.2021.1904266
  • Wang et al. (2018) Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep Knowledge-Aware Network for News Recommendation. arXiv:1801.08284 [stat.ML]
  • Wang and Lu (2022) Rongyao Wang and Wenpeng Lu. 2022. Modeling Multi-interest News Sequence for News Recommendation. arXiv:2207.07331 [cs.IR] https://arxiv.org/abs/2207.07331
  • Wu et al. (2019b) Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019b. Neural News Recommendation with Attentive Multi-View Learning. arXiv:1907.05576 [cs.CL]
  • Wu et al. (2019c) Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019c. NPA: Neural News Recommendation with Personalized Attention. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19). ACM. https://doi.org/10.1145/3292500.3330665
  • Wu et al. (2019a) Chuhan Wu, Fangzhao Wu, Mingxiao An, Yongfeng Huang, and Xing Xie. 2019a. Neural News Recommendation with Topic-Aware News Representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 1154–1159. https://doi.org/10.18653/v1/P19-1110
  • Wu et al. (2019d) Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang, and Xing Xie. 2019d. Neural News Recommendation with Multi-Head Self-Attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 6389–6394. https://doi.org/10.18653/v1/D19-1671
  • Wu et al. (2023) Chuhan Wu, Fangzhao Wu, Yongfeng Huang, and Xing Xie. 2023. Personalized News Recommendation: Methods and Challenges. ACM Trans. Inf. Syst. 41, 1, Article 24 (jan 2023), 50 pages. https://doi.org/10.1145/3530257
  • Wu et al. (2020b) Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020b. SentiRec: Sentiment Diversity-aware Neural News Recommendation. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Kam-Fai Wong, Kevin Knight, and Hua Wu (Eds.). Association for Computational Linguistics, Suzhou, China, 44–53. https://aclanthology.org/2020.aacl-main.6
  • Wu et al. (2022a) Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2022a. End-to-end Learnable Diversity-aware News Recommendation. arXiv:2204.00539 [cs.IR]
  • Wu et al. (2022b) Chuhan Wu, Fangzhao Wu, Tao Qi, Chenliang Li, and Yongfeng Huang. 2022b. Is News Recommendation a Sequential Recommendation Task?. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 2382–2386. https://doi.org/10.1145/3477495.3531862
  • Wu et al. (2020a) Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020a. MIND: A Large-scale Dataset for News Recommendation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 3597–3606. https://doi.org/10.18653/v1/2020.acl-main.331
  • Yang et al. (2023) Boming Yang, Dairui Liu, Toyotaro Suzumura, Ruihai Dong, and Irene Li. 2023. Going Beyond Local: Global Graph-Enhanced Personalized News Recommendations. In Proceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23). ACM. https://doi.org/10.1145/3604915.3608801