Recall-Augmented Ranking: Enhancing Click-Through Rate Prediction Accuracy with Cross-Stage Data

Junjie Huang 0000-0002-5637-0735 Shanghai Jiao Tong UniversityShanghaiChina [email protected] Guohao Cai 0000-0002-9000-857X Huawei Noah’s Ark LabShenzhenChina [email protected] Jieming Zhu 0000-0002-5666-8320 Huawei Noah’s Ark LabShenzhenChina [email protected] Zhenhua Dong 0000-0002-2231-4663 Huawei Noah’s Ark LabShenzhenChina [email protected] Ruiming Tang 0000-0002-9224-2431 Huawei Noah’s Ark LabShenzhenChina [email protected] Weinan Zhang 0000-0002-0127-2425 Shanghai Jiao Tong UniversityShanghaiChina [email protected]  and  Yong Yu 0000-0003-0281-8271 Shanghai Jiao Tong UniversityShanghaiChina [email protected]
(2024)
Abstract.

Click-through rate (CTR) prediction plays an indispensable role in online platforms. Numerous models have been proposed to capture users’ shifting preferences by leveraging user behavior sequences. However, these historical sequences often suffer from severe homogeneity and scarcity compared to the extensive item pool. Relying solely on such sequences for user representations is inherently restrictive, as user interests extend beyond the scope of items they have previously engaged with. To address this challenge, we propose a data-driven approach to enrich user representations. We recognize user profiling and recall items as two ideal data sources within the cross-stage framework, encompassing the u2u (user-to-user) and i2i (item-to-item) aspects respectively. In this paper, we propose a novel architecture named Recall-Augmented Ranking (RAR). RAR consists of two key sub-modules, which synergistically gather information from a vast pool of look-alike users and recall items, resulting in enriched user representations. Notably, RAR is orthogonal to many existing CTR models, allowing for consistent performance improvements in a plug-and-play manner. Extensive experiments are conducted, which verify the efficacy and compatibility of RAR against the SOTA methods.

Recommender systems, Cross-stage, CTR prediction
journalyear: 2024copyright: acmlicensedconference: Companion Proceedings of the ACM Web Conference 2024; May 13–17, 2024; Singapore, Singaporebooktitle: Companion Proceedings of the ACM Web Conference 2024 (WWW ’24 Companion), May 13–17, 2024, Singapore, Singaporedoi: 10.1145/3589335.3651551isbn: 979-8-4007-0172-6/24/05ccs: Information systems Recommender systems

1. Introduction

Recommender systems have been widely deployed to save users from information overload. Among them, CTR prediction is an essential task, which is to predict the probability that a user will click on an item under a particular context, enhancing both user experience and platform revenue.

Recently, many models have been proposed to extract user interest based on historical behavior sequences. However, items in user behavior sequences often exhibit homogeneity and scarcity versus the large-scale item pool, which is detailed in Section 2. Moreover, existing models often rely on target attention mechanisms, assigning higher scores to repetitive, similar items, reinforcing a cycle of homogeneity. In Figure 1, we provide an example. When a user buys lipstick, existing models often suggest similar products. However, the user might prefer exploring related items such as perfume or earrings, seeking variety beyond her initial purchase, even without previous interactions with these items. Therefore, we aim to enrich user representations from a data-driven perspective, incorporating diverse sources of information to enhance accuracy.

Refer to caption
Figure 1. An illustrated example for motivations of RAR.

Moreover, CTR predictions traditionally focus on single user-item interactions and often overlook the interrelationships across various users and items, resulting in inadequate long-tail modeling. On the contrary, the recall stage inherently generates similar user-item lists, providing cross-instance modeling capability. We recognize user profiling and recall items as two ideal data sources within the cross-stage framework, encompassing the u2u and i2i aspects respectively. In this paper, we are interested in how to leverage these cross-stage data to enhance CTR prediction accuracy rather than how to construct the two sets.

In this paper, we propose a novel architecture named Recall-Augemented Ranking (RAR) to enhance model accuracy based on cross-stage data. RAR consists of two key components: the Cross-Stage User & Item Selection Module and the Co-Interaction Module. These sub-modules efficiently gather information from a broad spectrum of look-alike users and recall items, thereby enriching user representations. Note that the Co-Interaction Module is a set-to-set modeling, which has not been previously explored in CTR prediction task.

In summary, the contributions of the paper are as follows:

  • We shed light on the limitations of relying solely on user behavior sequences to model user preferences. To address this inadequacy, we propose a novel architecture RAR, which leverages cross-stage data to enrich user representation.

  • RAR contains two extra data sources, namely the look-alike user set and recall item set. It is the first work that incorporates set-to-set modeling into CTR prediction to the best of our knowledge.

  • RAR serves as a framework capable of enhancing the performance of numerous existing CTR prediction models. Comprehensive experiments show RAR’s outperformance, effectiveness and compatibility with a wide variety of models.

2. BACKGROUND

  • Motivation of RAR: We provide an analysis of user behavior in Taobao111https://tianchi.aliyun.com/dataset/649. Figure 2(a) illustrates the scarcity of user historical sequences, with the majority of users having interacted with only a minuscule fraction of the total number of available items. Figure 2(b) highlights user behavior’s homogeneity, with most activity of a specific user concentrated in four to five categories out of thousands. Furthermore, traditional CTR models focus on single user-item interactions, yet overlook broader interrelationships. Conversely, the recall stage facilitates cross-instance modeling by linking similar user-item lists.

    Refer to caption
    Figure 2. Observations of scarcity and homogeneity of user behavior sequence in Taobao dataset.
  • Cascade Ranking System and User Profiling: In modern information retrieval applications, a cascade ranking system is often used to balance the efficiency and effectiveness. The system includes a variety of rankers. Each stage selects the top-k items it receives and feeds them to the next stage. Among them, recall and ranking are two common stages. Besides, look-alike methods have become a core component of online advertising and marketing, which are intended to identify similar users from a small user set.

Table 1. Notations and descriptions
Notation Description.
kl,krsubscript𝑘𝑙subscript𝑘𝑟k_{l},k_{r}italic_k start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT The number of selected look-alike users and recall items respectively.
S,Ssubscript𝑆subscript𝑆S_{\mathcal{L}},S_{\mathcal{R}}italic_S start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT Similarity score matrix of look-alike users and recall items.
E,Esubscript𝐸subscript𝐸E_{\mathcal{L}},E_{\mathcal{R}}italic_E start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT Embedding matrix of look-alike users and recall items respectively.
E,Esubscript𝐸superscriptsubscript𝐸superscriptE_{\mathcal{L}^{{}^{\prime}}},E_{\mathcal{R}^{{}^{\prime}}}italic_E start_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT Embedding matrix of selected look-alike users and recall items.
EH,EHsuperscriptsubscript𝐸𝐻superscriptsubscript𝐸𝐻E_{\mathcal{L}H}^{{}^{\prime}},E_{\mathcal{R}H}^{{}^{\prime}}italic_E start_POSTSUBSCRIPT caligraphic_L italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT caligraphic_R italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT High-order representation of selected look-alike users and recall items.
P𝑃Pitalic_P Hash projection matrix in selection modules.

3. Approach

3.1. Cross-Stage User/Item Selection Module

The Cross-Stage User/Item Selection Module select the most similar users and relevant items. The selection process can be abstracted into two steps and we take the selection of recall items as an example. First, similarity is measured between the target item and each recall item by similarity function f()𝑓f(\cdot)italic_f ( ⋅ ). Then top-k relevant recall items can be selected based on the similarity score, which can be formalized in Equation 1,  2. We conclude the key notations and the descriptions in Table 1.

(1) S=f(E,eu),S=f(E,ei)formulae-sequencesubscript𝑆𝑓subscript𝐸superscript𝑒𝑢subscript𝑆𝑓subscript𝐸superscript𝑒𝑖S_{\mathcal{L}}=f(E_{\mathcal{L}},e^{u}),\quad S_{\mathcal{R}}=f(E_{\mathcal{R% }},e^{i})italic_S start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT = italic_f ( italic_E start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT , italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) , italic_S start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT = italic_f ( italic_E start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT , italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT )
(2) E=[ek1uek2ueklu]T,E=[ek1iek2iekri]Tformulae-sequencesubscript𝐸superscriptsuperscriptdelimited-[]subscriptsuperscript𝑒𝑢subscript𝑘1subscriptsuperscript𝑒𝑢subscript𝑘2subscriptsuperscript𝑒𝑢subscript𝑘𝑙𝑇subscript𝐸superscriptsuperscriptdelimited-[]subscriptsuperscript𝑒𝑖subscript𝑘1subscriptsuperscript𝑒𝑖subscript𝑘2subscriptsuperscript𝑒𝑖subscript𝑘𝑟𝑇E_{\mathcal{L}^{{}^{\prime}}}=[e^{u}_{k_{1}}e^{u}_{k_{2}}...e^{u}_{k_{l}}]^{T}% ,\quad E_{\mathcal{R}^{{}^{\prime}}}=[e^{i}_{k_{1}}e^{i}_{k_{2}}...e^{i}_{k_{r% }}]^{T}italic_E start_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = [ italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = [ italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT … italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT

An intuitive idea is using the embedding and search k-nearest neighbor by inner product. However, the huge number of multiplications makes real-world deployment impractical. Considering the selection complexity, we use the SimHash function in our experiment.

SimHash, leveraging locality-sensitive properties, ensures similar outputs for similar inputs through random projection and signed axes, simplifying embeddings to binary fingerprints. This process is detailed in Equation 3,  4, where 𝒆kisubscriptsuperscript𝒆𝑖𝑘\boldsymbol{e}^{i}_{k}bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT stands for the embedding of the kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT recall item and m𝑚mitalic_m is the mthsuperscript𝑚𝑡m^{th}italic_m start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT hash function in the hash function set. It reduces storage and speeds up selection by using hamming distance for efficient comparison.

(3) sigki[m]=n=1d2sgn(𝒆ki[n]P[n][m])subscriptsuperscriptsig𝑖𝑘𝑚superscriptsubscript𝑛1subscript𝑑2sgnsubscriptsuperscript𝒆𝑖𝑘delimited-[]𝑛𝑃delimited-[]𝑛delimited-[]𝑚\operatorname{sig}^{i}_{k}[m]=\sum_{n=1}^{d_{2}}\operatorname{sgn}\left(% \boldsymbol{e}^{i}_{k}[n]\cdot P[n][m]\right)roman_sig start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_m ] = ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_sgn ( bold_italic_e start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_n ] ⋅ italic_P [ italic_n ] [ italic_m ] )
(4) sigki[m]𝟙sigki[m]>0(sigki[m])absentsubscriptsuperscriptsig𝑖𝑘𝑚subscript1subscriptsuperscriptsig𝑖𝑘𝑚0subscriptsuperscriptsig𝑖𝑘𝑚\operatorname{sig}^{i}_{k}[m]\xleftarrow{}\mathbbm{1}_{\operatorname{sig}^{i}_% {k}[m]>0}(\operatorname{sig}^{i}_{k}[m])roman_sig start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_m ] start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW blackboard_1 start_POSTSUBSCRIPT roman_sig start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_m ] > 0 end_POSTSUBSCRIPT ( roman_sig start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ italic_m ] )
Refer to caption
Figure 3. RAR applied in existing CTR prediction models.

3.2. Co-Interaction Module

Co-Interaction Module provides a fine-grained set-to-set modeling. It improves upon the simplistic equal weighting of all selected recall items, which overlooks hierarchical information. We introduce a matching matrix to assess user-item interest compatibility. The matching score is represented as high-level latent vectors’ inner product, as is shown in Equation 5. Then we compute the matching matrix in Equation 6, where Sigmoid()𝑆𝑖𝑔𝑚𝑜𝑖𝑑Sigmoid(\cdot)italic_S italic_i italic_g italic_m italic_o italic_i italic_d ( ⋅ ) is used to map the matching scores to (0,1).

(5) EH=MLP(E),EH=MLP(E)formulae-sequencesuperscriptsubscript𝐸𝐻𝑀𝐿𝑃superscriptsubscript𝐸superscriptsubscript𝐸𝐻𝑀𝐿𝑃superscriptsubscript𝐸E_{\mathcal{L}H}^{{}^{\prime}}=MLP(E_{\mathcal{L}}^{{}^{\prime}}),\quad E_{% \mathcal{R}H}^{{}^{\prime}}=MLP(E_{\mathcal{R}}^{{}^{\prime}})italic_E start_POSTSUBSCRIPT caligraphic_L italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT = italic_M italic_L italic_P ( italic_E start_POSTSUBSCRIPT caligraphic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ) , italic_E start_POSTSUBSCRIPT caligraphic_R italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT = italic_M italic_L italic_P ( italic_E start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT )
(6) M=Sigmoid(EHEHT)subscript𝑀𝑆𝑖𝑔𝑚𝑜𝑖𝑑superscriptsubscript𝐸𝐻superscriptsubscript𝐸𝐻superscript𝑇\mathcal{M}_{M}=Sigmoid(E_{\mathcal{L}H}^{{}^{\prime}}\cdot E_{\mathcal{R}H}^{% {}^{\prime}T})caligraphic_M start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = italic_S italic_i italic_g italic_m italic_o italic_i italic_d ( italic_E start_POSTSUBSCRIPT caligraphic_L italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT ⋅ italic_E start_POSTSUBSCRIPT caligraphic_R italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )

To provide the model with a clearer indication of which recall items are more important, the signal yuiepsuperscriptsubscript𝑦𝑢𝑖𝑒𝑝y_{ui}^{ep}italic_y start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e italic_p end_POSTSUPERSCRIPT is utilized to supervise the training of the matching matrix. As the exposed signal is very sparse and we define yuiepsuperscriptsubscript𝑦𝑢𝑖𝑒𝑝y_{ui}^{ep}italic_y start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e italic_p end_POSTSUPERSCRIPT in Equation 8.

(7) y^uiep=Msuperscriptsubscript^𝑦𝑢𝑖𝑒𝑝subscript𝑀\hat{y}_{ui}^{ep}=\mathcal{M}_{M}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e italic_p end_POSTSUPERSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT
(8) yuiep={0,i has never been exposed to u for  u in u1, otherwise superscriptsubscript𝑦𝑢𝑖𝑒𝑝cases0𝑖 has never been exposed to u for  u in u1 otherwise y_{ui}^{ep}=\begin{cases}0,&i\text{ has never been exposed to $u^{\prime}$ for% $\forall$ $u^{\prime}$ in $\mathcal{L}_{u}^{\prime}$}\\ 1,&\text{ otherwise }\end{cases}italic_y start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e italic_p end_POSTSUPERSCRIPT = { start_ROW start_CELL 0 , end_CELL start_CELL italic_i has never been exposed to italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for ∀ italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in caligraphic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL otherwise end_CELL end_ROW

Finally, the matching matrix is averaged by row and by column to obtain the item and user weighting vector. We obtain user common interest vuicsubscriptsuperscript𝑣𝑐𝑢𝑖v^{c}_{ui}italic_v start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT and user diverse interest vuidsubscriptsuperscript𝑣𝑑𝑢𝑖v^{d}_{ui}italic_v start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT by multiplying weighting vectors with corresponding embeddings. Then user enriched representation vuenrsubscriptsuperscript𝑣𝑒𝑛𝑟𝑢v^{enr}_{u}italic_v start_POSTSUPERSCRIPT italic_e italic_n italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is obtained by concatenating vuicsubscriptsuperscript𝑣𝑐𝑢𝑖v^{c}_{ui}italic_v start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT and vuidsubscriptsuperscript𝑣𝑑𝑢𝑖v^{d}_{ui}italic_v start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT.

(9) wi=Mean(M,axis=0),wu=Mean(M,axis=1)formulae-sequencesubscript𝑤𝑖𝑀𝑒𝑎𝑛subscript𝑀𝑎𝑥𝑖𝑠0subscript𝑤𝑢𝑀𝑒𝑎𝑛subscript𝑀𝑎𝑥𝑖𝑠1w_{i}=Mean(\mathcal{M}_{M},axis=0),\quad w_{u}=Mean(\mathcal{M}_{M},axis=1)italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_M italic_e italic_a italic_n ( caligraphic_M start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , italic_a italic_x italic_i italic_s = 0 ) , italic_w start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_M italic_e italic_a italic_n ( caligraphic_M start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , italic_a italic_x italic_i italic_s = 1 )
(10) vuic=wuE,vuid=wiEformulae-sequencesubscriptsuperscript𝑣𝑐𝑢𝑖subscript𝑤𝑢subscript𝐸superscriptsubscriptsuperscript𝑣𝑑𝑢𝑖subscript𝑤𝑖subscript𝐸superscriptv^{c}_{ui}=w_{u}\cdot E_{\mathcal{L}^{{}^{\prime}}},\quad v^{d}_{ui}=w_{i}% \cdot E_{\mathcal{R}^{{}^{\prime}}}italic_v start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ⋅ italic_E start_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_v start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_E start_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
(11) vuenr=Concat(vuic,vuid)subscriptsuperscript𝑣𝑒𝑛𝑟𝑢𝐶𝑜𝑛𝑐𝑎𝑡subscriptsuperscript𝑣𝑐𝑢𝑖subscriptsuperscript𝑣𝑑𝑢𝑖v^{enr}_{u}=Concat(v^{c}_{ui},v^{d}_{ui})italic_v start_POSTSUPERSCRIPT italic_e italic_n italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_C italic_o italic_n italic_c italic_a italic_t ( italic_v start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u italic_i end_POSTSUBSCRIPT )

3.3. Objective function

The loss function of RAR can be represented by Equation 12, where clksubscript𝑐𝑙𝑘\mathcal{L}_{clk}caligraphic_L start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT aims to predict CTR accurately and epsubscript𝑒𝑝\mathcal{L}_{ep}caligraphic_L start_POSTSUBSCRIPT italic_e italic_p end_POSTSUBSCRIPT aims to provide a clearer indication to the model of which recall items are most important. α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ] is a tunable parameter for balancing the two losses. Both two losses are cross-entropy loss and supervise the training process in a point-wise manner. All modules of RAR are trained jointly by minimizing the joint loss function on the training dataset.

(12) =αclk+(1α)ep𝛼subscript𝑐𝑙𝑘1𝛼subscript𝑒𝑝\mathcal{L}=\alpha\cdot\mathcal{L}_{clk}+(1-\alpha)\cdot\mathcal{L}_{ep}caligraphic_L = italic_α ⋅ caligraphic_L start_POSTSUBSCRIPT italic_c italic_l italic_k end_POSTSUBSCRIPT + ( 1 - italic_α ) ⋅ caligraphic_L start_POSTSUBSCRIPT italic_e italic_p end_POSTSUBSCRIPT

3.4. Complexity Analysis

We analyze the efficiency of the RAR in this section. Co-Interaction Module first gets the matching matrix and weighting scores by multiplying the embedding vectors of selected look-alike users and recall items, followed by a weighted sum using the weighting vectors. Therefore, three matrix multiplications are needed, and the time complexity is O(Bklkrd)𝑂𝐵subscript𝑘𝑙subscript𝑘𝑟𝑑O(B\cdot k_{l}\cdot k_{r}\cdot d)italic_O ( italic_B ⋅ italic_k start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⋅ italic_k start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ italic_d ). As for the selection module, the time complexity depends on the selection method utilized. Since conducting XOR𝑋𝑂𝑅XORitalic_X italic_O italic_R and counting the number of bits in 1 in SimHash can be accomplished in O(1)𝑂1O(1)italic_O ( 1 ), the overall time complexity of RAR is O(B(l+r)d+Bklkrd)𝑂𝐵𝑙𝑟𝑑𝐵subscript𝑘𝑙subscript𝑘𝑟𝑑O(B\cdot(l+r)\cdot d+B\cdot k_{l}\cdot k_{r}\cdot d)italic_O ( italic_B ⋅ ( italic_l + italic_r ) ⋅ italic_d + italic_B ⋅ italic_k start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⋅ italic_k start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ italic_d ), where d=Max(d1,d2)𝑑𝑀𝑎𝑥subscript𝑑1subscript𝑑2d=Max(d_{1},d_{2})italic_d = italic_M italic_a italic_x ( italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Since kl<<lmuch-less-thansubscript𝑘𝑙𝑙k_{l}<<litalic_k start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT < < italic_l, kr<<rmuch-less-thansubscript𝑘𝑟𝑟k_{r}<<ritalic_k start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT < < italic_r, it can be approximated as O(B(l+r)d)𝑂𝐵𝑙𝑟𝑑O(B\cdot(l+r)\cdot d)italic_O ( italic_B ⋅ ( italic_l + italic_r ) ⋅ italic_d ).

4. Experiments

4.1. Experimental Setup

Datasets. We conduct experiments on three public datasets. KKBox is a challenge dataset for music recommendation. Movielens contains users’ tagging records on movies. CandiCTR-Pub is a publicly available industrial dataset that is both practical and large-scale. Apart from CandiCTR-Pub, which already includes recall item sets, we manually construct recall item sets and look-alike user sets for KKBox and MovieLens by employing a pretrained matching model(e.g., DSSM) for inner product calculations between user-item and user-user pairs. Our implementation builds upon FuxiCTR and follow the public benchmark (Zhu et al., 2022) and previous works (Wang et al., 2022; Zheng et al., 2022; Lin et al., 2023).

Base models. We consider both high-order feature interaction and ensemble models. We choose IPNN, WDL, DeepFM, DCN, xDeepFM, AutoInt+, DeepIM, DCN-V2 as our base models, which has been evaluated in the BARS benchmark; see references in (Zhu et al., 2022, 2021).

Baselines. FRNet (Wang et al., 2022) learns context-aware feature representations by capturing cross-feature relationships, becoming the new SOTA. CIM (Zheng et al., 2022) encodes all candidate items into a context vector by transformer to characterize users’ implicit awareness.

Metrics. We apply the most popular metrics AUC and gAUC (weighted sum AUC, grouped by users) to evaluate the performance.

4.2. Performance Evaluation with SOTA Models

Table 2. Overall performance comparison against the state-of-the-art models on three datasets.
Datasets KKBox
Modules Raw +FRNet +CIM +RAR
Models gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%)
IPNN 78.75 85.25 78.27 84.94 78.31 85.25 80.15 86.45
WDL 78.44 85.02 78.26 84.85 78.67 85.36 79.74 86.23
DeepFM 78.76 85.32 78.70 85.26 78.90 85.68 80.14 86.51
DCN 78.66 85.25 78.69 85.26 79.16 85.74 80.22 86.58
xDeepFM 78.60 85.25 78.65 85.22 78.72 85.56 80.12 86.50
AutoInt+ 78.78 85.34 78.71 85.28 78.94 85.64 80.20 86.55
DeepIM 78.79 85.29 78.56 85.16 78.92 85.63 80.26 86.59
DCN-V2 78.64 85.17 78.62 85.22 79.45 85.77 80.12 86.49
Best RelImp 0.0% 0.0% 0.1% 0.1% 1.0% 0.7% 2.0% 1.6%
Datasets Movielens
Modules Raw +FRNet +CIM +RAR
Models gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%)
IPNN 95.53 96.53 95.14 96.15 95.28 96.38 95.92 97.02
WDL 95.29 96.23 95.17 96.19 95.36 96.44 95.73 96.67
DeepFM 94.84 95.90 94.65 96.11 95.06 96.23 95.40 96.40
DCN 95.32 96.35 95.21 96.33 95.30 96.37 95.51 96.54
xDeepFM 95.27 96.20 95.21 96.26 95.16 96.29 95.82 96.81
AutoInt+ 95.22 96.24 95.26 96.28 95.31 96.38 95.61 96.58
DeepIM 95.29 96.29 95.21 96.28 95.30 96.39 95.51 96.61
DCN-V2 94.95 96.00 95.23 96.25 95.35 96.39 95.66 96.63
Best RelImp 0.0% 0.0% 0.3% 0.3% 0.4% 0.4% 0.7% 0.7%
Datasets CandiCTR-Pub
Modules Raw +FRNet +CIM +RAR
Models gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%)
IPNN 52.87 60.92 52.35 61.08 53.76 61.86 54.35 62.53
WDL 52.82 60.90 52.48 60.82 53.92 62.73 54.43 63.92
DeepFM 52.82 60.99 52.48 60.89 53.94 62.80 54.50 63.92
DCN 52.78 60.95 52.55 60.61 53.75 61.75 54.50 62.53
xDeepFM 52.87 61.19 52.48 60.78 53.93 62.79 54.40 64.06
AutoInt+ 52.61 61.10 52.73 60.95 53.82 62.85 54.47 63.86
DeepIM 52.72 61.23 52.42 60.94 53.48 61.65 54.44 62.72
DCN-V2 52.65 61.06 52.64 60.79 53.32 61.69 54.30 62.70
Best RelImp 0.0% 0.0% 0.2% 0.3% 2.6% 3.0% 3.5% 5.0%

We evaluate RAR on existing models, including many SOTA methods, which is shown in Table 2. RAR notably surpasses other methods, with xDeepFM+RAR improving AUC by up to 4.7% across datasets, demonstrating the efficacy of using cross-stage data for richer user representations.

4.3. Ablation Study

We investigate the effectiveness of different components of RAR in Table 3. Three typical base models are selected to ensure generalizability and fairness.

  • Removing the channel of look-alike users: RAR-user replaces the channel of look-alike users with the target user only. Table 3 shows a 2.6% gAUC and 2.5% AUC increase over raw CTR models, highlighting the benefit of incorporating recall items into ranking models for diversified user representations. However, RAR-user’s comparison to RAR indicates further potential by leveraging user common interest introduced by look-alike users.

  • Removing the User and Item Selection Module: RAR-select, removing the Cross-Stage Selection Module, truncates the look-alike user set and recall item set to match RAR’s scale. Table 3 reveals a 0.4% gAUC and 2.3% AUC boost over base CTR models, which indicates the importance of a careful selection for further accuracy improvement.

  • Removing the Co-Interaction Module: RAR-aux-wght, omitting the Co-Interaction Module and related losses, uses simple sumpooling for user interest representation and shows weaker performance. RAR-wght, dropping weighting vectors but keeping auxiliary loss, significantly outperforms RAR-aux-wght, highlighting the auxiliary loss’s role in guiding hierachical information of recall items. RAR’s superiority over RAR-wght underscores the value of both matching loss and weighting vectors.

4.4. Training Efficiency

We present a wall-time comparison of RAR and CIM, the latter of which has proven effective in a real-world search advertising system. Findings detailed in Table 4 reveal that CIM_short, utilizing a truncated context input of 50 recall items, exhibits the least time consumption, whereas CIM_long, processing the full set of 305 recall items, incurs the most time. RAR’s selection modules effectively filter noise from extensive recall pools and look-alike user sets without reducing information. Additionally, as Section 4.2 demonstrates, the performance gain of RAR is substantial.

Table 3. Ablation study of RAR on CandiCTR-Pub.
Model DCN-V2 xDeepFM DeepIM Average RelImp
gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%) gAUC(%) AUC(%)
Raw 52.65 61.06 52.87 61.19 52.72 61.23 0% 0%
RAR-user 54.07 62.35 54.09 63.45 54.19 62.23 2.6% 2.5%
RAR-select 52.97 62.11 52.99 63.53 52.96 62.03 0.4% 2.3%
RAR-aux-wght 52.48 61.14 52.57 62.52 52.49 61.10 -0.4% 0.7%
RAR-wght 53.89 62.47 53.87 63.78 54.06 62.35 2.3% 2.8%
RAR 54.30 62.70 54.40 64.06 54.44 62.72 3.1% 3.3%
Table 4. Wall time comparison of the training and inference time of RAR and CIM on CandiCTR-Pub.
Model Training Time Rel.Inc Time per inference step Rel.Inc
CIM_short similar-to\sim14.5 min 0% similar-to\sim18 ms 0%
CIM_long similar-to\sim29 min 100% similar-to\sim48 ms 167%
RAR similar-to\sim18 min 24% similar-to\sim27.5 ms 53%

5. CONCLUSION

In this paper, we first shed light on the limitations of relying solely on homogeneous user behavior sequences to model user preferences and then we propose a novel architecture called RAR which utilizes cross-stage data to improve the cross-instance modeling capability of the models. RAR consists of two key sub-modules, which synergistically gather information from a vast pool of look-alike users and recall items, resulting in enriched user representations. RAR is a general framework that demonstrates great performance and compatibility through our in-depth experiments.

Acknowledgements.
The Shanghai Jiao Tong University team is partially supported by National Natural Science Foundation of China (62177033). We also gratefully acknowledge the support of MindSpore222https://www.mindspore.cn/, which is a new deep learning computing framework used for this research.

References

  • (1)
  • Lin et al. (2023) Jianghao Lin, Yanru Qu, Wei Guo, Xinyi Dai, Ruiming Tang, Yong Yu, and Weinan Zhang. 2023. MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1384–1395.
  • Wang et al. (2022) Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, and Ning Gu. 2022. Enhancing CTR prediction with context-aware feature representation learning. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 343–352.
  • Zheng et al. (2022) Kaifu Zheng, Lu Wang, Yu Li, Xusong Chen, Hu Liu, Jing Lu, Xiwei Zhao, Changping Peng, Zhangang Lin, and Jingping Shao. 2022. Implicit User Awareness Modeling via Candidate Items for CTR Prediction in Search Ads. In Proceedings of the ACM Web Conference 2022. 246–255.
  • Zhu et al. (2021) Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, and Xiuqiang He. 2021. Open benchmarking for click-through rate prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2759–2769.
  • Zhu et al. (2022) Jieming Zhu, Kelong Mao, Quanyu Dai, Liangcai Su, Rong Ma, Jinyang Liu, Guohao Cai, Zhicheng Dou, Xi Xiao, and Rui Zhang. 2022. BARS: Towards Open Benchmarking for Recommender Systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).