-
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings
Authors:
Revanth Gangi Reddy,
Omar Attia,
Yunyao Li,
Heng Ji,
Saloni Potdar
Abstract:
Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain ques…
▽ More
Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain question-answering, or proposition-level ranking for attribution. In this work, we introduce the idea of any-granularity ranking, which leverages multi-vector embeddings to rank at varying levels of granularity while maintaining encoding at a single (coarser) level of granularity. We propose a multi-granular contrastive loss for training multi-vector approaches, and validate its utility with both sentences and propositions as ranking units. Finally, we demonstrate the application of proposition-level ranking to post-hoc citation addition in retrieval-augmented generation, surpassing the performance of prompt-driven citation generation.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Entity Disambiguation via Fusion Entity Decoding
Authors:
Junxiong Wang,
Ali Mousavi,
Omar Attia,
Ronak Pradeep,
Saloni Potdar,
Alexander M. Rush,
Umar Farooq Minhas,
Yunyao Li
Abstract:
Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training a…
▽ More
Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked. We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions. Given text and candidate entities, the encoder learns interactions between the text and each candidate entity, producing representations for each entity candidate. The decoder then fuses the representations of entity candidates together and selects the correct entity. Our experiments, conducted on various entity disambiguation benchmarks, demonstrate the strong and robust performance of this model, particularly +1.5% in the ZELDA benchmark compared with GENRE. Furthermore, we integrate this approach into the retrieval/reader framework and observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
△ Less
Submitted 7 May, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
On the Role of Vehicular Mobility in Cooperative Content Caching
Authors:
Osama Attia,
Tamer ElBatt
Abstract:
In this paper, we analyze the performance of cooperative content caching in vehicular ad hoc networks (VANETs). In particular, we characterize, using analysis and simulations, the behavior of the probability of outage (i.e. not finding a requested data chunk at a neighbor) under freeway vehicular mobility. First, we introduce a formal definition for the probability of outage in the context of coop…
▽ More
In this paper, we analyze the performance of cooperative content caching in vehicular ad hoc networks (VANETs). In particular, we characterize, using analysis and simulations, the behavior of the probability of outage (i.e. not finding a requested data chunk at a neighbor) under freeway vehicular mobility. First, we introduce a formal definition for the probability of outage in the context of cooperative content caching. Second, we characterize, analytically, the outage probability under vehicular and random mobility scenarios. Next, we verify the analytical results using simulations and compare the performance under a number of plausible mobility scenarios. This provides key insights into the problem and the involved trade-offs and enable us to assess the potential opportunity offered by the, somewhat structured, vehicular mobility that can be exploited by cooperative content caching schemes. The presented numerical results exhibit complete agreement between the analytical and simulation studies. Finally, we observe that vehicular mobility creates opportunities for enhanced outage performance under practically relevant scenarios.
△ Less
Submitted 3 March, 2012;
originally announced March 2012.