Exploring the Practicality of Generative Retrieval on Dynamic Corpora

Abstract

Benchmarking the performance of information retrieval (IR) methods are mostly conducted with a fixed set of documents (static corpora); in realistic scenarios, this is rarely the case and the document to be retrieved are constantly updated and added. In this paper, we focus on conducting a comprehensive comparison between two categories of contemporary retrieval systems, Dual Encoders (DE) and Generative Retrievals (GR), in a dynamic scenario where the corpora to be retrieved is updated. We also conduct an extensive evaluation of computational and memory efficiency, crucial factors for IR systems for real-world deployment. Our results demonstrate that GR is more adaptable to evolving knowledge (+13-18% on the StreamingQA Benchmark), robust in handling data with temporal information (x 10 times), and efficient in terms of memory (x 4 times), indexing time (x 6 times), and inference flops (x 10 times). Our paper highlights GR’s potential for future use in practical IR systems.

Publication
Empirical Methods in Natural Language Processing

Related