Publications | Sohee Yang

Do Large Language Models Latently Perform Multi-Hop Reasoning?

We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as “The mother of …

Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, Sebastian Riedel

PDF

Exploring the Practicality of Generative Retrieval on Dynamic Corpora

Benchmarking the performance of information retrieval (IR) methods are mostly conducted with a fixed set of documents (static corpora); …

Soyoung Yoon, Chaeeun Kim, Hyunji Lee, Joel Jang, Sohee Yang, Minjoon Seo

PDF

Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

Previous works in prompt engineering for large language models have introduced different gradient-free probability-based prompt …

Sohee Yang, Jonghyeon Kim, Joel Jang, Seonghyeon Ye, Hyunji Lee, Minjoon Seo

PDF Code Slides

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the …

Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo

PDF Code

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate …

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, Minjoon Seo

PDF Code

Contextualized Generative Retrieval

The text retrieval task is mainly performed in two ways: the bi-encoder approach and the generative approach. The bi-encoder approach …

Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vlad Karpukhin, Yi Lu, Minjoon Seo

PDF Code

Generative Multi-hop Retrieval

Multi-hop retrieval is the task of retrieving a series of multiple documents that together provide sufficient evidence to answer a …

Hyunji Lee, Sohee Yang, Hanseok Oh, Minjoon Seo

PDF Code

TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models

Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which …

Joel Jang, Seonghyeon Ye, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Minjoon Seo

PDF Code

Towards Continual Knowledge Learning of Language Models

Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus, …

Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Stanley Jungkyu Choi, Minjoon Seo

PDF Code

Spatial Dependency Parsing for Semi-Structured Document Information Extraction

Information Extraction (IE) for document images is often approached as a BIO tagging problem, where the model sequentially goes through …

Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Sohee Yang, Minjoon Seo

PDF

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering

In open-domain question answering, retrieve-and-read mechanism has the inherent benefit of interpretability and the easiness of adding, …

Sohee Yang, Minjoon Seo

PDF Code Project Poster Slides Video

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems …

Sewon Min et al.

PDF Project

Is Retriever Merely an Approximator of Reader?

The state of the art in open-domain question answering (QA) relies on an efficient retriever that drastically reduces the search space …

Sohee Yang, Minjoon Seo

PDF Code Poster Slides Video

ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services. …

Jung-Woo Ha, Kihyun Nam, Jin Gu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim

PDF Code

Efficient Dialogue State Tracking by Selectively Overwriting Memory

Recent works in dialogue state tracking (DST) focus on an open vocabulary-based setting to resolve scalability and generalization …

Sungdong Kim, Sohee Yang, Gyuwan Kim, Sang-Woo Lee

PDF Code Video

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

Answerer in Questioner’s Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented …

Sang-Woo Lee, Tong Gao, Sohee Yang, Jaejun Yoo, Jung-Woo Ha

PDF Code Poster Slides