Sohee Yang (양소희)

PhD Student/Research Scientist Intern

I am a third-year PhD student at UCL and a part-time research scientist intern at Google DeepMind, splitting my time between the two organizations during my Ph.D. studies. I am currently also a remote visiting Ph.D. student at Tel Aviv University in Prof. Mor Geva’s lab. I am co-advised by Prof. Pontus Stenetorp and Prof. Sebastian Riedel at UCL NLP Group, while being co-advised by Prof. Sebastian Riedel and Prof. Mor Geva on my Google DeepMind projects. My research focuses on natural language processing and machine learning, with particular emphasis on understanding and enhancing the reasoning abilities of Large Language Models in a safe and controllable way. I completed my Master’s in Artificial Intelligence at Kim Jaechul Graduate School of AI, KAIST, advised by Prof. Minjoon Seo at Language & Knowlegde Lab. Prior to my graduate studies, I was a research engineer at Naver Clova for 2.5 years.

I have suffered from severe RSI (Repetitive Strain Injury) on all my fingers developed from keyboard overuse from Jan 2021 to July 2022; during this period, I could not type anything without seriously aggravating nerve pain. Thankfully, in July 2022, the RSI finally began to be healed through numerous sessions of TPIs (Trigger Point Injections) on my arms by a very skillful doctor and I could restart leading a research project in August 2022. I plan to post on my homepage someday about what have been helpful for my treatment, in case it is helpful for others suffering from RSI as well.

Last page update: June 18, 2025

Interests

Interpretability
Large Language Models
Natural Language Processing
Machine Learning

Education

PhD in Computer Science, March 2023 - Present

University College London (UCL)
MS in Artificial Intelligence, March 2021 - Feb 2023

Kim Jaechul Graduate School of AI, KAIST
BS in Computer Science and Engineering (Summa Cum Laude), March 2014 - Feb 2018

Handong Global University (GPA: 4.45/4.5, 1st rank in CSEE)

Industry Experience

Language Team, Google DeepMind
Part-Time Research Scientist Intern, Mar 2023 - Present
LINE & Clova, NAVER Corp.
Research Engineer, Sep 2018 - Feb 2021
* Parser of OCR
* NLU of AiCall
AI Team, Kakao Corp.
Research Intern, Jan 2018 - Jun 2018

Publications

Sohee Yang, Sang-Woo Lee, Nora Kassner, Daniela Gottesman, Sebastian Riedel, Mor Geva (2025). How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?. arXiv, June 2025.

PDF Tweet

Hyeonbin Hwang*, Byeongguk Jeon*, Seungone Kim, Jiyeon Kim, Hoyeon Chang, Sohee Yang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo (2025). Let's Predict Sentence by Sentence. arXiv, May 2025.

PDF Tweet

Hoyeon Chang*, Jinho Park*, Hanseul Cho, Sohee Yang, Miyoung Ko, Hyeonbin Hwang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo (2025). The Coverage Principle: A Framework for Understanding Compositional Generalization. arXiv, May 2025.

PDF Code Tweet

Dongkeun Yoon, Seungone Kim, Sohee Yang, Sunkyoung Kim, Soyeon Kim, Yongil Kim, Eunbi Choi, Yireun Kim, Minjoon Seo (2025). Reasoning Models Better Express Their Confidence. arXiv, May 2025.

PDF Tweet

Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel*, Mor Geva* (2024). Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?. In Findings of ACL 2025.

PDF Code Dataset Tweet

Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo (2024). How Do Large Language Models Acquire Factual Knowledge During Pretraining?. In NeurIPS 2024.

PDF Code

Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson (2024). Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries. In EMNLP 2024.

PDF Code

Soyoung Yoon*, Chaeeun Kim*, Hyunji Lee, Joel Jang, Sohee Yang, Minjoon Seo (2024). Exploring the Practicality of Generative Retrieval on Dynamic Corpora. In EMNLP 2024.

PDF

Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva*, Sebastian Riedel* (2024). Do Large Language Models Latently Perform Multi-Hop Reasoning?. In ACL 2024.

PDF Code Dataset Poster Slides Tweet

Sohee Yang, Jonghyeon Kim, Joel Jang, Seonghyeon Ye, Hyunji Lee, Minjoon Seo (2023). Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis. TACL 2024 (presented in ACL 2024).

PDF Code Dataset Poster Slides Tweet

Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo (2023). Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following. In AAAI 2024.

PDF Code

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, Minjoon Seo (2022). Knowledge Unlearning for Mitigating Privacy Risks in Language Models. In ACL 2023.

PDF Code

Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vlad Karpukhin, Yi Lu, Minjoon Seo (2022). Contextualized Generative Retrieval. In Findings of ACL 2023.

PDF Code

Hyunji Lee, Sohee Yang, Hanseok Oh, Minjoon Seo (2022). Generative Multi-hop Retrieval. In EMNLP 2022.

PDF Code

Joel Jang*, Seonghyeon Ye*, Changho Lee, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Minjoon Seo (2022). TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models. In EMNLP 2022.

PDF Code

Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Stanley Jungkyu Choi, Minjoon Seo (2021). Towards Continual Knowledge Learning of Language Models. In ICLR 2022.

PDF Code

Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Sohee Yang, Minjoon Seo (2021). Spatial Dependency Parsing for Semi-Structured Document Information Extraction. In Findings of ACL 2021.

PDF

Sohee Yang, Minjoon Seo (2021). Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering. In NAACL 2021.

PDF Code Project Poster Slides Video

Sewon Min et al. (2021). NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned. PMLR 2021.

PDF Project

Sohee Yang, Minjoon Seo (2020). Is Retriever Merely an Approximator of Reader?. In Spa-NLP Workshop at ACL 2022.

PDF Code Poster Slides Video

Jung-Woo Ha*, Kihyun Nam*, Jin Gu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim (2020). ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers. In INTERSPEECH 2020.

PDF Code

Sungdong Kim, Sohee Yang, Gyuwan Kim, Sang-Woo Lee (2019). Efficient Dialogue State Tracking by Selectively Overwriting Memory. In ACL 2020.

PDF Code Video

Sang-Woo Lee, Tong Gao, Sohee Yang, Jaejun Yoo, Jung-Woo Ha (2019). Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation. In ICLR 2019.

PDF Code Poster Slides