BibTeX  
Home  
Learning Language through Grounding
  (NAACL 2025 Tutorial)

Grounding has been a long-standing concept in natural language processing (NLP) and computational linguistics (CL). This tutorial provides a historical overview and introduces recent advances in learning language through grounding, with a particular emphasis on the latter. We will begin by tracing the history of grounding and presenting a unified perspective on the term. In Parts II to IV, we will delve into recent progress in learning lexical semantics, syntax, and complex meanings through various forms of grounding. We will conclude by discussing future directions and open challenges, particularly those related to the growing trend of large language models and scaling.

Tutorial Instructors

Freda Shi  
University of Waterloo & Vector Institute, Canada CIFAR AI Chair
Ziqiao Ma  
University of Michigan
Jiayuan Mao  
Massachusetts Institute of Technology
Parisa Kordjamshidi  
Michigan State University
Joyce Chai  
University of Michigan

Materials

Part I (20 minutes): Introduction to grounding.

We will review the history of grounding, and introduce the unified definition of grounding. In particular, grounding, in this tutorial, refers to processing the primary data with supervision from another source, where the two sources of data have positive mutual information. We will exemplify the definition through connection to existing work such as visual grounding, acoustic grounding, factual grounding, and cross-lingual grounding. We refer to ACL 2020 Tutorial 5 on building common ground through communication, and AAAI 2013 Keynote for early work on grounded language learning.

Part II (30 minutes): Learning lexicons through grounding.

Word acquisition has been a fundamental problem in language acquisition concerned by both cognitive science and robotics. With the advancement of neural networks and multimodal machine learning, there has been work on learning the meanings of written or spoken words by grounding language to visual signals. Particularly, there has been work focusing on grounding verb semantics to the change of the physical world. Another line of work on learning lexicons through cross-lingual grounding.

In the first 10 minutes, we will introduce the background and focus on recent advances in the remaining time. Work on learning lexical semantics through interaction or learning lexicon to compose sentence-level meanings will be deferred to Part IV.

Part III (30 minutes): Learning syntax through grounding.

Constituency parses of sentences can be learned by grounding to visual signals. Follow-up work has demonstrated the effectiveness of such visually grounded systems on learning variants of constituency and dependency grammars. On another line, word alignment, based cross-lingual transfer can also be considered as an instantiation of learning syntax through cross-lingual grounding, where the text in the target language(s) is grounded to existing knowledge in the source language(s).

A brief introduction of related syntactic knowledge, such as constituency, dependency, and combinatory categorial grammars, will be presented in the first 10 minutes of this part to help the audience better understand the content. We will focus on recent approaches to learning syntax through visual grounding and cross-lingual grounding in the rest of the time. Efforts on joint learning of syntax and semantics will be delivered in Part IV.

Part IV (60 minutes): Learning complex meanings (semantics and pragmatics) through grounding.

It has attracted significant interest in learning and evaluating meaning acquisition in visually grounded settings. In addition to visual grounding, interaction is also a common source of supervision, where considerations regarding pragmatics and theory of mind are often taken into account. Similarly to what has been mentioned in Part II, cross-lingual transfer on sentence or document-level meanings, particularly transferring knowledge from high-resource to low-resource languages, should also be considered as instantiations of cross-lingual grounding.

This part will cover three topics for 20 minutes each: learning semantics through grounding, learning pragmatics through grounded interaction, and learning cross-lingual text representations through cross-lingual grounding.

Part V (15 minutes): Discussion on future directions and open problems.

A key discussion for future directions centers around whether grounding should emerge naturally from scaling models or whether we should enforce grounded supervision to achieve more efficient learning. Additionally, the scope of grounding can be broadened beyond traditional modalities, incorporating touch, olfaction, non-human sensors, video and temporal data, 3D environments, proprioception, episodic experiences, and even other forms of meta-cognition.

References show selected / show all by topic

Overview
Lexicon Learning / Syntax Learning / Semantics Learning / Pragmatics Learning
Crossmodal Grounding / Crosslingual Grounding / Epistemic Grounding / Interactive Grounding

Learning Language Structures through Grounding
Haoyue Freda Shi

PhD Thesis, Toyota Technological Institute at Chicago    Paper
Thesis of Distinction

The Vector Grounding Problem
Dimitri Coelho Mollo, Raphaël Millière

arXiv preprint arXiv:2304.01481    Paper

Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Daniel Fried, Nicholas Tomlin, Jennifer Hu, Roma Patel, Aida Nematzadeh

Findings of EMNLP    Paper

Grounding 'Grounding' in NLP
Khyathi Raghavi Chandu, Yonatan Bisk, Alan W. Black

Findings of ACL    Paper

Experience Grounds Language
Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

EMNLP    Paper

Language to Action: Towards Interactive Task Learning with Physical Agents
Joyce Y. Chai, Qiaozi Gao, Lanbo She, Shaohua Yang, Sari Saba-Sadiya, Guangyue Xu

IJCAI    Paper
Invited Paper

The Symbol Grounding Problem
Stevan Harnad

Physica D: Nonlinear Phenomena    Paper

Grounding in Communication
Herbert H. Clark, Susan E. Brennan

Perspectives on socially shared cognition    Paper

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Ziqiao Ma, Jiayi Pan, Joyce Chai

ACL    Paper
Outstanding Paper Award

Word Discovery in Visually Grounded, Self-Supervised Speech Models
Puyuan Peng, David Harwath

Interspeech    Paper
Oral Presentation

Cross-lingual Entity Alignment with Incidental Supervision
Muhao Chen, Weijia Shi, Ben Zhou, Dan Roth

EACL    Paper

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment
Haoyue Shi, Luke Zettlemoyer, Sida I. Wang

ACL    Paper

Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages
Garrett Nicolai, David Yarowsky

ACL    Paper

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu

ICLR    Paper
Oral Presentation

Bilingual Lexicon Induction through Unsupervised Machine Translation
Mikel Artetxe, Gorka Labaka, Eneko Agirre

ACL    Paper

Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition
Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny

ICASSP    Paper

Verb Physics: Relative Physical Knowledge of Actions and Objects
Maxwell Forbes, Yejin Choi

ACL    Paper

Interactive Learning of Grounded Verb Semantics Towards Human-Robot Communication
Lanbo She, Joyce Chai

ACL    Paper

Physical Causality of Action Verbs in Grounded Language Understanding
Qiaozi Gao, Malcolm Doering, Shaohua Yang, Joyce Chai

ACL    Paper

Incremental Acquisition of Verb Hypothesis Space Towards Physical World Interaction
Lanbo She, Joyce Chai

ACL    Paper

Reframing Linguistic Bootstrapping as Joint Inference Using Visually-Grounded Grammar Induction Models
Eva Portelance, Siva Reddy, Timothy J O'Donnell

arXiv:2406.11977    Paper

Audio-Visual Neural Syntax Acquisition
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

ASRU    Paper

Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing
Freda Shi, Kevin Gimpel, Karen Livescu

ACL    Paper

PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation
Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

EACL    Paper

"Wikily" Supervised Neural Translation Tailored to Cross-Lingual Tasks
Mohammad Sadegh Rasooli, Chris Callison-Burch, Derry Tanti Wijaya

EMNLP    Paper

Dependency Induction Through the Lens of Visual Perception
Ruisi Su, Shruti Rijhwani, Hao Zhu, Junxian He, Xinyu Wang, Yonatan Bisk, Graham Neubig

CoNLL    Paper

Video-Aided Unsupervised Grammar Induction
Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo

NAACL-HLT    Paper
Best Paper Award

Visually Grounded Compound PCFGs
Yanpeng Zhao, Ivan Titov

EMNLP    Paper
Outstanding Paper Award

What is Learned in Visually Grounded Neural Syntax Acquisition
Noriyuki Kojima, Hadar Averbuch-Elor, Alexander Rush, Yoav Artzi

ACL    Paper

Visually Grounded Neural Syntax Acquisition
Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu

ACL    Paper
Best Paper Nominee

Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
Xuezhe Ma, Fei Xia

ACL    Paper

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma

ICLR    Paper
Oral Presentation

Grammar-Based Grounded Lexicon Learning
Jiayuan Mao, Freda Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum

NeurIPS    Paper

SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning
Roshanak Mirzaee, Hossein Rajaby Faghihi, Qiang Ning, Parisa Kordjamshidi

NAACL-HLT    Paper

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan, Mohit Bansal

EMNLP    Paper

Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
Brenden Lake, Marco Baroni

ICML    Paper

Spatial Role Labeling Annotation Scheme
Parisa Kordjamshidi, Martijn van Otterlo, Marie-Francine Moens

Handbook of linguistic annotation    Paper

A Corpus of Natural Language for Visual Reasoning
Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi

ACL    Paper
Best Resource Paper Award

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel

arXiv preprint arXiv:1411.2539    Paper

Pragmatic Inference with a CLIP Listener for Contrastive Captioning
Jiefu Ou, Benno Krojer, Daniel Fried

Findings of ACL    Paper

Computational Language Acquisition with Theory of Mind
Andy Liu, Hao Zhu, Emmy Liu, Yonatan Bisk, Graham Neubig

ICLR    Paper

Language Learning from Communicative Goals and Linguistic Input
Hao Zhu, Yonatan Bisk, Graham Neubig

CogSci    Paper

Interactive Classification by Asking Informative Questions
Lili Yu, Howard Chen, Sida I. Wang, Tao Lei, Yoav Artzi

ACL    Paper

A Knowledge-Grounded Neural Conversation Model
Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, Michel Galley

AAAI    Paper

Learning Language Games through Interaction
Sida I. Wang, Percy Liang, Christopher D. Manning

ACL    Paper

BibTeX

@proceedings{naacl2025grounding,
    author    = {Shi, Freda and Ma, Ziqiao and Mao, Jiayuan and Kordjamshidi, Parisa and Chai, Joyce},
    title     = {Learning Language through Grounding},
    booktitle = {Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts)},
    year      = {2025},
}