Clippers 3/26: Amad Hussain, A Review of RAPTOR: Can Tree-Organized Retrieval Improve a Virtual Museum Tour Guide

This week in Clippers (3/26) I will be presenting a review of the paper, RAPTOR: Recursive Abstractive Processing For Tree-Organized Retrieval (https://arxiv.org/abs/2401.18059). This work seeks to semantically cluster packages within a corpus and hierarchically create summaries based upon these clusters. A retrieval system may then present the original passages or summaries to a downstream LLM for Retrieval-Augmented-Generation (RAG). The authors present SOTA results over question-answering answering tasks, especially that requiring multi-step reasoning. In our talk, we will review RAPTOR and seek to explore how it, and other related retrieval solutions, can be applied to the existing Virtual Museum Tour Guide project in collaboration with COSI. This will basically be a brainstorming session following a paper review so I am hoping for good discussion.

Clippers 3/19: Christian Clark on semantically aided categorial grammar induction

Studies of grammar induction are a source of evidence about the mechanisms underlying children’s language acquisition. Manipulating the prior knowledge and inductive biases of grammar inducers can yield insights about the learnability of syntactic structure under various assumptions about the learner. While early induction models often relied on annotated data, more recent models have made progress toward learning from raw data, working with both probabilistic context-free grammars and categorial grammars. Still, accuracy levels of current systems fall well below human learners.

Incorporating world knowledge into grammar inducers is a potential path toward further improvement, one which is well motivated by psycholinguistic theory (e.g. semantic bootstrapping). Along these lines, I will present a categorial grammar inducer that incorporates semantic knowledge — implemented as association weights between predicate roles — into an existing syntax-only inducer. Associations can be distilled from large language models (LLMs), opening up possibilities not only for better grammar induction but also for exploration of the conceptual knowledge acquired by LLMs. This project is still a work in progress, but I will present some preliminary results on synthetic data and broad-coverage corpora.

Clippers 3/5: Alyssa Allen on SQL Query Explainability using Natural Language Generation

SQL Query Explainability using Natural Language Generation

This work is rooted in a larger project aimed at developing a dialogue system that helps increase transparency of database query outputs for non-expert SQL users. Previously, I’ve discussed processes for building a training set using few-shot prompting and a hand-annotated set of commented queries. Additionally, I’ve discussed test set results from LLMs (such as ChatGPT and Llama). This presentation will shift focus to the content of the natural language.

I’ll discuss the development of comment guidelines and the need for guidelines in standardizing the evaluation. Comment guidelines should ideally provide transparency in what constitutes a “good” comment. Comments should also 1) reflect certain properties of the relational database structure, 2) prioritize semantic fidelity to the query and 3) align with the user language wherever appropriate. The comment guidelines use these core elements to outline how generated natural language can increase explainability of database queries. Our methods will be compared to approaches that leverage templated or rule-based systems of explainability.