Hallucination in AI Dialogues: Detection and Mitigation
Large Language Models (LLMs) excel at generating fluent language but remain vulnerable to producing false or misleading outputs, commonly referred to as hallucinations. This presentation explores the nature of hallucinations in dialogue systems, why they emerge, and why they matter in high-stakes applications. I review current strategies for detecting hallucinations, including human evaluation, LLM-as-judge methods, uncertainty estimation, and fact-checking techniques such as FActScore. I also introduce VISTA Score, a new framework for sequential, turn-based verification that improves consistency and factuality in conversational settings. Building on these detection methods, I outline complementary approaches for mitigating hallucinations, from retrieval-augmented generation to evaluation pipelines that encourage abstention when confidence is low. Through examples from my virtual museum tour guide project, I demonstrate how combining detection and mitigation strategies can lead to more trustworthy and reliable dialogue systems.