Chuffed to observe that Ash Lewis will be presenting our paper Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents at the GEM 💎 Workshop at ACL 2025 today! Turns out that with automatic cleaning, self-training can work as well as knowledge distillation in getting Llama 8B to work better than GPT-4o on practical tasks. Reducing exposure bias looks like the explanation.