Next week, I will be giving a talk on sparse autoencoders (SAEs), their use in interpretability for LLMs, and my work applying them to vision models. I will cover Anthropic’s core line of work (Toy Models of Superposition, Towards Monosemanticity, Scaling Monosemanticity), the core challenges that still exist, why I’m excited about them for vision, and interesting applications of the technology.