Recent Research by CAIAC Members
The Partially Observable Off-Switch Game
Andrew Garber, Rohan Subramani, Linus Luu, Mark Bedaywi, Stuart Russell, Scott Emmons
April 11, 2025
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica, Yixin Liu, Arman Cohan, Tim GJ Rudner
December 15, 2024
Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects
Zhuofan Ying, Peter Hase, Mohit Bansal
December 2, 2024
Generalization Analogies (Genies): A Testbed for Generalizing AI Oversight to Hard-to-Measure Domains
Joshua Clymer, Garrett Baker, Rohan Subramani, Sam Wang
November 13, 2023