Recent Research by CAIAC Members

Getting out of the Big-Muddy: Escalation of Commitment in LLMs
Emilio Barkett, Olivia Long, Paul Kröger
August 3, 2025
Efficiently Detecting Hidden Reasoning with a Small Predictor Model
Rohan Subramani, Vishnu Vardhan Sai Lanka, Yau-Meng Wong, Daria Ivanova
July 13, 2025
Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
Emilio Barkett, Olivia Long, Madhavendra Thakur
June 12, 2025
The Partially Observable Off-Switch Game
Andrew Garber, Rohan Subramani, Linus Luu, Mark Bedaywi, Stuart Russell, Scott Emmons
April 11, 2025
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica, Yixin Liu, Arman Cohan, Tim GJ Rudner
December 15, 2024
Generalization Analogies (Genies): A Testbed for Generalizing AI Oversight to Hard-to-Measure Domains
Joshua Clymer, Garrett Baker, Rohan Subramani, Sam Wang
November 13, 2023