LASR Labs - NeurIPS 2025

today local_bar

🏛 LASR Labs

3 papers across 2 sessions

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Detecting High-Stakes Interactions with Activation Probes

#1112 · Alex McKenzie, Urja Pawar, Phil Blandfort, William Bankes, David Krueger, Ekdeep S Lubana, Dmitrii Krasheninnikov

We train probes on activations to classify high- vs low-stakes scenarios, find they outperform medium-sized fine-tuned LLMs, and consider applications to monitoring.

Poster Session 4

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring

#1512 · Benjamin Arnav, Pablo Bernabeu-Perez, Nathan Helm-Burger, Timothy H. Kostolansky, Hannes Whittingham, Mary Phuong

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

#1004 · David Chanin, James Wilken-Smith, Tomáš Dulka, Hardik Bhatnagar, Satvik Golechha, Joseph Bloom