2 papers across 2 sessions
We show that Sparse Autoencoders (SAEs) are inherently biased toward detecting only a subset of concepts in model activations shaped by their internal assumptions, highlighting the need for concept geometry-aware design of novel SAE architectures.