3 papers across 3 sessions
We propose spectral conditioning of attention layers to improve Jacobian conditioning, leading to more stable and efficient optimization with negligible computational overhead and consistent gains across diverse transformer architectures.
We theoretically formalize real-world misalignment in multimodal learning via latent-variable modeling, showing that learned representations inherently encode semantics invariant to selection and perturbation biases.