6 papers across 3 sessions
We propose Vittle, a new visual instruction tuning framework that improves robustnessof MLLMs to data distribution shifts by pursuing the minimal sufficient representation.
We present MedMax, a large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models, and show that our data achieves superior performance than GPT-4o on diverse biomedical tasks.
We introduce T-SHIRT, a new data selection method for instruction tuning LLMs that scores data at the token level and emphasizes robustness.
We propose a data selection method that leverages sparse, monosemantic neuronal activations learned via a sparse autoencoder to improve task-specific instruction tuning for large language models.