data annotation

2 papers across 2 sessions

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance

#112 Spotlight · Aladin Djuhera, Swanand Kadhe, Syed Zawad, Farhan Ahmed, Heiko Ludwig, Holger Boche

We compare leading open SFT datasets, add quality annotations using MagPie, and design curation recipes leading to a high-performing leaner SFT mixture

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

#3300 · Lequan Lin, Dai Shi, Andi Han, Feng Chen, Qiuzheng Chen, Jiawen Li, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Junbin Gao

This paper proposes the ACT data pipeline, which reduces human annotation costs by using MLLMs as annotators and error detectors, and provides a theoretical analysis to ensure effective downstream training.