6 papers across 3 sessions
We propose Hölder-DPO, the first alignment method with a provable redescending property, which enables robust learning from noisy human feedback by identifying and correcting mislabeled data, improving alignment and model performance.
Estimate 3D human poses from multi-view radar data using 2D image-plane keypoints and 3D BBox labels, rather than more expensive 3D keypoint labels.
Investigate how language bias originates, why margin mechanisms are effective, and propose a novel Multi-Margin Collaborative Debiasing (MMCD) framework
We introduce Forecasting in Non-stationary Offline RL (FORL), a novel framework designed to be robust to passive non-stationarities, leveraging diffusion probabilistic models and time-series forecasting foundation models.
This study provides an information-theoretic analysis of discrete latent variables in VQ-VAEs, deriving a novel generalization error bound based on the complexity of the latent variables and encoder.