3 papers across 3 sessions
We show that highly accurate LLMs can be learned from training sets consisting entirely of synthetic data and weakly curated data.
We propose the Anchored Diffusion Language Model (ADLM), a novel two-stage framework that generates an important token mixture which guides the prediction of missing likelihoods, resulting in better likelihood modeling and generated text quality.
We propose CADMorph, an inference-time editing method for parametric CAD models using the signal of geometry changes via a Plan–Generate–Verify Loop over pretrained priors, namely a LDM and an LLM, bypassing the need of non-existent editing data.