5 papers across 3 sessions
We introduce a large-scale synthetic dataset and a fine-grained alignment framework for composed person retrieval, and provide a manually annotated benchmark test set for objective evaluation.
This paper introduces PANGEA, a method that leverages general-purpose data to generate diverse and high-quality synthetic data, improving LLM performance on domain-specific tasks.
A novel Structural Entropy-guided Knowledge Navigator (SENATOR) framework that addresses the intrinsic knowledge deficiencies of LLMs.
Time Series Generation using Large Language Models and compact embeddings
An agentic pipeline for multi-turn synthetic data generation that produces high-quality training data for AI agents.