2 papers across 2 sessions
We propose a novel inference-time personalized alignment method that elicits the user's preferences with a few preference queries.
We propose a curriculum strategy for guiding the training of agents that operate under strict trajectory constraints during deployment by adaptively tightening constraints based on agent's performance.