5 papers across 2 sessions
We present a time scheduler that selects sampling points based on entropy rather than uniform time spacing, ensuring each point contributes an equal amount of information to the final generation.
We introduce Feedback Guidance, a trajectory specific guidance scheme that relies on the models own prediction to dynamically adapt the value of the guidance scale.
To advance evaluation of RPOMDP policies, we (1) introduce a formalization for suitable benchmarks, (2) define a new evaluation method, and (3) lift existing POMDP value bounds to RPOMDPs.
We characterize and provide algorithms for multi-environment POMDPs.
We use mechanistic interpretability to reverse engineer how neural networks break protected cryptographic implementations via side-channel analysis.