5 papers across 3 sessions
We develop efficient algorithms for non-uniformly sampling over directed acyclic graph structures, and use these along with results from online learning, to develop efficient algorithms for agnostically-learning Bayes nets in KL divergence.
Transformer-based language models learn low-dimensional task manifolds across layers, with similar patterns/trends in intrinsic dimensions revealing similar compression strategies despite varying architectures/sizes.
We propose a new method for interpretating transformer circuit by performing SVD on query-value and value-output matrices
Provide First Ever Sample complexity bounds for a first order bi-level algorithm.
Global Convergence with Order-Optimal rate for Average Reward Constrained MDPs with Primal-Dual Natural Actor Critic Algorithm