logo
today local_bar
Poster Session 5 · Friday, December 5, 2025 11:00 AM → 2:00 PM
#3101

Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

NeurIPS OpenReview

Abstract

Modern machine learning algorithms, especially deep learning-based techniques, typically involve careful hyperparameter tuning to achieve the best performance. Despite the surge of intense interest in practical techniques like Bayesian optimization and random search-based approaches to automating this laborious and compute-intensive task, the fundamental learning-theoretic complexity of tuning hyperparameters for deep neural networks is poorly understood.
Inspired by this glaring gap, we initiate the formal study of hyperparameter tuning complexity in deep learning through a recently introduced data-driven setting. We assume that we have a series of learning tasks, and we have to tune hyperparameters to do well on average over the distribution of tasks.
A major difficulty is that the utility function as a function of the hyperparameter is very volatile, and furthermore, it is given implicitly by an optimization problem over the model parameters. To tackle this challenge, we introduce a new technique to characterize the discontinuities and oscillations of the utility function on any fixed problem instance as we vary the hyperparameter; our analysis relies on subtle concepts, including tools from algebraic geometry, differential geometry, and constrained optimization. We use this to show that the learning-theoretic complexity of the corresponding family of utility functions is bounded.
We instantiate our results and provide sample complexity bounds for concrete applications—tuning a hyperparameter that interpolates neural activation functions and setting the kernel parameter in graph neural networks.