J Zico Kolter

Full Professor, Carnegie Mellon University

7 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

#1312 Spotlight · Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko

We introduce a benchmark to measure safety of general computer use agents across diverse categories of harm

Poster Session 4

3 papers

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Mean Flows for One-step Generative Modeling

#5509 · Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, Kaiming He

Antidistillation Sampling

#5402 · Yash Savani, Asher Trockman, Zhili Feng, Yixuan Even Xu, Avi Schwarzschild, Alexander Robey, Marc Anton Finzi, J Zico Kolter

Antidistillation sampling strategically modifyies a model's next-token probability distribution to poison reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility.

Safety Pretraining: Toward the Next Generation of Safe AI

#5210 · Pratyush Maini, Sachin Goyal, Dylan Sam, Alexander Robey, Yash Savani, Yiding Jiang, Andy Zou, Matt Fredrikson, Zachary Chase Lipton, J Zico Kolter

We present a data-centric pretraining framework that builds safety into the model from the start

Poster Session 6

3 papers

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Predicting the Performance of Black-box Language Models with Follow-up Queries

#1304 · Dylan Sam, Marc Anton Finzi, J Zico Kolter

We reliably predict the behavior of black-box language models by training predcitors on their responses to follow-up questions.

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

#4915 · Andy Zou, Maxwell Lin, Eliot Krzysztof Jones, Micha V. Nowak, Mateusz Dziemian, Nick Winter, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Yarin Gal, Dan Hendrycks, J Zico Kolter, Matt Fredrikson

OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics

#1308 · Vineeth Dorna, Anmol Reddy Mekala, Wenlong Zhao, Andrew McCallum, J Zico Kolter, Zachary Chase Lipton, Pratyush Maini

Open-source framework for LLM unlearning supporting multiple benchmarks and methods.