2 papers across 2 sessions
We introduce a policy-grounded guardrail dataset and benchmark SOTA guardrail models, offering novel insights into their capabilities and limitations.
We propose a LLM agent framework to automate red teaming