4 papers across 3 sessions
We generalize CLIP training to worldwide web-scale, with +0.8% better than English only counterpart on zero-shot ImageNet classification (no compromise), SoTA on zero-shot multilingual: 57.4% on CVQA and 50.2% on Babel-ImageNet.
We identify several factors that lead to token premium effects in monolingual tokenizers and provide two interventions which significantly reduce tokenizer inequities.
This study makes a key contribution by introducing a novel systematic framework to interpret the translation mechanisms of LLMs from a computational components perspective, an area previously unexplored.