1 paper across 1 session
We re-evaluate whether vision-and-language models exhibit the human-like bouba-kiki effects, using two methods modelled after human experiments. Compared to humans, VLMs fall short in aligning cross-modal associations with human intuitions.