Home · Blog · Research

Do Vision-Language Models Think More Like Humans? Not Always, New Study Finds

Do Vision-Language Models Think More Like Humans? Not Always, New Study Finds

Large language models (LLMs) are powerful tools for understanding how humans process language, but adding vision-language training doesn't always make their text representations more human-like. In a new study published on arXiv, researchers Jinzhou Wu, Zhengwu Ma, Jixing Li, Baoping Tang, and Zitong Lu compared tightly matched LLMs and vision-language models (VLMs) to see if multimodal pretraining improves alignment with human brain activity and eye movements during natural reading.

The Research

The team used a dataset of whole-cortex fMRI responses and synchronized eye-tracking saccades from humans reading natural sentences. They compared LLM and VLM pairs that were identical except for multimodal training history, controlling for online visual input or cross-modal fusion. The results showed that VLMs do not have a global, uniform advantage over LLMs in aligning with human neural and behavioral responses. Instead, language-internal representations in LLMs were the key factor for modeling human text processing. However, when sentences had stronger visual semantic content, VLMs showed selective improvements, with converging evidence from both fMRI and eye-tracking alignments.

Why It Matters

For anyone curious about their own cognition, this study suggests that while adding visual information can sometimes help AI models mimic human reading, the core of language understanding remains rooted in language itself. It highlights the importance of studying how different kinds of learning—from words alone versus from words and images—shape our mental representations. Practically, it means that if you want to improve your reading comprehension, focusing on language skills (like vocabulary and grammar) might be more effective than trying to visualize everything.

What You Can Do

To sharpen your language processing, try activities that challenge your verbal skills—like reading complex texts, learning new words, or doing crossword puzzles. These exercises strengthen the language networks in your brain, much like LLMs benefit from more language training. And if you tend to think visually, note that you might have an edge when processing descriptive, image-rich content.

Source: arXiv q-bio.NC

Curious about your own brain? Take our free adaptive IQ test or try 306 brain training levels.

Curious about your own IQ?

Take our free, scientifically designed adaptive test across 7 cognitive domains. No signup required.

Take the free test