Sparse Autoencoders Reveal How LLMs Mirror Brain's Semantic Map

May 25, 2026 · 2 min read · Research

A team of researchers from the University of Hong Kong has uncovered a mechanistic explanation for why intermediate layers of large language models (LLMs) best predict brain activity during language processing. By applying sparse autoencoders (SAEs) to GPT-2 XL and Llama-3.1-8B, they decomposed these models into 16,000–32,000 interpretable features per layer, revealing that semantic features alone recover 94% of peak brain-encoding performance.

The Research

The study by Dongxin Guo, Jikun Wu, and Siu Ming Yiu, accepted at CoNLL 2025, used SAEs to bridge mechanistic interpretability with neural encoding models. They created a human-validated taxonomy (κ ≥ 0.74) showing that semantic features account for nearly all predictive power, far surpassing variance-matched baselines (p < 0.001, d = 1.31). Critically, they tested a novel prediction: five semantic subcategories, derived from three independent neuroscience programs, should map onto distinct brain regions. A formal convergence test confirmed this alignment (Spearman ρ = 0.72, p < 0.001; hypergeometric p = 0.007). Additionally, SAE features predicted human reading times beyond lexical controls (ΔlogLik = 38.4, p < 0.001), and an exploratory analysis suggests the brain encodes unexpected semantic content. Results generalized across English, Chinese, and French.

Why It Matters

This work is a major step in understanding the brain-language alignment puzzle. For anyone curious about cognition, it suggests that the brain categorizes meaning in a highly organized way — a semantic topography that parallels the internal representations of AI language models. This means your brain may use similar 'feature maps' to process ideas, which could inform future brain-training or learning strategies focused on semantic organization.

What You Can Do

To leverage this insight, try organizing new information into semantic categories when studying. For example, when learning a new concept, place it alongside related ideas mentally. This helps align your brain's natural semantic topography, potentially improving memory encoding and retrieval.

Source: arXiv q-bio.NC

Curious about your own brain? Take our free adaptive IQ test or try 306 brain training levels.