What 'spatially structured populations' means—and why it matters for interpreting ancient Indian genomes
Population structure—invisible barriers to gene flow across geography—shapes how we read ancient DNA. A look at what the term means and why it changes Harappan-era interpretations.

Dr. Anil Patel for SwavedaMay 25, 2026

When geneticists say a population is "spatially structured," they mean something specific: individuals mate more often with neighbors than with distant groups, even when no mountains or deserts separate them. Over time, this creates genetic gradients across a landscape. The pattern matters because it changes how we interpret ancient genomes—including those from Bronze Age South Asia.
A 2024 study in Nature Communications used wild house mice to demonstrate the principle. Researchers from the Max Planck Institute for Evolutionary Biology tracked mice across a barn in Germany and found that even within a single building, mating happened more often between mice from the same area. Gene flow barriers formed not from walls but from distance and behavior. The result: genetic variation organized itself into clusters that looked like distinct populations—but reflected geography, not separate origins.
The same logic applies to human populations, including those that left behind the Harappan (Indus Valley) archaeological record.
What creates structure without borders
Spatial structure emerges when gene flow—the movement of genetic material between groups through mating—occurs locally rather than randomly across a region. Three factors drive it: geographic distance, cultural practices that favor endogamy (marriage within a group), and ecological barriers that limit movement.
In the mouse study, distance alone was enough. Among humans, language, caste, clan affiliation, and occupational specialization add layers. Ancient DNA from South Asia shows evidence of all three factors operating over millennia.
The Harappan cline
The 2019 paper by Narasimhan, Reich, and colleagues in Science analyzed 523 ancient genomes, including individuals from Harappan-period sites such as Rakhigarhi and Gonur. The data revealed a genetic cline—a gradient—across the Indus Valley and adjacent regions. Individuals from the northwest carried more ancestry related to Iranian farmers and Central Asian steppe pastoralists; those from the southeast carried more ancestry related to South Asian hunter-gatherers.
The cline was not a sharp boundary. It was a smooth gradient, consistent with a spatially structured population in which gene flow occurred but was geographically constrained. The Narasimhan paper described this as "a gradient of admixture" rather than discrete migration waves.
This interpretation contradicts two common misreadings. First, that the Harappans were genetically uniform—a single "Indus Valley population." Second, that later South Asian genetic diversity resulted solely from discrete migration events in the second millennium BCE. Spatial structure means the gradient already existed during the Harappan period, shaped by local gene flow patterns over centuries.
Post-Harappan continuity and change
After the Harappan urban phase declined around 1900 BCE, genetic structure persisted but shifted. The 2019 Shinde paper in Cell, which included the Rakhigarhi genome, showed that individuals from that site lacked the Steppe pastoralist ancestry component found in later South Asian populations. Rakhigarhi dates to the mature Harappan period, around 2500 BCE.
Narasimhan's data from later sites—post-2000 BCE—showed increasing Steppe-related ancestry in northwestern groups, forming a new cline. This was not replacement. It was gene flow into an already structured population, creating a revised gradient.
The distinction matters. If South Asia were genetically unstructured at the time, new ancestry would have spread evenly. Instead, it appeared in a geographic pattern: more in the northwest, less in the south and east. That pattern implies existing structure shaped how incoming gene flow dispersed.
Why structure changes interpretation
When populations are structured, genetic clusters can reflect geography rather than separate ancestral groups. This complicates phylogenetic trees—the branching diagrams geneticists use to model ancestry. A tree assumes populations split cleanly; structure creates gradients that blur splits.
The Narasimhan paper used models that accounted for admixture (mixture between groups) rather than simple branching. The result: South Asian populations formed through continuous gene flow along geographic axes, not a series of isolated founding events. The authors wrote that their model "rejects a simple tree-like history" for the region.
This affects how we read archaeological and linguistic evidence. If genetic structure was already present during the Harappan period, then later changes in pottery styles, settlement patterns, or language need not correlate with wholesale population replacement. Spatially structured populations can adopt new practices through diffusion—the spread of ideas or technologies—without large-scale migration.
Local adaptation within structure
Structured populations also adapt locally. When gene flow is limited, different groups experience different selective pressures. The Narasimhan data showed signals of selection in genes related to diet, immunity, and skin pigmentation across South Asian populations, consistent with adaptation to regional environments.
One example: lactase persistence, the genetic ability to digest milk into adulthood, shows frequency variation across South Asia. The trait is more common in northwestern populations with longer histories of pastoralism. The pattern fits spatial structure—selection acted differently on groups with limited gene flow between them.
What the data doesn't say
Spatial structure does not tell us about language, religion, or identity. Genes record mating patterns, not culture. A genetic cline across the Harappan zone does not reveal whether people spoke one language or many, worshiped the same gods, or saw themselves as a unified civilization.
Tradition holds that Vedic Sanskrit entered South Asia in the second millennium BCE, carried by groups from the Steppe. Linguistics shows Sanskrit shares features with other Indo-European languages, supporting a Central Asian origin. The genetic data showing Steppe ancestry in post-Harappan populations is consistent with that model—but consistency is not proof. Genes and languages can move independently.
Reading the landscape
Spatially structured populations leave a specific genetic signature: gradients, not clusters. When ancient DNA from South Asia shows gradients during the Harappan period and new gradients forming afterward, the simplest explanation is continuous, geographically constrained gene flow—not waves of invasion or isolated purity.
The mouse study's lesson applies: what looks like separate groups may be a single structured population stretched across space. For ancient India, that means the genetic landscape was already complex before any Steppe-related ancestry arrived, and remained complex after. The data shows mixture, movement, and local adaptation—not a story of origins written in sharp lines.