How Linkage Disequilibrium Helps Trace Ancient Population Mixtures in South Asia
Geneticists use linkage disequilibrium—the non-random pairing of DNA variants—to date ancient admixture events in South Asia, revealing when Steppe pastoralists mixed with indigenous groups.

Asha Naidu for SwavedaMay 25, 2026

When two populations that have been separated for thousands of years suddenly mix, their DNA tells the story. Geneticists can read that story by measuring linkage disequilibrium, a statistical pattern that acts like a molecular clock for admixture events.
The term itself is a linguistic curiosity. "Linkage" comes from the Old English hlencan, meaning to connect or join—appropriate for genes that travel together on a chromosome. "Disequilibrium" pairs the Latin prefix dis- (apart, away from) with aequilibrium (balance). Together they describe a state of imbalance in how genetic variants associate with each other.
What Linkage Disequilibrium Measures
In any population, genes sit at fixed positions along chromosomes. During reproduction, chromosomes swap segments through recombination, shuffling parental DNA into new combinations. Over many generations, this shuffling breaks apart associations between variants that sit far apart on a chromosome.
Linkage disequilibrium measures whether two genetic variants appear together more often than random chance would predict. When a population forms from two previously separate groups, variants that originated in different ancestral populations suddenly find themselves in the same genome. These "mismatched" pairs start out in high linkage disequilibrium. Each generation, recombination breaks down some of these associations. The decay rate depends on the physical distance between variants on the chromosome and the number of generations since mixing occurred.
Dating the Steppe Admixture
This principle lets researchers date admixture events. Evidence shows that between roughly 2000 and 1500 BCE, pastoralists from the Eurasian Steppe moved into South Asia and mixed with indigenous populations that included descendants of Indus Valley Civilization inhabitants. Tradition holds various accounts of ancient migrations, but genetic data provides independent evidence.
The Steppe migrants carried ancestry related to the Yamnaya culture of the Pontic-Caspian Steppe. Indigenous South Asians of that period carried a mix of ancestry from Iranian-related farmers and South Asian hunter-gatherers. When these groups mixed, Steppe-derived genetic variants and indigenous South Asian variants appeared together in individual genomes for the first time.
By measuring linkage disequilibrium between Steppe-ancestry variants and indigenous-ancestry variants in present-day populations, geneticists can estimate how many generations of recombination have occurred since the initial mixing. Higher linkage disequilibrium suggests more recent admixture; lower values point to events further in the past.
According to research published in Science in 2019 by a team including David Reich at Harvard Medical School, this approach helped establish that the primary Steppe admixture into South Asia occurred after the decline of the Indus Valley Civilization. The genetic data showed that Indus Valley individuals from the site of Rakhigarhi, dating to around 2500 BCE, lacked Steppe ancestry entirely, while present-day South Asian populations carry between roughly 0 and 30 percent Steppe-related ancestry depending on region and caste group.
Why Better Statistical Tools Matter
Early linkage disequilibrium analyses used simplified models that assumed a single pulse of admixture at one point in time. Scholars debate whether this reflects reality. Archaeological and linguistic evidence suggests that population movements into South Asia likely occurred in waves over centuries, not as a single event.
Improved statistical methods can model more complex scenarios: multiple admixture events at different times, continuous gene flow over extended periods, or admixture followed by population structure that limited further mixing. These refinements produce different timelines. A single-pulse model might date an admixture event to 3,500 years ago, while a continuous-migration model analyzing the same data might spread the mixing across several centuries.
The distinction matters for correlating genetic evidence with archaeological and linguistic records. The composition and dating of Rigvedic Sanskrit, the material culture changes visible in the archaeological record after 2000 BCE, and the genetic signatures of population mixing must all align within a coherent historical framework. Refined dating of admixture events helps researchers test whether particular migration scenarios match the evidence across disciplines.
Reading the Molecular Clock
The decay of linkage disequilibrium follows a predictable mathematical pattern based on recombination rates, which vary along the genome. Regions of low recombination preserve linkage disequilibrium longer; regions of high recombination break it down faster. Geneticists account for these differences using recombination maps built from studying how DNA segments are inherited across generations in present-day families.
Temperature also affects the metaphor—this is a clock that ticks at different rates in different genomic neighborhoods. But averaged across thousands of genetic variants throughout the genome, the pattern becomes clear enough to distinguish admixture events separated by even a few centuries.
Limits and Complications
Several factors complicate linkage disequilibrium dating. Small population sizes preserve linkage disequilibrium longer than large ones because there are fewer recombination events per generation. Endogamy (marriage within a defined group) effectively shrinks population size, slowing the decay of linkage disequilibrium. India's caste system, which tradition holds emerged in the late Vedic period, created endogamous groups whose genetic isolation affects linkage disequilibrium patterns in ways that can be mistaken for more recent admixture.
Selection on particular genes also distorts linkage disequilibrium. If a Steppe-ancestry variant near a gene provided an advantage—such as lactase persistence, which allows adults to digest milk—natural selection would preserve that variant and the surrounding genetic neighborhood, maintaining higher linkage disequilibrium than neutral regions.
Despite these complications, linkage disequilibrium remains one of the most powerful tools for extracting chronological information from ancient DNA and present-day genomes. As statistical methods improve and ancient DNA samples from more South Asian sites become available, the technique will continue refining our understanding of when and how populations mixed in the subcontinent's deep past.