The Grierson Gap: Why India's Language Count Jumped From 179 to Nearly 20,000
Grierson's 1898–1928 survey found 179 languages. Today's counts range from 325 to 19,500. The gap reflects methodology, not error.
Asha Naidu for SwavedaMay 7, 2026

In the 1880s, a British linguist faced a stubborn problem: How many languages does India actually speak? The estimates circulating among European scholars ranged wildly—from 20 to 250. No one knew for sure. That linguist was George Abraham Grierson, and he spent the next thirty years finding out.
Between 1898 and 1928, Grierson's Linguistic Survey of India documented what scholars describe as distinct language groups across the subcontinent. The work was published across twenty volumes—a staggering undertaking. The bulk of data collection took place in 1898–1903, when British civil servants, missionaries, and Indian translators were sent out to record the known languages and dialects in their regions. The project created the most comprehensive map of Indian speech that had ever existed.
Today, almost a hundred years later, we have a different problem: too many answers.
According to the 2001 Census of India, speakers reported 122 languages as their mother tongue that were grouped as "scheduled" languages under the Indian Constitution. An additional 1,599 languages were recorded as "other" mother tongues. The SIL Ethnologue, a reference database maintained by SIL International, lists languages across the region using its own classification system. Meanwhile, older linguistic surveys catalogued far larger numbers—some approaches identified over 19,000 distinct speech varieties, though how these were classified (as dialects, languages, or sub-varieties) depends on the survey's methodology. The gap isn't a mistake—it's a window into how linguists define their terms.
What is a language? What is a dialect? These aren't innocent questions. The figures vary primarily because different surveys use different definitions. In Grierson's era, linguists largely relied on shared vocabulary and grammar. But the boundary between a language and a dialect is not written in stone—it shifts with methodology and sometimes with politics.
Consider Hindi and Urdu. Linguistically, Modern Standard Hindi and Modern Standard Urdu share the same grammatical structure and much vocabulary; many linguists classify them as registers (standardized varieties) of a single language sometimes called Hindustani. Yet the Indian government lists them as separate languages. The difference reflects not just linguistics but history: Hindi is associated with Hindu-majority India, Urdu with Muslim heritage and Pakistan. Mutual intelligibility (the ability of speakers to understand each other) matters. Politics matters more.
Even the concept of mutual intelligibility itself is contested among scholars. Two speakers may understand each other in conversation but disagree about whether they speak "the same language"—a judgment that blends linguistic fact with social identity. Neither measurement is purely objective.
The methodological limits of Grierson's survey also shaped what modern India inherited. Grierson's field workers—magistrates and missionaries—were not trained phoneticians, and the quality of their observations varied widely. Linguist Murray Emeneau, writing in his 1956 work India as a Linguistic Area, assessed the Linguistic Survey's achievements and limitations, noting that it had captured much of value but had not fully mapped the complexity of Indian speech. South India was particularly under-represented; documentation in the Madras region and the princely states of Hyderabad and Mysore received less intensive attention than North India did.
The numbers, then, reflect not just linguistic fact but methodological choice. Modern surveys use different thresholds for mutual intelligibility, different criteria for script and literary tradition, and different weight given to speaker identity and self-identification. A form of speech with five hundred fluent speakers in a tribal pocket of Maharashtra might be counted as a separate language under one system and absorbed into a regional classification under another.
Scholars debate how to count, which means they debate what counts. The 2001 Census distinction between "scheduled" and "other" languages reflects India's constitutional framework, not a linguistic boundary. The Ethnologue uses mutual intelligibility as a primary criterion. The Grierson survey prioritized administrative recognizability. Each system answers a slightly different question—and gets a different answer.
India's linguistic landscape cannot be captured with a single number. Each census, each linguistic database, each field survey answers a different question. Grierson answered: What are the distinct speech forms that British administrators need to know about? The 2001 Census asked: What do people claim as their mother tongue? Modern linguistic databases ask: What forms are mutually unintelligible?
What changed is not the languages themselves. It is our willingness to see linguistic diversity as legitimate, to document regional and tribal speech with the same care Grierson gave to court languages, and to acknowledge that language boundaries are drawn by speakers and scholars together, never by nature alone. The field workers who came after Grierson were increasingly trained specialists. The questions asked by modern surveys—"What is your mother tongue?"—center speaker identity rather than administrative utility. And the sheer scale of documentation has exploded: we now have audio recordings, grammatical sketches, digital corpora (text collections) that Grierson could never have assembled.
The "gap" between Grierson's count and today's counts is not a failure of linguistics. It is evidence of a century of more careful counting, wider reach into rural and tribal areas, and a shift in what we believe deserves to be documented. The difference tells us what changed: not the languages, but the people doing the counting, the tools they use, and the question they ask.
Grierson asked: How many languages matter? Modern linguists ask: How many ways do people speak? The answer to the second question is always larger than the answer to the first.