An insatiable appetite for ancient and modern tongues

Overview. The Indian subcontinent is an area of great linguistic diversity where close to 500 languages are spoken. Politically, it is divided into seven independent countries: India, Pakistan, Bangladesh, Nepal, Bhutan, Sri Lanka and the Maldives (not shown in the map). Population density is very high, the whole area being inhabited by some 1.5 billion people (more than 20% of the world's population).

    The languages of South Asia belong to four different families: Indo-European, represented mainly by the Indo-Aryan branch and to a lesser extent by Iranian and Nuristani, Dravidian, Austro-Asiatic represented by the Munda branch and Khasi, and Tibeto-Burman. Language isolates, which are not related to any known family, include Burushaski, Kusunda and Nahali.

    Indo-European languages are spoken by 80% of the population of South Asia spreading all over the north and center of the region. They are followed in numerical importance by Dravidian languages, spoken by 18% of the population, mainly in the four southern states of India (Karnataka, Kerala, Andhra Pradesh, Tamil Nadu) and in the north of Sri Lanka. Tibeto-Burman is spoken in Bhutan, Nepal and Ladakh (part of Jammu and Kashmir in northwestern India) as well as in the extreme northeast of the subcontinent by 1.2% of its inhabitants. Finally, Austro-Asiatic (0.8 %) is found in pockets in the Indian center and northeast.

Map of South Asian language families

Families and Languages. The oldest language groups are Dravidian, Munda and Indo-Aryan. They have coexisted for a long time and, thus, converge in some features of phonology, morphology, syntax and usage. Dravidian is the only language family entirely circumscribed to South Asia. It is likely that in the past it spread over the entire subcontinent as suggested by the incorporation of Dravidian loanwords into the Sanskrit Rig Veda, which was composed in the northwest of ancient India more than three thousand years ago. The four largest Dravidian languages are Tamil spoken in Tamil Nadu and northern Sri Lanka, Malayalam spoken in Kerala, Kannada spoken in Karnataka, and Telugu spoken in Andhra Pradesh, but besides them (all located in the south), there are other smaller languages distributed in the center and in the north of the subcontinent. Dravidian appears to be unrelated to languages beyond South Asia and can be considered indigenous to it.

    Munda are atypical Austroasiatic languages whose speakers might have migrated westerly in prehistoric times into India from their Southeast Asia homeland. Their marked divergence from mainstream Austroasiatic coupled with the incorporation of a number of Munda words into Sanskrit, suggest that this migration is very ancient. The three largest Munda languages are Santali, Mundari and Ho.

    The oldest Indo-Aryan documents are the hymns of the Rig Veda composed in archaic Sanskrit around 1300-1200 BCE and transmitted orally during a millennium or more until they were written down after the appearance of writing in India. From Sanskrit, or related undocumented languages, descend the modern Indo-Aryan ones: Hindi and Urdu spoken in vast areas of the north and center of South Asia, Nepali in Nepal, Bengali in West Bengal and Bangladesh, Oriya in Orissa, Assamese in Assam, Marathi in Maharashtra, Gujarati in Gujarat, Punjabi in the Punjab regions of India and Pakistan, Sindhi in Pakistani Sindh, the Dardic languages in northwestern India and Pakistan of which Kashmiri spoken in Jammu and Kashmir is the largest. Sinhalese, spoken in Sri Lanka, is cut off from the rest of the Indo-Aryan languages.


  1. Phonology

  2. -Retroflex consonants, made with the tip of the tongue bent back touching the hard palate, are widespread in South Asia. They are found in all Dravidian and Indo-Aryan (except Assamese) as well as in most Munda languages (except Korku and Sora), in Pashto (an Indo-Iranian language of west Pakistan and Afghanistan) and in the isolate Burushaski (spoken in north Pakistan).

  3. Some languages, like Sanskrit, Hindi and Tamil, have a complete set of retroflex sounds including stops, nasals and liquids (r-like and l-like sounds); others have a more limited set but generally including at least one retroflex stop. Retroflex sounds are absent in Indo-European languages other than Indo-Aryan and are, probably, of Dravidian origin.

  1. -Most Dravidian and Indo-Aryan languages have relatively simple vowel systems which include short and long vowels. In them, a basic five-vowel set, composed of a, e, i, o, u, can be recognized. Similarly, Munda vowel systems are generally much simpler than the highly complex ones typical of Austroasiatic, though Munda has no distinction between short and long vowels. South Asian languages, except in rare instances, lack tones in contrast to the situation in neighboring Southeast Asia.

  1. Morphology

  2. - Dative subjects. In Dravidian and Indo-Aryan some sentences require the subject to be marked by the dative (otherwise used to mark the indirect object), particularly those in which the subject is the experiencer of feelings, perceptions or other psychological states that are not under his control. In Eastern Indo-Aryan the genitive is used instead of the dative. In Munda and Tibeto-Burman this construction is less frequent or absent in some languages which don't have a dative marker. As it is not found in Sanskrit, its most likely source is Dravidian.

  1. -Converbs. Also known as conjunctive participles or gerunds, are non-finite verbal forms that express an action occurred before another one expressed by the main verb. They function, frequently, as a sort of adverb. They exist in Indo-Aryan, Dravidian, Munda and Tibeto-Burman.

  1. -Compound verbs. Consist of two verbs; the first is the main lexical verb (V1) which is followed by another one which carries the tense, aspect and agreement markers (V2). The number of V2 verbs is limited; they usually include  'go’ and ‘come’, ‘give’ and ‘take’, ‘throw’ and ‘send’, ‘fall’ and ‘rise’, etc.

  1. -Causative verbs. Indicate that a subject causes someone or something else to do or be something. They occur all across South Asia.

  1. -Echo compounds. Dravidian, Indo-Aryan, Munda and some Tibeto-Burman languages have 'echo words' formed by reduplication of a word in which its first consonant (sometimes also the following vowel) is changed. Thus, the second word appears as an echo of the first one. It cannot exist independently and must be joined to the original word forming a compound. The echo word has the meaning ‘and the like’, ‘and such things’. For example:


  1. Syntax

  2. -Word order. The vast majority of South Asian languages have Subject-Object-Verb (SOV) word order, including all Indo-Aryan (except Kashmiri), Dravidian and Munda. Kashmiri and the Austro-Asiatic (non-Munda) Khasi are SVO instead. Munda syntax is very different from that of other Austroasiatic languages in which SVO order is prevalent. There is evidence that a preexisting SVO order shifted to the current one by Dravidian influence.

  1. © 2013 Alejandro Gutman and Beatriz Avanzati                                                                               

Further Reading

  1. -'India as a Linguistic Area'. M. B. Emeneau. Language 32(1), 3-16 (1956).

  2. -Defining a Linguistic Area: South Asia. C. P. Masica. The University of Chicago Press (1976).

  3. -'South Asia as a Linguistic Area'. K. Ebert. In Concise Encyclopedia of Languages of the World, 965-1001. K. Brown & S. Ogilvie. Elsevier (2009).

  4. -Lenguas de Asia Meridonal. A. Gutman. El Portal de la India Antigua (2009). Available online at:

  5. -Experiencer Subjects in South Asian Languages. M. K. Verma & K. P. Mohanan (eds). Stanford University Press (1990).

  6. -'Substrate languages of Old Indo-Aryan (Rigvedic, Middle and Late Vedic)'. M. Witzel. Electronic Journal of Vedic Studies 5(1), 1–67 (1999).

  1. Top   Home   Alphabetic Index   Classificatory Index   Largest Languages & Families   Glossary


Languages of South Asia

Address comments and questions to: