An insatiable appetite for ancient and modern tongues

Alternative Names. Śaka, Śakan.

Classification. Indo-European, Indo-Iranian, Middle Iranian, Eastern. Khotanese and Tumshuqese are related to other Middle Iranian languages of Central Asia like Chorasmian and Sogdian.

Overview. Both, Khotanese and Tumshuqese, were the languages of two Buddhist city-states, bordering the Taklamakan desert in the Silk Road, in the first millennium CE. Many of their written documents are religious in kind, the majority being translations of Buddhist texts though there are some original compositions too. Though very similar in many ways, Khotanese and Tumshuqese are nevertheless different languages. Their mutual intelligibility must have been low because they were spoken in geographically distant, politically independent, areas with little or no communication.

Distribution. Khotanese was spoken in the kingdom of Khotan, and Tumshuqese in the region of Kucha, situated southwest and northwest of the Taklamakan desert, respectively, in what is now Xinjiang, China.

Status. Extinct. The extant documents belong to the second half of the first millennium CE., though the languages must have been spoken much earlier.

Main Documents. A considerable number of Khotanese documents has survived mainly in the form of paper manuscripts. In contrast, comparatively little has perdured in Tumshuqese. The most important documents are:

  1. Numerous Khotanese translations of Buddhist Sanskrit literature.

  1. Some Khotanese literary works.

  1. Several letters in Tumshuqese written by political and religious officials.

  1. A bilingual Tumshuqese text  describing a Buddhist ceremony for laymen (karmavācanā).

Phonology (Khotanese)

Vowels (11). The Khotanese vowel system included 6 short vowels, all of which, except schwa (ə), had long counterparts. Vowel length was phonemic.


Consonants (40). One distinguishing feature of Khotanese is the presence of retroflex consonants, absent in other Middle Indo-Iranian languages. The other main places of articulation were labial, dental-alveolar, palatal, and velar. Stops and affricates exhibit a three-way contrast between non-aspirated voiceless, aspirated voiceless and non-aspirated voiced. The fricatives are all (except h) voiceless-voiced pairs. Rhotic consonants include a trill and two flaps (dental and retroflex).


Script. Khotanese and Tumshuquese were written in Central Asian varieties of the Indian Brāhmī alphabetic script.

Morphology. Khotanese and Tumshuquese morphology, like those of other Central Asian Middle Iranian languages, is rather conservative preserving many features of Old Iranian.

  1. Nominal. Nouns, adjectives and pronouns were inflected for case, gender and number.

  1. case: nominative, accusative, locative, genitive-dative, instrumental-ablative, vocative.

  1. gender: masculine, feminine, neuter. The neuter gender was still in use in Khotanese though it had lost importance; in Tumshuqese, there is no clear evidence of its existence.

  1. number: singular, plural.

  1. pronouns: personal, demonstrative, interrogative-relative, indefinite.

  2. The 1st and 2nd person pronouns are genderless and are inflected in all cases. The demonstratives are similarly inflected, and distinguish gender and three degrees of deixis: neutral (‘the one’), near (‘this’), remote (‘that’). The neutral demonstrative serves as 3rd person pronoun.

  3. Interrogative pronouns can function also as relative pronouns. The main ones are animate kye/ce (‘who’) and inanimate cu (‘what’). Khotanese has only one indefinite pronoun, the invariable ye (‘one’).

  1. compounds: nominal compounds of two members are attested in Khotanese combining nouns, adjectives, pronouns and, even, verbal roots.


  1. Verbal. Verbs have a present and a past stem (the latter based on the past participle), and many have different transitive and intransitive stems.

  1. person and number: 1s, 2s, 3s; 1p, 2p, 3p.

  1. tense: present, simple perfect, periphrastic perfect, pluperfect.

  2. Future sense may be conveyed by the present or the subjunctive.

  3. The present tense is formed by attaching the personal endings to the present stem. Forms based on the present stem distinguish active and middle voice.

  4. The simple perfect adds the personal endings to the past stem. There is a distinction between transitive and intransitive.

  5. The periphrastic perfect is formed by combining the past participle with the present of the auxiliary verb 'to be' inflected for person and number (present stem ah-). There is no functional difference between this tense and the simple perfect.

  6. The pluperfect is similarly formed but the past forms of ‘to be’ are used, instead (past stem vät-).

  1. mood: indicative, subjunctive, optative, imperative, injunctive (rare).

  2. The subjunctive and optative have all tenses except the simple perfect. The imperative has only present tense. The traditional function of the individual moods has become blurred and the tendency is to use only the indicative.

  1. voice: active, middle, passive.

  1. non-finite forms: infinitive (present and past), present active participle, present middle participle, past passive participle, gerundive (obligation or necessity).


Khotanese and Tumshuqese word order is essentially Subject-Object-Verb. They are head-final: adjectives, possessors, demonstratives, and numbers precede their nouns. Indirect object precedes the direct one. Adjectives agree in case, number, and gender with the nouns they qualify. The copula is frequently omitted.


The majority of loanwords are from the Indian languages, Sanskrit and Prakrits, a source of Buddhist terminology. In an early period some technical terms were adopted from Zoroastrian, but applied in a Buddhist context. A handful of borrowings come from Tocharian, Chinese and Tibetan.

Khotanese numerals

one: śśau

two: duva

three: drraia

four: tcahora

five: paṃjsa

six: kṣäta, kṣei'

seven: hoda

eight: haṣṭa

nine: no, nau

ten: dasau

hundred: satä


Some literary texts in Khotanese have been discovered in Khotan and in the caves of the One Thousand Buddhas near Dunghuang. They consist of lyrical verses, a substantial fragment of a poem about a love story, the description of a journey to Kashmir, and some literary epistles. Two major compositions are influenced by Buddhism: a version of the Indian epic Ramayana and the Book of Zambasta. The latter, whose original title is not known, dating back to 600-700 CE, though fragmentary, is the longest extant poem in Khotanese. It treats a number of doctrinal matters of Buddhism.

  1. © 2013 Alejandro Gutman and Beatriz Avanzati                                                                               

Further Reading

-'Khotanese and Tumshuqese'. R. E. Emmerick. In The Iranian Languages, 377-415. G. Windfuhr (ed). Routledge (2009).

-Saka Grammatical Studies. R. E. Emmerick. Oxford University Press (1968).

-'A Guide to the Literature of Khotan'. In Studia Philologica Buddhica (Occasional Paper Series III), 2nd ed. R. E. Emmerick. The Inter­national Institute for Buddhist Studies (1993).

  1. Top   Home   Alphabetic Index   Classificatory Index   Largest Languages & Families   Glossary


Khotanese and Tumshuquese

Address comments and questions to: