Etymological Dictionary of the Iranian Verb

Edited by: Johnny Cheung

The present work gives a critical survey of all the verbs that may have existed in Proto-Iranian as deduced from the attested Iranian descendants and their archaic sister language, Sanskrit. This is accompanied by an analysis of the morphology and assessment of the provenance. The Iranian group within the Indo-European language family consists of languages that were and are still spoken in Western and Central Asia, among which Persian, Balochi, Kurdish, Pashto, Shughni and Ossetic are the best known today, and Avestan, Old and Middle Persian, Parthian, Bactrian, Khotanese, Sogdian and Choresmian in the past. This work aims to bridge the gap in knowledge that exists between Indo-Europeanists and scholars of Iranian languages with regard to each other's fields.

Johnny Cheung, Ph.D. (2000) in Comparative Linguistics, Leiden University, is research assistant at SOAS, London, and research fellow at Clare Hall, Cambridge. He has published extensively on Iranian linguistics, notably Studies in the Historical Development of the Ossetic Vocalism (2002).

1.0. Overview and aim

For a very long time, a dictionary which incorporates all Iranian languages has been a desideratum. For the time being we are still awaiting the arrival of a comprehensive dictionary which would be the Iranistic equivalent of Pokorny’s famous “Indoger-manisches Etymologisches Wörterbuch” (IEW), although the eminent Russian Iranists V.S. Rastorgueva and D.I. Edel’man have started compiling an “Etymolo-gical Dictionary of the Iranian Languages” (“Ètimologičeskij slovar’ iranskix jazykov”, ESIJa). The first volume starting with a - ā appeared in 2000. Since then, a second volume, b - d, was published in 2003. Regrettably, they have ignored any progress in the research of Indo-European linguistics since Julius Pokorny. It is hoped that despite the demise of one of the editors we may be able to see the completion of their work. The arrival of such a comprehensive work is long overdue, considering the wealth of publications that have shed light on so many Middle and New Iranian languages barely known to many non-Iranists or even to not a few Iranists as well. And in fact we are not devoid of etymological dictionaries for Iranian. A milestone in lexical-historical research of any Iranian language is the publication of the “Historical-Etymological Dictionary of Ossetic” (“Istoriko-ètimologičeskij slovar’ osetinskogo jazyka”) by the late Ossetian scholar Vassilij Abaev, who completed this opus magnum in four decades. Another great accomplishment is the “Dictionary of Khotan Saka” (DKS) by the late Sir Harold W. Bailey, equally relevant for the linguistically oriented Iranist. Of a much more modest scope, yet certainly not to be overlooked, is Georg Morgenstierne’s “Etymological Vocabulary of the Shughni Group” (EVS). Other etymological dictionaries for other languages have appeared since. Ivan Steblin-Kamenskij com-pleted his “Etymological Dictionary of the Wakhi Language” (“Ètimologičeskij slovar’ vaxanskogo jazyka”) in 1999, whereas R.L. Cabolov treated the entries from A to M for Kurdish in his “Etymological Dictionary of the Kurdish Language” (“Ètimologičeskij slovar’ kurdskogo jazyka”), which appeared in 2001. Gharib’s “Sogdian Dictionary” published in 1995 may be helpful to historical linguists, too, although its aim is not etymological. It does regularly refer to other Iranian cognate forms. Recently, Joseph Elfenbein completed “A New Etymological Vocabulary of Pashto Compiled and Edited from the Papers of Georg Morgenstierne”, which was initiated by the late Neil MacKenzie, and finally published in 2003. As for Persian, notably New Persian, disappointingly little has been published since Paul Horn’s “Grundriss der neupersischen Etymologie” from 1893, especially when we consider the prominence of Persian language and culture within the Iranian group. For the time being we have to be content with glossaries with etymological elucidations and articles from journals and periodicals. Some attempts have been made or are still made to come up with a comprehensive work. For example, Leonard Hertzenberg has been working on a full etymological dictionary of New Persian for some time now.


1.1. Scope

The present dictionary has a limited scope, only the attested verbal[2] Proto-Iranian[3] roots and their continuations are treated. The verbal forms present a rather manageable category that is relatively immune to (inner-Iranian) borrowing. Also, possible loanwords are easier to detect, as the verbal paradigm in most Iranian languages requires a separate present/past stem for the formation of the tenses: borrowed verbs therefore tend to have analytic or paraphrastic present / past stems. In some instances I have separated forms that, although originally derived from a single root, clearly show a well-developed semantic differentiation of presumably Proto-Iranian date. I have avoided the reconstruction of roots that are solely supported by nominal (I)Ir. continuations.

Several major iranological reference books have been incorporated systematically throughout the book. For Avestan I have perused Jean Kellens, “Liste du verbe avestique” (Liste) and for Old Persian the classic handbook of Roland Kent, “Old Persian”. Ronald Emmerick, “Saka Grammatical Studies” (SGS), has been consulted for Khotanese, whilst for the Chorasmian forms, the standard work of M. Samadi, “Das chwaresmische Verbum”, has been used. For Middle Western Iranian, liberal use of the recently published and already indispensable reference work of Desmond Durkin-Meisterernst, “Dictionary of Manichaean Middle Persian and Parthian” (DMMPP) has been made. I have also gleaned from the works mentioned in 1.0.

Finally, with regard to modern Iranian languages I had to limit myself to a representative selection of Eastern and Western languages, such as Ossetic, Shughni, Wakhi, Yaghnobi, Pashto, New Persian, Balochi, Kurdish, and several modern dialects of Iran. Also Ormuri and Parachi forms have been frequently cited. Three important, recently published, contributions to the research of modern Ir. languages (not mentioned above) should not go unnoticed here, viz. Pierre Lecoq, “Recherches sur les dialectes kermaniens (Iran Central)” (2002), Charles Kiefer, “Grammaire de l’ōrmurī de Baraki-Barak (Lōgar, Afghanistan)” (2003), and last, but not least, Agnes Korn, “Towards a Historical Grammar of Balochi” (2005). Their works have been incorporated in the Dictionary as well.[4]

In addition, Iranian forms that have been borrowed (cited as such) in other languages have been referred to as well[5]. The provenance of a root has been assessed by comparing it in the first place to the attested Sanskrit form, as treated in Mayrhofer’s reference work, “Etymologisches Wörterbuch des Altindoarischen” (EWAia). Further afield, the possible Indo-European origin of the (Indo-)Iranian forms has been critically evaluated with the “Lexikon der indogermanischen Verben” (LIV) and IEW taken as reference.


1.2. Challenges and obstacles

As with other etymological dictionaries that contain vast amounts of languages and/or their attested forms, it can only be expected that the present Dictionary will have its own share of errors and misquoted forms. In some instances, the inter-pretation or allocation of the roots/forms may be disputed. This work should therefore be regarded, in the first place, as a starting point for future research. Copious bibliographical references have been given for this purpose, roughly in chronological order. At the present moment, the definitive edition of a well-balanced etymological dictionary of the Iranian languages is still hampered by several obstacles:

- our understanding of the Avestan texts is still imperfect, many passages have not received a satisfactory interpretation;

- the absence of a comprehensive Pahlavi dictionary, which cannot be realised when so many Pahlavi texts, especially the Pahlavi commentary on the Avesta, are still unpublished;

- the research on the modern Iranian dialects of notably Iran, Afghanistan and Tajikistan shows many gaps, the “dialects” may yet yield archaic forms that are not attested in the older Iranian languages, especially in the daily life vocabulary;[6]

- a comprehensive, analytic Sogdian dictionary is wanting;

- a comprehensive etymological dictionary of New Persian has yet to appear.


It is for this reason that I have reconstructed Iranian roots that are sometimes based on little data, as it cannot be excluded that in the future one would find more, perhaps even more convincing, cognate forms.


1.3. Methodology

The roots have been reconstructed according to the principles of comparative (Indo‑)Iranian and Indo-European philology. It is inevitable that, as with many other works, this Dictionary has a certain degree of bias towards a particular theory or school, whether intentional or not. The results of the laryngeal theory[7], which is now generally accepted among Indo-Europeanists, have been systematically incorporated in the present work. It is conceivable though that in some instances and positions the laryngeal *H that has been reconstructed for Proto-Iranian has already disappeared at this stage, as can be observed in (Old) Avestan, our most archaic representative of the Iranian language group. With regard to the (non-)Indo-European etymology assigned to the (Indo-)Iranian forms in the major handbooks and recent articles I have tried to assess the assumptions made from the different perspectives of Iranists and Indo-Europeanists. For instance, numerous IE etymologies suggested or cited in the DKS proved to be untenable for Indo-Europeanists and should therefore be discarded.[8] On the other hand, several wrongly interpreted Iranian forms have found their way in Pokorny’s IEW and, recently, LIV.[9]


1.4. The reconstructed phonemes

The phonemes of the Iranian roots have been reconstructed on the basis of evidence provided by the Iranian languages and also, if attested, their Indo-Aryan (mainly Sanskrit) cognates.

The postulation of *H in Proto-(Indo-)Iranian has sometimes far-reaching con-sequences. The previously reconstructed and can now be analysed as *iH and *uH respectively. Other implications are that a root cannot begin with a vowel or (old) *r and all ablaut series consist of the pattern, full grade *aC, lengthened *āC and zero *øC (C also includes *H, *i̯ and *u̯).[10]

The diphthongs *au, *ai, etc. have been left out as they are actually combinations of two phonemes. The phonemic status of and *l is doubtful.



Inventory of the Proto-Iranian phonemes


Vowels:                                                i                              u




            voiceless                      p                             t                              k

            voiced                                      b                             d                           g

Fricatives:                                            f                              ϑ                           x


            voiceless                                  č 

            voiced                                                       ǰ̣                                             


            voiceless                                  š              s

            voiced                                                    (ž)            z                            

Liquids:                                                m                           n   r  (l)


Laryngeals:                                                         h                             H                           



1.5. Presentation

The lemmata are the Proto-Iranian roots that have been reconstructed from the evidence of the attested Iranian languages. These reconstructed roots are presented in their full grade form (except in instances where there is no evidence for this ablaut) and sorted according to the order of the Latin alphabet. The letters with diacritics (haček, macron) are placed after their simple equivalents, *ϑ after *t and *H after *h. The attested prefixed formations of the roots are clearly marked. The languages are presented roughly in order of age and importance, similar to the presentation in the “Compendium Linguarum Iranicarum” (CLI): the oldest first (Avestan, Old Persian), followed by the Middle Iranian languages (Middle Persian, Parthian, Khotanese, Sogdian, etc.) and the New Iranian languages, the most prominent members first (New Persian, Kurdish and so on), finally followed by the Sanskrit cognate (if attested) and the (non-)Indo-European provenance with additional remarks and observations. This is completed by further bibliographical references (especially those not mentioned in the main entries).

The entries of each language (or cluster of languages) are given as follows. For Avestan and Old Persian, the (postulated) full grade root is given, except in instances where such a root form is impossible as can be inferred from, for instance, the Sanskrit correspondence. As to the Middle Iranian languages, the usual practice is observed here: the entries are represented in their present stem (if attested), except for Chorasmian, which has mainly imperfect forms. The entries of the modern Iranian languages are also accompanied by their preterite/past stems, which are often in the guise of the infinitive or past participle. Nominal formations are sometimes cited in addition, particularly in the absence of verbal finite forms.

In the case of the extinct languages, references to the textual passages are also given. For Avestan and Old Persian, every effort has been made to be complete, on account of their importance for the reconstruction of Proto-Iranian: every attested form may be relevant. As to the Middle Iranian languages, this either proves to be superfluous (since they can be found in the standard handbooks) or it was more time consuming to compile and correct the attested forms than anticipated initially. For the Middle West Iranian and Sogdian I have decided just to indicate how widespread or scarce the attested verbal forms are.


1.6. Formal and semantic aspects

Many Iranian roots reconstructed here show great similarity in meaning and form. Already Hermann Güntert, über Reimwortbildungen im Arischen und Altgriechi-schen (1914) signalled a remarkable number of these so-called “rhyming formations” or Reimwortbildungen in (Indo-)Iranian: Skt. ched / bhed = Ir. *said / *baid ‘to split’, Skt. srav / plav = Ir. *hrau / *frau2 ‘to flow’, Skt. kram / gam / dram = Ir. *xramH / *gam1 / *dram ‘to go, run, walk’. In several cases this may be coincidental, but especially for the following roots that lack a convincing IE etymology one may assume a secondary origin, possibly having arisen as the result of contamination in Proto-Indo-Iranian or, at a later stage, Proto-Iranian: *rauč / *sauč ‘to burn, light’, *raiϑ1 / *frait/ϑ ‘to die, decompose’, *čaud / *paud ‘to run’.

The assignment of the meaning to the roots quite often proves to be complicated (cf. ESIJa I: 17 ff.). I have decided not only to look at the meanings exhibited by the Old Iranian languages (and/or Sanskrit) primarily, but also to weigh in the meanings of the later Iranian languages. In many instances we see quite noticeable differences among the languages. The solutions have been either to reconstruct a separate (homonymous) root that would cover a specific meaning shown by several languages, e.g. *baud2 ‘to smell’ (*baud1 ‘to feel, sense’), *gaH2 ‘to have sexual intercourse’ (*gaH1 ‘to enter’), or to reconstruct a tentative, primordial meaning from which the meanings of these languages could have developed, notably *zarH3 ‘to bewail the deceased’, *čarH ‘to come and go, wander’. This approach is admittedly subjective, but I believe this is preferable to a long catalogue of meanings assigned to the root. In some instances the semantic deviation displayed by a particular form is such that it is more likely that it has a different provenance altogether. This form is preceded by a query in the Dictionary.

In a number of roots the semantic shift or departure from the original, inherited IE meaning can be explained in terms of euphemistic usage, notably *HraiH ‘to defecate’ < *‘to flow’, *tarp ‘to steal’ < *‘to enjoy’ and *raiϑ ‘to die’ < *‘to pass’. In other instances a root with a particular semantic specialisation may have largely replaced the older, inherited etymon, being considered rather inappropriate or uncouth, e.g. *gaH2 (developed from *gaH1) has replaced *Hiab (Gr. ofw, Russ. ebát’). It is also interesting to note that the old IE “eat”-root (*H1ed-, Skt. ad, Gr. ἔdmenai, Lat. edere, Goth. itan, etc.) has been supplanted by *huar3, which originally meant ‘to take’ and has developed the secondary meaning ‘to partake, eat, consume’ (*huar1).[11]


1.7. Stem formations

Some Iranian roots actually go back to specific IE stem formations (cf. LIV: 10 ff.), which may not always have an exact IE correspondence. The following roots go back to such an older stem:

*-so, e.g. *baxš ‘to divide, apportion’ (cf. *baǰ̣ ‘to divide’), *Hraxš ‘to protect, defend’ (Gr. a'le/xw ‘I ward off, defend’);

*-dho, e.g. *fraHd ‘to increase’ (Gr. plh/qw ‘I fill up’), *pazd ‘to cause to thread, go’ (cf. *pad ‘to fall, stuck in’ < *‘to thread, go’);

reduplicative, e.g. čaš1 ‘to teach, show’, *HaHh ‘to be seated, sit’ (Gr. ἧstai ‘he sits’).

*-eH1, one certain example: *darH ‘to have pain’ (Lat. doleō ‘I suffer, am in pain’)


1.8. Denominatives

In several instances originally nominal roots or formations became deverbal in Indo-Iranian or Iranian, e.g. *diHp ‘to shine, light up’ (Skt. dīp), *uai(H)n ‘to see’ (Skt. ven) and *rauxšn ‘to shine’.


1.9. Provenance and substrate

A substantial part of the Iranian vocabulary cannot be traced back to Proto-Indo-European. Many of these forms, both verbal and nominal, are exclusively Indo-Iranian. In the 1999 conference in Helsinki, Lubotsky (2001) argued that they might be loan words from an unknown language spoken in the towns of Central Asia in the second millennium BCE (p. 306). Iranian verbs, such as *baru2 ‘to chew, swallow’ (Skt. bharv), *gauš ‘to hear’ (Skt. ghos) and *nard ‘to lament, moan’ (Skt. nard), would have been borrowed from this “substratum”.

Several roots are common Iranian without any known (or indisputable) etymology. Either they were borrowed from this non-Indo-European substrate language during the common Indo-Iranian period (the absence of a Sanskrit cognate would be purely co-incidental) or they arose only during the Proto-Iranian phase (due to local borrowing, taboo, interference from semantically similar roots, blending, etc.). Examples are *čaxš ‘to drip, sip, eat’, *fšar1 ‘to shame, be ashamed’, *gaub ‘to say’, *huah ‘to strike, thresh’ and *xar ‘to go, pass’.

A few reconstructed roots are attested only in a limited area, for instance, exclusively in West Iranian or East Iranian. It cannot be excluded therefore that these roots are not of Proto-Iranian date, notably, *dauč ‘to sew’, *fan to move, pass (time ?)’ and *gāz ‘to receive, accept’.


1.10. Transcription

The romanized transcription of the Iranian forms follows the practice as established among Iranists for the respective languages. I have generally adopted the spelling of the forms as transcribed in recently published major handbooks, cf. ESIJa I: 30 and Korn 2005: 29 ff.

[1] This introduction is the revised and expanded paper I gave at the Conference of the Societas Iranologica Europaea, Ravenna 2003. Since 2003 many important works, which had to be included for my own Dictionary, have been published.

[2] This also includes originally nominal roots that became verbal in Proto-(Indo-)Iranian.

[3] This postulated ancestral language forms together with Indo-Aryan (and the little known group of lan-guages spoken in Nuristan, Afghanistan) the Indo-Iranian branch of the Indo-European language family.

[4]  I have further used on a large scale the dialect descriptions of Oskar Mann (“Kurdisch-persische For-schungen” = KPF), Arthur Christensen (“Contributions à la dialectologie iranienne”), R. Abrahamian (“Dialectologie iranienne, dialectes des israelites de Hamadan et d’Ispahan, et dialecte de Baba Tahir”), W. Eilers (“Westiranische Mundarten”) and “Jagnobskie teksty” by M.S. Andreev and E.M. Peščereva.

[5] Not included are the most recent, mainly New Persian, borrowings in languages such as Turkish, Uzbek, Urdu-Hindi, Indonesian, etc.

[6] I only recently discovered that the root *gar3 ‘to be/make warm’ has been preserved as verb in Oss. (ænȝaryn) and several modern dialects spoken in Iran. Also the informal form NP šāš ‘pee’ has a more ancient pedigree than previously thought, once the connection with Av. š́ā- ‘to defecate’ is recognised.  

[7] The main tenet of the laryngeal theory is the existence of three kinds of laryngeals, *H1*H2 and *H3 in the Indo-European proto-language. These three laryngeals would have merged in a single laryngeal *H in Proto-(Indo-)Iranian.

[8] Bailey’s Indo-European reconstructions are not seldom based on isolated or even obscure Khotanese forms with no further (Indo-)Iranian correspondences. 

[9] A good example is the citation of BSogd. ’’y’np as an Ir. continuation of IE *H3i̯ebh- ‛to have sexual intercourse’, based on an erroneous meaning given by Henning 1939: 103, viz. ‘to commit adultery’. The Sogd. form should rather mean ‘to seduce, pervert’ and probably be connected to an IE root *i̯eb(h)- ‘to go (slowly)’, on which see the entry *i̯a(m)b/p.

[10] For further details on the development of the Proto-Indo-European laryngeals in Proto-(Indo-)Iranian see the most recent publication of Manfred Mayrhofer, Die Fortsetzung der indogermanischen Laryn-gale im Indo-Iranischen (Sitzungsberichte der phil.-hist. Klasse 730), Wien: Verlag der Österreichischen Akademie der Wissenschaften, 2005.

[11] Traces of the IE “eat”-root have been preserved in a few nominal forms, notably NP aspast ‘lucerne’, Oss. I. ad, D. adæ ‘taste’.

Symbols and abbreviations




C             consonant

T             stop

H             laryngeal

R             resonant(esp. l, r)

N             nasal

V             vowel

ø              zero (ending)

<              developed from

>              developed into

<<           analogically replacing

>>           replaced analogically by

*              reconstructed form

**           hypothetical form

< >          graphemic representation

[ ]            phonetic representation

/ /            phonemic representation

˚              part of (pre)form

+             and later (Skt.)


+              emended reading of a form

x              tentative reading of a form


