sci.lang FAQ (Frequently Asked Questions)

Archive-name: sci-lang-faq
Version: 2.8.3
Last-modified: 12 Oct 1995
[last posted 16 Oct 1995]

Except where noted, written by Michael Covington (
Maintained by Mark Rosenfelder (

      changes this month: a couple more books

NOTE: This FAQ file is fairly short.  Many good books and many important ideas
      are left unmentioned.  All readers should be aware that linguistics
      is a young science and that linguists rarely agree 100% on anything.

DISTRIBUTION: This file may be freely distributed electronically, or 
      as handouts in linguistics classes.  Before using it in print,
      please contact the authors.

 1. What is sci.lang for?
 2. What is linguistics?
 3. Does linguistics tell people how to speak or write properly?
 4. What are some good books about linguistics?
 5. How did language originate?
 6. What is known about prehistoric language?
 7. What do those asterisks mean?
 8. How are present-day languages related?
 9. Why do Hebrew and Yiddish [etc.] look alike if they aren't related?
10. How do linguists decide that languages are related?
11. What is Noam Chomsky's transformational grammar all about?
12. What is a dialect?  (Relation between dialects and languages.)
13. Are all languages equally complex, or are some more primitive than others?
14. What about artificial languages, such as Esperanto?          
15. What are some stories and novels that involve linguistics?
16. What about those Eskimo words for snow? (and other myths about language)
17. Where can I get an electronic IPA font (or other electronic resources)?
18. How do I subscribe to the LINGUIST list?
19. How can I represent phonetic symbols in ASCII?
1. What is sci.lang for?

Discussion of the scientific or historical study of human language(s).
Note the "sci." prefix.  The main concern here is with _facts_ and
theories accounting for them.

For advice on English usage, see alt.usage.english or misc.writing.
For casual chatter about other languages see soc.culture.<whatever>.
Discussion of or in Greek or Latin is available in sci.classics.
The sci.lang.translation newsgroup focusses on translation and issues of 
  concern to translators and interpreters.
The newsgroup focusses on natural language processing
  by computers.

Like all "sci." newsgroups, sci.lang is not meant to substitute for
a dictionary or even a college library.  If the answer to your question
can be looked up easily, then do so rather than using the net.
If you don't have a library, then ask away, but explain your situation.
2. What is linguistics?

  The scientific study of human language, including:
     Phonetics (physical nature of speech)
     Phonology (use of sounds in language)
     Morphology (word formation)
     Syntax (sentence structure)
     Semantics (meaning of words & how they combine into sentences)
     Pragmatics (effect of situation on language use)

  Or, carving it up another way:
     Theoretical linguistics (pure and simple: how languages work)
     Historical linguistics (how languages got to be the way they are)
     Sociolinguistics (language and the structure of society)
     Psycholinguistics (how language is implemented in the brain)
     Applied linguistics (teaching, translation, etc.)
     Computational linguistics (computer processing of human language)

  Some linguists also study sign language, non-verbal communication,
  animal communication, and other topics besides spoken language.
3. Does linguistics tell people how to speak or write properly?

No.  Linguistics is descriptive, not prescriptive.
Linguistics can often supply facts which help people arrive at a
recommendation or value judgement, but the recommendation or value
judgement is not part of linguistic science itself.
4. What are some good books about linguistics?

(These are cited by title and author only. Full ordering information
can be obtained from BOOKS IN PRINT, available at most bookstores and
at even the smallest public libraries.)

  CAMBRIDGE ENCYCLOPEDIA OF LANGUAGE, by David Crystal (1987) is a good place 
     to start if you are new to this field.
  LANGUAGE, by Edward Sapir (1921), is a readable survey of linguistics 
     that is still worthwhile despite its age.
  AN INTRODUCTION TO LANGUAGE, by Fromkin and Rodman (1974), is one of the 
     best intro linguistics survey texts.  There are many others.
  CAMBRIDGE TEXTBOOKS IN LINGUISTICS (a series) consists of good,
     modestly priced introductions to all the areas of linguistics.
  Any encyclopedia will give you basic information about widely studied
     languages, alphabets, etc.
5. How did language originate?

Nobody knows.  Very little evidence is available.
See however D. Bickerton, LANGUAGE AND SPECIES (1990).
6. What is known about prehistoric language?

Quite a lot, if by "prehistoric" you'll settle for maybe 2000 years
before the development of writing.  (Language is many thousands of years
older than that.)

Languages of the past can be recovered by comparative reconstruction
from their descendants.  The comparative method relies mainly on
pronunciation, which changes very slowly and in highly systematic
ways.  If you apply it to French, Spanish, and Italian, you 
reconstruct late colloquial Latin with a high degree of accuracy;
this and similar tests show us that the method works.

Also, if you use the comparative method on unrelated languages,
you get nothing. So comparative reconstruction is a test of whether 
languages are related (to a discernible degree).

The ancient languages Latin, Greek, Sanskrit, and several others form 
a group known as "Indo-European."  Comparative reconstruction from 
them gives a language called Proto-Indo-European which was spoken 
around 2500 B.C.  Many Indo-European words can be reconstructed with 
considerable confidence (e.g., *ekwos 'horse').  The grammar was 
similar to Homeric Greek or Vedic Sanskrit.  Similar reconstructions are
available for some other language families, though none has been as 
thoroughly reconstructed as Indo-European.
7. What do those asterisks mean?

Attached to a word, either of 2 things.
An unattested, reconstructed word (such as Indo-European *ekwos);
or an ungrammatical sentence (such as *Himself saw me).

(In a generative rule, such as AP -> Adj (AP)*, it indicates that
an element may be repeated zero or more times.)
8. How are present-day languages related?
                                                           [--Scott DeLancey]

This is an INCOMPLETE list of some of the world's language families.  More
detailed classifications can be found in Voegelin and Voegelin, CLASSIFICATION
WORLD'S LANGUAGES (1987).  (Note: Ruhlen's classification recognizes a 
number of higher-order groups which most linguists regard as speculative).

A language family is a group of languages that have been proven to have
descended from a common ancestral language.  Branches of families likewise
represent groups of languages with a more recent common ancestor.  For 
example, English, Dutch, and German have a common ancestor which we label
Proto-West-Germanic, and thus belong to the West Germanic branch of Germanic.
Icelandic and Norwegian are descended from Proto-North Germanic, a separate 
branch of Germanic.  All the Germanic languages have a common ancestor, 
Proto-Germanic; farther back, this ancestor was descended from Proto-Indo-
European, as were the ancestors of the Italic, Slavic, and other branches.

Not all languages are known to be related to each other.  It is possible that 
they are related but the evidence of relationship has been lost; it's also 
possible they arose separately.  It is likely that some of the families 
listed here will eventually turn out to be related to one another.

While low-level close relationships are easy to demonstrate, higher-order 
classification proposals must rely on more problematic evidence and tend to 
be controversial.  Recently linguists such as Joseph Greenberg and Vitalij 
Shevoroshkin have attracted attention both in linguistic circles and in the 
popular press with claims of larger genetic units, such as Nostratic 
(comprising Indo-European, Uralic, Altaic, Dravidian, and Afroasiatic) or 
Amerind (to include all the languages of the New World except Na-Dene and
Eskimo-Aleut).  Most linguists regard these hypotheses as having a grossly 
insufficient empirical foundation, and argue that comparisons at that depth 
are not possible using available methods of historical linguistics.

This list isn't intended to be exhaustive, even for families like Germanic
and Italic.  Nor is it the last word on what's a "language"; see question 12.

  Note: English is not descended from Latin.
        English is a Germanic language with a lot of Latin vocabulary,
        borrowed from French in the Middle Ages.

      North Germanic:  Icelandic, Norwegian / Swedish / Danish
      East Germanic:  Gothic (extinct)
      West Germanic:  English, Dutch, German, Yiddish
      Osco-Umbrian:  Oscan, Umbrian (extinct languages of Italy)
      Latin and its modern descendants (Italian, Spanish, Portuguese, 
         Catalan, Rumanian, French, etc.)
      P-Celtic:  Welsh, Breton, Cornish
      Q-Celtic:  Irish, Scots Gaelic, Manx
      Some extinct European languages were also Celtic, notably those of Gaul
    HELLENIC:  Greek (ancient and modern)
    SLAVIC:  Russian, Bulgarian, Polish, Czech, Serbo-Croatian, etc. 
         (not Rumanian or Albanian)
    BALTIC:  Lithuanian and Latvian
      Indic:  Sanskrit and its modern descendants (Hindi-Urdu, 
         Gypsy (Romany), Bengali, etc.)
      Iranian: Persian (ancient and modern), Pashto (Afghanistan), others
    ALBANIAN:  Albanian
    ARMENIAN:  Armenian
    TOKHARIAN (an extinct language of NW China)
    HITTITE (extinct language of Turkey)

    SEMITIC:  Arabic, Hebrew (not Yiddish; see above), Aramaic, Amharic
      and other languages of Ethiopia
    CHADIC:  languages of northern Africa, e.g. Hausa
    CUSHITIC:  Somali, other languages of eastern Africa
    EGYPTIAN:  Ancient Egyptian
    BERBER:  languages of North Africa

NIGER-KORDOFANIAN:  includes most of the languages of sub-Saharan 
    Africa.  Most of the languages are in the NIGER-CONGO branch; the
    most widely known subgroup of N-C is BANTU (Swahili, Zulu, Xhosa, etc.) 

    Finnish, Estonian, Saami (Lapp), Hungarian, and several 
    languages of central Russia

MONGOL:  Mongolian, Buryat, Kalmuck, etc.
TURKIC:  Turkish, Azerbaijani, Kazakh, and other languages of Central Asia

    Some linguists group the Mongol and Turkic families together as ALTAIC.
    Rather more controversially, some add Korean and Japanese to this group.

    It has been claimed that URALIC and ALTAIC are related (as URAL-ALTAIC),
    but this idea is not widely accepted.

DRAVIDIAN:  languages of southern India, including Tamil, Telugu, etc.

    SINITIC:  Chinese (several "dialects", or arguably distinct languages:
      Mandarin, Wu (Shanghai), Min (Hokkien [Fujian], Taiwanese), 
      Yue (Cantonese), Hakka, Gan, Xiang
    TIBETO-BURMAN: Tibetan, Burmese, various languages of Burma,
      China, India, and Nepal

    MON-KHMER:  Vietnamese, Khmer (Cambodian), and various minority 
      and tribal languages of Southeast Asia
    MUNDA:  tribal languages of eastern India

    Malay-Indonesian, other languages of Indonesia (Javanese, etc.)
    Philippine languages: Tagalog, Ilocano, Bontoc, etc.
    Aboriginal languages of Taiwan (Tsou, etc.)
    Polynesian languages: Hawaiian, Maori, Samoan, Tahitian, etc.
    Micronesian:  Chamorro (spoken in Guam), Yap, Truk, etc.
    Malagasy (spoken in Madagascar)
  Most of these languages fall in a branch called MALAYO-POLYNESIAN

JAPANESE:  A number of linguists argue that Japanese is ALTAIC; others,
    that it is most closely related to AUSTRONESIAN, or that it represents 
    a mixture of AUSTRONESIAN and ALTAIC elements.

TAI-KADAI:  Thai, Lao, and other languages of southern China and 
    northern Burma.  Possibly related to AUSTRONESIAN.  
    An outdated hypothesis that TAI is part of SINO-TIBETAN is still 
    often found in reference works and introductory texts.

AUSTRALIA:  the Aboriginal languages of Australia are conservatively 
    classified into 26 families, the largest being PAMA-NYUNGAN, consisting
    of about 200 languages originally spoken over 80-90% of Australia.

A large number of language families are found in North and South America.
There are numerous proposals which group these into larger units, some of
which will probably be demonstrated in time.  To date no New World language 
has been proven to be related to any Old World family.  The larger North 
American families include:

ESKIMO-ALEUT:  two Eskimo languages and Aleut.
ATHAPASKAN:  most of the languages of Alaska and northwestern Canada,
    also includes Navajo and Apache.  Eyak (in Alaska) is related to
    Athapaskan; some linguists put these together with Tlingit and Haida
    in a NA-DENE family.
ALGONQUIAN:  most of Canada and the Northeastern U.S., includes
    Cree, Ojibwa, Cheyenne, Blackfoot
IROQUOIAN:  the languages of NY state (Mohawk, Onondaga, etc.) and Cherokee
SIOUAN:  includes Dakota/Lakhota and other languages of the Plains
    and Southeast U.S.
MUSKOGEAN: Choctaw, Alabama, Creek, Mikasuki (Seminole) and other
    languages of the southeast U.S.
UTO-AZTECAN:  a large family in Mexico and the Southwestern U.S., 
    includes Nahuatl (Aztec), Hopi, Comanche, Paiute, etc.
SALISH:  languages of Washington and British Columbia
HOKAN:  languages of California and Mexico; a controversial grouping
PENUTIAN:  languages of California and Oregon; also controversial

Work on documentation and classification of South American languages still 
has a long way to go.  Generally recognized families include:

ARAWAKAN, TUCANOAN, TUPI-GUARANI (including Guarani, a national language
of Paraguay), CARIBAN, ANDEAN (including Quechua and Aymara)

LANGUAGE ISOLATES:  A number of languages around the world have never been
successfully shown to be related to any others-- in at least some cases 
because any related languages have long been extinct.  The most famous 
isolate is Basque, spoken in northern Spain and southern France; it is 
apparently a survival from before the Indo-Europeanization of Europe.
9. Why do  Hebrew and Yiddish
           Japanese and Chinese
           Persian and Arabic
   look so much alike if they aren't related?

In each of these cases one language has adopted part or all of the 
writing system of an unrelated language.

(To a Chinese, English and Finnish look alike, because they're written 
in the same alphabet.  Yet they are not historically related.)

An excellent introduction to writing systems is Geoffrey Sampson's
10. How do linguists decide that languages are related?           [--markrose]

When linguists say that languages are related, they're not just remarking 
on their surface similarity; they're making a technical statement or claim
about their history-- namely, that they can be regularly derived from a 
common parent language.

Proto-languages are reconstructed using the comparative method.  The 
first stage is to inspect and compare large amounts of vocabulary from the 
languages in question.  Where possible we compare entire _paradigms_ (sets 
of related forms, such as the those of the present active indicative in 
Latin), rather than individual words.

The inspection should yield a set of regular sound correspondences between 
the languages.  By regular, we mean that the same correspondences are 
consistently observed in identical phonetic environments.  Finally, _sound 
changes_ are formulated: language-specific rules which specify how the 
original common form changed in order to produce those observed in each 
descendent language.

Applying the comparative method to the Romance languages, we might find

  'I sense'  Sard /sento/  French /sa~/   Italian /sento/   Spanish /sjEnto/
  'sleep'         /sonnu/         /som/           /sonno/           /suEn^o/

  'hundred'       /kentu/         /sa~/           /tSento/          /sjEnto/
  'five'          /kimbe/         /sE~k/          /tSinkwe/         /sinko/

  'I run'         /kurro/         /kur/           /korro/           /korro/
  'story'         /kontu/         /ko~t@/         /(rak)konto/      /kuEnto/

and hundreds of similar examples.  We see some correspondences--

  (1)        Sard /s/      French /s/     Italian /s/       Spanish /s/
  (2)             /k/             /s/             /tS/              /s/
  (3)             /k/             /k/             /k/               /k/

but they seem to conflict: does Sard /k/ correspond to Spanish /s/ or /k/?
Does French /s/ correspond to Italian /s/ or /tS/?

In fact we will find that the correspondences are regular, once we observe
that (2) is seen before a front vowel (i or e), while (3) is seen in other
environments.  Alternations within paradigms, such as It. /diko/ 'I say' 
vs. /ditSe/ 'says', will help us make and confirm such generalizations.

We may interpret these now-regular correspondences as indicating that an 
initial /s/ in the proto-language has been retained in all four languages, 
and likewise initial /k/ in Sard; but that /k/ changed to /s/ or /tS/ in 
the other languages in the environment of a front vowel.

Actually, this process is iterative.  For instance, at first glance we 
might think that German _haben_ and Latin _habere_ 'have' are obvious 
cognates.  However, after noting the regular correspondence of German h to 
Latin c, we are forced to change our minds, and look to _capere_ 'seize' 
as a better cognate for _haben_.

Thus, similarity of words is only a clue, and perhaps a misleading one.
Linguists conclude languages are related, and thus derive from a common
ancestor, only if they find *regular* sound correspondences between them.

To complicate things, derivations may be obscured by irregular changes,
such as dissimilation, borrowing, or analogical change.  For instance, 
the normal development of Middle English _kyn_ is 'kine', but this word
has been largely replaced by 'cows', formed from 'cow' (ME _cou_) on the 
analogy of word-pairs like stone : stones.  Analogy often serves to reduce 
irregularities in a language (here, an unusual plural).

_Borrowing_ refers to taking words from other languages, as English has
taken 'search' and 'garage' from French, 'paternal' from Latin, 'anger' from 
Old Norse, and 'tomato' from Nahuatl.  How do we know that English doesn't
derive from French or Nahuatl?  The latter case is easy to eliminate: 
regular sound correspondences can't be set up between English and Nahuatl.

But English has borrowed so heavily from French that regular correspondences 
do occur.  Here, however, we find that the French borrowings are thickest in 
government, legal, and military domains; while the basic vocabulary (which 
languages borrow less frequently) is more akin to German.  Paradigmatic 
correspondences like sing/sang/sung vs. singen/sang/gesungen also help show
that the Germanic words are inherited, the French ones borrowed.
11. What is Noam Chomsky's transformational grammar all about?

Several things; it really comprises several layers of theory:

(1) The hypothesis that much of the structure of human language is
inborn ("built-in") in the human brain, so that a baby learning to
talk only has to learn the vocabulary and the structural "parameters"
of his native language -- he doesn't have to learn how language works
from scratch.

This is widely (but not universally) believed; the main evidence consists of:
   - The fact that babies learn to talk remarkably well from what seems
     to be inadequate exposure to language; it can be shown in detail
     that babies acquire some rules of grammar that they could never
     have "learned" from what is available to them, if the structure of
     language were not partly built-in.
   - The fact that the structure of language on different levels
     (vocabulary, ability to connect words, etc.) can be lost by injury
     to specific areas of the brain.
   - The fact that there are unexpected structural similarities between
     all known languages.
For detailed exposition see Cook, CHOMSKY'S UNIVERSAL GRAMMAR (1988), 

(2) The hypothesis that to adequately describe the grammar of a human
language, you have to give each sentence at least two different structures, 
called "deep structure" and "surface structure", together with rules
called "transformations" that relate them. 

This is hotly debated.  Some theories of grammar use two levels and
some don't.  Chomsky's original monograph, SYNTACTIC STRUCTURES (1957),
is still well worth reading; this is what it deals with.

(3) Chomsky's name is associated with specific flavors of transformational
grammar.  The model elaborated over the last few years is called GB
(government and binding) theory; however, Chomsky's 1992 paper on Minimalism
contains significant departures from earlier work in GB.

Bill Turkel ( runs a mailing list on Minimalism; e-mail
him for more information. 

(4) Some people think Chomsky is the source of the idea that grammar ought
to be viewed with mathematical precision.  (Thus there are occasional
vehement anti-Chomsky polemics such as THE NEW GRAMMARIAN'S FUNERAL, which
are really polemics against grammar per se.)

Although Chomsky contributed some valuable techniques, grammarians have
_always_ believed that grammar was a precise, mechanical thing.  They
are highly divided, however, on the nature and function of those mechanisms!
12. What is a dialect?
                                                              [--M.C. + M.R.]
A dialect is any variety of a language spoken by a specific community of
people. Most languages have many dialects.

Everyone speaks a dialect.  In fact everyone speaks an _idiolect_, i.e.,
a personal language.  (Your English language is not quite the same as
my English language, though they are probably very, very close.)

A group of people with very similar idiolects are considered to be
speaking the same dialect.  Some dialects, such as Standard American
English, are taught in schools and used widely around the world.
Others are very localized.  

Localized or uneducated dialects are _not_ merely failed attempts to speak
the standard language.  William Labov and others have demonstrated, for
example, that the speech of inner-city blacks has its own intricate
grammar, quite different in some ways from that of Standard English.

It should be emphasized that linguists do not consider some dialects 
superior to others-- though speakers of the language may do so;
and linguists do study people's attitudes toward language, since 
these have a strong effect on the development of language.

Linguists call varieties of language "dialects" if the speakers can
understand each other and "languages" if they can't.  For example,
Irish English and Southern American English are dialects of English,
but English and German are different languages (though related).

This criterion is not always as easy to apply as it sounds.
Intelligibility may vary with familiarity and interest, or may depend
on the subject.  A more serious problem is the _dialect continuum_: a
chain of dialects such that any two adjoining dialects are mutually
intelligible, but the dialects at the ends are not.  Speakers of
Belgian Dutch, for instance, can't understand Swiss German, but
between them there lies a continuum of mutually intelligible dialects.

Sometimes the use of the terms "language" or "dialect" is politically
motivated.  Norwegian and Danish (being mutually intelligible) are
dialects of the same language, but are considered separate languages
because of their political independence.  By contrast, Mandarin and
Cantonese, which are mutually unintelligible, are often referred to
as "dialects" of Chinese, due to the political and cultural unity of
China, and because they share a common _written_ language.

Because of such problems, some linguists reject the mutual
intelligibility criterion; but they do not propose to return to
arguments on political and cultural grounds.  Instead, they prefer
not to speak of dialects and languages at all, but only of different
varieties, with varying degrees of mutual intelligibility.
13. Are all languages equally complex, or are some more primitive than others?
                                                              [--M.C. + M.R.]
In the last century many people believed that so-called "primitive 
peoples" would have primitive languages, and that Latin and Greek--
or their own languages-- were inherently superior to other tongues.

In fact, however, there is no correlation between type or complexity of
culture and any measure of language complexity.  Peoples of very simple
material culture, such as the Australian Aborigines, are often found to 
speak very complex languages.

Obviously, the size of the vocabulary and the variety and sophistication of
literary forms will depend on the culture.  The _grammar_ of all languages,
however, tends to be about equally complex-- although the complexity may 
be found in different places.  Latin, for instance, has a much richer
system of inflections than English, but a less complicated syntax.

As David Crystal puts it, "All languages meet the social and psychological
needs of their speakers, are equally deserving of scientific study, and can
provide us with valuable information about human nature and society."

The only really simple languages are _pidgins_, which result when speakers
of different languages come to live and work together.  Vocabulary is drawn
from one or both languages, and a very forgiving grammar devised.  Grammars
of pidgins from around the world have interesting similarities (e.g. they
are likely to use repetition to express plurals).  

A pidgin becomes a _creole_ when children acquire it as a native language;
as it evolves to meet the needs of a primary language, its vocabulary and
grammar become much richer.  If a pidgin is used over a long period (for
example, Tok Pisin in Papua New Guinea), it may similarly develop into a 
more complex language known as an _extended pidgin_.
14. What about artificial languages, such as Esperanto?          [--markrose]

Hundreds of constructed languages have been devised in the last few centuries.
Early proposals, such as those of Lodwick (1647), Wilkins, or Leibniz, were 
attempts to devise an ideal language based on philosophical classification 
of concepts, and used wholly invented words.  Most were too complex to learn,
but one, Jean Francois Sudre's Solresol, achieved some popularity in the last
century; its entire vocabulary was built from the names of the notes of 
the musical scale, and could be sung as well as spoken.

Later the focus shifted to languages based on existing languages, with a 
polyglot (usually European) vocabulary and a simplified grammar, whose purpose
was to facilitate international communication.  Johann Schleyer's Volapu"k 
(1880) was the first to achieve success; its name is based on English 
("world-speech"), and reflects Schleyer's notions of phonetic simplicity.  

It was soon eclipsed by Ludwig Zamenhof's Esperanto (1887), whose grammar 
was simpler and its vocabulary more recognizable.  Esperanto has remained 
the most successful and best-known artificial language, with a million or 
more speakers and a voluminous literature; children of Esperantists have 
even learned it as a native language.

Its relative success hasn't prevented the appearance of new proposals, such
as Ido, Interlingua, Occidental, and Novial.  There have also been attempts
to simplify Latin (Latino Sine Flexione, 1903) and English (Basic English, 
1930) for international use.  The recent Loglan and Lojban, based on 
predicate logic, may represent a revival of a priori language construction.

See also Andrew Large, THE ARTIFICIAL LANGUAGE MOVEMENT (1985); Mario Pei, 

There is a newsgroup, soc.culture.esperanto, dedicated to Esperanto.

The ConLang mailing list is devoted to the discussion of constructed 
languages.  To subscribe, e-mail a message to consisting
of the single line
    subscribe conlang firstname lastname
15. What are some stories and novels that involve linguistics?    [--markrose]

The following list is by no means exhaustive.  It's based on James Myers'
list of books, which was compiled the last time the subject came up on 
sci.lang.  Additions and corrections are welcome; please suggest the
approximate category and give the publication date, if possible.

ALIENS AND LINGUISTS: Language Study and Science Fiction, by Walter Meyers
(1980) contains a general discussion and lists more works.

alien languages

	"Tlon, Uqbar, Tertius Orbis" in FICCIONES - Jorge Luis Borges (1956)
	40000 IN GEHENNA - C.J. Cherryh
	BABEL-17 - Samuel R. Delany (1966)
	FLIGHT OF THE DRAGONFLY - Robert L. Forward (1984)
	THE HAUNTED STARS - Edmond Hamilton
	"Omnilingual", in FEDERATION - H. Beam Piper
	CONTACT - Carl Sagan (1985)
	PSYCHAOS - E. P. Thompson
	"A Martian Odyssey" in SF HALL OF FAME - Stanley Weinbaum (1934)
	"A Rose for Ecclesiastes" in SF HALL OF FAME - Roger Zelazny (1963)

futuristic varieties of English

	A CLOCKWORK ORANGE - Anthony Burgess (1962)
	HELLFLOWER - eluki bes shahar
	THE INHERITORS - William Golding (1955)
	THE MOON IS A HARSH MISTRESS - Robert Heinlein (1966)
	RIDDLEY WALKER - Russel Hoban (1980)
	1984 - George Orwell (1948)

other invented languages

	NATIVE TONGUE - Suzette Haden Elgin (1984)
	"Gulf" in ASSIGNMENT IN ETERNITY - Robert A. Heinlein (1949)
	PALE FIRE - Vladimir Nabokov
	THE KLINGON DICTIONARY - Marc Okrand (1985)
	THE LORD OF THE RINGS - J R R Tolkien (1954-55)
	THE MEMORANDUM - Vaclav Havel (1966)
	THE LANGUAGES OF PAO - Jack Vance (1957)

linguist heroes

	DOUBLE NEGATIVE - David Carkeet
	PYGMALION - George Bernard Shaw (1912)
	THE POISON ORACLE - Peter Dickinson (1974)
	HANDS ON - Andrew Rosenheim (1992)

animal language

	WATERSHIP DOWN - Richard Adams
	TARZAN OF THE APES - Edgar Rice Burroughs (1912)
	CONGO - Michael Crichton

use of linguistic theory

	SNOW CRASH - Neal Stephenson (1992)
	GULLIVER'S TRAVELS - Jonathan Swift (1726)
	THE EMBEDDING - Ian Watson (1973)
	Ozark trilogy - Suzette Haden Elgin


	THE TROIKA INCIDENT - James Cooke Brown (1969)   [Loglan]
	LOVE ME TOMORROW - Robert Rimmer (1976)   [Loglan]
	ETXEMENDI - Florence Delay  [Chomsky ref]
	TONGUES OF THE MOON - Philip Jose Farmer
	DUNE - Frank Herbert (1965)
	THE DISPOSSESSED - Ursula LeGuin (1974)
16. What about those Eskimo words for snow? (and other myths about language)

   "The Eskimos have hundreds of words for snow."

This story is constantly being repeated, with various numbers given,
despite the fact that it has no basis at all.  No one who repeats this
pseudo-factoid can list the hundreds of words for you, or even cite a 
work that does.  They just heard it somewhere.

The anthropologist Laura Martin has traced the development of this myth
(including the steady growth in the number of words claimed).  Geoffrey
Pullum summarizes her report in THE GREAT ESKIMO VOCABULARY HOAX (1991).

How many words are there really?  Well, the Yup'ik language in particular
has about two dozen roots describing snow or things related to snow.  This
is not particularly significant; English can amass about the same total:
snow, sleet, slush, blizzard, flurry, avalanche, powder, hardpack,
snowball, snowman, and other derivatives.

The Yup'ik total could be greatly expanded by other derived words, since
the Inuit languages can form hundreds of words from a single root.  But
this is true of all words in the language (and indeed of all agglutinative
languages), not just the words for snow.

   "There's a town in Appalachia that speaks pure Elizabethan English."

There isn't.  All languages, everywhere, are constantly changing.  Some
areas speak more conservative dialects, but we know of no case where 
people speak exactly as their ancestors spoke centuries ago.

Of course, ancient languages are sometimes revived; biblical Hebrew has
been revived (with some modifications) in modern Israel; and there's a
village in India in which Sanskrit is being taught as an everyday
language.  But these are conscious revivals of languages which have
otherwise died out in everyday use, not survivals of living languages.

   "Chinese characters directly represent ideas, not spoken words."

Westerners have been taken by this notion for centuries, ever since
missionaries started describing the Chinese writing system.  However, it's
quite false.  Chinese characters represent specific Chinese words.  

(To be precise, almost all characters represent a particular syllable with
a particular meaning; about 10% however represent one syllable of a 
particular two-syllable word.)

The vast majority of characters consist of a _phonetic_ giving the
approximate pronunciation of the word, plus a _signific_ giving a clue to
its meaning (thus distinguishing different syllables having different
meanings).  As an added difficulty, many of the phonetics are no longer
helpful, because of sound changes since the characters were devised, over
2000 years ago.  However, it is estimated that 60% of the phonetics still
give useful information about the character's pronunciation.

To be sure, Japanese (among other languages) uses Chinese characters too,
and it is a very different language from Chinese.  However, we must look
at exactly how the Japanese use the Chinese characters.  Generally they
borrowed both the characters and the words represented; it's rather as if
when we borrowed words like _psychology_ from Greek, we wrote them in the
Greek alphabet.  Native Japanese words are also written using the Chinese
characters for the closest Chinese words: if the Japanese word overlaps
several Chinese words, different characters must be written in different
contexts, according to the meanings in Chinese.

A good demythologizing of common notions about Chinese writing is found in

   "German lost out to English as the US's official language by 1 vote."

This entertaining story is also told of Greek, Latin, and even Hebrew.

There was never any such vote.  Dennis Baron, in THE ENGLISH ONLY QUESTION
(1990), thinks the legend may have originated with a 1795 vote concerning
a proposal to publish federal laws in German as well as English.  At one
point a motion to table discussion (rather than referring the matter back
to committee) was defeated 41-40.  The proposal was eventually defeated. 

   "Sign language isn't really a language."
   "ASL is a gestural version of English."

Sign languages are true languages, with vocabularies of thousands of words,
and grammars as complex and sophisticated as those of any other language,
though with fascinating differences from speech.  If you think they are
merely pantomime, try watching a mathematics lecture, a poetry reading, or
a religious service conducted in Sign, and see how much you understand.

ASL (American Sign Language) is not an invented system like Esperanto; it
developed gradually and naturally among the Deaf.  It has no particular
relation to English; the best demonstration of this is that it is quite
different from British Sign.  Curiously enough, it is most closely related
to French Sign Language, due to the influence of Laurent Clerc, who came
from Paris in 1817 to be the first teacher of the Deaf in the US.

ASL is not to be confused with Signed English, which is a word-for-word
signed equivalent of English.  Deaf people tend to find it tiring, because
its grammar, like that of spoken languages, is linear, while that of ASL is
primarily spatial.

For more on Sign and the Deaf community, see Oliver Sacks' SEEING VOICES
(1989), or Harlan Lane, WHEN THE MIND HEARS.
17. Where can I get an electronic IPA font (or other electronic resources)?
     [Adapted from information posted to sci.lang by Sean Redmond,
     Evan Antworth, J"org Knappen, Alex Rudnicky, Enrico Scalas,
     and Mark Kantrowitz.

     If you know of other publically available (and legal) fonts or 
     other linguistic resources, please e-mail me or post to sci.lang, 
     so they can be listed here.]

* A number of Postscript Type 1 and TrueType fonts (including IPA, Greek,
  Cyrillic, Armenian, etc.) are available by ftp from 

     host []
     directory: pub/pc/win3/fonts/truetype

  List (ls) the directory to see what's available.  The files are zipped; 
  a version of unzip is usually available on whatever host you use 
  to ftp with.

  Note: TrueType files can be used under Windows or on the Macintosh.
  I'm not sure if the unzipped files can be inserted directly into the
  Mac's Fonts folder; I ran them through Fontographer first.

* The SIL IPA fonts (also in PostScript Type 1 and TrueType versions)
  are also available by ftp from

     host: []
        Windows version: /msdos/windows/fonts/truetype/sil-ipa12.exe
        Mac version:     /mac/system.extensions/font/type1/silipa1.2.cpt.hqx

* They are also available on diskette for $5.00 plus postage: $2.00 in U.S. 
  or $5.00 outside U.S.  Order from:

     SIL Printing Arts Department
     7500 W. Camp Wisdom Road
     Dallas TX  75236   USA

     tel:    214-709-2495, -2440
     fax:    214-709-3387.

* Some IPA fonts for TeX can be found in the CTAN archives

  in the directories


* The Carnegie-Mellon 100,000-word English dictionary can be retrieved
  as follows.

     host: []
     directory: project/fgdata/dict

  Retrieve the following files:

     cmudict.0.2.Z (compressed)
     cmulex.0.1.Z (compressed)

* If you have access to the World Wide Web, the following servers contain
  information of interest on linguistics or languages.

  Brown University linguistics page

  Summer Institute of Linguistics Ethnologue (languages database)

  University of Virginia electronic text center

  University of Stuttgart - Institute for Natural Language Processing

  Tyler Jones' Human-Languages Page

  sci.lang faq on the Web (Netherlands)

* sci.lang (since October 1994) is archived at

18. How do I subscribe to the LINGUIST list?

The LINGUIST list is a mailing list dedicated to linguistics; it's more
technical than sci.lang.  To subscribe, send an e-mail message to

containing the single line

     subscribe linguist firstname lastname
for example:

     subscribe linguist Edward Sapir

Presumably the same message with "unsubscribe" will take you off the
mailing list.
19. How can I represent phonetic symbols in ASCII?

The following table is a summary of Evan Kirshenbaum's IPA/ASCII schema,
which a number of posters have been using in sci.lang and alt.usage.english.
Evan ( has available a fuller explanation of the system.
This summary is presented for convenience only, and is not intended to
forestall discussion of alternative systems.

     blb-- -lbd-- --dnt-- --alv-- -rfx- -pla-- --pal--- --vel-- -----uvl-----

nas    m      M       n[      n      n.            n^      N           n"
stp  p b           t[ d[    t d   t. d.        c   J     k g      q    G
frc  F V    f v    T  D     s z   s. z.  S Z   C C<vcd>  x Q      X    g"
apr        r<lbd>     r[      r      r.            j       j<vel>      g"
lat                   l[      l      l.            l^      L
trl  b<trl>                r<trl>                                      r"
flp                           *      *. 
ejc  p`            t[`      t`                 c`        k'
clk  p!            t!       c!                   c!      k!
imp    b`             d`      d`                   J`      g`     q`   G`

     ---- lbv ----   --phr--  ---glt---

nas         n<lbv>                               alv lat frc: s<lat> z<lat>
stp  t<lbv> d<lbv>              ?                    lat flp: *<lat>
frc  w<vls>   w      H H<vcd>   h<?>                 lat clk: l!
apr           w                 h

    ----- unr -----     unr     ----- rnd -----
    fnt   cnt   bck     cnt     fnt   cnt   bck
hgh  i     i"    u-              y     u"    u
smh  I                           I.          U
umd  e   @<umd>  o-    R<umd>    Y           o
mid        @             R             @.
lmd  E     V"    V               W     O"    O
low  &     a     A               &.    a.    A.

      Vowels:     Consonants:            + =   ad hoc diacritic
  ~   nasalized   velarized              [     dental
  :   long                               !     click  
  -   unrounded   syllabic               <H>   pharyngealized
  .   rounded     retroflex              <h>   aspirated
  `               ejective/implosive     <o>   unexploded or voiceless
  ^               palatal                <r>   rhotacized
  ;               palatalized            <w>   labialized
  "   centered    uvular                 <?>   murmured
Other symbols:
  $ %     ad hoc segment
  []      phonetic transcription
  //      phonemic transcription
  #       syllable or word boundary
  space   word/segment separator
  ' ,     primary and secondary stress
  0-9     tones