下文選自上海外語教育出版社引進出版的經典英語教學法著作《詞匯:描述,、習得與教學》,,翻譯武太白。 How many words are needed to do the things a language user needs to do? Although a language makes use of a large number of words, not all of these words are equally useful. One measure of usefulness is word frequency, that is, how often the word occurs in normal use of the language. From the point of view of frequency, the word “the” is a very useful word in English. It occurs so frequently that about 7 per cent of the words on a page of written English and the same proportion of the words in a conversation are repetitions of the word “the”. Look back over this paragraph and you will find an occurrence of “the” in almost every line. The good news for second language learners and second language teachers is that a small number of the words of English occur very frequently and if a learner knows these words, that learner will know a very large proportion of the running words in a written or spoken text. Most of these words are content words and knowing enough of them allows a good degree of comprehension of a text. Here are some figures showing what proportion of a text is covered by certain numbers of high frequency words. Table 1 Vocabulary size and text coverage in the Brown corpus
(taken from Francis and Kucera, 1982) The figures in Table 1 refer to written texts and are from Francis and Kucera (1982) which is a very diverse corpus of over 1,000,000 running words made up of 500 texts of around 2,000 running words long. As we shall see, the more diverse the texts in a corpus are, the greater the number of different words, and the high frequency words cover slightly less of the text, so these figures are a conservative estimate. The figures in the last line of the table are from Kucera (1982). The Collins COBUILD English Language Dictionary (1987) claims that 15,000 words cover 95 per cent of the running words of their corpus. The figures in Table 1 are for lemmas and not word families. (A lemma is a base word and its inflected forms.) Word families would give fractionally higher coverage. Table 1 assumes that high frequency words are known before lower frequency words and shows that knowing about 2,000 word families gives near to 80 per cent coverage of written text. The same number of words gives greater coverage of informal spoken text - around 96 per cent (Schonell, Meddleton and Shaw, 1956). (McCarthy and Carter discuss other differences between spoken and written discourse in the next chapter.) With a vocabulary size of 2,000 words, a learner knows 80 per cent of the words in a text which means that one word in every five (approximately two words in every line) are unknown. Research by Liu Na and Nation (1985) has shown that this ratio ot unknown to known words is not sufficient to allow reasonably successful guessing of the meaning of the unknown words. At least 95 per cent coverage is needed for that. Research by l.aufer (1988a) suggests that 95 per cent coverage is sufficient to allow reasonable comprehension of a text. A larger vocabulary size is clearly better. Table 2 is based on research by Hirsh and Nation (1992) about novels written for teenage or younger readers. The Hirsh and Nation (1992) study looked at such novels because they might provide the most favourable conditions for second language learners to read unsimplified texts. These conditions could come about because they are aimed at a non-adult audience and thus there may be a tendency for the writer to use simpler vocabulary, and because a continuous novel on one topic by one writer provides opportunity for the repetition of vocabulary. Table 2 shows that under favourable conditions, a vocabulary size of 2,ooo to 3,000 words provides a very good basis for language use. Table 2 Vocubulary size and coverage in novels for teenagers
The significance of this information is that although there are well over 54,000 word families in English, and although educated adult native speakers know around 20,000 of these word families, a much smaller number ot words, say between 3-5,000 word families is needed to provide a basis for comprehension. It is possible to make use of a smaller number, around 2-3,000 for productive use in speaking and writing. Hazenburg and Hulstijn (1996), however, suggest a figure nearer to 10,000 for Dutch as a second language. Sutarsyah, Nation and Kennedy (1994) found that a single long economics text was made up of 5,438 word families and a corpus of similar length made up of diverse short academic texts contained 12,744 word families. Within narrowly focused areas of interest, such as in an economics text, a much smaller vocabulary is needed than if the reader wishes to read a wide range of texts on a variety of different topics. 盡管一門語言用到的單詞數(shù)是很大的,,卻并不是所有這些單詞都一樣有用,。對“有用性”的一種量度就是單詞出現(xiàn)頻率,即一個單詞在語言的正常使用中有多么經常出現(xiàn),。從出現(xiàn)頻率的角度來看,,單詞“the”是英語中非常有用的一個詞。它出現(xiàn)得如此頻繁,,以至于英語書面語一頁中有7%都是這個詞,口語中的重復頻率也一樣,?;仡^看看本段,你在幾乎每一行都找到“the”的身影,。 對第二語言學習者和教師來說,,好消息是出現(xiàn)頻率非常高的英語單詞為數(shù)不多,只要學習者學會了這些詞,,他們就能懂得書面或口語文本中的大部分行文,。這些單詞中的絕大部分都是實義詞,這些詞只要認識得夠多,,就能很好地理解一篇文本,。下面是一些數(shù)字,表明高頻詞掌握數(shù)量與文本覆蓋率的關系(圖表見上文): 表1中的數(shù)字所指為書面文本,,來自弗朗西斯和庫切拉(1982),,這是一個五花八門的語料庫,有超過100萬詞,,由500篇各約2000詞的文本組成,。我們會看到,,語料庫的文本內容越是多樣化,不同的單詞數(shù)量就越多,,高頻詞所能覆蓋的文本比率相應也就略少,,所以這些數(shù)字是比較保守的估計。表格最后一行的數(shù)字來自庫切拉(1982),??屏炙?/span>COBUILD英語詞典(1987)稱15000詞覆蓋了其語料庫行文的95%。表1的數(shù)字計算的是詞元而非詞族,。(“詞元”是基礎詞及其語法變化形式的統(tǒng)稱,。)如按照詞族計算,則同樣的數(shù)字會覆蓋略微高一點比例的內容,。表1的假定是,,高頻詞要比低頻詞先認識,這份表格也表明,,認識2000個詞族就能懂得書面文本的80%,。同樣多的詞族能夠覆蓋的非正式口語文本比例則高得多——96%左右(肖奈爾,麥德爾頓和肖,,1956),。(下一章麥卡錫和卡特探討的是口語和書面語的其他不同。) 學習者如有2000詞的詞匯量,,就能懂得文本中80%的內容,,即每五個詞中有一個詞是不會的(大約每行兩個詞)。劉娜(音)和納辛(1985)已經表明,,這種生詞率還不足以使學習者在猜測生詞意義方面取得不錯的成績,。要想做到這一點,至少要有95%的詞匯覆蓋率,。勞弗爾(1988a)所做的研究顯示,,(學習者有)95%的詞匯覆蓋率,就足以取得還算不錯的文本理解,。顯然,,詞匯量大些會更好。表二基于希爾施和納辛(1992)的研究,,內容是為青少年或年齡更小的讀者所創(chuàng)作的小說,。 希爾施和納辛(1992)的研究內容是這類小說,因為其能夠為二語學習者提供最有利的條件,,讓他們閱讀到沒有經過簡化的文本,。之所以有這樣的有利條件,是因為這類小說的目標讀者不是成年人,,所以作者可能傾向于使用更簡單的詞匯,,并且,,同一位作者就單一主題的創(chuàng)作連貫性強,提供了(學習者接觸)詞匯反復的機會,。表2表明,,在有利的條件下,2000到3000詞就能為語言運用打下良好基礎,。 這一發(fā)現(xiàn)的意義在于,,盡管英語中有遠超54000個詞族,盡管受過教育的本國語者能懂得其中的20000個詞族,,但只需要掌握小得多的詞匯量,,比如說3000到5000詞族,即可為理解打下基礎,。在說和寫中,,可以利用再小一點的詞匯量,兩三千詞左右,,即可進行輸出性運用,。不過,哈增伯格和胡爾斯蒂金(1996)提出,,荷蘭語的二語學習者要學會將近10000詞(才能理解和運用),。 蘇斯塔爾西亞,納辛和肯尼迪(1994)發(fā)現(xiàn),,一份較長的經濟學文本由5438個詞族組成,,而由多樣化的較短學術文本組成的相近長度的語料庫包含12744個詞族。在高度聚焦的話題領域,,比如在經濟學文本中,,所需要的詞匯量要比閱讀多種不同話題形成的多篇文本所需要的詞匯量小得多。 |
|
來自: 昵稱70926123 > 《待分類》