Word lists grounded in principles of frequency and/or functionality have been advocated in instructed SLA since the early 20th century (González-Fernández & Schmitt 2017), though they have at times proved controversial (Coste 2006). Policy-makers in England are currently adopting a frequency-led approach for foreign languages in schools; a recent specification (DFE 2022) requires that learners aged 16 have a vocabulary of 1700 items, and that 85% of lexis to be taught be drawn from the "2000 most frequent words" in the language.
Vocabulary theorists concur that instructed learners require a mix of intensive focused instruction and extensive exposure to build vocabulary (Webb 2020). But little is known about the lexis to which young instructed learners are actually exposed, and how far authentic classroom discourse reflects the frequency distributions in reference corpora.
This paper explores relationships between lexical frequency in a reference corpus (RC) for French developed for educational purposes (Lonsdale & Le Bras 2009), and an input corpus (IC) of authentic classroom L2 French. IC comprises a 33-hour sequence of French lessons recorded with a single class of 7-8 year old beginners (AUTHOR 2012, 2019). All speech was transcribed and analysed with CHAT/CLAN (MacWhinney 2000); analysis showed that the learners were exposed over time to just under 700 word types (with token frequencies ranging from many hundreds, to single occurrences). Using the tool MultiLingProfiler (Finlayson 2021), frequency band positions according to RC were calculated for the lexis of IC. This analysis showed that 61.0% of input items fell within the 2000-word band in RC; detailed analysis showed these items were well distributed across word classes, and the commonest RC function words were well represented. However, IC also included many less frequent words, e.g. 16.5% of word types had frequencies below 1 in 5000 words. These "rare" words were mainly nouns and adjectives, relating to particular topics which the teacher used to motivate language use (e.g. animals, body parts, foods). Further analysis of learner achievement data and of instructional strategies showed that such words were just as learnable as more frequent words, given reasonable input frequency, and multimodal support.
Conclusions are drawn regarding the relevance of reference corpora for early instructed SLA, and the design features which may maximise their usefulness.
References
AUTHOR 2012
AUTHOR 2019
Coste, D. (2006). Français élémentaire, débats publics et représentations de la langue. Documents pour l'histoire du français langue étrangère ou seconde, 36.
Department for Education. (2022). French, German and Spanish: GCSE subject content.
https://www.gov.uk/government/publications/gcse-french-german-and-spanish-subject-contentFinlayson, N., Marsden, E., Anthony, L., Bovolenta, G., & Hawkes, R. (2021). MultiLingProfiler (Version 2) [Computer software]. University of York. https://www.multilingprofiler.net/
González-Fernández, B., & Schmitt, N. (2017). Vocabulary acquisition. In S. Loewen & M. Sato (Eds.), The Routledge handbook of instructed second language acquisition (pp. 280-298). Routledge.
Lonsdale, D. & Le Bras. Y. (2009). A frequency dictionary of French: Core vocabulary for learners. Routledge.
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk. 3rd Edition. Lawrence Erlbaum Associates.
Webb, S. (Ed.) (2020). Routledge handbook of vocabulary studies. Routledge.