In recent years, the freely available LLAMA tests (Meara, 2005) have been widely used as a test of language learning aptitude. However, they have been subject to a number of criticisms, including their reliability (Bokander and Bylund, 2020) and construct validity, namely whether the aptitude tests are really memory tests. Wen (2016) suggests aptitude and working memory (WM) may overlap whereas Buffington and Morgan-Short (2019) argue for a role for declarative and procedural (LTM) memory. These criticisms relate to the original LLAMA tests but in 2019, Meara and Rogers released an updated, online version (v.3). This paper seeks to empirically investigate:
1. Are the LLAMA v.3 tests (more) reliable?
2. Do the LLAMA v.3 tests measure the same thing as common WM and LTM tests?
The four LLAMA tests - LLAMA B (vocabulary), LLAMA D (sound recognition), LLAMA E (sound-symbol correspondence) and LLAMA F (grammatical inferencing) - were re-programmed into gorilla.sc for online administration. The test battery also included a Flanker test and auditory digits forwards and backwards tests (various WM components), Tower of Hanoi (procedural memory) and CVMT (declarative memory). 210 participants took the 8 tasks as well as a background questionnaire (F=145, Mean age = 23, (range 18-84)).
Data analysis for the LLAMA D, Hanoi and digits tasks is ongoing. Preliminary analysis with the other tasks for RQ1 shows that the Cronbach's alpha scores are all over .80 suggesting the new test is an improvement on the previous version in terms of internal consistency. Table 1 compares the Cronbach alpha scores from this study with the new LLAMA tests and those found in Bokander & Bylund's (2020) paper with the original tests.
Table 1:
Comparison of Cronbach alpha scores.pdfIn terms of RQ2: a principal components analysis was carried out to establish if the Flanker, CVMT and LLAMA B, E & F tests were measuring the same construct. A chi-squared test showed the model was significant (p< .001) and two components were produced (see Table 2:
Component Loadings.pdf).
Note. Applied rotation method is promax.
The results suggest that the declarative memory measure loads on the same component as the three LLAMA tests (Buffington & Morgan-Short, 2019) whereas the executive function test measures something else contra Wen (2016). Additional analysis will include the remaining tests and the reaction time measures collected.
References:
Bokander, L., & Bylund, E. (2020). Probing the internal validity of the LLAMA language aptitude tests. Language learning, 70(1), 11-47.
Buffington, J., & Morgan-Short, K. (2019). Declarative and procedural memory as individual differences in second language aptitude. In Language Aptitude (pp. 215-237). Routledge.
Meara, P. (2005). LLAMA language aptitude tests: The manual. Swansea: Lognostics.
Meara, P. and Rogers, V. (2019) The LLAMA Tests v3. Cardiff: Lognostics.
Wen, Z. E. (2016). Working memory and second language learning. Multilingual matters.