Empowering autonomy in language learning: the sustainable impact of data-driven learning on noun collocation acquisition
The participants
This study investigated 70 Chinese learners of English, all aged 16, from a senior high school in Hubei Province, China. These students, who were in their Senior Two year at the time of the study, had previously passed the English language subject with at least 95 out of 150 marks during their Senior One year. Achieving a score of at least 95 out of 150 marks indicates a good foundational understanding of English. This threshold suggests that students who achieve at least 95 marks demonstrate above-average proficiency compared to their peers. This proficiency level is reflective of their ability to grasp intermediate-level English language concepts, perform well in reading comprehension, construct grammatically correct sentences, and understand spoken English at a moderate level (Fang et al. 2008).
The group consisted of 39 males and 31 females, reflecting the school’s gender distribution. To ensure consistency in educational backgrounds, these students were selected from two classes that followed the same curriculum and used identical teaching materials. All participants were native Mandarin speakers learning English as a foreign language. A pre-study questionnaire confirmed that none of the participants had prior experience using corpus tools. The students were divided into two groups of 35: a control group and an experimental group. The control group continued their regular English classes and could use any resources except ‘Corpusmate’, an online corpus platform, during their revision before the collocation test. In contrast, the experimental group was instructed to use ‘Corpusmate’ during their treatment period.
Corpus tool and Corpora
The study utilised ‘Corpusmate’, an innovative online corpus platform to facilitate access to a vast collection of English language texts. Developed by Crosthwaite and Baisa (2023), ‘Corpusmate’ offers a streamlined and simplified experience in language data exploration tailored specifically for younger learners. Its design integrates the most effective features of existing tools into a cohesive digital environment, making it particularly suitable for secondary school students. This user-friendly platform is ideal for fostering independent language learning, providing an intuitive interface for linguistic exploration. The corpus has been compiled from 6 different spoken and written resources: the British Academic Written English (BAWE), TED talks, Simple English Wikipedia, BBC Teach, Elsevier corpus, and BNC 2014 Spoken. Instructions were given to the learners on how to use ‘Corpusmate’ to explore noun collocations. The interface of ‘Corpusmate’ is shown in Fig 1.
DDL training
For the two-month training, the learners were provided with selected nouns from the Academic Word List (AWL) to explore noun collocations, i.e. pre-modifiers and nouns that form collocations. AWL, developed by Coxhead (1998) at Victoria University of Wellington, New Zealand, comprises 570 word families. These words were included in AWL for their high frequency across a wide array of academic texts. Significantly, the AWL excludes words found in the top 2000 most common English words, known as the General Service List, thereby tailoring it to academic settings. Designed primarily for academic purposes, the AWL serves as a resource for teachers preparing students for tertiary education and for students independently aiming to acquire vocabulary critical for college and university studies. The words in the AWL are categorised into ten distinct Groups. This categorisation is based on frequency, with the words in the first group being the most common and those in the tenth group being the least common within academic texts. This systematic arrangement assists learners in prioritizing their study of these words in accordance with their frequency of use in academic contexts. In this study, learners were asked to explore 10 nouns from each Group in AWL for each session. The DDL training sessions were conducted twice a week, with each session lasting 80 min.
Steps in conducting DDL training
To conduct DDL training using specific nouns from the AWL developed by Coxhead (1998), we followed several steps to create an engaging and effective educational experience.
Step 1: Selection of Nouns from AWL
We selected nouns from the AWL that align with the learners’ proficiency level to ensure that the vocabulary is challenging enough to facilitate learning without being overly difficult or discouraging. This selection process was guided by two English teachers who provided their expertise to ensure the appropriateness of the chosen nouns for the students’ academic level. We chose the following noun as example:
Policy
Step 2: Creating Collocations
We asked the learners to use the selected noun to form three types of collocations: verb-noun, noun-noun, and adjective-noun. Here are examples for each type using the selected noun:
Verb-Noun Collocations:
Implement policy
Formulate policy
Noun-Noun Collocations:
Policy maker
Policy revision
Adjective-Noun Collocations:
National policy
Effective policy
Step 3: Compilation of authentic examples
We asked learners to extract authentic sentences from ‘Corpusmate’ that showcase how these collocations are used in different contexts.
Step 4: Analysis and discovery
We asked learners to identify patterns or rules about the use of these nouns in various collocations. This discovery-based approach helps deepen their understanding of the language.
Step 5: Guided practice
After learners have explored these examples, we guided them through structured practice activities. This involved using collocations to complete sentences, and creating their own sentences using these collocations.
Step 6: Feedback and revision
We provided feedback on learners’ attempts and encouraged them to revise their sentences or try new combinations based on the feedback.
It is worth noting that during the DDL training phase, students were engaged in forming and analysing a wide variety of collocations, including both standard and non-standard examples. This approach was intentionally chosen to encourage a deeper understanding of collocational patterns and to foster flexibility in language use. The training was not limited to a predetermined set of collocations; instead, it was designed to expose students to a broad spectrum of collocational possibilities, reflecting natural language use.
For the control group, English teachers in the control group were informed about the study’s focus on noun collocations and were instructed to integrate this focus into their regular teaching practices. This means that while the teachers continued to deliver the standard curriculum, they also incorporated additional emphasis on noun collocations during their lessons. This ensured that the control group received relevant exposure to collocations, like what was emphasised in the study. The participants were given the same set of nouns extracted from the AWL as the experimental group. To ensure that their learning was aligned with the collocations tested in the assessment, the control group was instructed to focus specifically on identifying and learning collocations involving these given noun heads. They were guided to explore various resources, such as collocation dictionaries, thesauruses, and the internet, to find common nouns, verbs, and adjectives that can precede each noun head provided. This guided exploration aimed to ensure that the control group’s learning activities were focused on the same noun collocations emphasised in the test, despite not using the ‘Corpusmate’ tool or any other corpus-based resources.
While the control group did not use corpus-based tools, the learning process was carefully structured to parallel the objectives of the DDL training. This ensured that both groups were studying the same types of collocations, allowing for a fair comparison of the effectiveness of the DDL approach versus more traditional, resource-based methods. By aligning the content of the control group’s instruction with the test items, we aimed to provide a valid measure of the impact of DDL training on collocation acquisition.
Collocation test
To measure the effectiveness of the DDL approach, a 60-item test was developed and administered to all learners before and after the training. The items focused on three prevalent types of noun collocations: verb-noun (e.g., “tackle [the] issues”), noun-noun (e.g., “emergency response”), and adjective-noun (e.g., “social context”), with 20 items dedicated to each category. This distribution ensured equal representation for each type of noun pre-modifier. These collocation types were chosen for their pedagogical significance, as they are commonly used in the language yet pose unique challenges for ESL and EFL learners due to their varied levels of fixedness and potential for cross-linguistic influence, such as L1 transfer (Nesselhauf, 2004; Pérez-Paredes and Sánchez-Tornel, 2014). The test design drew inspiration from previous studies on collocation knowledge assessment (such as Boers et al. 2014; Gyllstad and Schmitt, 2019) and adopted a binary-choice format, requiring learners to choose the correct pair of collocating words. To develop the distractor items for the test, two methods were employed: firstly, incorporating common collocational mistakes found in prior research on ESL and EFL learners (for example, in Lu, 2017), and secondly, integrating input from three native English language instructors from an English language centre in Hubei province, China. The test was administered three times to ensure a comprehensive assessment: before, immediately after, and three months post-training. This approach aimed to gauge both immediate and long-term learning outcomes. Additionally, we randomised the order of items in each session to minimise biases related to question sequence or random guessing.
To ensure the binary-choice test accurately reflected the content and activities covered during the training, a systematic approach was employed in the test design. The test items were developed based on the specific types of collocations and patterns that were emphasised during the training sessions. For example, if students practiced forming noun + noun, adjective + noun, and verb + noun collocations, the test included a proportional number of items from each of these categories. Furthermore, the test items were selected to mirror the level of difficulty and the types of collocational combinations that students encountered during the training. Besides, the test was designed to assess not just the recall of specific collocations but also the students’ ability to apply the principles and patterns they learned to new collocations. This was done to evaluate their understanding of collocational usage beyond the examples directly covered in the training. By aligning the test content with the training focus in this manner, we aimed to ensure that the test items were representative of the learning objectives and that students were fairly assessed on their ability to form and recognise collocations as practiced during the training.
The study included both an experimental group, which received DDL training, and a control group, which did not. This design allowed for a comparison of the impact of the DDL training on noun collocation learning between the two groups. The collocation test was carefully designed to align with the DDL training and conventional classroom teaching. The selected nouns from AWL were provided to both groups, ensuring that the test items reflected the specific types of collocations and activities learners engaged with during the training and conventional classroom learning. This alignment allowed for an accurate assessment of the DDL training’s effectiveness in comparison with conventional classroom learning. This approach ensured that any improvements observed in the experimental group’s test scores could be attributed to the DDL training, as the test directly assessed the content covered during the training.
By comparing the pre- and post-test results of the experimental group with those of the control group, which did not receive DDL training, the study could isolate the effect of the DDL intervention. This comparative analysis provided a clear measure of the DDL approach’s impact on learners’ ability to understand and use noun collocations. The control group served as a baseline to determine the natural progression in collocational competence, while the experimental group’s results highlighted the added value of the DDL training.
Learner experience and perception with corpus-based tools: questionnaire survey
We used questionnaires to obtain feedback from learners in the experimental group to evaluate the DDL technique’s effectiveness. These questionnaires were designed to capture learners’ perspectives on several key aspects: the overall utility of the DDL approach, their experience with the online corpora and the software tool ‘Corpusmate’, and their confidence in independently learning English collocations. This approach aligns with the findings of Chang and Sun (2022), who emphasised the importance of learner feedback in assessing the impact of innovative teaching methods on student motivation and autonomy. The questionnaire was adapted from Crosthwaite and Steeples (2022) and administered directly following the completion of the post-training test. It was divided into two sections, including perceptions of corpus training and perceptions of DDL for improving knowledge or use of collocations, which would provide a comprehensive view of the learners’ experiences and opinions.
Data analysis
In addressing Research Question 1 regarding DDL’s effects on noun collocation learning, the scores for the collocation tests were calculated by assigning one point for each correct answer. The dataset was reviewed to confirm that it met essential criteria (like normal distribution) for an Analysis of Variance (ANOVA) test. This study used a repeated-measures ANOVA to compare scores from three different testing times: before, immediately after, and three months following the training. This method was chosen as it effectively examines differences in average scores across various experimental conditions over multiple time points (Larson-Hall, 2015). Additionally, it accounts for individual variances within and between groups, allowing for adjustments for any initial differences in knowledge, as measured in the pre-test. In analysing the data, both the p-value (with a standard threshold of 0.05) and the effect sizes (using partial eta-squared) were calculated. For effect sizes, ηp2 values of 0.01 were considered small, 0.06 medium, and 0.14 large (Cohen, 1988). The analysis focused on overall changes in collocation knowledge across the entire test. Cohen’s d would be utilised to measure the effect size of the intervention between the experimental and control groups, should there be any differences in the pre-test scores. Cohen’s d is a standardised measure of effect size that expresses the mean difference between two groups in terms of standard deviation, allowing for the comparison of effect sizes across different studies and contexts (Cohen, 1988). This test provides a guideline for interpreting the magnitude of effect size. A small effect size indicates a modest difference between groups, while a large effect size indicates a substantial difference.
To address Research Question 2, which aimed to evaluate learners’ perceptions of the DDL training and the use of the ‘Corpusmate’ tool, the questionnaire data were designed and analysed by examining learners’ levels of agreement with each statement provided in the survey. This analysis utilised a five-point Likert scale, ranging from “Strongly Disagree” to “Strongly Agree,” allowing for a nuanced understanding of the participants’ attitudes and experiences regarding the DDL approach and the specific functionalities of the ‘Corpusmate’ tool.
Likert scale responses
The questionnaire employed a five-point Likert scale for each statement, where participants could indicate their level of agreement. The scale is presented in Table 1.
Questionnaire data collection and analysis
Learners were asked to respond to a series of statements regarding their experiences and perceptions of the DDL training and the ‘Corpusmate’ tool. These statements were designed to capture various dimensions of their learning experience, including engagement, usefulness, ease of use, and overall satisfaction. After collecting the responses, the questionnaire responses were checked for completeness. Each statement’s responses were analysed to assess the levels of agreement or disagreement. This involved calculating the percentage of participants who selected each response option on the Likert scale. For example, 58% of the participants agreed or strongly agreed that the ‘Corpusmate’ tool was useful in learning noun collocations, while only 11% disagreed or strongly disagreed. Such findings indicate a general consensus on certain aspects of the tool’s user-friendliness. The analysis also looked at statements with higher neutrality levels to identify areas where perceptions were less definitive, indicating potential areas for further training or tool improvement. This method allowed for a clear representation of the distribution of responses across the different levels of the Likert scale. The percentages were calculated for each statement to understand the overall sentiment of the learners.
Steps in the analysis
Data tabulation
All responses were first tabulated, categorising each response according to the five-point Likert scale. This step involved organising the data into a structured format suitable for percentage calculation.
Percentage computation
To standardise the data and allow for easier comparison across different statements, the raw counts were converted into percentages. This was done by dividing the number of responses in each category by the total number of respondents for that statement, then multiplying by 100. The formula used for this calculation is:
$$\rmPercentage=\left(\frac\rmNumber\,\rmof\,\rmrespones\,\rmin\,\rmcategory\rmTotal\,\rmnumber\,\rmof\,\rmrespondents\right)\times 100$$
Visualisation and interpretation
The calculated percentages were then visualised using bar charts to provide an easily interpretable overview of the data. This visualisation helped to quickly identify trends and patterns in the learners’ responses. For instance, a high percentage of “Agree” and “Strongly Agree” responses would indicate a positive perception of a particular aspect of the DDL training. Once the percentages were calculated and visually represented, the next step was to analyse patterns and trends in the data. This involved identifying statements with high levels of agreement or disagreement, as well as those with a significant proportion of neutral responses. For example, if a large percentage of respondents consistently selected “Agree” or “Strongly Agree” across multiple statements, this pattern would suggest a generally positive perception of the DDL training and the ‘Corpusmate’ tool. Conversely, a high percentage of “Neutral” responses could indicate areas where learners were undecided or had mixed feelings, suggesting potential areas for further investigation or improvement.
link