Growth in Grammar

Growth in Grammar was a three-year project studying how English children’s written language develops as they progress through their school careers. We hope that this research will provide new understandings of writing development that will inform teaching and curriculum design.

The Growth in Grammar project aimed to further our understanding of how the language of children’s writing develops as they progress through the education system in England, from ages six to sixteen.

We collected 2,898 texts from 983 children in 24 schools and used a number of computer-assisted methods to understand differences in the use of grammar and vocabulary across year groups and text types.

An accessible description of our findings and links to the full downloadable corpus can be found at gigcorpus.com (registration required - please contact Philip Durrant for access - p.l.durrant@exeter.ac.uk).

Introduction

The Growth in Grammar Corpus is a collection of texts written by children at schools in England as part of their regular school work. This page describes the process of text collection, transcription and annotation and summarizes the contents of the corpus. 

The full corpus can be accessed at gigcorpus.com (registration required - contact Phil Durrant for access details p.l.durrant@exeter.ac.uk).

Corpus procedures

Collecting the corpus

Our research team contacted schools from across the country, briefing them on the project and inviting them to participate. All writing was obtained subject to the students’ voluntary informed consent, with additional consent obtained from the head teacher, the relevant subject teachers, and the students’ legal guardians.

Teachers collected texts from participating students and either photocopied these texts and mailed them to us or invited us into their schools to make photocopies ourselves.

Transcription

All of the texts were received in hand-written form so we employed a small team of transcribers to type them up. Transcribers received two days of training and worked closely with a member of the core project team to deal with issues that arose during the process.

Transcription proceded in two phases. In the first phase, each transcriber was assigned a set of photocopies to type up, in accordance with our transcription conventions. They were also asked to make two types of change to the original texts: 1) replace any proper names which might compromise participants’ or institutions’ anonymity with anonymisation markers; 2) where a word had been mis-spelled, contained erroneous capitalization or an abbreviation, insert a tag recording both the original form and a ‘correction’ with the correct spelling/capitalization/expanded form of the abbreviation.

In the second phase, each transcriber was assigned texts which had originally been transcribed by someone else. They both reviewed the original transcription for accuracy and added annotations related to punctuation and grammar.

Linguistic Annotation

The conventions set out above describe the ‘basic’ version of the corpus. For the purposes of analysis, further versions were created incorporating different types of additional linguistic information.

Part-of-speech-tagged corpus

We used the CLAWStagger to automatically add information about the part-of-speech of each word in the corpus. To achieve more accurate classifications, prior to tagging, misspelled words were corrected and unclear/illegible material removed. Material appearing inside tables was also removed.

Syntactically-tagged corpus

The corpus was tagged with syntactic information in two ways. First, the entire corpus was tagged for part-of-speech and grammatical relations using the Stanford Core NLPsuit of tools (as with the part-of-speech tagging, misspellings were corrected and unclear/illegible material and tables were removed prior to parsing).

Second, a subset of the corpus was manually tagged by a team of trained annotators. This analysis focused specifically on tagging syntactic elements within noun phrases and subordinate clauses. Read about the Procedures and conventions used in this process. The hand-parsed version of the corpus is available upon request. Please contact Phil Durrant (p.l.durrant@exeter.ac.uk) for more information.

Corpus contents

The Growth in Grammar corpus comprises nearly 3,000 texts, written by 983 children in 24 different schools. See Summary of corpus contents for quantitative summaries of the corpus contents. See Corpus metadata for metadata describing the full contents of the corpus in detail.

Our primary points of data collection were years 2, 6, 9 and 11. We were also sent some texts from year 4, which are included as a supplement to the main corpus.

Publications

P. Durrant, M. Brenchley & L. McCallum. (forthcoming). Understanding development and proficiency in writing: quantitative corpus linguistic approaches. Cambridge: Cambridge University Press.

P. Durrant & M. Brenchley (forthcoming). Development of vocabulary sophistication across genres in English children's writing. Reading and Writing.

P. Durrant & M. Brenchley (forthcoming). The development of academic collocations in children’s writing. In P. Szudarski & S. Barclay (Eds). Vocabulary Theory, Patterning and Teaching. Bristol: Multilingual Matters.

P. Durrant & M. Brenchley (forthcoming). Corpus research on the development of children's writing in L1 English. In A. Glaznieks, A. Abel, V. Lyding & V. Nicolas (Eds), Corpora and Language in Use. Proceedings of the Learner Corpus Research Conference,2017. Louvain: Presses Universitaires de Lovain.

Presentations 

Lexical development in English school children's writing from six to sixteen. American Association of Corpus Linguistics. Georgia State University, USA, September 2018.

Lexical development in English school children's writing from six to sixteen. BAAL Vocab SIG. University College London, UK, July 2018. 

Lexical development in English school children's writing from six to sixteen. IVACS International Biennial Conference. University of Malta, Malta, June 2018.

Corpus research on the development of children’s school writing. Plenary talk at Fifth International Conference on Writing Analytics: Data Mining and the Teaching of Writing. University of Southern Florida, USA. January 2018.

Corpus research on the development of children’s writing in L1 English. Plenary talk as Learner Corpus Research Conference. Bozen/Bolzano, Italy. October 2017.

Language development in children's writing from six to sixteen. Invited talk at Department of Education, University of Oxford, UK, June 2018.

Investigating linguistic development in children’s writing: The Growth in Grammar project. Invited talk at School of English Studies, University of Nottingham, UK, March 2017.

Development in L1 Written Vocabulary between 6 and 14. BAAL Vocabulary SIG, Reading, July 2017

The Grammar of School-Age Writing : What do we Already (Think we Maybe (Don't)) Know? Exeter Staff-Student Conference, March 2017

Growing Grammar - Mapping the Dimensions. UCL Symposium, February 2017

The GiG Corpus: Corpus Linguistics Progress goes 'boink' Trondheim Symposium, January 2017

Growth in Grammar: A multidimensional analysis of student writing between 5 and 16. International Conference on Writing Analytics. University of Southern Florida, USA, January 2017.

The GiG corpus: On working with children (but not animals) University of Pennsylvania, September 2016

The grammar of school-age writing development. University of Pennsylvania, September 2016

Growth in Grammar: A multi-dimensional analysis of student writing between 5 and 16. Invited talk at Faculty of Education, University of Hong Kong, Hong Kong, May 2016.

Growth in Grammar: A multi-dimensional analysis of student writing between 5 and 16. The 37thICAME Conference. The Chinese University of Hong Kong, China, May 2016.

Core team

Professor Debra Myhill Debra’s research interests centre upon: composing processes in writing; the role of grammar and metalinguistic understanding in writing; the relationship between talk and writing.
Dr Philip Durrant The majority of Philip's research uses corpus-linguistic methods to study the language of academic writing, both at school and university levels. He also has ongoing interests in language testing and in the learning and use of formulaic language.
Mark Brenchley Mark's main interest lies in the nature of syntactic knowledge, focusing on its acquisition and later development within a wider framework of “communicative competence”, and with a particular emphasis on the use of corpus-based analytical techniques to better understand this knowledge. He is also interested in the general relationship between language and education, together with the possibility of helping develop what might be termed a genuinely "educational" theory of language.

News

Biggest ever archive of children's writing created to help experts assess language skills