470 likes | 790 Views
Corpus-based approaches for research students: Using student-made corpora to promote autonomous learning. Margaret Cargill, Michelle Picard and Cally Guerin 25 November 2011. Corpus linguistics approaches. Corpus: a body of text selected for analysis using appropriate software
E N D
Corpus-based approaches for research students: Using student-made corpora to promote autonomous learning Margaret Cargill, Michelle Picard and Cally Guerin 25 November 2011
Corpus linguistics approaches • Corpus: a body of text selected for analysis using appropriate software • The software: a concordancer • Available as web-based applications • e.g. Springer Exemplar www.springerexemplar.com • or stand-alone programs, e.g. • www.adelaide.edu.au/red/adtat/
Concordancing • A concordancer is software that searches a group of texts (a corpus) for all examples of a particular item. • It displays the results as lines of text across the screen for easy comparison • Results can be sorted according to what is on the left or the right of the search item • This can provide data to ‘drive’ language learning and improve written texts
An example of concordancing output to utilise existing available soil water, unlike the perennial gr es (4 g oven dry wt basis) of soil were weighed into 40 ml polypr required 9 kg P/ha, whereas a soil with a high P sorption capacit concentration by 1 mg/kg on a soil with a low P sorption capacity 00, it was expected that this soil would have consistently been t capacity (PBC), which is the soil's capacity to moderate changes and buffering capacity of the soil-an attempt to test Schofield's nisms that are present in the soil-plant microcosm environment. T etermined in a growth-chamber soil-plant microcosm study. Nodding 84) Lime and phosphate in the soil-plant system. Advances in Agro a where crops rely heavily on soil-stored water accrued in summer ns Between Herbicides and the Soil. Academic Press, London, pp. 2 fertility on these particular soils. Although this aberration has over in a range of allophanicsoils amended with 14Clabelled gluc alues for 9 different pasture soils, 6 and 12 months after P fert
Data-driven language learning • Very relevant for discipline-specific English and usage conventions • Empowers users to research their own language issues • Appeals particularly to research students • Hypothesised to contribute to autonomous learning approaches
Making your own corpus • In order to make this tool optimally effective for writing/self-editing, you need to (find or) make a corpus that is specific to the task at hand • Texts need to be in.txt files • ‘Save as’ .txt from .doc, .html or .pdf files • ‘Cleanup’ may be needed after saving
Using a self-made corpus? • A simple concordancer called AdTAT (Adelaide Text Analysis Tool) is available at http://www.adelaide.edu.au/red/adtat/ • Recently developed at the University of Adelaide and made available freely for use • Designed for authors focused on writing texts, not for linguistic researchers
Roundtable structure • Margaret: Self-made and online corpora in an EFL ESP context • Michelle: Use with Turn-it-in to investigate intertextuality • Cally: • Discussion and questions • Uptake options within ALL
Self-made and online corpora in an EFL ESP context • China Academy of Engineering Physics, Mianyang, Sichuan • Consecutive 5-day workshops, 40+ participants (5 to date) • Mixed disciplines within engineering physics • Working researchers, some without higher degrees, English level variable • Strong and increasing pressure to publish in English
Resulting workshop design features • Emphasis required on listening, speaking, reading and writing, but all integrated with publication focus • Day 1, afternoon: ‘Developing discipline-specific English writing skills’ • Re-using language vs plagiarism (sentence templates) • Noun phrases and articles • Identifying vocabulary for learning → concordancing
“Selecting noun phrases to learn • Extending vocabulary is an ongoing need for EAL scientist authors • One effective way to select vocabulary to learn is to use a word frequency list from your own discipline. • Such a word list can be created by using concordancing software to search a collection of discipline-specific texts such as research articles. • These text collections are called corpora (sing. =corpus)”
To make a Frequency list • Open AdTAT • Load a corpus • From top menu bar select Corpus, Word frequency • Resulting screen lists all words in the corpus in order of frequency of use
Needed: a discipline-specific corpus • To demonstrate effectiveness, I organise preparation of a CAEP corpus • Participants are asked to each prepare a single file for homework
“Homework: Building a CAEP corpus • Each participant will prepare one research article for the corpus tonight • Prepared files should be sent as an email attachment to 刘希 <evey324@qq.com> • Include full bibliographic details of the article (the full reference) in the body of the email”
“Preparing text for a corpus • Select articles written by native-speakers of English wherever possible. • Texts in a corpus for concordancing must be stored as plain text files (.txt). • Remove un-needed parts before saving: biodata, keywords, tables, figures, reference list, acknowledgements. • An easy way to do this is to download your selected articles as .html files, remove unwanted parts and ‘save as’ .txt • If you can only get .pdf versions, use ‘Save as text’ option – OR copy desired text one page/ column at a time into a word document, correct spacing, and save as .txt • Label the file with name of journal, first author and year of paper”
Issues re student preparation of texts for corpora • Following directions! • Selection (especially author language status) • Cleaning of text (headers, footers, page numbers/labels, figures/tables, author affiliations, etc. • Provision of reference data for record keeping purposes • Consistent file labelling
My response • Use files as received • Conduct a search every time an issue arises in student drafts that can be addressed this way • When anomalies occur in search outcomes, point out the link to inappropriate corpus preparation • Use Exemplar as a comparison
Demonstration/discussion • I will load into AdTAT the 3 corpora made by CAEP participants in 2010 and 2011 • Together we will run some searches to see • What questions we can answer • What anomalies we may find in the corpus
Types of searches: Collocations • Sort left to find verbs or adjectives that go with a search noun (e.g. reason) or adverbs that go with a search verb (vary) • Sort right to find prepositions that go with a search noun (role) or verb (compare)
Types of searches: Usage conventions • Search for we to see if it is used in the genre of interest • Search for also to see if it appears at the start of a sentence (i.e. with a capital letter – AdTAT is case-sensitive) • bioinformatical or bioinformatic? • No examples of -al in self-made corpus – but strong supervisor preference • Check Exemplar site
“Springer Exemplar • Go to www.springerexemplar.com • Choose to search a field or a journal • It is web-based, so no software download • BUT, you cannot • sort output to answer your own questions • see more context than a few words • know if the English is native-speaker or not • These problems are solved with AdTAT”
Uses of the Exemplar site • When your discipline-specific corpus has no or too few examples of a term • When you do not have an appropriate corpus to search • When you want to compare usage more widely than your own small corpus allows but still specific to the discipline • e.g. evolvement (geology); bioinformatical (plant science)
Other web concordancers: Uses • See Virtual Language Centre Web Concordancer at http://www.edict.biz/concordance/WWWConcappE.htm • Allows choice of corpora to search, including the Brown Corpus of US general English • To demonstrate differences between discipline-specific and general English usage • e.g. And at start of sentence, or use of contractions ending in n’t
US general English But a similar AdTAT search of a corpus from the New Phytologist journal (plant science, impact factor over 5) finds 0 examples …
To summarise… • High potential usefulness of both self-made and online corpora, especially where ‘native-speaker’ models are lacking • Labour of constructing a corpus can be seen as a disincentive to use of self-made corpora • Hands-on demonstration to address students’ actual errors can • help overcome this perception, and • provide needed training in search construction
You can use the same text as long as youcite We all use the same words The research reversal?
‘Obligatory intertextuality’ • Document structure intertextuality • Engaging with the literature • Co-authored texts • Discipline-specific language (Eira, 2007)
Researcher Education & Development Adelaide Graduate Centre Explicit instruction • Take focussed notes • Separate the “English” from the “science/ content” • Assemble notes related to topics • “Story” each paragraph in dot-points • Identifying acceptably recyclable text
Researcher Education & Development Adelaide Graduate Centre Intelligently reading reports
Too original: Hamda For students who are producing unidiomatic awkward, non-standard usage) and/or non-academic English • Step 1 Concordancer • (focussing on collocations, standard phrases) • Step 2 Text-matching to check for originality • Unrelated matches • Standard strings • Sentence templates • Some discipline-specific language
Not original: Weimin For students who are patchwriting with little understanding of referencing and citation conventions • Step 1 Training in the use of bibliographic software • Step 2 Instruction in note-taking & organising writing • Step 3 Text-matching to check for originality • Step 4 Concordancer (focussing on general usage and discipline- specific language) • Phrases commonly used in general academic English • Discipline-specific language • Unacceptable recycling
Researcher Education & Development Adelaide Graduate Centre Refining authorial voice: Liang • For students who have a good grasp of both citation conventions and reasonably high-level English expression. • Step 1 Text-matching to check for originality • Step 2 Concordancer (focussing on general usage and discipline-specific language) • Step 3 General Google Scholar search - Matches with unrelated student writing • Discipline-specific language • Springer Exemplar
Concrete Outcomes • Increased participants’ understanding of acceptable and unacceptable intertextuality • Enhanced participants’ knowledge of disciplinary language • Developed participants’ autonomy through stages of a reflective process, enabling them to “critically review [their] suppositions of subject discipline and existing knowledge” (Chan et al. 2002:515)
Your thoughts • How much instruction is required from ALL staff? Will this process promote autonomy? • Difficulties in implementation in your own situation? • Other uses for the software?