Skoufaki, S. (2009) ‘An exploratory application of Rhetorical Structure Theory to detect coherence errors in L2 English writing: possible implications for Automated Writing Evaluation software.’ International Journal of Computational Linguis

Please download to get full document.

View again

of 24
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Skoufaki, S. (2009) ‘An exploratory application of Rhetorical Structure Theory to detect coherence errors in L2 English writing: possible implications for Automated Writing Evaluation software.’ International Journal of Computational Linguistics and
    Computational Linguistics and Chinese Language ProcessingVol. 14, No. 2, June 2009, pp. 181-204 181 © The Association for Computational Linguistics and Chinese Language Processing [Received August 10, 2009; Revised December 6, 2009; Accepted December 22, 2009] An Exploratory Application of Rhetorical StructureTheory to Detect Coherence Errors in L2 EnglishWriting: Possible Implications for Automated WritingEvaluation Software Sophia Skoufaki ∗   Abstract This paper presents an initial attempt to examine whether Rhetorical StructureTheory (RST) (Mann & Thompson, 1988) can be fruitfully applied to the detectionof the coherence errors made by Taiwanese low-intermediate learners of English.This investigation is considered warranted for three reasons. First, other methods forbottom-up coherence analysis have proved ineffective (e.g., Watson Todd et al. ,2007). Second, this research provides a preliminary categorization of the coherenceerrors made by first language (L1) Chinese learners of English. Third, secondlanguage discourse errors in general have received little attention in appliedlinguistic research. The data are 45 written samples from the LTTC English LearnerCorpus, a Taiwanese learner corpus of English currently under construction. Therationale of this study is that diagrams which violate some of the rules of RSTdiagram formation will point to coherence errors. No reliability test has beenconducted since this work is at an initial stage. Therefore, this study is exploratoryand results are preliminary. Results are discussed in terms of the practicality of usingthis method to detect coherence errors, their possible consequences about claims fora typical inductive content order in the writing of L1 Chinese learners of English,and their potential implications for Automated Writing Evaluation (AWE) software,since discourse organization is one of the essay characteristics assessed by thissoftware. In particular, the extent to which the kinds of errors detected through theRST analysis match those located by Criterion (Burstein, Chodorow, & Leachock,2004), a well-known AWE software by Educational Testing Service (ETS), isdiscussed. ∗ Graduate Institute of Linguistics, National Taiwan UniversityE-mail:   182 Sophia Skoufaki Keywords: Automated Writing Evaluation, Discourse Organization, CoherenceErrors, Rhetorical Structure Theory. 1.   Introduction Research findings indicate that English language learners produce various kinds of discourseerrors in their writing, such as inductive patterns (e.g., Kaplan, 1966) and inappropriatecoordination (e.g., Soter, 1988). However, the discourse errors of second language (L2) learnersof English have not been examined in detail partly because at least some of them are moredifficult to detect than other kinds of errors (e.g., syntactic, spelling). This paper describes aninitial attempt to examine whether Rhetorical Structure Theory (RST) (Mann & Thompson,1988) can be fruitfully applied to the detection of the coherence errors made by Taiwaneselow-intermediate learners of English. In particular, this paper reports on a pilot study where 45written samples from the LTTC English Learner Corpus, a Taiwanese learner corpus of Englishcurrently under construction, were analysed according to RST. It is hoped that this pilot studywill provide some preliminary indication of the viability of this approach to coherence errordetection.The results of this analysis will also serve as a preliminary list of coherence errors whichmay prove typical or not in further large-scale studies of this kind. A categorization of secondlanguage (L2) English coherence errors in general and of the coherence errors of particularlearner populations has not been provided yet by applied linguists. Therefore, this pilot study iswarranted because of its possible utility for research on English L2 discourse and the instructionof writing in English as an L2.Another aim of this study is to examine whether the most frequent of the errors detectedthrough the RST analysis can be located by Criterion , a well-known AWE software by theEducational Testing Service (ETS). Automated Writing Evaluation (AWE) software such as Criterion (e.g., Burstein, Chodorow, & Leachock, 2004) and  My Access! (e.g., VantageLearning, 2007) locate and give diagnostic feedback only for a limited number of discourseerrors. This issue has been pointed out by the computational linguists involved in the creation of AWE software (e.g., Higgins, Burstein, Marcu, & Gentile, 2004), but no study has beenconducted with specific English learner populations to examine what discourse errors should beadded to the inventory of discourse errors currently located via AWE software. Being a pilotstudy, the study reported here does not purport to fill this research gap but only to provide aninitial step towards this goal.In the following two sections, this paper will offer further information on the motivation of this study. Then, it will offer some background information on RST. Third, it will provide anoverview of the LTTC English Learner Corpus and will describe the data and method of thestudy. Fourth, it will describe findings from a qualitative and quantitative perspective. Fifth,    An Exploratory Application of Rhetorical Structure Theory to Detect Coherence Errors 183in L2 English Writing: Possible Implications for Automated Writing Evaluation Software these results will be discussed in relation to a) whether RST analysis seems a viable method forcoherence error detection, b) which factors seem to affect the coherence errors located in thedata, c) whether results indicate inductive order patterns and d) how much they overlap with thecoherence errors that can be located via Criterion . The paper will end with a summary of conclusions and directions for future research. 2.   RST and Discourse Coherence Error Detection It is difficult to reliably identify coherence errors because readers of the same text may formdifferent interpretations of the coherence relations among elements of the text (Mann &Thompson, 1988). Therefore, a bottom-up method of coherence error detection should be usedso that coherence errors will be identified as reliably and objectively as possible.RST was chosen first because the output of other methods of locating coherence breaks,such as topical structure analysis and genre analysis in Watson Todd et al. (2007), has beenshown to have little relationship with English teachers’ judgments. Second, strong correlationshave been found between RST analyses which show that a text is coherent and subjective judgments that a text is coherent (Taboada & Mann, 2006a). Finally, RST has not been appliedto the location of coherence errors (Higgins, Burstein, Marcu, & Gentile, 2004: 185), so anevaluation of its application for this purpose is interesting from a methodological perspective. 3.   Discourse Coherence and L1 Chinese Learners of English Given the paucity of discourse error tagging in learner corpora (Díaz-Negrillo &Fernández-Domínguez, 2006) and the sparse research on discourse errors by learners of English,this pilot study aims to provide a preliminary categorization of discourse errors in the writing of low-intermediate Taiwanese learners of English. This list of errors will be supplemented andrefined through further research.L1 Chinese learners of English make similar discourse errors to learners with other nativetongues, but there have also been claims for typical L1 Chinese errors. However, these claimshave not been examined sufficiently through quantitative methods. Therefore, the pilot studyreported in this paper also partly functions as a preliminary quantitative test for one of theseclaims. This claim is that the paragraphs and essays of L1 Chinese learners of L2 English havean inductive rather than deductive order. It has been claimed that these learners present the mainpoint of their writing only at the end of a paragraph or essay, whereas in L1 English writing themain point is presented first (e.g., Kaplan, 1966; Matalene, 1985).The claim for the use of an inductive order only by L1 Chinese learners of English (and notby native speakers of English) has been challenged. For example, Scollon and Scollon (1995)used ethnomethodology to show that inductive and deductive patterns both exist in the speech of    184 Sophia Skoufaki both native speakers of English and native speakers of Chinese. The only difference between thetwo languages is that these patterns are used for different pragmatic purposes. However, theiranalysis relates only to spoken discourse, so one cannot draw any conclusions about theexistence of inductive patterns in written native English. This research gap is filled by Chen(2008). In a quantitative study, he found, among other things, that the minority of the nativespeakers of English preferred essays written with an inductive rather than deductive pattern andnearly half of them preferred paragraphs written in an inductive rather than a deductive order.This finding indicates that inductive patters can be used in written English but they are moreacceptable in paragraphs rather than in essays. Finally Mohan and Lo (1985) review Chinesewriting textbooks and analyse Classical Chinese texts to show that the deductive pattern is themost usual and prescribed essay writing pattern in Chinese 1 .From a theoretical perspective, if the RST analysis of the texts in the pilot study can pointto instances of inductive order, the controversial issue of whether the English discourse of L1Chinese learners is characterized by inductive order will be able to be examined in more detailin later research. Moreover, if the present study indicates that inductive-order errors occurfrequently in the data, this may be seen as a preliminary indication that AWE software shouldtry to detect and categorize as errors cases of inductive content order. 4.   Discourse Errors and Criterion The pilot study reported here is also motivated by one of the criticisms made about AWEsoftware, that is, that the effectiveness of AWE software should not be tested only through “ a posteriori statistical validation” but also through an “ a priori investigation of what should  beelicited by the test before its actual administration” (Weir, 2005: 17). In other words, high levelsof agreement in the grades assigned to essays between human judges and software should not bethe only criterion for software evaluation; the kinds of errors which are located by softwareshould also match those located by human judges. Such concerns are warranted for practicalreasons as well, since it has been shown that learners can fool AWE software, that is, they canget high scores although the content of their essays is inadequate (Herrington & Moran, 2001;Powers, Burstein, Chodorow, Fowles, & Kukich, 2002; Ware, 2005). Therefore, if AWEsoftware is designed so as to locate the errors that a human judge would locate, wrong essay 1 Controversy also exists over the cause of inductive patterns whenever they are found in the writing of L1Chinese learners of English. For example, one possible reason is the influence from L1 rhetorical structure,as contrastive rhetoric theorists claim (eg., Chen, 2001; Kaplan, 1966; Matalene, 1985). Another is thelack of relevant or useful feedback and instruction from teachers (e.g., Gonzales, Chen, & Sanchez, 2001;Mohan & Lo, 1985). Yet another possible reason is the inability to properly structure an essay not only inthe L2 but also in the L1 because one has not reached the right developmental stage in his/her writingability (e.g., Mohan & Lo, 1985).      An Exploratory Application of Rhetorical Structure Theory to Detect Coherence Errors 185in L2 English Writing: Possible Implications for Automated Writing Evaluation Software evaluations will be prevented.To examine this issue, this section of the paper will summarize the kinds of discourseerrors located by Criterion 2 . Then, section 9.3 will compare them with the discourse errorslocated through the RST analysis in the present study. The rationale is that any discrepanciesbetween the two lists of errors should warrant large-scale empirical work testing whether thesediscrepancies really exist.The main discourse errors which are located by Criterion are those of absence orinsufficient number of discourse structures considered necessary in expository andargumentative essays, which are the input of this software. This is a valuable feature because noother AWE software has it (Burstein, 2009: 15). These structures are introductory informationwhich forms the background for the rest of the essay (‘Introductory Material’), the statementwhich expresses the opinion of the writer (‘Thesis Statement’), the main point(s) made by thewriter (‘Main Point’), the statement(s) which support(s) each main point (‘Support’), and theconclusion (‘Conclusion’). For example, if a learner has not included a thesis statement in his orher essay, the software is likely to locate this error and inform the learner about it.Apart from the aforementioned discourse-structure tags, the creators of  Criterion hadinitially used separate tags for cases where learners had written a title for their essay, foropening and closing salutations in essays in letter format, and for content which could not betagged with any of the other tags. These tags occurred infrequently, so such cases were lumpedunder the tag ‘Other’ (Burstein, 2009: 15; Bustein, Marcu, & Knight, 2003: 33). However, thispractice obscures the number of times when the software could not categorize structures throughany of the existing labels. This problem could be important because perhaps structures could notbe labeled by the software because they violated the usual order of discourse structures (that is,Introductory Material, Thesis, Main Points, Conclusion), an error which should occur wheneverinformation is ordered unusually in an essay. This possibility is likely because in Criterion oneof the modules used to identify the discourse structures in essays is the ‘global language model’.It predicts the sequence of discourse elements in an essay by seeing how well the predictionswhich stem from a ‘local language model’ - which predicts which discourse structure is likely toappear after two sections which have already been tagged as specific kinds of discoursestructures - fit a final-state grammar manually created by the software creators (Burstein, Marcu,& Knight, 2003: 36).As we have seen in section 3 of this paper, inductive, rather than deductive, content orderhas been claimed to characterize the writing of Chinese L1 learners of English; therefore, the 2   Criterion , rather than  My Access! , was chosen because the research reports on the latter do not giveenough information about its workings for its discourse organization evaluation function to beassessable.
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!