Alumni Directory

Deepan Babu
MS, Spring 2013

Home Dept: Computer Science

MS Dissertation title: Automated analysis of Learner's Research Article writing and feedback generation through Machine Learning and Natural Language Processing

MS Advisor(s): Stephen Gilbert and Jin Tian

Area of MS research: Teaching academic writing in English to native and non-native speakers is a challenging task. Quite a variety of computer-aided instruction tools have arisen in the form of Automated Writing Evaluation (AWE) systems to help students in this regard. This thesis describes my contribution towards the implementation of the Research Writing Tutor (RWT), an AWE tool that aids students with academic research writing by analyzing a learner's text at the discourse level. It offers tailored feedback after analysis based on discipline-aware corpora. At the core of RWT lie two different computational models built using machine learning algorithms to identify the rhetorical structure of a text. RWT extends previous research on a similar AWE tool, the Intelligent Academic Discourse Evaluator (IADE) (Cotos, Adviser-Chapelle, & Adviser-Douglas, 2010), designed to analyze articles at the move level of discourse. As a result of the present research, RWT analyzes further at the level of discourse steps, which are the granular communicative functions that constitute a particular move. Based on features extracted from a corpus of expert-annotated research article introductions, the learning algorithm classifies each sentence of a document with a particular rhetorical move and a step. Currently, RWT analyzes the introduction section of a research article, but this work generalizes to handle the other sections of an article, including Methods, Results and Discussion/Conclusion. This research describes RWT's unique software architecture for analyzing academic writing. This architecture consists of a database schema, a specific choice of classification features, our computational model training procedure, our approach to testing for performance evaluation, and finally the method of applying the models to a learner's writing sample. Experiments were done on the annotated corpus data to study the relation among the features and the rhetorical structure within the documents. Finally, I report the performance measures of our 23 computational models and their capability to identify rhetorical structure on user submitted writing. The final move classifier was trained using a total of 5828 unigrams and 11630 trigrams and performed at a maximum accuracy of 72.65%. Similarly, the step classifier was trained using a total of 27689 unigrams and 27160 trigrams and performed at a maximum accuracy of 72.01%. The revised architecture presented also led to increased speed of both training (a 9x speedup) and real-time performance (a 2x speedup). These performance rates are sufficient for satisfactory usage of RWT in the classroom. The overall goal of RWT is to empower students to write better by helping them consider writing as a series of rhetorical strategies to convey a functional meaning. This research will enable RWT to be deployed broadly into a wider spectrum of classrooms.

Employment upon graduating:

<---- Back to search results