210 likes | 453 Views
Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing & Evaluation. Tbilisi, Georgia, September, 2007. Merits of AES. Psychometric Objectivity & standardization Logistic Saves time & money Allows for immediate reporting of scores Didactic
E N D
Introduction to:Automated Essay Scoring (AES)Anat Ben-Simon National Institute for Testing & Evaluation Tbilisi, Georgia, September, 2007
Merits of AES • Psychometric • Objectivity & standardization • Logistic • Saves time & money • Allows for immediate reporting of scores • Didactic • Immediate diagnostic feedback
AES - How does it work? • Humans rate sample of essays • Computer extracts relevant text features • Computer generates model to predict human scores • Computer applies prediction model to score new essays
AES – Model Determination Feature determination • Text driven– empirically based quantitative (computational) variables • Theoretically driven Weight determination • Empirically based • Theoretically based
Surface variables Essay length Av. word / sentence length Variability of sentence length Av. word frequency Word similarity to prototype essays Style errors (e.g.,repetitious words, very long sentences) NLP based variables The number of “discourse” elements Word complexity (e.g., ratio of different content words to total no. of words) Style errors (e.g.,passive sentences) AES - Examples of Text Features
AES: Commercially Available Systems • Project Essay Grade (PEG) • Intelligent Essay Assessor (IEA) • Intellimetric • e-rater
PEG (Project Essay Grade) Scoring Method • Uses NLP tools (grammar checkers, part-of-speech taggers) as well as surface variables • Typical scoring model uses 30-40 features • Features are combined to produce a scoring model through multiple regression Score Dimensions • Content, Organization, Style, Mechanics, Creativity
Intelligent Essay Assessor Scoring Method • Focuses primarily on the evaluation of content • Based on Latent Semantic Analysis (LSA) • Based on a well-articulated theory of knowledge acquisition and representation • Features combined through hierarchical multiple regression Score Dimensions • Content, Style, Mechanics
Intellimetric Scoring Method • “Brain-based” or “mind-based” model of information processing and understanding • Appears to draw more on artificial intelligence, neural net, and computational linguistic traditions than on theoretical models of writing • Uses close to 500 features Score Dimensions • Content, Creativity, Style, Mechanics, Organization
E-rater v2 Scoring Method • Based on natural language processing and statistical methods • Uses a fixed set of 12 features that reflect good writing • Features are combined using hierarchical multiple regression Score Dimensions • Grammar, usage, mechanics, and style • Organization and development • Topical analysis (content) • Word complexity • Essay length
Reliability Studies Studies comparing inter-rater agreement to computer-rater agreement
AES: Validity Issues • To what extent are the text features used by AES programs valid measures of writing skills? • To what extent is the AES inappropriately sensitive to irrelevant features and insensitive to relevant ones? • Are human grades an optimal criterion? • Which external criteria should be used for validation? • What are the wash-back effects (consequential validity)?
Weighting Human & computer Scores • Automated scoring used only as a quality control (QC) check • Automated scoring and human scoring • Human scoring used only as a QC check
AES: To use or not to use? • Are the essays written by hand or composed on computer? • Is there enough volume to make AES cost-effective? • Will students, teachers, and other key constituencies accept automated scoring?
Criticism and Reservations • Insensitive to some important features relevant to good writing • Fail to identify and appreciate unique writing styles and creativity • Susceptible to construct-irrelevant variance • May encourage writing for the computer as opposed to writing for people
How to choose a program? • Does the system work in a way you can defend? • Is there a credible research base supporting the use of the system for your particular purpose? • What are the practical implications of using the system? • How will the use of the system affect students, teachers, and other key constituencies?