From Discussions VOL. 9 NO. 2
Timed Writing Assessment as a Measure of Writing Ability: A Qualitative Study
Throughout the American education system, the assessment of writing skill and general academic performance through timed essay examinations has become increasingly pervasive, contributing to the determination of grades and course placements and ultimately affecting college admissions through their use in standardized tests. In March 2005, the College Board introduced a new writing section for the SAT that incorporates a 25-minute impromptu essay component, as well as traditional multiple-choice questions on grammar and usage (Hass; “SAT Test Sections”). Likewise, timed writing assessment holds a prominent position in the ACT, which features an optional 30-minute essay section that is mandatory for students applying to some institutions, and in the College Board’s Advanced Placement program, whose English Literature examination requires three essays written over a two-hour period (“The ACT Plus Writing”; “English Literature: The Exam”). As Nancy Hass reports in the New York Times, the introduction of timed writing in the SAT has generated substantial public controversy, with many colleges deciding not to consider the essay scores in the admissions process. At the same time, a number of universities have elected to utilize the essay section results, not only for admissions, but also for the determination of placement in composition courses, sometimes provoking passionate opposition from their own writing faculty members (Isaacs and Molloy 518-20).
Employing the SAT essay section as an illustration of the debate surrounding timed essay examinations, this paper seeks to investigate the accuracy, instructional usefulness, and social implications of the widespread use of timed writing assessment as a measure of writing ability at the high school and collegiate levels. To supplement a review of the published literature, this study integrates material from interviews conducted by the author with five experienced instructors in composition and literature programs at Stanford University and the University of California (UC), Davis. Both in standardized examinations and in the classroom setting, timed writing assessment can offer a rough and imprecise, but most often fairly accurate, prediction of a student’s performance on longer, traditional writing assignments. Nevertheless, as this paper will attempt to demonstrate, the imposition of severe time constraints induces an altogether different mode of writing, called an “assessment genre” by one of the instructors, that renders questionable the comparability of the writing skills displayed in timed and untimed contexts. Given this finding, teachers and institutional administrators should carefully consider the potentially objectionable social values and attitudes toward writing communicated by the choice of timed writing as an assessment technique, especially when used to identify excellence rather than to certify basic competence.
In recent decades, the accuracy and appropriateness of timed writing assessment as a measure of writing ability have been subject to progressively rising doubts from English instructors and scholars of composition. As Kathleen Yancey discusses in an article on the history of writing assessment, in the period between 1950 and 1970, the evaluation of writing was frequently conducted through ‘objective,’ multiple-choice tests on grammar and vocabulary (485). In the 1970s, trends in standardized writing assessment gradually shifted to the holistically scored essay test, and by the 1980s and 1990s, many universities had again changed their evaluation methods to adopt broader, multi-element writing portfolios for such purposes as assigning course placement (Yancey 487-94). Yancey observes that beneath these fluctuations were evolving views on the relative importance of reliability and validity, the two central concepts of psychometric theory. Reliability is associated with objective assessments, while validity is associated with ‘direct’ tests of writing skill that provide samples of actual writing (487-94). An assessment is reliable to the extent that it yields consistent scores across readers and test administrations, while it is valid insofar as it successfully measures the particular characteristic, in this case writing ability, that it purports to measure (Yancey 487; Moss 6).
Current scholarship opposing the use of timed writing assessment continues to voice the same concerns about the validity of essay tests originally raised by those advocating the introduction of portfolios. For example, in her 1991 paper encouraging the further spread of the newly developed portfolio assessment method, Sarah Freedman argues that timed essay examinations involve “unnatural” writing conditions, with students producing work that has no function except for external evaluation, on topics that may not interest them (3). Similarly, as Ann Del Principe and Jeanine Graziano-King contend in their 2008 article, the testing environment created by timed writing assessment undermines the authenticity of essay examinations because it inhibits the complex processes of thinking and articulation that enable students to produce quality writing (297-98). Of course, many more specific arguments can be adduced under the general heading of authenticity in assessments. For Kathy Albertson and Mary Marwitz, the dangers of high- stakes standardized writing examinations are dramatically exemplified by students who seek security in uninspired, formulaic essays and students who unknowingly imperil their prospects of reader approval by engaging with challenging topics through more sophisticated pieces that they lack the time to complete (146-49). Finally, in an inversion of the usual call for more valid assessment techniques, Peter Cooper reprises the common contention that a single writing sample on a particular topic cannot fully represent a student’s abilities, only to present this consideration as evidence in support of multiple-choice writing tests (25).
In defense of timed writing assessment, noted composition theorist Edward White invokes the historical attractions of objective tests of writing, which remain in use in a large proportion of colleges (32). As he writes in his widely referenced article “An Apologia for the Timed Impromptu Essay Test,” the scoring of timed essays provides significant cost savings relative to the labor- intensive evaluation of portfolios, allowing institutions that would otherwise employ even less expensive multiple- choice assessments to include some form of direct writing evaluation in reviewing student performance (43-44). He also remarks on the utility of timed in-class writing in preventing plagiarism and in helping students to focus on the task of composition (34-36). Importantly, in response to concerns about a lack of opportunities for revision, White asserts that, even though an impromptu essay test may encourage first- draft writing, a first draft still constitutes a form of writing: thus the use of timed writing rather than multiple-choice tests emphasizes the value of writing (35-38). Extending this reasoning further, Marie Lederman argues that, despite the rightful focus on revision and the writing process in the curriculum, only the final product of that process holds any significance or communicative potential for the reader, lending legitimacy to the product-oriented nature of timed writing assessment (40-42).
A second major line of thought in favor of timed essay tests, separate from the pragmatic and conceptual arguments surveyed above, relates to their empirical capacity to predict the future academic performance of students. College Board researchers claim that, of all the components of the SAT, the writing section most accurately predicts students’ grade point average in the first year of college, demonstrating its validity as an assessment instrument (Kobrin et al. 1, 5-6). A crucial point of weakness in this argument, however, is the idea that a strong correlation between scores and later academic success in isolation can show the validity of a given assessment. For as Rexford Brown insisted as early as 1978, in the course of his opposition to objective writing tests, the fact that parental income and education might also correlate with writing ability and predict performance in college does not mean that they should form the basis for judging students’ aptitude (qtd. in Yancey 490-91). Validity requires that an assessment measures what it is intended to measure, so the question remains whether timed writing examinations truly reflect writing ability. In order to explore this issue in greater detail, one can turn to the material gathered in interviews with Brenda Rinard, a member of the University Writing Program at UC Davis, and postdoctoral fellows Roland Hsu, Barbara Clayton, Patricia Slatin, and Jeffrey Schwegman, all teaching in Stanford’s Introduction to the Humanities (IHUM) program. Though all interviewees had experience with timed essay tests in their respective courses, it is a limitation of this study that the four participating IHUM fellows, unlike Dr. Rinard, were seeking to evaluate student examinations not explicitly in terms of writing quality but rather in terms of content. All of these instructors nevertheless offered valuable information while answering questions regarding the accuracy and social implications of timed writing assessment and their motivations for using it in their courses (see the Appendix).
The interviewees were first asked about the accuracy of timed writing assessment as a measure of writing ability, where the standard for writing skill is assumed to be students’ performance in producing traditional argumentative papers. All subjects reported that timed essay tests generally provided a fairly accurate indication of students’ writing ability as demonstrated in regular paper assignments, although they all mentioned some exceptions or variations in accuracy as well. In particular, Dr. Clayton stated that students who had previously submitted papers of lower quality would sometimes show a surprising level of proficiency on essay tests, perhaps on account of additional preparation for the examination. In contrast, Dr. Schwegman noted that the students most skilled in composing extended papers would not usually produce the highest-quality timed essays in the class. Dr. Rinard emphasized the adverse effects of the testing environment for students with test anxiety and students for whom English was not the first language. Interestingly, Dr. Slatin and Dr. Rinard both affirmed, when asked, that timed writing assessment could offer only an imprecise measurement of writing ability, one that would not accommodate fine distinctions in skill or provide for the display of the full range of variation in writing ability.1 Their observations agree in this respect with the conjecture of Leo Ruth and Sandra Murphy that “short, timed writing tests are likely to truncate severely the range of performance elicited,” as suggested by surveys indicating that more sophisticated writers often consider time allocations inadequate due to their use of a greater amount of time for planning their work (151-54).
At this point, given that timed writing assessment does not seem grossly inaccurate in evaluating broader writing skill, one might be inclined to accept White’s contention that the use of standardized essay tests is justified by their practical efficiency and the fact that they at least require first- draft writing. Once again, however, this conclusion can be warranted only by a demonstration of the validity of timed writing examinations in measuring the same sort of writing ability that manifests itself in regular paper assignments, not simply by a correlation between the two forms of writing. From this standpoint, the true importance of the notion that timed writing is first-draft writing becomes evident: it embodies the idea that timed writing is fundamentally similar to the writing involved in extended composition. Only if timed writing is sufficiently continuous with, and therefore comparable to, writing without such time constraints can the validity of timed writing assessment be maintained. Indeed, as Murphy notes, assessment specialist Roberta Camp has argued that standardized writing tests implicitly assume that timed, impromptu writing can be considered representative of writing in general and that writing involves a uniform set of skills regardless of its purpose or circumstances (Murphy 38). One of the criticisms offered byDr. Rinard challenges the core assumptions underlying the use of timed essays to determine writing ability. In particular, she believes that the timed writing on standardized examinations constitutes a distinct “assessment genre” with its own unique rhetorical situation, implying that judgments of writing skill obtained using timed writing may not be generalizable to writing in other contexts.
One must now resolve the question of whether the timed writing should be regarded as representative of all academic writing or should instead be classified as a narrow and artificial “assessment genre.” Insight on this topic is supplied by the other interviewees’ remarks on their motivations for employing timed essay examinations. With a notion of timed writing as essentially continuous with other forms of writing, one might expect that they would conceive of essay examinations as simply compressed versions of regular papers, assigned because they require less time to grade and offer greater protection against plagiarism. To the contrary, in fact, the four IHUM fellows tended not to express any of these practical motivations for using timed essay examinations. The exceptions to this trend were Dr. Hsu, who cited the necessity of ensuring that work submitted was a student’s own, and Dr. Schwegman, who briefly remarked on the issue of time available for grading, but even these two instructors spoke at length about other reasons for employing the essay test format. Dr. Hsu, for instance, contended that timed essays were useful for encouraging students to construct a “less developed synthesis” of the material, meaning, as he explained, that they would not be influenced by the interchange of ideas with the teacher or other students and would therefore need to “take ownership” of their work in a way not facilitated by traditional papers. On the other hand, Dr. Slatin emphasized the importance of timed essay examinations as another mode of evaluation different from longer paper assignments, contributing to the diversity of assessment measures and thus ensuring fairness to all students in grading. Likewise, Dr. Schwegman found his principal motivation in the idea of achieving fairness by employing a broad spectrum of assessment methods, each engaging a distinct skill set and a different type of ability. All of these perspectives on the utility of timed writing assessment crucially presuppose a fundamental dissimilarity between timed writing and the extended composition demanded by regular papers.
Moreover, a majority of the interviewees indicated that the writing produced on the essay tests that they had used generally failed by a large margin to satisfy the standards of a decent first draft for any other assignment. This finding, in addition to the previously developed suggestion of a divergence in the skills and processes involved in timed writing and other forms of writing, further challenges White’s assertion that timed writing should be regarded as first-draft writing. Dr. Schwegman, for instance, freely admitted that the writing submitted for final examinations was often “atrocious” in quality, and Dr. Clayton related that she would expect students to spend far more time than that permitted in essay examinations on the first draft of even a short paper. For a piece comparable to one on the AP tests, where a student might receive approximately 40 minutes per essay, Dr. Rinard estimated that a student might require anywhere from one to three hours to produce a draft of reasonable quality.Continued on Next Page »