How Do You Ensure Quality Scoring?

by learningspriral.com

What steps do you take to ensure accurate and consistent scoring of constructed-response items in large-scale assessment?

Most credible large-scale assessment programs continuously monitor for scoring accuracy and consistency during scoring sessions. Scoring accuracy is often measured through the use of validity items. Prior to the start of a scoring session, validity items are expertly scored, to derive their “true” scores. These validity items are seeded into the flow of items that raters score throughout the scoring session. By comparing the raters’ scores with those assigned by the experts, the accuracy of scoring (by individual raters, groups or tables of raters, scoring rooms, and the scoring site overall) can be judged. Validity targets, the expected percentage of exact and adjacent agreement for rubrics of different sizes (e.g., 3-, 4-, 5-, and 6-point rubrics) can be established, in consideration of industry standards, for ongoing rater monitoring and remediation.

Scoring consistency can be determined by calculating the exact and adjacent agreement between pairs of raters who independently rate the same response (inter-rater reliability). Inter-rater reliability can be helpful in reporting on the degree of scoring consistency of individual raters, groups or tables of raters, scoring rooms, and the scoring site overall.

A variety of additional scoring statistics (e.g., daily and cumulative mean score and score-point distribution data) can be used to monitor for potential scoring drift on the part of individual raters, groups or tables of raters, scoring rooms, and the scoring site. These data, combined with other test statistics, can be used to compute the classification accuracy of an assessment (the probability of an examinee being classified into the same achievement category using the same or alternate forms of assessment). Provided scoring and overall test statistics are strong, when publicly reported, they can contribute to the promotion of stakeholder confidence in the scoring process, the performance results, and the assessment program overall.

When large-scale assessments are administered at regular intervals (i.e., annually or on a cycle), it is also important to monitor scoring consistency over time by having raters re-score student responses from previous administrations and to compare scores with those previously assigned.

Many other processes contribute to high-quality assessment scoring, including the following:

Designing an effective, efficient scoring plan
Selecting the “right” scoring leaders, supervisors, and raters (e.g., with regard to qualifications, expertise, experience, and background [e.g., age, gender, urban/rural])
Conducting quality pre-range-finding and range-finding activities (the selection of representative examples of student responses to illustrate the performances that reflect the range of student responses at each of the codes of the scoring rubrics)
Preparing high-quality scoring materials (e.g., student exemplars [example responses showing the different score points, as previously described] and training papers, previously scored by scoring leaders)
Providing effective leader, supervisor, and rater training

More information about scoring best practices can be found in my book, Large-scale Assessment Issues and Practices: An Introductory Handbook (2014), which can be accessed at https://www.amazon.ca/-/fr/Richard-Merrick-Jones/dp/B012U0YHCY.

RMJ Assessment provides a wide range of assessment and evaluation services, which are outlined on our website: https://www.rmjassessment.com/large-scale-assessment/.

9780968485781_cover

Share this:

Related

Leave a comment Cancel reply