Publisher: Journal of Educational Measurement, 36 (1)
Page Numbers: 61-71
In 1993, the authors reported in the Journal of Educational Measurement (JEM) that task-sampling variability was the Achilles' heel of science performance assessment. To reduce measurement error, tasks needed to be stratified before sampling, sampled in large number, or possibly both. However, Cronbach, Linn, Brennan, & Haertel (1997) pointed out that a task-sampling interpretation of a large person x task variance component might be incorrect. Task and occasion sampling are confounded because tasks are typically given on only a single occasion. The person x task source of measurement error is then confounded with the pt x occasion source. If pto variability accounts for a substantial part of the commonly observed pt interaction, stratifying tasks into homogeneous subsets -- a cost-effective way of addressing task sampling variability -- might not increase accuracy. Stratification would not address the pto source of error. Another conclusion reported in JEM was that only direct observation and notebook methods of collecting performance assessment data were exchangeable; computer simulation, short-answer, and multiple-choice methods were not. However, if Cronbach et al. were right, the exchangeability conclusion might be incorrect. After re-examining and re-analyzing data, the authors found support for Cronback et al. They concluded that large task-sampling variability was due to both the person x task interaction and person x task x occasion interaction. Moreover, the authors found that direct observation, notebook and computer simulation methods were equally exchangeably, but their exchangeability was limited by the volatility of student performance across tasks and occasions.