• 09.20.12

How Do We Measure What Really Counts In The Classroom?

A new generation of assessment tools are hoping to piggyback off the wealth of online rating software to find a better and more efficient method of assessing students.

The world is caught up in an Information Age revolution, where we are all evaluating products, restaurants, doctors, books, hotels, and everything else online, but education has not yet moved past the standardized assessment, which was invented in 1914. Frederick Kelly, a doctoral student in Kansas, was looking for a mass-produced way to address a teacher shortage caused by World War I. If Ford could mass produce Model T’s, why not come up with a test for “lower order thinking” for the masses of immigrants coming into America just as secondary education was made compulsory and all the female teachers were working in factories while their men went to the European front? Even Kelly was dismayed when his emergency system, which he called the Kansas Silent Reading Test, was retained after the war ended. By 1926, a variation of Kelly’s test was adopted by the College Entrance Examination Board as the Scholastic Aptitude Test (SAT). The rest is history.


So when Kyle Peck (from Penn State) and Khusro Kidwai (of the University of Southern Maine) demoed their nonprofit, free eRubric assessment tool at Duke recently, we were all surprised at the flexibility it allowed, in a customizable and highly automated form.

An art history teacher and a prof teaching geographical information systems were both beta-testing it to grade essay and short answer exams to hundreds of students. eRubric allowed them to assess everything from the accuracy of the specific content on individual answers to logical thinking, verbal expression, imaginative thinking-outside-the-box application of the material–in other words: originality. In a different kind of assignment, the professors might have added categories for collaborative work, or the ability to take an idea from beginning to conclusion of a project–the kinds of skills good teachers discover but rarely have a chance to test, measure, or provide any good feedback on, especially if there are 90 or 400 students in a course. The eRubric allows anyone evaluating others the ability to customize the categories to be evaluated, to weight the individual categories differently on different assignments, and could be used in informal or formal education, from kindergarten through college and beyond, and with applications for any Human Resources department at any corporation too.

That’s just the beginning. If a teacher wished, she could even begin the first day of class with a blank eRubric and have students, together, write the categories and the feedback for each category together. They would then know, on each challenge or test or essay they were given, how they would be judged, the terms of the assessment that would, in the end, determine their grade. All research on assessment shows we learn more if we understand, participate in, and agree with the basic learning or work goals we’re aiming at. An investment in outcomes that research shows improves learning.

With eRubric, the teacher decides, on any assignment, which categories apply and how to weight them. When the test papers, problem sets, or essays come in, the teacher reads and then clicks each category box to generate complex feedback in each category. eRubric allows the teacher to write an individual comment in each category or on the whole assignment if he thinks the pre-written comment could use more precision. eRubric then automatically sends all of this feedback (probably a page-long assessment in the end) to the student in an email: summary grade, break down, general comments, specific comments. A week or so later, eRubric sends any student who hasn’t opened the assessment document a reminder email and sends one to the professor indicating whether or not the student has bothered.

When we hear every year that the U.S. has fallen in the OECD rankings to, say, 14th in reading, 17th in science, and 25th in math in the world, as we did in 2010, we’re always alarmed. Isn’t that a problem? It may well be, but it the problem is far more complex. Americans use standardized tests earlier and more often than any other nation on the planet. Research shows that high stakes, after-the-fact or end of grade, multiple choice testing has little impact on learning motivation and even little real quantitative relationship to content mastery.

When I talk to corporate trainers, they insist that, in this job market, they can hire the smartest students in the country, those who have had the highest grades through the entire school system. But because the No Child Left Behind national law began requiring the standardized tests for all students since 2002, it takes them one to two years to retrain these great students not to think in terms of single-best-answer (multiple choice) options. They have to make them “unlearn” the skill of guessing the best answer from five available ones (a pretty useless skill in the workplace), and begin to “relearn” how to think about what they do or don’t really understand about a situation, who to go to in order to find out, and what they need to do to have the best results. In other words, whether we are 1st or 17th, we’re failing at testing what we really value in the workplace. There is an extreme mismatch between what we value and how we count.


On September 20 and 21st, the 30 recipients of grants from our MacArthur Foundation-Gates Digital Media and Learning Competition will be meeting at Duke to show off how far they have gotten on the badging systems they are creating. One institutional representative and one software systems developer from each team will be there to demo, discuss, learn, and innovate in a group un-conference. The institutions include Intel, the Department of Veterans Affairs, Disney, the Smithsonian Museum of Natural History, the Girl Scouts, 4 H, Carnegie Mellon, the Urban Affairs Coalition, Microsoft, Boise State University, and several K-12 schools and teachers groups. All are working to find systems that–like eRubric–allow for real-time feedback, peer-contribution to an evaluation system, flexibility, and customizability—all of which inspire learning. They are also looking for ways that their systems can be automated and provide enough consistency that they are meaningful in comparing results within, between, and across institutions.

Standardized testing is our past–but it doesn’t have to be our future. We’re hoping that pioneers like the developers of eRubric or those who are coming together this week at Duke from institutions large and small can pioneer systems that work better for our age, taking advantage of the technology we now have. In this, they have two decades of work by the worldwide community of web developers who have already developed peer-awarded badging systems (on Top Coder, Stack Exchange, and other online accreditation sites) that they use when finding collaborative partners on which their systems and livelihood depends. If computer programmers can figure out reliable systems for rewarding everything from Python coding skills to “fire starter” ability to breathe creativity into a project when everyone else is stuck, so can our schools. Soon, we may well have automated, easy, teacher-friendly, student-inspiring assessment systems that actually measure what we value and count the kind of knowledge and thinking that really do count in the classroom and in the real world.