Comparative Judgment
How it can be used to enhance teachers’ formative assessment skills and students’ learning
Comparative judgment (CJ) is an assessment methodology based on the ranking of two pieces of work at a time. CJ can be used both as a professional development tool to sharpen assessment skills and develop shared standards, and as a way of identifying exemplars of quality that allow students to better understand learning goals and expectations.
Learning is unpredictable, and students do not learn everything they are taught; therefore simply providing learning opportunities in school is not by itself sufficient. Assessment must be embedded within educational settings to bridge teaching and learning. Teachers who fail to assess what pupils do cannot determine whether they are contributing to or impeding pupils’ progress. Assessment data must be elicited, inferred from and used to adapt classroom practice to better meet students’ needs.
We know that teachers make inferences based on what happens in the classroom activities. My colleague, Professor Inga-Britt Skogh, and I were curious to find out what teachers are focusing on while assessing student progress in the context of Swedish technology education. Our study involved six teachers and a class of 11-year-old students who were undertaking an open-ended design scenario. The students designed a model robot to help them complete various tasks they needed help with at home. They identified problems such as recording NHL games, walking the dog, completing homework, scanning and submitting homework, and baking cupcakes. During classroom activities, the students built Web-based synchronous e-portfolios of their learning and the product development, using text, photos, moving pictures and sketches on their iPads. In order to unpack what teachers emphasized as criteria for success, we decided to use a methodology called comparative judgment.1
Comparative judgment is an assessment methodology where the judgers compare two pieces of student work and identify which one of them is better, without saying how much better it is. Their decision is based on quality of the work.
To identify the motives behind teachers’ choices, the research team asked them to describe the reasons for their choices by speaking into an MP3 recorder while doing the pairwise comparisons. These think-aloud protocols were recorded and transcribed. They provided valuable insights on the rationale for each choice. Results generated a judge consistency above .9 and the qualitative data revealed what the assessors agreed upon: the importance of seeing the narrative of the portfolio/design process. The teachers – our judgers – were also invited to a session where we interviewed them.
Comparative judgment
Comparative judgment has been used in different settings, such as psychology and perfume making, and also quite recently (last 10–15 years) in educational settings. Comparative judgment stems from the work of Luis Thurstone who, in the 1920s, tried to find methods for measuring things that are difficult to measure – such as attitudes and opinions, for example how serious a crime is considered to be. Thurstone argued that while people find it hard to say how serious a crime is, they can compare one crime to another relatively easily and reliably in terms of which crime they think is more serious. He explained that when two phenomena are placed in comparison with one another, individuals can use their knowledge to compare and identify which qualities are superior with high fidelity. He showed that by repeatedly comparing pairs of items, a ranking could be made of all items assessed with very high reliability. Based on his studies, he formulated the Law of Comparative Judgment, which in short means that people are more reliable when comparing two stimuli, such as two crimes, than when giving an absolute value to a stimulus.2 Laming built on Thurstone’s work and said that all assessment is a comparison of one thing to something else.3
How does comparative judgment work?
Comparative judgment is an iterative process, where assessors are presented with a series of two objects and select the better of the two. They assign an instinctive ranking to each object based on their expertise, previous experience and the object quality, instead of awarding an absolute score.
This iterative process may be undertaken manually, where you, for example, pick two random essays from your pile of student work, compare them, and pick one as a winner; you then repeat the process iteratively until every essay has been compared to each other, like the Swiss tournament. This manual process is a bit complicated, especially when you want to work with others. It can be facilitated with comparative judgment software, where student work is presented two at a time and then from a mathematical formula compared to each other. This software for comparative judgment generates quantitative data of high reliability, usually above 80 percent, and also facilitates the inclusion of multiple assessors.
Research studies have been conducted with comparative judgment, both regarding validity and reliability, in Ireland, England, Belgium, Sweden and the U.S. In the studies, comparative assessment has been used primarily to assess creative work in, for example, technology education and essay writing. The high reliability achieved reflects a professional consensus in the group of assessors. The software system allows assessors to leave comments and explain why they judged one example to be better than other. These comments can be used to identify criteria that describe what teachers consider to be important competencies in the subject and can also be given as feedback to both teachers and students while learning.
The comparative judgment process can be undertaken wherever is convenient, something that I and my American friends Dr. Scott Bartholomew and Dr. Greg Strimel, from Purdue University, took advantage of when we wanted to investigate differences and commonalities among teachers’ assessment practices in open-ended design scenarios across nations.
Teachers and educational researchers from the U.S., U.K. and Sweden were invited to assess an open-ended design scenario in engineering/ technology education (a pill dispenser for a fictional forgetful client) made by 760 high school students. The judgers assessed 175 portfolios and 175 products with comparative judgment via the cloud-based software Compare Assess. We undertook the whole study via the Internet and the judgers did their assessment from their home couches or wherever they liked.4
Clarifying criteria for success
I strongly believe that teachers do everything in their power to move their pupils forward in their learning journeys. However, the direction of forward movement is not always obvious! Still, it is crucial for teachers to be clear about what they expect of their students. Such clarity benefits all students, and especially low achievers, and thus it may dramatically reduce the gap between low and high achievers.
Clarifying the learning intentions, consequences, and results of an assessment increases validity and reliability. But this clarity can be hard to achieve without spoiling the joy of learning. Furthermore, students’ perceptions of learning intentions may not match teachers’ expectations. Addressing this discrepancy is crucial to reducing the gap between low and high achievers, since low achievers generally find it more difficult to interpret what their teachers consider as criteria for success.
How do we overcome the discrepancy between teachers’ intention and students’ comprehension, and at the same time promote thinking by encouraging pupils to express themselves, reflect upon their own and others’ ideas, and expand their horizons? The Irish Technology Education Research Group (TERG) approached this challenge in the technology teacher education program at the University of Limerick by letting students peer assess one another’s work using comparative judgment. Specifically, students were asked to compare two pieces of work, choose which one was better and provide peer feedback comments in an iterative process. Feedback was matched to each exemplar and given back to the students, who were then given time to consider the feedback and develop their work before handing it in to the teacher for final assessment. The research team was overwhelmed by the students’ positive response to this intervention, reporting that the iterative process of comparative judgment was valuable for improving their understanding of the nature of technology – much better than with rubrics, according to the students themselves.
The TERG study5 also reported how valuable students found providing and receiving peer feedback via comparative judgment. Follow-up of student’s progress suggested that low-achieving students benefitted more than high-achieving students from seeing exemplars of other students’ work, as the low-achieving students made the greatest leap.
Rubrics vs exemplars to convey learning intentions
There are different ways to share learning intentions. Using comparative judgment to identify exemplars of quality work is one example. More traditionally, a teacher can post learning intentions on the blackboard and then have the pupils copy them into their workbooks, where they will likely never be reviewed again.
One popular approach involves the use of rubrics. However rubrics are often written in teacher-friendly language, such that students and teachers may interpret them in different ways. Furthermore, I wonder why rubrics are so often divided into three columns? Is learning always a three-step process? Therefore I prefer the use of exemplars to rubrics. However, this preference is not just based on what I like – the advantage with exemplars is considerable as they articulate learning intentions in a richer way.
Using exemplars is like wine tasting, where you actually taste and discuss the wine. Rubrics, by comparison, are like reading a review of a wine without smelling or tasting it. By sharing exemplars from different contexts, educators can help students explore the true construct more deeply. Annotated exemplars give students an understanding of what quality looks like, especially when exemplars of different quality can be contrasted. Exemplars of student work may also promote discussion among learners. Using exemplars to explicate expectations and criteria for success for students is not cheating; instead, it is a way to invite students into a discussion of quality. Exemplars are valuable for learning, especially when used as part of instruction and in open-ended and problem-solving tasks, and have been found to reduce cognitive load.6 They have the greatest impact on learners at a lower level of mastery, and the effect on learning decreases as expertise grows. Therefore, evidence suggests that students gain the most when exemplars are presented at the beginning of the learning journey.
Working with exemplars
Using comparative judgment software systems is one way of working with exemplars. However, the software system is not required. Figure 1 is from a Japanese secondary classroom. Technology teachers used these exemplars in a dialogue with their pupils to articulate different quality work in electronics by comparing the three different exemplars to each other and with the students’ own work as well.
Figure 2 is from an arts classroom in Sweden. The teacher has illustrated the national criteria for grading with sunflowers of different quality. I showed these exemplars at a workshop on formative assessment and at first the participating teachers all agreed – but then suddenly a man raised his hand and objected to the shared consensus that the sunflower at the top was of highest quality. He informed us that he was more into abstract art, and therefore thought the sunflower at the bottom should be rated highest. Then the discussion about quality in artwork really took off; I wish I could have recorded it. The discussion ended in an agreement that is summed up by Winnie the Pooh when he says, “It’s best to know what you’re looking for before you look for it.” With this particular exemplar, the purpose of the task should be clarified. The sad part of this story is that this was the first time these teachers had had the opportunity to discuss this in depth with their peers.
Knowing where they are going makes it easier for students to get there, especially when they know what next step to take and in which direction. Conversely, when pupils are often left on their own, trying to decode the mystery path of learning themselves without guidance and opportunities for reflection, they may lose interest and opportunities. When students are able to consider exemplars in dialogue with their peers, they may gain a richer understanding of what quality work looks like – just as teachers do when discussing exemplars with their professional peers.
Comparative judgment for professional learning
I believe that teachers can develop their assessment literacy and their nose for quality by being exposed to exemplars via the comparative judgment process and by being “forced” to justify their choices and discuss them with others within the profession. And why not start this journey during their teacher education program by reviewing authentic exemplars and practicing how to provide feedback, while they are taking their teacher training courses? How often did you get a chance to see student work during your teacher training, and how often have you had the opportunity to share exemplars with peers? My experience tells me it is not a frequent experience.
Comparative judgment via digital software is also a fairly easy way to invite others within the profession to your classroom practices. The teachers that I have worked with in Sweden were particularly fond of seeing other than their own students’ work, as it expanded their horizon. The interviews in the Hartell and Skogh study7 showed that teachers felt that the comparative judgment method answered the need to collaborate with other teachers in the assessment process. Comparative judgment is useful for both training and on-going refining of teachers’ assessment practices. For example, you can investigate if your standards have changed by blending last year’s students’ essays with the ones you have now, and then checking your comparative judgment outcomes against how you have graded the work. To discover how your standards compare to your peers’, you can invite others to participate, then share and discuss your results together. A school in Oxford used this model to share consensus in terms of quality of student work. The project was initiated by the school head, not for accountability purposes but instead with the aim of strengthening teachers’ assessment practices to enhance equity for their pupils.
It is easy to get carried away with new approaches, and even though there are multiple applications of comparative judgment, appropriate use should always be kept in mind. What decisions are to be made? Then choose what data to collect and present. Depending on what a teacher wants his or her students to learn, the teacher must choose appropriate tasks and exemplars.
The foremost value I see with comparative judgment and exemplars are to serve as a catalyst for discussion. Similar to how wine connoisseurs taste and discuss wine, I see the potential of comparative judgment to foster teachers’ assessment literacy and self-efficacy. Comparative judgment is a useful tool to unpack teachers’ assessment practices, to uncover epistemological values and constructs, and to explicate criteria for success in a much deeper way. Above all, I see great potential as a way to invite learners into the mystery of learning.
Original illustrations: iStock
First published in Education Canada, March 2019
1 A. Pollitt, “The Method of Adaptive Comparative Judgment,” Assessment in Education: Principles, Policy & Practice19, no. 3 (2012): 281–300.
2 L. L. Thurstone, “A Law of Comparative Judgment,” Psychological Review 34 (1927).
3 D. Laming, Human Judgment: The eye of the beholder (London: Thomson Learning, 2004).
4 See e.g. S. R. Bartholomew, E. Yoshikawa-Ruesch, E. Hartell, and G. J. Strimel, “Design Values, Preferences, Similarities, and Differences across Three Global Regions,” in PATT 36. Research and Practice in Technology Education: Perspectives on human capacity and development, eds. Seery, Buckley, Canty and Phelan (Athlone, Ireland: TERG, 2018), 432–440.
5 N. Seery, J. Buckley, T. Delahunty, and D. Canty, “Integrating Learners into the Assessment Process Using Adaptive Comparative Judgment with an Ipsative Approach to Identifying Competence Based Gains Relative to Student Ability Levels,” International Journal of Technology and Design Education (2018); N. Seery, D. Canty, and P. Phelan, “The Validity and Value of Peer Assessment Using Adaptive Comparative Judgment in Design Driven Practical Education,” International Journal of Technology and Design Education 22, no. 2 (2012): 205–226.
6 J. Sweller, (1988). “Cognitive Load During Problem Solving: Effects on learning,” Cognitive Science 12, no. 2 (2988): 257–285.
7 E. Hartell, and I. -B. Skogh, (2015). “Criteria for Success: A study of primary technology teachers’ assessment of digital portfolios,” Australasian Journal of Technology Education, 2, no. 1 (2015).