BlogPost Progress Report: peer assessment

Over the last four semesters, beginning in the spring of 2011, I have been using a badge system that allows for peer review and the awarding of badges that can then be shared on the open badge infrastructure. As with many of my experiments with educational technologies, I figured the best way to learn what works is just to dive in and muddle through. I initially intended to start without any specific infrastructure, just running through the process via a wiki, but instead I coded a simple system for managing the badge process, and have tweaked it over time.

It doesn’t really work, but it works well enough, and thanks to some patient and very helpful students, I now know a great deal more about how badges can work in higher education. I make no claim to my successes being best practices, but I at least know more now than when I started, and figured I would share some of this experience.

Why did you do that?

More than a decade ago, I coded my first blog system for a course, though the term was not widely used then. I did it because there were particular kinds of interactions I wanted to encourage, and existing applications didn’t do quite what I wanted them to. I created my BadgePost system for the same reason. I am not really a coder (I dabble) but what I wanted did not exist, and so I took a shot at prototyping something that might work. (As an aside, I also hope that what happened with blogs happens with badges, and I can download the equivalent of WordPress soon instead of having to roll my own.) I knew I wanted:

Peer assessment. I wanted to get myself out of the sole role of sole reviewer. In many cases peers can give better advice than I can. One of the main difficulties of teaching is rewinding to the perspective of the student, and that can be easier, in some cases, for those who have just learned something. I wanted to enable that kind of open peer review in both hybrid courses and those taught entirely online.

Mastery. I also wanted desperately to get away from letter grades, as they seemed like a plague, not just for undergrad courses, but for grad as well. Students seemed far more interested in the grade than they were in learning something, a refrain I’ve heard frequently from a lot of my colleagues. I wanted to move the focus off of the grade.

Peers as cases. Students often ask me for models of good work, and because I change assignments so frequently, I rarely have a “model.” The advantage to open assessment that travels beyond a single course is that there are exemplars to look at, and (hopefully) they are diverse enough not to stifle creative interpretations by new students.

Unbundling the credential from the course. I had a number of problems that seemed to swirl around the equation of course time to learning objectives. For one, in the required technical courses, some people came in with nothing and others with extensive knowledge, and I wanted to try to address the issue of not all students moving through a program in lock-step. I wanted a back door to reduce redundancy and have instructors know that their students were coming into a course with certain skills. Finally, I wanted to give students a range of choices so that they could pursue the areas they were most interested in.

I also wanted non-paying non-Quinnipiac students participating in my courses to have a portable credential to show for it. And I wanted paying, matriculating students to have an easier way of communicating the kinds of things they had learned in the program.

I won’t cover all of these in detail, but will expound a bit more on the assessment and assessing piece…

Peer Assessment

There have been suggestions that the credentialing aspect of badges is separate from the process of assessment that leads to the badge, but in practice I think it’s both likely that they get rolled together, and beneficial when they are. Frankly, students don’t see the distinction, and they can reinforce each other in interesting ways. So, while I have done peer critique in the past, from the outset here, I wanted to get students involved in the process of granting badges via peer critique.

A lot of this was influenced by discussions with Philipp Schmidt and the application of badges in Peer2Peer University. I have long stated the goal of “disappearing” as an instructor in a course, and the place where that appearance is most obvious is when it comes to grading. (And assessment, not the same thing, but bound together.) From the outset, I saw the authority of a badge as vested in the material presented as evidence of learning, and the open endorsement/assessment of that work by peers.

Lots of reasons for this, but part was as a demotivator. That is, my least favorite question on the first day of classes is “how do I get an A?” I am always tempted to tell the truth: “I don’t care, and I wish you didn’t either.” So, I wanted badges to provide a way of getting away from that linear grading scale. I went so far as to basically throw grades out, saying that if you showed up on something approaching a regular basis, you’d get an A.

I should say that this was a failure. If anything, students paid more attention to grades because the unique system made them have to think about it. It wasn’t onerous, but a lot more of the course became about the assessment process. And it’s funny, my desire to escape grading as a focus and process turned a 180, and I am now all about assessment. I should explain…

I hate giving traditional tests (I don’t think they show anything), and hate empty work. And while I now know I like ideas around authentic assessment, from the outside these seemed a lot like more of the same. Now, not only do I think formative assessment is the key element of learning, but that the skill of assessing work in any field is what essentially defines expertise. Being able to tell what constitutes good work allows you to improve the work of others, and importantly, of yourself. At the core of teaching is figuring out what in a piece of work is good, what needs improvement, and how the creator can improve her work.

Beyond Binary

I had expected students to do the work, apply for a badge, and then either get it or not. A lot of other people new to badges seem to have a similar expectation. Just the opposite occurred, and a lot of the changes to my badge system have been to accommodate this.

First, a lot of work that really was not ready for a badge was submitted. I kind of expected students to be very sure of the work that they submitted for a badge, in part because of my experience with blogging in classes, and seeing that students were more careful about their writing when it was for a peer audience. Instead, students often presented work that was not enough for a badge, or barely enough for a badge. I was pleasantly surprised by how much feedback, and in what detail, students gave to their peers.

One of the more concrete changes I made to the system was to move from a binary endorsement (qualified or not, on a number of factors), to a sliding scale, with the center point being passing, and the ability of reviewers to come back and revise their “vote.” As a result, you can see from the evidence of a badge not just what the student has done, but whether their peers thought this was acceptable or awesome.

I’ve also been surprised by how many nominated themselves for “aspirational” badges. When a user selects a badge, it is moved into their “pending” category, and I was confused by so many pending badges that had no evidence uploaded. But students seem to click on these as a kind of note to themselves that this is what they are pursuing. This, incidentally, leads to a problem for reviewers who look at a pending badge before it is ready, and find that process frustrating, but one of the things that needs to improve in the system is communicating such progress. I didn’t plan to need to do that, since I saw badges as an end point rather than a process.

The Reappearing Teacher

The other surprise was just how interested students were in getting my imprimatur. But the reason, in this case, was not the grade–they had that. They actually valued my response as an expert a bit more, I think. This was a refreshing change from students turning to the back page of graded paper to see the grade, and then throwing it out before reading any of my comments. No doubt, some of this comes from a lack in confidence in their peers as well, and I’ve found that in some cases this lack is reasonable.

In some ways, I’m trying to encourage the sempai/kohai relationship, of those who have “gone before” and therefore have more to say about a particular badge. I’ve been reluctant to limit approval to only those who actually have the badge (in part for reasons I’ll note below regarding encouraging reviews), but I may do more of that. There are some kinds of assessment, though, that don’t require having the badge. I don’t need to know how to create a magic trick to be amazed by it, for example. So I don’t want to rule out this kind of “audience assessment.” There is also space for automated assessment. For example, for some badges you need to show a minimal number of tweets, or comments, or responses to comments, or (e.g.) valid HTML. There is no reason to have a human do these pieces of the assessments, though I would hate to see badges that did not involve human assessment, in large part because, again, I think building the capacity to do assessments is an important part of the system.

The Other Motivation

I began by hoping students would ignore the grading process, and have evolved to think that they should pay a lot of attention to assessment. In some courses, students have jumped into peer assessment. In others–and particularly the undergraduate course I’m teaching this semester–they were slow to get started. I want to think about why people assess, and how to motivate them to be involved.

When I did peer assessments in the pre-badge world, I assigned a grade for the quality of the assessment provided. I want to do something similar here, and a lot of this comes of a discussion with Philipp Schmidt in Chicago last year. The meta-project here is getting students to be able to analytically assess work and communicate that. Yes, you could do an “expert assessor” badge, or something similar, but really it is more essential to the overall project.

One way to do this is inter-coder reliability. If I am considered an expert in the area (and in the current system, this is defined as having badges at a higher level than the one in question, within the same “vertical”), those with less experience should be able to spot the same kinds of things I do, and arrive at a similar quantitative result on the assessments.

So, for example, if someone submits the write-up of a content analysis, two of her peers might look at it and come up with two very different assessments of the methods section of the article. Alice may say that it is outstanding, 90/100 on the scale of a particular rubric. Frank might disagree, putting it at 25/100. Of course, both would provide some textual explanation for why they reached these conclusions. Then I come along and give it a 30/100, along with my own critique.
The dynamics of getting students to do peer assessments (some courses they did a lot, some they have not), and my involvement in the assessment, is an interesting piece for me. In this case, Frank should receive some sort of indication within the system that he has done a good job of performing the assessment.

I’m still working out a way to do this that isn’t unnecessarily complex. Right now there is a karma system that gives users karma for performing assessments, with multipliers for agreeing with more experienced assessors, but this is complicated to “tune” and non-intuitive.

There is also the issue of when various levels perform the assessment. For the above process to work, Alice and Frank both need to get their assessments in before I do, and shouldn’t get the same kind of kudos for “me too” assessments after the fact.

Badges

None of this is necessarily about badges, but it leaves a trail of evidence, conversation, and assessment behind. One of the big questions is whether badge records should be formative or summative. As I said, the degree to which students have engaged in badges as a process rather than an outcome came as a bit of a surprise to me. Right now, much of that process happens pretty openly, but I can fully understand how someone well on in their career may not want to expose fully their learning process. (“May” is operative here–I think doing so is valuable for the learning community!)

On the other hand, I think badges that appeal to authority undermine the whole reason badges are not evil. Badges that make an authoritative appeal (“Yale gave me this badge so it must be good.”) simply reinforce many of the bad structures of learning and credentialing that currently exist. Far better is a record of the work done to show that you understand something or can do something, along with the peers that helped you get there, pointed to and easily found via a digital badge.

Balancing the privacy needs with the need to authentically vest the badge with some authority will be an interesting feat. I suspect I may provide ways of hiding “the work” and only displaying the final version (and final critiques) to the outside world, while preserving the sausage-making process for the learning community itself. But this remains a tricky balance.

BlogPost Progress Report: peer assessment

Share this: