Formative assessment – A Thaumaturgical Compendium

BlogPost Progress Report: peer assessment

alex — Wed, 09 May 2012 15:54:00 +0000

Over the last four semesters, beginning in the spring of 2011, I have been using a badge system that allows for peer review and the awarding of badges that can then be shared on the open badge infrastructure. As with many of my experiments with educational technologies, I figured the best way to learn what works is just to dive in and muddle through. I initially intended to start without any specific infrastructure, just running through the process via a wiki, but instead I coded a simple system for managing the badge process, and have tweaked it over time.

It doesn’t really work, but it works well enough, and thanks to some patient and very helpful students, I now know a great deal more about how badges can work in higher education. I make no claim to my successes being best practices, but I at least know more now than when I started, and figured I would share some of this experience.

Why did you do that?

More than a decade ago, I coded my first blog system for a course, though the term was not widely used then. I did it because there were particular kinds of interactions I wanted to encourage, and existing applications didn’t do quite what I wanted them to. I created my BadgePost system for the same reason. I am not really a coder (I dabble) but what I wanted did not exist, and so I took a shot at prototyping something that might work. (As an aside, I also hope that what happened with blogs happens with badges, and I can download the equivalent of WordPress soon instead of having to roll my own.) I knew I wanted:

Peer assessment. I wanted to get myself out of the sole role of sole reviewer. In many cases peers can give better advice than I can. One of the main difficulties of teaching is rewinding to the perspective of the student, and that can be easier, in some cases, for those who have just learned something. I wanted to enable that kind of open peer review in both hybrid courses and those taught entirely online.

Mastery. I also wanted desperately to get away from letter grades, as they seemed like a plague, not just for undergrad courses, but for grad as well. Students seemed far more interested in the grade than they were in learning something, a refrain I’ve heard frequently from a lot of my colleagues. I wanted to move the focus off of the grade.

Peers as cases. Students often ask me for models of good work, and because I change assignments so frequently, I rarely have a “model.” The advantage to open assessment that travels beyond a single course is that there are exemplars to look at, and (hopefully) they are diverse enough not to stifle creative interpretations by new students.

Unbundling the credential from the course. I had a number of problems that seemed to swirl around the equation of course time to learning objectives. For one, in the required technical courses, some people came in with nothing and others with extensive knowledge, and I wanted to try to address the issue of not all students moving through a program in lock-step. I wanted a back door to reduce redundancy and have instructors know that their students were coming into a course with certain skills. Finally, I wanted to give students a range of choices so that they could pursue the areas they were most interested in.

I also wanted non-paying non-Quinnipiac students participating in my courses to have a portable credential to show for it. And I wanted paying, matriculating students to have an easier way of communicating the kinds of things they had learned in the program.

I won’t cover all of these in detail, but will expound a bit more on the assessment and assessing piece…

Peer Assessment

There have been suggestions that the credentialing aspect of badges is separate from the process of assessment that leads to the badge, but in practice I think it’s both likely that they get rolled together, and beneficial when they are. Frankly, students don’t see the distinction, and they can reinforce each other in interesting ways. So, while I have done peer critique in the past, from the outset here, I wanted to get students involved in the process of granting badges via peer critique.

A lot of this was influenced by discussions with Philipp Schmidt and the application of badges in Peer2Peer University. I have long stated the goal of “disappearing” as an instructor in a course, and the place where that appearance is most obvious is when it comes to grading. (And assessment, not the same thing, but bound together.) From the outset, I saw the authority of a badge as vested in the material presented as evidence of learning, and the open endorsement/assessment of that work by peers.

Lots of reasons for this, but part was as a demotivator. That is, my least favorite question on the first day of classes is “how do I get an A?” I am always tempted to tell the truth: “I don’t care, and I wish you didn’t either.” So, I wanted badges to provide a way of getting away from that linear grading scale. I went so far as to basically throw grades out, saying that if you showed up on something approaching a regular basis, you’d get an A.

I should say that this was a failure. If anything, students paid more attention to grades because the unique system made them have to think about it. It wasn’t onerous, but a lot more of the course became about the assessment process. And it’s funny, my desire to escape grading as a focus and process turned a 180, and I am now all about assessment. I should explain…

I hate giving traditional tests (I don’t think they show anything), and hate empty work. And while I now know I like ideas around authentic assessment, from the outside these seemed a lot like more of the same. Now, not only do I think formative assessment is the key element of learning, but that the skill of assessing work in any field is what essentially defines expertise. Being able to tell what constitutes good work allows you to improve the work of others, and importantly, of yourself. At the core of teaching is figuring out what in a piece of work is good, what needs improvement, and how the creator can improve her work.

Beyond Binary

I had expected students to do the work, apply for a badge, and then either get it or not. A lot of other people new to badges seem to have a similar expectation. Just the opposite occurred, and a lot of the changes to my badge system have been to accommodate this.

First, a lot of work that really was not ready for a badge was submitted. I kind of expected students to be very sure of the work that they submitted for a badge, in part because of my experience with blogging in classes, and seeing that students were more careful about their writing when it was for a peer audience. Instead, students often presented work that was not enough for a badge, or barely enough for a badge. I was pleasantly surprised by how much feedback, and in what detail, students gave to their peers.

One of the more concrete changes I made to the system was to move from a binary endorsement (qualified or not, on a number of factors), to a sliding scale, with the center point being passing, and the ability of reviewers to come back and revise their “vote.” As a result, you can see from the evidence of a badge not just what the student has done, but whether their peers thought this was acceptable or awesome.

I’ve also been surprised by how many nominated themselves for “aspirational” badges. When a user selects a badge, it is moved into their “pending” category, and I was confused by so many pending badges that had no evidence uploaded. But students seem to click on these as a kind of note to themselves that this is what they are pursuing. This, incidentally, leads to a problem for reviewers who look at a pending badge before it is ready, and find that process frustrating, but one of the things that needs to improve in the system is communicating such progress. I didn’t plan to need to do that, since I saw badges as an end point rather than a process.

The Reappearing Teacher

The other surprise was just how interested students were in getting my imprimatur. But the reason, in this case, was not the grade–they had that. They actually valued my response as an expert a bit more, I think. This was a refreshing change from students turning to the back page of graded paper to see the grade, and then throwing it out before reading any of my comments. No doubt, some of this comes from a lack in confidence in their peers as well, and I’ve found that in some cases this lack is reasonable.

In some ways, I’m trying to encourage the sempai/kohai relationship, of those who have “gone before” and therefore have more to say about a particular badge. I’ve been reluctant to limit approval to only those who actually have the badge (in part for reasons I’ll note below regarding encouraging reviews), but I may do more of that. There are some kinds of assessment, though, that don’t require having the badge. I don’t need to know how to create a magic trick to be amazed by it, for example. So I don’t want to rule out this kind of “audience assessment.” There is also space for automated assessment. For example, for some badges you need to show a minimal number of tweets, or comments, or responses to comments, or (e.g.) valid HTML. There is no reason to have a human do these pieces of the assessments, though I would hate to see badges that did not involve human assessment, in large part because, again, I think building the capacity to do assessments is an important part of the system.

The Other Motivation

I began by hoping students would ignore the grading process, and have evolved to think that they should pay a lot of attention to assessment. In some courses, students have jumped into peer assessment. In others–and particularly the undergraduate course I’m teaching this semester–they were slow to get started. I want to think about why people assess, and how to motivate them to be involved.

When I did peer assessments in the pre-badge world, I assigned a grade for the quality of the assessment provided. I want to do something similar here, and a lot of this comes of a discussion with Philipp Schmidt in Chicago last year. The meta-project here is getting students to be able to analytically assess work and communicate that. Yes, you could do an “expert assessor” badge, or something similar, but really it is more essential to the overall project.

One way to do this is inter-coder reliability. If I am considered an expert in the area (and in the current system, this is defined as having badges at a higher level than the one in question, within the same “vertical”), those with less experience should be able to spot the same kinds of things I do, and arrive at a similar quantitative result on the assessments.

So, for example, if someone submits the write-up of a content analysis, two of her peers might look at it and come up with two very different assessments of the methods section of the article. Alice may say that it is outstanding, 90/100 on the scale of a particular rubric. Frank might disagree, putting it at 25/100. Of course, both would provide some textual explanation for why they reached these conclusions. Then I come along and give it a 30/100, along with my own critique.
The dynamics of getting students to do peer assessments (some courses they did a lot, some they have not), and my involvement in the assessment, is an interesting piece for me. In this case, Frank should receive some sort of indication within the system that he has done a good job of performing the assessment.

I’m still working out a way to do this that isn’t unnecessarily complex. Right now there is a karma system that gives users karma for performing assessments, with multipliers for agreeing with more experienced assessors, but this is complicated to “tune” and non-intuitive.

There is also the issue of when various levels perform the assessment. For the above process to work, Alice and Frank both need to get their assessments in before I do, and shouldn’t get the same kind of kudos for “me too” assessments after the fact.

Badges

None of this is necessarily about badges, but it leaves a trail of evidence, conversation, and assessment behind. One of the big questions is whether badge records should be formative or summative. As I said, the degree to which students have engaged in badges as a process rather than an outcome came as a bit of a surprise to me. Right now, much of that process happens pretty openly, but I can fully understand how someone well on in their career may not want to expose fully their learning process. (“May” is operative here–I think doing so is valuable for the learning community!)

On the other hand, I think badges that appeal to authority undermine the whole reason badges are not evil. Badges that make an authoritative appeal (“Yale gave me this badge so it must be good.”) simply reinforce many of the bad structures of learning and credentialing that currently exist. Far better is a record of the work done to show that you understand something or can do something, along with the peers that helped you get there, pointed to and easily found via a digital badge.

Balancing the privacy needs with the need to authentically vest the badge with some authority will be an interesting feat. I suspect I may provide ways of hiding “the work” and only displaying the final version (and final critiques) to the outside world, while preserving the sausage-making process for the learning community itself. But this remains a tricky balance.

Badges: The Skeptical Evangelist

alex — Tue, 06 Mar 2012 06:53:57 +0000

I have been meaning to find a moment to write about learning badges for some time. I wanted to respond to the last run of criticisms of learning badges, and the most I managed was a brief comment on Alex Reidâ€s post. Now, with the announcement of the winners of this yearâ€s DML Competition, there comes another set of criticisms of the idea of badges in learning. This isnâ€t an attempt to defend badges–I donâ€t think such a defence is necessary. It is instead an attempt to understand why they are worthy of such easy dismissal by many people.

Good? Bad?

My advisor one day related the story of a local news crew that came to interview him in his office. This would have been in the mid-1990s. The first question the reporter asked him was: “The Internet: Good? Or Bad?”

Technologies have politics, but the obvious answer to that obvious question is “Yes.” Just as when people ask about computers and learning, the answer is that technology can be a force for oppressive, ordered, adaptive multiple-choice “Computer Aided Teaching,” or it can be used to provide a platform for autonomous, participatory, authentic interaction. If there is a tendency, it is one that is largely reflective of existing structures of power. But that doesn’t mean you throw the baby out with the bathwater. On the whole, I think computers provide more opportunities for learning than threats to it, but I’ll be the first to admit that outcome was neither predestined nor obvious. It still isn’t.

Are there dangers inherent to the very idea of badges? I think there are. I’ve written a bit about them in a recent article on the genealogy of badges. But just as I can find Herb Schillerâ€s work on the role of computer technology in cultural hegemony compelling, but still entertain its emancipatory possibilities, I can acknowledge that badges have a long and unfortunate past, and still recognize in them a potential tool for disrupting the currently dominant patterns of assessment in institutionalized settings, and building bridges between informal and formal learning environments.

Ultimately, what is so confusing to me is that I agree wholeheartedly with many of the critics of badges, and reach different conclusions. To look at how some badges have been used in the past and not be concerned about the ways they might be applied in the future would require a healthy amount of selective perception. I have no doubt that badges, badly applied, are dangerous. But so are table saws and genetic engineering. The question is whether they can also be used to positive ends.

Over the last year, I’ve used badges to such positive ends. My own experience suggests that they can be an effective way of improving and structuring peer learning communities and forms of authentic assessment. I know others have had similar successes. So, I will wholeheartedly agree with many of the critics: badges can be poorly employed. Indeed, I suspect they will be poorly employed. But the same can be said of just about any technology. The real question is if there is also some promise that they could represent an effective tool for opening up learning, and providing the leverage needed to create new forms of assessment.

Gold Stars

One of the main critiques of badges suggests that they represent extrinsic forms of motivation to the natural exclusion of intrinsic motivation. Mitch Resnick makes the case here:

I worry that students will focus on accumulating badges rather than making connections with the ideas and material associated with the badges â€“ the same way that students too often focus on grades in a class rather than the material in the class, or the points in an educational game rather than the ideas in the game.

I worry about the same thing. I will note in passing that at worst, he is describing a situation that does no harm: replacing a scalar (A-F letter grades) with a system of extrinsic motivation that is more multidimensional. But the problem remains: if badges are being used chiefly as a way of motivating students, this is probably not going to end well.

And I will note that many educators I’ve met are excited about badges precisely because they see them as ways of motivating students. I think that if you had to limit the influences of using badges to three areas, they would be motivation, assessment, and credentialing. The first of these if often seen as the most important, and not just by the “bad” badgers, but by many who are actively a part of the community promoting learning badges.

(As an aside, I think there are important applications of badges beyond these “big three.” I think they can be used, for example, as a way for a community to collaboratively structure and restructure their view of how different forms of local knowledge are related and I think they can provide a neophyte a map of this knowledge, and an expert a way of tracing their learning autobiography over time. I suspect there are other implications as well.)

Perhaps my biggest frustration is the ways in which badges are automatically tied to gamification. I think there are ways that games can be used for learning, and I know that a lot of the discussion around badges comes from their use in computer games, but for a number of reasons I think the tie is unfortunate; not least, badges in games are often seen primarily as a way of motivating players to do something they would otherwise not do.

Badges and Assessment

The other way in which I worry about computer gaming badges as a model is the way they are awarded. I think that both learning informatics and “stealth assessment,” have their place, but if misapplied they can be very dangerous. My own application of badges puts formative assessment by actual humans (especially peers) at the core. Over time I have come to believe that the essential skill of the expert is an ability to assess. If someone can effectively determine whether something is “good”–a good fit, a good solution, aesthetically pleasing, interesting, etc.–she can then apply that to her own work. Only through this critical view can learning take place.

For me, badges provide a framework for engaging effectively in assessment within a learning community. This seems also to be true for Barry Joseph, who suggests some good and bad examples of badge use here. Can this kind of re-imagination of assessment happen outside of a “badge” construct? Certainly. But badges provide a way of structuring assessment that provides scaffolding without significant constraints. This is particularly true when the community is involved in the continual creation and revision of the badges and what they represent.

Boundary Objects

Badges provide the opportunity to represent knowledge and power within a learning community. Any such representation comes with a dash of danger. The physical structuring of communities: who gets to talk to whom and when, where people sit and stand, gaze–all these things are dangerous. But providing markers of knowledge is not inherently a bad thing, and particularly as learning communities move online and lose some of the social and cultural context, finding those who know something can be difficult.

This becomes even more difficult as people move from one learning community to another. Georg Simmel described the intersection of such social circles as the quintessential property of modern society. You choose your circles, and you have markers of standing that might travel with you to a certain degree. We know what these are: and the college degree is one of the most significant.

I went to graduate school with students who finished their undergraduate degrees at Evergreen State College, and have been on admissions committees that considered Evergreen transcripts in making admissions decisions. Evergreen provides narrative assessments of student work, and while I wholeheartedly stand by the practice–as a great divergence if not a model–it makes understanding a learning experience difficult for those outside the community. Wouldn’t it be nice to have a table of contents? A visual guide though a learning portfolio and narrative evaluation? A way of representing abilities and learning to those unfamiliar with the community in which occurred?

I came to badges because I was interested in alternative ways of indicating learning. I think that open resources and communities of learning are vitally important, but I know that universities will cling to the diploma as a source of tuition dollars and social capital. Badges represent one way of nibbling at the commodity of the college diploma.

Badges, if done badly, just become another commodity: a replacement of authentic learning with an powerful image. To me, badges when done well are nothing more than a pointer. In an era when storing and transmitting vast amounts of content is simple, there is no technical need for badges as a replacement. But as a way of structuring and organizing a personal narrative, and relating knowledge learned in one place to the ideas found in another, badges represent a bridge and a pointer.

This is one reason I strongly endorsed the inclusion of an “evidence” url in the Mozilla Open Badge Infrastructure schema. Of course, the OBI is not the only way of representing badges, nor does it intend to represent only learning badges–there is a danger here of confusing the medium and the message. Nonetheless, it does make for an easier exchange and presentation of badges, and importantly, a way of quickly finding the work that under-girds a personal learning history.

All the Cool Kids Are Doing It

Henry Jenkins provides one of the most compelling cases against badges I’ve seen, though it’s less a case against badges and more a case against the potential of a badgecopalypse, in which a single sort of badging system becomes ubiquitous and totalizing. Even if such a badge system followed more of the “good” patterns on Barry Joseph’s list than the “bad,” it would nonetheless create a space in which participation was largely expected and required.

Some of this comes of the groups that came together around the badge competition. If it were, like several years ago, something that a few people were experimenting with on the periphery, I suspect we would see little conversation. But when foundations and technologists, the Department of Education and NASA, all get behind a new way of doing something, I think it is appropriate to be concerned that it might obliterate other interesting approaches. I share Jenkins’ worry that interesting approaches might easily be cast aside by the DML Competition (though I will readily concede that may be because I was a ~~loser~~“unfunded winner” in the competition) and hope that the projects that move forward do so with open, experimental eyes, allowing their various communities to help iteratively guide the application of badges to their own ends. I worry that by winnowing 500 applications to 30, we may have already begun to centralize what “counts” in approaches to badges. But perhaps the skeptical posts I’ve linked to here provide evidence of the contrary: that the competition has encouraged a healthy public dialog around alternative assessment, and badges represent a kind of “conversation piece.”

Ultimately, it is important that critical voices of approaches to badges remain at the core of the discussion. My greatest concern is that the perception that there are badge evangelists and skeptics is in fact true. I certainly think of myself as both, and I hope that others feel the same way.