“Leon Rothberg, Ph.D., a 58-year-old professor of English Literature at Ohio State University, was shocked and saddened Monday after receiving a sub-par mid-semester evaluation from freshman student Chad Berner. The circles labeled 4 and 5 on the Scan-Tron form were predominantly filled in, placing Rothberg’s teaching skill in the ‘below average’ to ‘poor’ range.”
So begins an article in what has become one of the truthiest sources of news on the web. But it is no longer time for mid-semester evals. In most of the US classes are wrapping up, and professors are chest-deep in grading. And the students–the students are also grading.
Few faculty are great fans of student evaluations, and I think with good reason. Even the best designed instruments–and few are well designed–treat the course like a marketing survey. How did you feel about the textbook that was chosen? Were the tests too hard? And tell us, were you entertained?
Were the student evals used for marketing, that would probably be OK. At a couple of the universities where I taught, evals were made publicly available, allowing students a glimpse of what to expect from a course or a professor. While that has its own problems, it’s not a bad use of the practice. It can also be helpful for a professor who is student-centered (and that should be all of us) and wants to consider this response when redesigning the course. I certainly have benefited from evaluations in that way.
Their primary importance on the university campus, however, is as a measure of teaching effectiveness. Often, they are used as the main measure of such effectiveness. Especially for tenure, and now as many universities incorporate more rigorous post-tenure evaluation, there as well.
Teaching to the Test
A former colleague, who shall remain nameless, noted that priming the student evals was actually pretty easily done, and started with the syllabus. You note why your text choice is appropriate, how you are making sure grading is fair, indicate the methods you use to be well organized and speak clearly, etc. Throughout the semester, you keep using the terms used on the evals to make clear how outstanding a professor you really are. While not all the students may fall for this, a good proportion would, he surmised.
(Yes, this faculty member had ridiculously good teaching evaluations. But from what I knew, he was also an outstanding teacher.)
Or you could just change your wardrobe. Or do one of a dozen other things the literature suggests improves student evaluations.
Or you could do what my car dealership does and prominently note that you are going to be surveyed and if you can’t answer “Excellent” to any item, to please bring it to their attention so they can get to excellent. This verges on slimy, and I can imagine, in the final third of the semester, that if I said this it might even cross over into unethical. Of course, if I do the same for students–give them an opportunity to get to the A–it is called mastery learning, and can actually be a pretty effective use of formative assessment.
Or you could do what an Amazon seller has recently done for me, and offer students $10 to remove any negative evaluations. But I think the clearly crosses the line both in Amazon’s case and in the classroom. (That said, I have on one occasion had students fill out evals in a bar after buying them a pitcher of beer.)
It is perhaps a testament to the general character of the professoriate that in an environment where student evaluations have come to be disproportionately influential on our careers, such manipulation–if it occurs at all–is extremely rare.
It’s the nature of the beast, though: we focus on what is measured. If what is being measured is student attitudes toward the course and the professor, we will naturally focus on those attitudes. While such attitudes are related to the ability to learn new material, they are not equivalent.
Doctor Feelgood
Imagine a hospital that promoted doctors (or dismissed them) based largely on patient reviews. Some of you may be saying “that would be awesome.” Given the way many doctors relate to patients, I am right there with you. My current doctor, Ernest Young, actually takes time to talk to me, listens to me, and seems to care about my health, which makes me want to care about my health too. So, good. And frankly, I do think that student (and patient) evaluation serves an important role.
But–and mind you I really have no idea how hospitals evaluate their staff–I suspect there are other metrics involved. Probably some metrics we would prefer were not (how many patients the doctor sees in an hour) and some that we are happy about (how many patients manage to stay alive). As I type this, I strongly suspect that hospitals are not making use of these outcome measures, but I would be pleased to hear otherwise.
A hospital that promoted only doctors who made patients think they were doing better, and who made important medical decisions for them, and who fed them drugs on demand would be a not-so-great place to go to get well. Likewise, a university that promotes faculty who inflate grades, reduce workload to nill, and focus on entertainment to the exclusion of learning would also be a pretty bad place to spend four years.
If we are talking about teaching effectiveness, we should measure outcomes: do students walk out of the classroom knowing much more than they did when they walked in? And we may also want to measure performance: are professors following practices that we know promote learning? The worst people to determine these things: the legislature. The second worst: the students. The third worst: fellow faculty.
Faculty should have their students evaluated by someone else. They should have their teaching performance peer reviewed–and not just by their departmental colleagues. And yes, well designed student evaluations could remain a part of this picture, but they shouldn’t be the whole things.
Buffet Evals
I would guess that 95% of my courses are in the top half on average evals, and that a slightly smaller percentage are in the top quarter. (At SUNY Buffalo, our means were reported against department, school, and university means, as well as weighted against our average grade in the course. Not the case at Quinnipiac.) So, my student evals tend not to suck, but there are also faculty who much more consistently get top marks. In some cases, this is because they are young, charming, and cool–three things I emphatically am not. But in many cases it is because they really care about teaching.
These are the people who need to lead reform of the use of teaching evaluation use in tenure and promotion. It’s true, a lot of them probably like reading their own reviews, and probably agree with their students that they do, indeed, rock. But a fair number I’ve talked to recognize that these evals are given far more weight than they deserve. Right now, the most vocal opponents to student evaluations are those who are–both fairly and unfairly–consistently savaged by their students at the end of the semester.
We need those who have heart-stoppingly perfect evaluations to stand up and say that we need to not pay so much attention to evaluations. I’m not going to hold my breath on that one.
Short of this, we need to create systems of evaluating teaching that are at least reasonably easy and can begin to crowd out the student eval as the sole quantitative measure of teaching effectiveness.
BlogPost Progress Report: peer assessment
Over the last four semesters, beginning in the spring of 2011, I have been using a badge system that allows for peer review and the awarding of badges that can then be shared on the open badge infrastructure. As with many of my experiments with educational technologies, I figured the best way to learn what works is just to dive in and muddle through. I initially intended to start without any specific infrastructure, just running through the process via a wiki, but instead I coded a simple system for managing the badge process, and have tweaked it over time.
It doesn’t really work, but it works well enough, and thanks to some patient and very helpful students, I now know a great deal more about how badges can work in higher education. I make no claim to my successes being best practices, but I at least know more now than when I started, and figured I would share some of this experience.
Why did you do that?
More than a decade ago, I coded my first blog system for a course, though the term was not widely used then. I did it because there were particular kinds of interactions I wanted to encourage, and existing applications didn’t do quite what I wanted them to. I created my BadgePost system for the same reason. I am not really a coder (I dabble) but what I wanted did not exist, and so I took a shot at prototyping something that might work. (As an aside, I also hope that what happened with blogs happens with badges, and I can download the equivalent of WordPress soon instead of having to roll my own.) I knew I wanted:
Peer assessment. I wanted to get myself out of the sole role of sole reviewer. In many cases peers can give better advice than I can. One of the main difficulties of teaching is rewinding to the perspective of the student, and that can be easier, in some cases, for those who have just learned something. I wanted to enable that kind of open peer review in both hybrid courses and those taught entirely online.
Mastery. I also wanted desperately to get away from letter grades, as they seemed like a plague, not just for undergrad courses, but for grad as well. Students seemed far more interested in the grade than they were in learning something, a refrain I’ve heard frequently from a lot of my colleagues. I wanted to move the focus off of the grade.
Peers as cases. Students often ask me for models of good work, and because I change assignments so frequently, I rarely have a “model.” The advantage to open assessment that travels beyond a single course is that there are exemplars to look at, and (hopefully) they are diverse enough not to stifle creative interpretations by new students.
Unbundling the credential from the course. I had a number of problems that seemed to swirl around the equation of course time to learning objectives. For one, in the required technical courses, some people came in with nothing and others with extensive knowledge, and I wanted to try to address the issue of not all students moving through a program in lock-step. I wanted a back door to reduce redundancy and have instructors know that their students were coming into a course with certain skills. Finally, I wanted to give students a range of choices so that they could pursue the areas they were most interested in.
I also wanted non-paying non-Quinnipiac students participating in my courses to have a portable credential to show for it. And I wanted paying, matriculating students to have an easier way of communicating the kinds of things they had learned in the program.
I won’t cover all of these in detail, but will expound a bit more on the assessment and assessing piece…
Peer Assessment
There have been suggestions that the credentialing aspect of badges is separate from the process of assessment that leads to the badge, but in practice I think it’s both likely that they get rolled together, and beneficial when they are. Frankly, students don’t see the distinction, and they can reinforce each other in interesting ways. So, while I have done peer critique in the past, from the outset here, I wanted to get students involved in the process of granting badges via peer critique.
A lot of this was influenced by discussions with Philipp Schmidt and the application of badges in Peer2Peer University. I have long stated the goal of “disappearing” as an instructor in a course, and the place where that appearance is most obvious is when it comes to grading. (And assessment, not the same thing, but bound together.) From the outset, I saw the authority of a badge as vested in the material presented as evidence of learning, and the open endorsement/assessment of that work by peers.
Lots of reasons for this, but part was as a demotivator. That is, my least favorite question on the first day of classes is “how do I get an A?” I am always tempted to tell the truth: “I don’t care, and I wish you didn’t either.” So, I wanted badges to provide a way of getting away from that linear grading scale. I went so far as to basically throw grades out, saying that if you showed up on something approaching a regular basis, you’d get an A.
I should say that this was a failure. If anything, students paid more attention to grades because the unique system made them have to think about it. It wasn’t onerous, but a lot more of the course became about the assessment process. And it’s funny, my desire to escape grading as a focus and process turned a 180, and I am now all about assessment. I should explain…
I hate giving traditional tests (I don’t think they show anything), and hate empty work. And while I now know I like ideas around authentic assessment, from the outside these seemed a lot like more of the same. Now, not only do I think formative assessment is the key element of learning, but that the skill of assessing work in any field is what essentially defines expertise. Being able to tell what constitutes good work allows you to improve the work of others, and importantly, of yourself. At the core of teaching is figuring out what in a piece of work is good, what needs improvement, and how the creator can improve her work.
Beyond Binary
I had expected students to do the work, apply for a badge, and then either get it or not. A lot of other people new to badges seem to have a similar expectation. Just the opposite occurred, and a lot of the changes to my badge system have been to accommodate this.
First, a lot of work that really was not ready for a badge was submitted. I kind of expected students to be very sure of the work that they submitted for a badge, in part because of my experience with blogging in classes, and seeing that students were more careful about their writing when it was for a peer audience. Instead, students often presented work that was not enough for a badge, or barely enough for a badge. I was pleasantly surprised by how much feedback, and in what detail, students gave to their peers.
One of the more concrete changes I made to the system was to move from a binary endorsement (qualified or not, on a number of factors), to a sliding scale, with the center point being passing, and the ability of reviewers to come back and revise their “vote.” As a result, you can see from the evidence of a badge not just what the student has done, but whether their peers thought this was acceptable or awesome.
I’ve also been surprised by how many nominated themselves for “aspirational” badges. When a user selects a badge, it is moved into their “pending” category, and I was confused by so many pending badges that had no evidence uploaded. But students seem to click on these as a kind of note to themselves that this is what they are pursuing. This, incidentally, leads to a problem for reviewers who look at a pending badge before it is ready, and find that process frustrating, but one of the things that needs to improve in the system is communicating such progress. I didn’t plan to need to do that, since I saw badges as an end point rather than a process.
The Reappearing Teacher
The other surprise was just how interested students were in getting my imprimatur. But the reason, in this case, was not the grade–they had that. They actually valued my response as an expert a bit more, I think. This was a refreshing change from students turning to the back page of graded paper to see the grade, and then throwing it out before reading any of my comments. No doubt, some of this comes from a lack in confidence in their peers as well, and I’ve found that in some cases this lack is reasonable.
In some ways, I’m trying to encourage the sempai/kohai relationship, of those who have “gone before” and therefore have more to say about a particular badge. I’ve been reluctant to limit approval to only those who actually have the badge (in part for reasons I’ll note below regarding encouraging reviews), but I may do more of that. There are some kinds of assessment, though, that don’t require having the badge. I don’t need to know how to create a magic trick to be amazed by it, for example. So I don’t want to rule out this kind of “audience assessment.” There is also space for automated assessment. For example, for some badges you need to show a minimal number of tweets, or comments, or responses to comments, or (e.g.) valid HTML. There is no reason to have a human do these pieces of the assessments, though I would hate to see badges that did not involve human assessment, in large part because, again, I think building the capacity to do assessments is an important part of the system.
The Other Motivation
I began by hoping students would ignore the grading process, and have evolved to think that they should pay a lot of attention to assessment. In some courses, students have jumped into peer assessment. In others–and particularly the undergraduate course I’m teaching this semester–they were slow to get started. I want to think about why people assess, and how to motivate them to be involved.
When I did peer assessments in the pre-badge world, I assigned a grade for the quality of the assessment provided. I want to do something similar here, and a lot of this comes of a discussion with Philipp Schmidt in Chicago last year. The meta-project here is getting students to be able to analytically assess work and communicate that. Yes, you could do an “expert assessor” badge, or something similar, but really it is more essential to the overall project.
One way to do this is inter-coder reliability. If I am considered an expert in the area (and in the current system, this is defined as having badges at a higher level than the one in question, within the same “vertical”), those with less experience should be able to spot the same kinds of things I do, and arrive at a similar quantitative result on the assessments.
So, for example, if someone submits the write-up of a content analysis, two of her peers might look at it and come up with two very different assessments of the methods section of the article. Alice may say that it is outstanding, 90/100 on the scale of a particular rubric. Frank might disagree, putting it at 25/100. Of course, both would provide some textual explanation for why they reached these conclusions. Then I come along and give it a 30/100, along with my own critique.
The dynamics of getting students to do peer assessments (some courses they did a lot, some they have not), and my involvement in the assessment, is an interesting piece for me. In this case, Frank should receive some sort of indication within the system that he has done a good job of performing the assessment.
I’m still working out a way to do this that isn’t unnecessarily complex. Right now there is a karma system that gives users karma for performing assessments, with multipliers for agreeing with more experienced assessors, but this is complicated to “tune” and non-intuitive.
There is also the issue of when various levels perform the assessment. For the above process to work, Alice and Frank both need to get their assessments in before I do, and shouldn’t get the same kind of kudos for “me too” assessments after the fact.
Badges
None of this is necessarily about badges, but it leaves a trail of evidence, conversation, and assessment behind. One of the big questions is whether badge records should be formative or summative. As I said, the degree to which students have engaged in badges as a process rather than an outcome came as a bit of a surprise to me. Right now, much of that process happens pretty openly, but I can fully understand how someone well on in their career may not want to expose fully their learning process. (“May” is operative here–I think doing so is valuable for the learning community!)
On the other hand, I think badges that appeal to authority undermine the whole reason badges are not evil. Badges that make an authoritative appeal (“Yale gave me this badge so it must be good.”) simply reinforce many of the bad structures of learning and credentialing that currently exist. Far better is a record of the work done to show that you understand something or can do something, along with the peers that helped you get there, pointed to and easily found via a digital badge.
Balancing the privacy needs with the need to authentically vest the badge with some authority will be an interesting feat. I suspect I may provide ways of hiding “the work” and only displaying the final version (and final critiques) to the outside world, while preserving the sausage-making process for the learning community itself. But this remains a tricky balance.
Share this: