I talked to someone today who had a list of names in Excel, and wanted to sort them by last name. Unfortunately, they were not split into first / last names. What is the (non-programmer) to do? It’s actually a good opportunity to talk about doing stuff in Excel, since it tends to be a tool I end up using a lot to work with data. And this is a surprisingly common problem. It is also a nice way to talk a little bit about functions in Excel beyond “SUM”. (If you want to cut to the chase, hereis the Excel document.) So, let’s get started:
For reasons that may not appear immediately obvious, I’m going to treat this a bit like a programming exercise. Before starting to write it out in Excel, let’s figure out what we need to do. Take, for example, the following name:
How is it that we figure out what we need to do. Even though we are going for the surname, let’s start with the slightly easier version of finding out the first name.
First, we need to state what we want to do as clearly as possible:
In B1, put the left-hand word from A1
The LEFT Function
That’s good, but unfortunately, Excel doesn’t have a function “left-hand word”–it doesn’t even know what a word is! It does, however, have the function LEFT which takes a number of characters from the left-hand side of a string of characters. So, we have a solution:
If we type this in and hit ENTER, we end up with “George” in B1. Which is great for people with 6-letter surnames, but not-so-great for people with 5-letter surnames. So we need to rethink our specification, and make it more detailed:
In B1, put a string of characters representing everything left of the space in the string of characters in A1
OK, Excel doesn’t know what a “word” is, but we can tell it explicitly what the space is. Unfortunately, again, there isn’t a function “everything left of the space.” We have “LEFT” but it needs a specific number. So clearly, we need to find out how far from the left the space is. Luckily, there is a function that does that: FIND. Let’s try this:
|1||George Bush||=find(” “,A1)|
(Note that there is a SPACE between those two quotation marks.) If you hit enter, you will have a 7, since the space first shows up in position seven in the string. Now we’re getting somewhere.
So we know where the space is, and we have a way of getting strings of arbitrary sizes from the left-hand side of something. What if we put these two together. Let’s add a column.
So, what the formula in C1 does is take the string in A1, and chop of the first n letters, where n is taken from the formula in B1. Why are we subtracting 1 from that number? A first name doesn’t include the space, right? So we take one shy of the number up to the space.
Putting It All Together
In this case, you can probably see how to smoosh everything together. As we shall see later on, this isn’t always possible. Sometimes, you need interim solutions. But here we can put it all together in a single formula in a single cell:
|1||George Bush||=left(A1,find(” “,A1)-1)|
If you enter this formula and press enter, you have the first name.
What about the last name? If you think there might be a function RIGHT to go with LEFT, you would be right. So, all we need to do is figure out how many characters on the right we need. First, let’s make a useful mistake:
If you enter this formula, you will end up with “e Bush.” The FIND function only gives us the location of the space as a number of characters from the start (left-hand side) of a string of characters. How do we find the number from the right-hand side. Naturally, it is the total length of the string, minus the number of characters on the left-hand side. Luckily, we can get the total length of the string with the LEN function:
This gets us the length of the string. We still need to know where that space is:
|1||George Bush||11||=find(” “,A1)|
Now we have the two pieces we need, and we can get the right-hand part of the string we want:
And we end up with Bush. More importantly, just about any first-last combo you put in A1 will yield the correct last name. (If you try “Sting” or “George W. Bush” you run into troubles, but we’ll deal with one of those in a moment.)
Of course, we can again smoosh this all into a single-cell for economy and neatness:
|1||George Bush||=right(A1,len(A1)-find(” “,A1))|
Assuming that your list of names has only one space, you are in good shape. Just copy the formula in B1, and copy it down the whole list of names:
|5||George W. Bush||W. Bush|
Uh, oh! Some problems toward the bottom there. But generally, we have a solution as long as there is a single space with a first and last name. If that’s your case, you can stop now. Otherwise, things are going to get…
A Little Bit More Complicated
What do we do with a problem like “George W. Bush”? Since FIND looks for the first occurrence of a space, we end up taking everything right of the first space. Well, we could do that again and again until we had no space. That might work. What would that look like? Well, we start with our initial plan:
|1||George W. Bush||=right(A1,len(A1)-find(” “,A1)|
This gets us “W. Bush”. Now, we can copy the formula in B1 to C1. When we do this, it automatically adjusts so that it is looking at B1 instead of A1, so we end up with:
|1||George W. Bush||W. Bush||Bush|
Alright, so we’ve solved the problem of the middle initial!
Mixing It Up
But what if, like in the earlier example, some of our names have middle initials and some do not. Let’s try copying our formulas in B1 and C1 to a list of names:
|1||George W. Bush||W. Bush||Bush|
Well, it is getting the names, but they are all over the place. Sometimes, the correct name shows up in the B column, and sometimes the C. But we’re nearly there.
Getting In Line
OK, so shouldn’t we be able to put these all in the same column somehow? Of course we can! Let’s try again to state what we want to do as clearly as we can in English:
If there is an error in the C column, use the name in B; otherwise, use the one in the C column.
Naturally, computers are really good with statements like that. We have two new functions we’ll use to make this happen: IF, and ISERROR. IF is just that, it checks whether something is true and if it is, it puts something in a cell. If it isn’t, it does something else. ISERROR tells you whether an error appears in a cell. So we now have everything we need to make that statement. (We could do this in two separate cells, but I’m going to make a great leap forward, and just put it all in one. Generally, though, if something isn’t working, for example, you want to try in multiple cells first.)
|1||George W. Bush||W. Bush||Bush||=if(iserror(C1),B1,C1)|
In D1, up pops “Bush.” Why? Because the cell checks to see whether an error appears in C1. If it did, it would print out the contents of B1. But since it doesn’t, it prints out the contents of C1. What happens if it does find an error in C1? Well, we can copy and paste to find out.
|1||George W. Bush||W. Bush||Bush||Bush|
If you are thinking that there is a way to mush this all into a single cell, you are right, but I’ll leave that as an exercise to the reader. Also, we haven’t made the world safe for the single named, like Madonna and Sting. If either of these people are on your list, you should just remove them; they are already on plenty of lists. (NB: If you extend the logic we have used above, you can avoid deleting them.) Also, if you use Che’s real name (Ernesto Guevara de la Serna), you run into real problems, but parsing out extended names in the Hispanic tradition is something best left to humans.
Cleaning it up
Truth is, in many organizations, if you give this to your boss, he or she will be unhappy. Trust me, I know from experience: folks often do not like formulas in their spreadsheets. Especially if they didn’t write them, they often find them to be confusing or capricious. So, it may be worth locking these down as values. Highlight all the formulas that have generated your last name. Copy, and then Edit>Paste Special, selecting “Values.” Now you have a column that looks to all the world like you spent the afternoon typing surnames.