Three ways to manipulate strings without retyping your data
by Stephen Nebel
Microsoft Excel 2000/2002/2003/2007
I have an Excel list that contains course names. The individual names contain both the course titles and the PDU (Professional Development Units) values. How can I extract the PDU values and transfer them as numeric values into column B while also deleting the PDU labels and values from the names in column A?
We’ve created a short sample of what your list might look like, as shown in Figure A. You can manipulate the text in column A without retyping your data, which can prove extremely time-consuming when you’re working with larger data sets.
In the course of this article, we’ll explore three different approaches to string manipulations of this sort: formulas, VBA and user-defined functions.
You can split the PDU information from its corresponding course name with a little string work.
Find a pattern
The first step is generally to look for a pattern in the data that you can exploit. Fortunately, the course names shown in Figure A are all consistent in these respects:
1. The second to the last character cluster always consists of digits and a possible decimal point representing the course’s PDU value.
2. The last character cluster always consists of the PDU label itself.
But the course names are inconsistent in these respects:
1. The digits representing the PDU values do not follow any consistent fixed decimal format.
2. The PDU label could be either singular or plural.
Utilize string functions to get the job done
One approach to the problem is to use Excel string functions. The Excel string functions aren’t a particularly rich set, but you’ve got a shot. Let’s take it one step at a time.
Set an anchor in your string
The Excel FIND() function can get you going by seeking out the text you want to get rid of. Figure B shows the use of the FIND() and LEFT() functions to clip the PDU acronym from the end of the course name.
Use the FIND() function in conjunction with the LEFT() function to delete the PDU description from the Course Name column.
The FIND() function takes three parameters:
1. The string you’re looking for.
2. The string you’re searching in.
3. The start position for the search.
Cell B2 in our example contains the formula =FIND("PDU",A2,1). The formula returns 24, which is the position in the string of the P in PDU. You’ve now got a foothold in the string.
Enter stage LEFT()
The LEFT() function takes two parameters:
1. The target string.
2. The character count.
In cell C2, we use the formula =LEFT(A2,B2-2) to store the left segment of the string, clipping “PDUs” from the right end of the string. This gives us our course name minus “PDUs” with the PDU value now in the last position. So where do you go from here?
If would be really nice if FIND() allowed you to search backwards, because then you could just search backwards to the first space from the end and extract the PDU value digits with the Excel MID() function. FIND() won’t scan backwards, though.
FIND() does allow you to start the search at an advanced point in the string, however. So take another look at the original data in Figure A. It looks like our longest digit string is 2.25 and the shortest is 1. Furthermore, each string is preceded by words five and six characters in length. So, if you start the search for a space from the fifth character from the end of the string, it looks like there’s a high probability we’ll hit the last space. Figure C shows the final phase of our conversion.
The FIND() function in column D locates the last space in the string so Excel knows where you want to split the string text.
Cell D2 contains the formula =FIND(" ",C2,LEN(C2) - 4). This searches for a space beginning from the fifth character from the end of the string. The value returned, 19, then makes it easy to do the final split. The formula in cell E2 is =LEFT(C2,D2), which returns the course name without the digits.
Combine MID() and VALUE()
The MID() function takes three parameters:
1. The source string.
2. The start point in the string.
3. The number of characters to return from the start point.
The formula in cell F2 is =VALUE(MID(C2,D2,5)), which returns the digits, converting them to numeric value. Note that even if you specify 5 characters, if the string has fewer than 5 characters from the start point to the end of the string, MID() just returns a shorter string.
But using all of these string functions really isn’t the most graceful solution. Worse, it’s fragile. It gets you by if you can depend on consistent conventions in the text, but PDU values wider than four places, or unexpected spaces in descriptions such as “Course A 1 PDU” will cause the #VALUE! error to show up in many of the columns.
Get a more reliable solution with VBA
Let’s take a look at a better solution for inconsistent string data using VBA.
To enter a VBA solution:
1. Create a sample worksheet based on the data shown in Figure A and save the workbook as Strings.xls.
2. Press [Alt][F11] to access the Visual Basic Editor (VBE).
3. In the Projects window, find VBAProject(Strings.xls). Click the box to the left as needed to open the project’s object tree.
4. Right-click in the general area below Strings.xls, and then choose Insert | Module from the shortcut menu.
5. Enter the code in Listing A, minus the line numbers, which we’ll use only to reference particular lines in the article.
Learn how the code works
We’ll do a general overview of the code first, then do a walk-through.
Listing A contains two procedures. The first, Parser(), is public and is primarily a loop that moves the cursor down through the list of course names. Parser() continues until it encounters a blank cell, then terminates execution. Each time through, Parser() calls the private procedure BreakUpString(). It passes the course name and length as parameters. BreakUpString() does just that — it strips the PDU value from the course name in column A, and then stores it as a numeric value in column B. Let’s look at the code.
Step through the Parser() procedure
At lines 4 and 6, the code declares the public integers ColumnOffset and RowOffset. These will be your location coordinates in the worksheet (in a really big worksheet you might want to make them long data types rather than integers). At lines 18 and 20, you set those integers to their start values — the equivalent of cell A2.
At lines 14 and 16, the code declares variables to get the cell contents — that is, the course names — and the course name string lengths. The code retrieves the names and establishes the lengths in the main loop, and then passes these values to the string break procedure.
At line 22, Excel enters the loop that will move down column A, retrieving and processing course names until it encounters a blank cell. At line 24, the code retrieves the cell contents using the Cells collection of the ActiveSheet object and our row and column pointers. At line 27, the code establishes the string length. At line 29, the code tests to see if the cell’s blank, and if so, the code exits the procedure.
At line 35, the code tests whether the string “PDU” is contained in the current string in the variable CellContents. VBA can only process strings that match the intended pattern, and this test helps ensure that the code doesn’t turn BreakUpString() loose on inappropriate material.
Move on to the BreakUpString() procedure
Assuming “PDU” does exist in the string, the code calls BreakUpString() at line 37, passing both the string and its length as parameters. Now the procedure jumps[...]