Question 1

The primary reason for using Excel to set up data frames is that people like to have the columns aligned. However, if there are not too many columns, it may be faster to do the job in a plain text editor first and align the columns with tabs. In your text editor, type in (or copy and paste from here) the following lines of text:

First String Second 1.22 3.4
Second More Text 1.555555 2.2220
Third x 3 124

Don’t worry about how many tab spaces are needed to set this up, just make sure the columns are aligned. Now, using a single regular expression, transform these lines into what we need for a proper .csv file:

First String,Second,1.22,3.4
Second,More Text,1.555555,2.2220
Third,x,3,124

Answer

Find: \h{2,}
Replace: ,

Question 2

A True Regex Story. I am preparing a collaborative NSF grant with a colleague at another university. One of the pieces of an NSF grant is a listing of potential conflicts of interest. NSF wants to know the first and last name of the collaborator and their institution. Here are a few lines of my conflict list:

Ballif, Bryan, University of Vermont
Ellison, Aaron, Harvard Forest
Record, Sydne, Bryn Mawr

However, my collaborator asked me to please provide to her the list in this format:

Bryan Ballif (University of Vermont)
Aaron Ellison (Harvard Forest)
Sydne Record (Bryn Mawr)

Write a single regular expression that will make the change.

Answer

Find:(\w+),\s(\w+),\s(.+)
Replace:\2 \1 (\3)

Question 3

A Second True Regex Story. A few weeks ago, at Radio Bean’s Sunday afternoon old-time music session, one of the mandolin players gave me a DVD with over 1000 historic recordings of old-time fiddle tunes.

The list of tunes (shown here as a single line of text) looks like this:

0001 Georgia Horseshoe.mp3 0002 Billy In The Lowground.mp3 0003 Cherokee Shuffle.mp3 0004 Walking Cane.mp3

Unfortunately, in this form, you can’t re-order the file names to put them in alphabetical order. I thought I could just strip out the leading numbers, but this will cause a conflict, because, for wildly popular tunes such as “Shove That Pig’s Foot A Little Further In The Fire”, there are multiple copies somewhere in the list.

All of these files are on a single line, so first write a regular expression to place each file name on its own line:

0001 Georgia Horseshoe.mp3
0002 Billy In The Lowground.mp3
0003 Cherokee Shuffle.mp3
0004 Walking Cane.mp3

Answer

Find: mp3\s
Replace: mp3\n

Question 4

Now write a regular expression to grab the four digit number and put it at the end of the title:

Georgia Horseshoe_0001.mp3
Billy In The Lowground_0002.mp3
Cherokee Shuffle_0003.mp3
Walking Cane_0004.mp3

Answer

Find: (\d{4})\s(.+)(\.mp3)
Replace: \2_\1\3

Question 5

Here is a data frame with genus, species, and two numeric variables.

Camponotus,pennsylvanicus,10.2,44
Camponotus,herculeanus,10.5,3
Myrmica,punctiventris,12.2,4
Lasius,neoniger,3.3,55

Write a single regular expression to rearrange the data set like this:

C_pennsylvanicus,44
C_herculeanus,3
M_punctiventris,4
L_neoniger,55

Answer

Find: (\w)\w+,(\w+),.+,(\d+)
Replace: \1_\2,\3

Question 6

Beginning with the original data set, rearrange it to abbreviate the species name like this:

C_penn,44
C_herc,3
M_punc,4
L_neon,55

Answer

Find: (\w)\w+,(\w{4})\w+,.+,(\d+)
Replace: \1_\2,\3

Question 7

Beginning with the original data set, rearrange it so that the species and genus names are fused with the first 3 letters of each, followed by the two columns of numerical data in reversed order:

Campen, 44, 10.2
Camher, 3, 10.5
Myrpun, 4, 12.2
Lasneo, 55, 3.3

Answer

Find: (\w{3})\w+,(\w{3})\w+,(.+),(\d+)
Replace: \1\2,\4,\3