I’m revisiting this topic because in my last post about translating Excel data into a text file for database import I used a php script and yet still had some manual steps (the one that bugged me most was adding custom line termination characters for multiline data). I mentioned the idea that if I wanted to put more time in I could use a Perl script to do the whole process. Well my next set of data I needed to import was almost 700 records spanning almost ten times as many lines in the tab delimited text file I exported from Excel (due to the description fields having newline characters). So here’s my latest solution:
- First I added a numeric column in the left-most position (within excel) and filled it with incremental integers.
- Then I exported the spreadsheet as tab delimited text
- Then I wrote a perl script to loop through the file and add line termination characters before each new record, this time I had to use double equal signs (==) because the data contained single equal signs
- I also found a perl module similar to the php htmlentities() function which actually worked for me, as opposed to the php function which I abandoned in favor of hard-coded string replacement commands
- Then I ran the perl script on my tab delimited text file to create a tab delimited file with double equal signs as the line terminator and all the funny characters encoded so they would show up correctly in a browser
- Finally I imported my file using the load data infile option in the phpmyadmin interface, specifying the delimiter and line termination as \t and == respectively.
- Then I looked at a webpage which pulled that data and it looked like it did in the excel file!
Here is the code for the perl script I created:
#!/usr/bin/perl # Filename: prepareTextImport.pl - this script will take a list of data # including multiline data where the first field is a line count. # It will add == before each line count to enable mysql importing and # will encode html entities and convert newline characters to <br> use HTML::Entities; my $input = $ARGV[0]; #this is the file to parse my $output = $ARGV[1]; #this is the file to create my $lineTerminator = "=="; my $count=1; my $line = ''; open (IN,$input) or die "can't open file $input\n"; open (OUT, ">$output") or die "Can't open outfile : $output\n"; while (<IN>) { # each line of the current input file chomp; $line .= "<br>"; # include this here so the <br> will be encoded too $line = encode_entities($_); #This is the function which encodes html symbols # Check if this is the beginning of a new record. If so then add the line terminator if ($line =~ /^$count/) { $count++; $line = $lineTerminator.$line; } print OUT $line; } close IN; close OUT; print "there were $count records processed.\n"; exit;
The syntax for running the script (assuming perl is installed on your system – I use activeperl on my PC which is free):
perl prepareTextImport.pl inputFileName.txt outputFileName.txt