How to Easily Prepare Database Import Using Perl

I’m revisiting this topic because in my last post about translating Excel data into a text file for database import I used a php script and yet still had some manual steps (the one that bugged me most was adding custom line termination characters for multiline data). I mentioned the idea that if I wanted to put more time in I could use a Perl script to do the whole process. Well my next set of data I needed to import was almost 700 records spanning almost ten times as many lines in the tab delimited text file I exported from Excel (due to the description fields having newline characters). So here’s my latest solution:

First I added a numeric column in the left-most position (within excel) and filled it with incremental integers.
Then I exported the spreadsheet as tab delimited text
Then I wrote a perl script to loop through the file and add line termination characters before each new record, this time I had to use double equal signs (==) because the data contained single equal signs
I also found a perl module similar to the php htmlentities() function which actually worked for me, as opposed to the php function which I abandoned in favor of hard-coded string replacement commands
Then I ran the perl script on my tab delimited text file to create a tab delimited file with double equal signs as the line terminator and all the funny characters encoded so they would show up correctly in a browser
Finally I imported my file using the load data infile option in the phpmyadmin interface, specifying the delimiter and line termination as \t and == respectively.
Then I looked at a webpage which pulled that data and it looked like it did in the excel file!

Here is the code for the perl script I created:
[sourcecode language=”perl”]
#!/usr/bin/perl
# Filename: prepareTextImport.pl – this script will take a list of data
# including multiline data where the first field is a line count.
# It will add == before each line count to enable mysql importing and
# will encode html entities and convert newline characters to
use HTML::Entities;

my $input = $ARGV[0]; #this is the file to parse
my $output = $ARGV[1]; #this is the file to create
my $lineTerminator = “==”;
my $count=1;
my $line = ”;
open (IN,$input) or die “can’t open file $input\n”;
open (OUT, “>$output”) or die “Can’t open outfile : $output\n”;
while () { # each line of the current input file
chomp;
$line .= “
“; # include this here so the
will be encoded too
$line = encode_entities($_); #This is the function which encodes html symbols
# Check if this is the beginning of a new record. If so then add the line terminator
if ($line =~ /^$count/) {
$count++;
$line = $lineTerminator.$line;
}
print OUT $line;
}
close IN;
close OUT;
print “there were $count records processed.\n”;
exit;
[/sourcecode]
The syntax for running the script (assuming perl is installed on your system – I use activeperl on my PC which is free):
[sourcecode language=”sh”]
perl prepareTextImport.pl inputFileName.txt outputFileName.txt
[/sourcecode]

How to Easily Prepare Database Import Using Perl

Leave a comment

Cancel reply