How to Easily Prepare Database Import Using Perl

I’m revisiting this topic because in my last post about translating Excel data into a text file for database import I used a php script and yet still had some manual steps (the one that bugged me most was adding custom line termination characters for multiline data). I mentioned the idea that if I wanted to put more time in I could use a Perl script to do the whole process. Well my next set of data I needed to import was almost 700 records spanning almost ten times as many lines in the tab delimited text file I exported from Excel (due to the description fields having newline characters). So here’s my latest solution:

  1. First I added a numeric column in the left-most position (within excel) and filled it with incremental integers.
  2. Then I exported the spreadsheet as tab delimited text
  3. Then I wrote a perl script to loop through the file and add line termination characters before each new record, this time I had to use double equal signs (==) because the data contained single equal signs
  4. I also found a perl module similar to the php htmlentities() function which actually worked for me, as opposed to the php function which I abandoned in favor of hard-coded string replacement commands
  5. Then I ran the perl script on my tab delimited text file to create a tab delimited file with double equal signs as the line terminator and all the funny characters encoded so they would show up correctly in a browser
  6. Finally I imported my file using the load data infile option in the phpmyadmin interface, specifying the delimiter and line termination as \t and == respectively.
  7. Then I looked at a webpage which pulled that data and it looked like it did in the excel file!

Here is the code for the perl script I created:

#!/usr/bin/perl
# Filename: prepareTextImport.pl - this script will take a list of data 
#	including multiline data where the first field is a line count.
#	It will add == before each line count to enable mysql importing and 
#	will encode html entities and convert newline characters to <br>
use HTML::Entities;

my $input = $ARGV[0]; #this is the file to parse
my $output = $ARGV[1]; #this is the file to create
my $lineTerminator = "==";
my $count=1;
my $line = '';
open (IN,$input) or die "can't open file $input\n";  
open (OUT, ">$output") or die "Can't open outfile : $output\n";
while (<IN>) {	# each line of the current input file
	chomp;
	$line .= "<br>"; # include this here so the <br> will be encoded too
	$line = encode_entities($_); #This is the function which encodes html symbols
	# Check if this is the beginning of a new record.  If so then add the line terminator
	if ($line =~ /^$count/) {
		$count++;
		$line = $lineTerminator.$line;
	}
	print OUT $line;
}
close IN;
close OUT;
print "there were $count records processed.\n";
exit;

The syntax for running the script (assuming perl is installed on your system – I use activeperl on my PC which is free):

perl prepareTextImport.pl inputFileName.txt outputFileName.txt

Leave a comment

Your email address will not be published. Required fields are marked *

CommentLuv badge