How to import and export large datasets in mysql

originally posted at http://www.goodfeelingplace.com/how-to-import-and-export-large-datasets-in-mysql
I’ve seen many questions around the importing and exporting of large tables and/or databases in mysql, so I’d like to share how I’ve learned to manage those processes.  Using phpmyadmin for these tasks is not suitable because web servers are not designed for running large scripts or uploading/downloading large files.  You can changes the limits in phpmyadmin but there are several different limitations in that environment which you may run into.

For exporting large databases you can use the mysqldump command from a prompt (unix shell or dos prompt).  For exporting large tables, I login to mysql and use a query with the syntax “SELECT column1name, column2name, column3name, … FROM myDatabaseName.myTableName INTO OUTFILE ‘/path/to/outfilename.csv’ FIELDS TERMINATED BY ‘,’ ENCLOSED BY ‘\” ESCAPED BY ‘\\’;” – this gives you a csv file in the location you specify with the fields surrounded by single quotes, separated by commas, and escaped by a backslash.  The only thing that needs to be escaped in this context is a single quote.

For importing large databases created from mysqldump or phpmyadmin, I save the sql dump file into a local directory, cd to that directory (in unix shell or windows dos), login to mysql from the command line, then type the following command: source dbdump.sql (or whatever the dump file name is with your db data).

For importing large tables that were exported using the export syntax I mentioned above, I login to mysql and then type something like “LOAD DATA INFILE ‘/path/to/outfile.csv’ INTO TABLE myDatabaseName.myTableName FIELDS TERMINATED BY ‘,’ ENCLOSED BY ‘\” ESCAPED BY ‘\\’;”

Free MDB file to MySQL Conversion

originally posted at http://www.goodfeelingplace.com/free-mdb-file-to-mysql-conversion
This week I created a pretty useful tool I’d like to share. I was given an MS Access file (.mdb) and I needed a way to transfer the structure and the data into a MySQL database. I don’t have MS Access and I have no plans to purchase it, so I did a lot of searching to find free tools for getting at the data.

I found a few ways that worked ok, two ways that did what I needed and one that I liked the best. The one I liked the best was MDB Viewer Plus, an exe file that doesn’t need to be installed – you can get the latest version at http://www.alexnolan.net/software/mdb_viewer_plus.htm. I got the 1.5.1 version from another site found through google, but later discovered that I couldn’t get into an mdb file that required a password using that version. Once I went to the source of the program I got the 1.5.3 version and the password authentication worked great. The other program that worked fairly well was called Open Office Base, but exporting the data proved a bit tricky, as I had to first create reports that queried the tables before I could save them as text. I liked MDB Viewer Plus better because exporting a table is very easy, you can do it by clicking a tab with the table name you want to view it, then click the “Export Table” button to export it in one of nine different formats.

So once I got the tables exported from the mdb file into text files, then I looked around for an easy way to get it into mysql. Unfortunately phpmyadmin requires more than just field delimiters to import csv files, but also some character has to “enclose” every field. I did some more searching and found a simple php script posted on the codewalkers site that converted a type of csv file into a bunch of sql insert queries. This looked promising, so I modified it to ignore empty fields, convert dates into valid mysql date format, take tab delimited text as the input instead of semicolon delimited fields (I don’t know where you would get that kind of csv file anyway), and I added a table creation query with a user definable field type. I had a table with over 100 fields in it that I didn’t want to type out, so this was the script I used to generate the table creation query as well as the insert statements.

Below you will see a screenshot of the script with a small example tab delimited text file that I pasted into the main input box:

Download the zipped php file here

So the summarize, here are the steps to follow for this MDB file to MySQL transfer or migration:

  1. Use one of the free programs mentioned above like the MDB Viewer Plus to open the Access file and see what’s in it
  2. Save each table as a TXT file (which happens to be tab delimited when you use the MDB Viewer Plus)
  3. Download the php file I put together and shared above, and put it in the webroot of either a web hosting account you have on a remote server (like Blue Host, which I use for this site) or on your local webserver root. If you are interested in installing a free webserver easily you can try XAMPP, which I use on my PC.
  4. For each table you want to add to MySQL, type in the name and default field type (the type that most of your fields will be, so you will have to manually set as few of them as possible afterwards), and paste the tab delimited text of your table (with column names on the first line) into the big input box.
  5. Push the button at the bottom, and then use the first generated sql statement to create your table
  6. Change the field types that don’t match the default (like an autoincrement integer field for the index, or a few date fields)
  7. Then run the rest of the sql statements to populate your table and you’re done:)

Update: I just figured out how to do something that makes the whole process much easier for windows users. Using php you can connect to just about any database that exists, and query it directly. And for Access, an .mdb file counts as a database. So here’s the code to connect to a local mdb file using php on a windows machine:

$db_connection = new COM("ADODB.Connection", NULL, 1251);
$db_connstr = "DRIVER={Microsoft Access Driver (*.mdb)};" .
" DBQ=C:\Path to mdb file\MyAccessFileName.mdb; Password=MyPasswordHere; DefaultDir=C:\DataDir";
$db_connection->open($db_connstr);
$rs = $db_connection->execute("SELECT * FROM TableName");
$rs_fld0 = $rs->Fields(0);
$rs_fld1 = $rs->Fields(1);
while (!$rs->EOF) { // loop through the result set from the mdb file
$field1Value    = $rs_fld0->value;
$field2Value    = $rs_fld1->value;
$rs->MoveNext();
// Do something with the data here, like put it into mysql, put it into a file, or print it to the screen
echo "the first field value is $field1Value, the second field value is $field2Value<br>";
}
$rs->Close();
$db_connection->Close();

You can read more about it in the PHP manual at http://us3.php.net/manual/en/function.odbc-connect.php

How to enable SOAP on a LAMP server

originally posted at http://www.goodfeelingplace.com/how-to-enable-soap-on-a-lamp-server
I wanted to use the MSN search API on a VPS (Virtual Private Server) I have setup for a client, but the existing CentOS 5 operating system only had PHP5.1.6 and I needed PHP5.2.1 (at least) with SOAP enabled for the search API to work. So I downloaded all my code files, created a dump of the MySQL database (I had to use mysqldump from the command line for this, as the DB was 147Mb, way over the limit for phpmyadmin to export) and downloaded the db dump. Then I re-imaged the server using the OpenSUSE 10.3 (with plesk) operating system, which I was told would have at least PHP5.2.4 on it by the 1and1 server support person I spoke with.

After the re-imaging process was done (it takes about an hour), I used plesk to easily recreate my accounts, domain, database, and scheduled job. Then I used winscp to upload my gzipped database dump and the code files, logged into the newly created db from the command line through ssh and used the command “source dbdump.sql” from a mysql prompt (and from the same directory as where I gunzipped the dbdump.sql file) to populate the large database in under a minute. I also had to change the ownership of my webroot and everything in it to “default”, since that’s the name of the user I have running my scheduled job which reads and writes to the server. One requirement of running php in safe mode is that scripts can only read and write to files/directories owned by the same user who is running the script. There’s probably a way for me to turn off the PHP safe mode but the workaround was easy so I haven’t messed with it.

Once I got everything put back together I checked the phpinfo page and I was pleased to see that I now had php5.2.6 running. But when I tried using MSN search it still didn’t work, because SOAP wasn’t enabled. Then I did a bunch of searching and found several people in similar situations but I didn’t find any good answers. I tried to update the php configuration using yum and apt-get, but those commands were not on my system. Then I found an RPM file (from http://rpm.pbone.net) that contained a soap.so file in it, but when I tried to install it I got the message “rpmlib(PayloadIsLzma) <= 4.4.2-1”, which I learned was some kind of dependency that wasn’t met by my system. I was starting to get frustrated and the kids were playing kind of loud behind me, so I decided it would be best to take a break and try to feel better about it.

So I turned around and saw all 5 of my kids jumping on the couch. It was our basement couch so I didn’t mind, but I began to feel better and grabbed my camera. I started taking pictures of them and showing them the pictures, which they enjoyed, so we all had a good time. Here’s one with all of them jumping on the couch:

It reminded me of the children’s book, Eight Little Monkeys Jumping on the Bed. The last page (after they all fall off and bump their head and the doctor says “No more monkeys jumping on the bed!”) has them all on the couch and says “But no one said anything about jumping on the couch!”. So later after putting the children to bed I was feeling better and got an idea.

I remembered reading a post which suggested going to the specific operating system home page to get the appropriate version of the rpm file needed, so I searched for the opensuse home page and found http://software.opensuse.org/search, where I just chose my operating system from the dropdown and searched for soap. Actually they have many operating systems there, including RHEL, CentOS, Debian, Ubuntu, openSUSE, Fedora, Mandriva, and SLES. From the results page I just chose the php5-soap rpm for my version of php (5.2.6) and my system (I’m using a 64 bit system so I chose the x86_64 link, I assumed the other link was for 32 bit systems), downloaded it to my PC, then uploaded it to my server, used gunzip to decompress it, and ran the “rpm -i php5-soap-5.2.6-35.6.x86_64.rpm” command to install soap. This file passed all the dependency checks, then put the soap.so file in the right place on my system, and modified my php.ini file. Then all I had to do was restart the apache service and my LiveSearch sample page (using SOAP to interact with the MSN search engine) worked! By the way, for dedicated servers and virtual private servers I’ve had good experiences with 1and1, their server department is usually very helpful and quick to respond to questions or concerns. For shared hosting, I prefer Blue Host (which I use for this site).

How to allow up and down arrows to navigate your form fields

originally posted at http://www.goodfeelingplace.com/how-to-allow-up-and-down-arrows-to-navigate-your-form-fields
I learned something pretty cool last week. I fielded a request from a client who wanted to be able to use the up and down arrows on his keyboard to navigate a rather large form on the site I created for him. I believed it was possible but I had never done it. So after doing some research I figured out how to do it. Below is a simplistic version (put your cursor in one of the text boxes below and push the up or down arrow keys to move between them):

1-1 1-2 1-3
2-1 2-2 2-3
3-1 3-2 3-3

The HTML for the simple table is as follows:

<table border="1">
<tr>
<td>1-1</td>
<td><input id="ItemCost11" onkeydown="return checkKey(event,3,3,'ItemCost',1,1)" maxlength="30" name="ItemCost11" size="3" /></td>
<td>1-2</td>
<td><input id="ItemCost12" onkeydown="return checkKey(event,3,3,'ItemCost',1,2)" maxlength="30" name="ItemCost12" size="3" /></td>
<td>1-3</td>
<td><input id="ItemCost13" onkeydown="return checkKey(event,3,3,'ItemCost',1,3)" maxlength="30" name="ItemCost13" size="3" /></td>
</tr>
<tr>
<td>2-1</td>
<td><input id="ItemCost21" onkeydown="return checkKey(event,3,3,'ItemCost',2,1)" maxlength="30" name="ItemCost21" size="3" /></td>
<td>2-2</td>
<td><input id="ItemCost22" onkeydown="return checkKey(event,3,3,'ItemCost',2,2)" maxlength="30" name="ItemCost22" size="3" /></td>
<td>2-3</td>
<td><input id="ItemCost23" onkeydown="return checkKey(event,3,3,'ItemCost',2,3)" maxlength="30" name="ItemCost23" size="3" /></td>
</tr>
<tr>
<td>3-1</td>
<td><input id="ItemCost31" onkeydown="return checkKey(event,3,3,'ItemCost',3,1)" maxlength="30" name="ItemCost31" size="3" /></td>
<td>3-2</td>
<td><input id="ItemCost32" onkeydown="return checkKey(event,3,3,'ItemCost',3,2)" maxlength="30" name="ItemCost32" size="3" /></td>
<td>3-3</td>
<td><input id="ItemCost33" onkeydown="return checkKey(event,3,3,'ItemCost',3,3)" maxlength="30" name="ItemCost33" size="3" /></td>
</tr>
</table>

The javascript function used is as follows:

<script type="text/javascript"><!--
function checkKey(event, numRecs, numCols, myName, myRow, myCol) {
  var key = event.keyCode;
  var bSetFocus = false;
  var myId = '';
// if the left arrow has been pressed and we're not in the first column
  if( key == 37) {
  if (myCol-1 > 0) {
  myId = myName + myRow + (myCol-1);
  bSetFocus = true;
  }
  }
  // if the up arrow has been pressed and we're not in the first row
  if( key == 38) {
  if (myRow-1 > 0) {
  myId = myName + (myRow-1) + myCol;
  bSetFocus = true;
  }
  }
  // if the right arrow has been pressed and we're not in the last column
  if( key == 39) {
  if (myCol+1 <= numRecs) {
  myId = myName + myRow + (myCol+1);
  bSetFocus = true;
  }
  }
  // if the down arrow has been pressed and we're not in the last row
  else if( key == 40 ) {
  if (myRow+1 <= numCols) {
  myId = myName + (myRow+1) + myCol;
  bSetFocus = true;
  }
  }
  if (bSetFocus) document.getElementById(myId).focus();
} // end checkKey function
// --></script>

In my code I generate the HTML using PHP so when I pass the number of records to the checkKey function (3 in this example) it is the number of rows returned from a database query, and the current row for any given row of the table is the counter in a loop I use to print the table.

By the way, if you want to allow for the left and right arrows to move left and right in a form, you can do that by expanding the checkKey function to look for key code 37 (left arrow) and key code 39 (right arrow), and you’ll need to implement a column ordering scheme (like the counter I used for the rows) so you’ll know where to take the cursor.

Four Direction Code

(submitted by Mr Arrows below, but the comment fields do not accept code so I put it in the main post. I actually had trouble putting two working code samples in the same post so I just updated the main example above to include the left and right arrows)

What if there are values in each field?

These two javascript functions need to be defined, which will tell you the index of the start of the current selection and the end of the current selection (which will be the same if nothing is selected and you just have a blinking cursor in the input box). They both take as input the same object (called o below) that is passed into checkKey as the last param (called myObj there):

function getSelectionStart(o) {
  if (o.createTextRange) {
  var r = document.selection.createRange().duplicate();
  r.moveEnd('character', o.value.length);
  if (r.text == '') return o.value.length;
  return o.value.lastIndexOf(r.text);
  } else return o.selectionStart;
}
function getSelectionEnd(o) {
  if (o.createTextRange) {
  var r = document.selection.createRange().duplicate();
  r.moveStart('character', -o.value.length);
  return r.text.length;
  } else return o.selectionEnd;
}

Then you need to define some variables inside the checkKey function, to get the current input value and the index of the cursor position within that value:

  var inputVal = myObj.value,
  startIndex = getSelectionStart(myObj);
  endIndex = getSelectionEnd(myObj);

I only use startIndex because it will be the same as endIndex when nothing is highlighted or selected in the input box.

Next you can use a simple regular expression (or any other string function you want) to see if there is any text before (for the left arrow condition) or after (for the right arrow condition) the cursor. For example, this is the code I use when when the left arrow has been pressed:

  var testBC = inputVal.substring(0, startIndex);
  var blankRE=/^\s*$/;
  //alert("beforeCursor is '"+testBC+"');
  // if the regular expression test passes, then there is nothing to the left and we can move cells
  if (blankRE.test(testBC)) {
  var nextField = document.getElementById(columns[colIndex-1]+String(myRow));
  nextField.focus();
  //If we are moving into a select dropdown, restore the previously selected value half a second after it switches to the first or last option (I don't know how to prevent that, so this is my workaround):
  if (nextField.tagName=="SELECT") {
  var cur_ind = nextField.options.selectedIndex;
  var cur_val = nextField.options[nextField.options.selectedIndex].value;
  var cur_text = nextField.options[nextField.options.selectedIndex].text;
  setTimeout(function(){nextField.options.selectedIndex=cur_ind}, 5);
  }
  return false;
  }

Source Control Feels Good

originally posted at http://www.goodfeelingplace.com/source-control-feels-good
Today I want to sing praises for source control.  I was working on an inventory application this morning, putting a few final touches on the display of the data to make it more crisp and clean looking.  In this application I used the extjs grid view library (see www.extjs.com for more info) to display the data so the columns could be resized dynamically and it would look and act more like excel than a web page.  Unfortunately something I did this morning caused the column headers to get left justified and lose connection with the centered data, and I lost the scroll bars even though the data was bigger than the box (div) containing it.

After digging myself in a hole and getting angry and frustrated, my wise wife Candice convinced me to take a break and have lunch.  After she helped me move up the emotional scale a bit (and I had a good lunch), I was able to figure out why everything went crazy – it was because I took out my carefully calculated container sizing code and set the width and height to “auto”.  But at the time I was so excited to find a way to make the size exactly match the contents that I had deleted my calculations (eight different calculations, 2 for each of 4 different data grids).  This is where source control came to my rescue.  I use bazaar (see http://bazaar-vcs.org/) to save changes to my code, so once I figured out (based on my log entries) which revision I had deleted the code that I needed, I typed this simple command to create a text file with the code that I had deleted from 3 different files:

bzr diff -r156..157>heights.txt

(bzr diff shows the differences between versions, -r156..157 specifies the two versions I want to see the differences in, and >heights.txt causes a text file called heights.txt to be created in the current directory with the results of the request)

Then I just opened up the created text file and copied over the code that I wanted to put back in my current files, and created a new version with a command like this:

bzr commit -m “restored column header centering and scrollbars by setting the div height and width explicitly using calculations based on the number of records to display”

This whole procedure took a few minutes and felt great, because it saved me an hour of recreating the code while kicking myself for deleting it.  I love source control 🙂