Importing County Voting Records Into Advokit - by John Barta, Morro Bay , California
Advokit works with many tables and fields of data within those tables. The Advokit system is highly structured. Unfortunately, local voter files come in all shapes, sizes, and qualities. So, you will first use your own computer to put that information into shape before it can be imported into Advokit. The process involves getting your local data and then your conversion of that data into a "comma separated values" (CSV) file that is usable by Advokit. The county data must be cleaned up and often parsed in order for Advokit to be able to convert and import it without error into the Advokit system data file which is a MySQL file. You upload your "cleaned up" CSV file to the proper location on the web server where it can be acted upon. The actual importation of data from the CSV file to the MySQL file is done on the web server using a perl script called "import.pl" which is part of the Advokit utilities.
The basic steps in the process are as follows:
1. Obtain the voter file from your registrar of voters. In our case the county provided the file on a CD in a "tab delimited" format. You may also wish to get from the county a printout or other copy of the file structure. This will contain valuable information when you start playing with the data on your personal computer. Sometimes the county information is kept in more than one data table. For example the voter information may not be in the same table as the precinct and district information. If you can get the district and precinct information table from the county you should do that to as it will be useful when later figuring out which candidates are running in which precincts within that jurisdiction. In our case the county voter file only contains information about which local precinct the voter is located in. The information about which districts contain that precinct is contained in a separate file. Keep a friendly relationship with your local voter registrar, it may pay dividends down the road.
2. Study the structure of the local voter file. You should try to understand what the information in each (of the about 60) fields represents. In our case I was able to speak to the "computer guy" at the county office and he was quite helpful in my understanding the purposes of each field in the file. Make a written description of the structure of the county file and each field that it contains. Explain what each field is used for. Once you look around at the data within the file, you will have a much better idea of how it is designed to work. You may find that some fields have no data inside at all. You may also find that two similar-sounding fields actually are not 100 percent the same data. This a good time to ask questions of your registrar because you may make a mistake if you rely on one field and later learn that the data is not exactly what you thought it was.
3. Study the fields that are requested in the Advokit mapping utility. You will need to know what Advokit is looking for so you can make the best of your local voter file data before trying to import it into Advokit.
The fields that Advokit will ask you about for its use in the "_voter", "_residence", "_mailaddress", and "_respolis" tables are as follows:
As you study the Advokit field information compare it to what your local voter file is offering. It is highly likely that you will discover that you'll have to make some changes to your local voter file data in order to use it with Advokit. For example, our local data has a single field for birthdate. This will need to be turned into 3 fields of data – birthyear, birthmonth, and birthdayofmonth. Advokit turns these three fields into the simple birthdate field itself. Also, it turns out that Advokit only stores party affiliation as a single letter – the first letter shown in the CSV field for party affiliation. Locally we have two affiliations that begin with the letter "D" Democrats (DEM) and "Declined to State" (DTS). So, we better change the DTS data in the local file to some other letter or we'll have an inaccurate picture of the number of Democrats in our county. Anyway, there will very likely be a number of opportunities to optimize the local data to make things work better for Advokit.
When you feel confident about what you need to do to your local data you should be able to list what massaging you will want to do to the local voter data before sending it off to Advokit.
4. Clean the county file. This is not the massaging that was mentioned above. It is the process of finding out-of-place or troublesome characters in the local data file. For example, local data containing double-quote (") could be extremely troublesome during the CSV process since the double-quote character is used extensively in that process. The advokit documentation has a link to the CSV standard which is useful. What I discovered is that if I cleaned out the double-quotes, single-quote (apostrophe)('), percent (%) character, semicolon(;), etc. I did not have a problem later. The CVS standard allows quite a number of things to remain in the file, but why take a chance on the odd stuff since it is easy to clean and rare or non-existent anyway?
Open the county file on your personal computer. In our case I used Microsoft Word to load the file. Our file was about 50 megabytes at that point in time and so the operations were a little slow at first. Since the tab character was only used as a field delimiter I was able to search through all fields of the entire file to look for "dirty" data – characters that might cause problems later in the process. The one thing that will for sure cause problems is the double-quote character " The other things I searched to eliminate were the tilde~, the backward apostrophe `, the semi colon; and the apostrophe '. The Advokit help files point you to a CSV standard which will show that most of these are okay, but I didn't want to take chances on how they might be handled by MySQL after being imported into Advokit, so I removed them from the file, too. Once I finished cleaning the file in MS Word I saved it for use by my local database program. In my case, I prefer to use good old Dbase for data manipulation. You may want to use some other program – Access or a local copy of MySQL, for example. Just make sure you save it in the correct form so that it can be imported by your data base program. In my case I saved it in Word as a "Text-only" file that was comma-delimited (with double-quotes around the data for each field).
[Note on how you clean your county file – Use your choice of programs to accomplish that goal, but beware that your early choices can create issues later, if you aren't careful. For example, I could import a tab-delimited county file directly into MS Access but Access will make certain assumptions about the type of data that is being imported – fields that always have numbers in them will be assumed to be integers and not text, for example. So, if you are importing into Access, you need to specify that all fields are text fields and not other data types (date and time, integer, floating, etc.) before or during the importation into Access. Access will not export the data later to a workable CSV file if the fields are not defined as text going in.]
5. Massage/Parse the clean local data file. Once you have loaded your local data into your personal database manipulation program you can start optimizing it for use by Advokit. This is where you will make the changes you planned for at the end of Step 3, previously. So, go in – split the birthdates into three fields, makes adjustments to the party affiliation codes, etc. This is the step where you will be able to add some real value to the county data by good parsing or concantation, etc. If you plan to add additional fields of data, this is the time to do it. In my Dbase case I saved my successful command lines in a file. I was able to then turn this file into a Dbase program script. So, the next time I need to massage the county file I'll be able to save much time by just running the Dbase program. If you use Access to do the local data work – be sure that you end up with all of the data in text fields – otherwise you will not end up with a usable CSV file.
[Note on the CSV standard – the standard calls for each field of data to be separated from the next field with a comma. If the data within the field is text data it is supposed to be surrounded by double quotes. If it is integer data there are no double quotes around the data. Here's the rub – The Advokit import utility wants to receive all of the data as text, regardless of its final disposition. So, a usable CSV file will be in the form of "field1data","field2data","field3data","field4data" etc. If you create a (perfectly legal) CSV file that looks like this: "field1data","field2data",field3data,"field4data" you will have problems (note that field3data is not surrounded by quotes because it was presumably an integer value). You need to be careful to export the data from your database program as all text with no integer, numeric, etc data types in it.]
6. Do final preparations for loading the CSV file onto the server. Make sure that the first record in the CSV file contains all of the field names in CSV format, i.e. "fieldname1","fieldname2","fieldname3", etc. Also, make sure that there are no extra carriage returns at the end of the last record – it may cause problems at the end of the import process. Load the CSV file onto the server. In my case I used a freeware FTP program to upload it into the appropriate directory (/voterfiles/). My file was 80 MB and this took about 45 minutes to upload to the server on my cable connection. It would have taken considerably longer on a dialup connection.
7. Download and alter the import.pl file Use your text file editor to include the information which may be unique to your situation- near the top of the file you will add appropriate information to the
my $CFG_DBPORT= 3306
my CFG $DBNAME='yourAdvokitDBDSN'
my $CFG_user='root'
my $CFG_DBPASS='yourMySQL_DBpassword'
my $CFG_DBPREFIX='ak'
lines. (Several of these parameters are unlikely to change. I.e. 'localhost', 3306, 'root' are examples of information that is not likely to change).
When you have completed editing the import.pl file – save it (it's a text file) and upload it to the server.
8. Mapping your data to the Advokit standard MySQL data. Log into Advokit as a Team Leader. This should have been set up previously from an administrative level. When you are logged in as a Team Leader you will see an option on the left menu bar to "Upload Voterfile". Take the link and follow your nose. You will first identify the CSV file that you uploaded to the server. When you click the "proceed" button you will be presented with a mapping utility. On the left side will be a list of all of the fieldnames from your CSV file. (Folks who have more fields will have a longer list.) This is when your knowledge about the Advokit fields will come in very handy. Using the selection lists in each column you can show the connection between specific fields in your CSV datafile and the fields that Advokit has arranged. When you are done you will click your way through (You'll have to confirm a check box along the way). This is also where you can add some custom fields if your CSV file has them in it. When you have completed the mapping process Advokit will have created a text file with the extension ".map" in the /voterfile/ directory where you previously had the CSV file and the import.pl file.
[Note on Mapping file: If you take a look at your map file you will see that all Advokit did was to create a CSV list of several columns – one column for each of the four tables and an extra one for the custom fields if you added them. Here's the first row of that file in my case:
"csv_column","respolis_column","residence_column","mailaddress_column","voter_column","custom_column"
what follows is a whole list of every field in the CSV file followed by the various Advokit fieldname(s) that correspond to that field in the order shown above. Thus, if you were a careful text editor of your mapping file, you could bypass the Mapping utility]
9. Time to Rock and Roll. All of the pieces are now in place to do the import itself. This is best done from the command line of the server by the server administrator due to permissions issues and server security policies which might interfere or make impossible your executing the command. It may be possible to arrange a Cron job to do this, if you have the appropriate permissions on the server. In our case we were on a shared hosting server with many other accounts. So, for simplicity, we emailed the system administrator and asked them to run the perl command from the server's command line. We gave them the exact command syntax to make it easier for them to just cut and paste the command onto their command line. The basic syntax is:
perl import.pl <mappingfilename.map> <csvfilename.csv>
(note the three spaces in this commandline)
In practice, the command can be quite long since it needs to have path information in it. Ours (ignore the wrapping, but not the spaces) was:
/usr/bin/perl /home/httpd/vhosts/ourdomain.org/aboverootdirectory/advokit/voterfiles/import.pl /home/httpd/vhosts/ourdomain.org/aboverootdirectory/voterfiles/SLOcsvTestSnippet060205.csv-2006-02-05-17-54-16.map /home/httpd/vhosts/ourdomain.org/aboverootdirectory/voterfiles/SLOvotersAll060206.csv
That's It! The import.pl script will do its thing and report what is/has happened on the system. This may take some time, so be patient. I am told that really long files can take some time to handle (ones in the hundreds of thousands of voter and up category) If all goes well you will be in business with voter data at that point. Our file had 150,000 records in it and it took about 50 minutes. (There were no complaints about how long it took from the system administrator, so we assume it was a background task. We also asked him to run it in the middle of the night)
10. Examine the results. When you have been notified of the outcome of the import.pl script, log on to the server and open phpMyAdmin where you can look at the data and all of the Advokit tables. You should see the same number of voters in the "_voters" table as your CSV file if this is your first upload. Note that the number of records in the "_residence" "_mailaddress" and "_respolis" tables will be a lesser number. This is because Advokit only stores one record in "_residence" for each house. Thus, because there are often several voters at the same address – that address will only appear once in the "_residence" table. Make sure the data is in the right places. Congratulations!
