Program in GIS

GEOG 5223 Project 4:
Database Design

Brenton White
 

 

Proposed Design Summary

In order to:

  • Minimize use of disk storage space

  • Lessen the effort required of the summer intern to enter the data

  • Decrease the chance of data entry mistakes

  • Allow for easy and efficient querying within ArcMap

I decided to have one table for voter registration information and two lookup tables, one for ethnicity and one for party, even though I used coded values for the ethnicity and party ID fields.

Voter Registration Table (VoterReg)

Name Key Type Length/Precision
VoterID PK Long Integer 5
PinA   Text 3
PinB   Text 3
PinC   Text 3
EthnicityID FK Short Integer 2
Party FK Short Integer 2

Table 1: Voter registration table attributes.

Ethnicity Table (Ethnicity)

Name Key Type Length/Precision
EthnicityID PK Short Integer 2
Ethnicity   Text 15

Table 2: Ethnicity lookup table attributes.

Party Table (Party)

Name Key Type Length/Precision
PartyID PK Short Integer 2
Party   Text 10


Table 3: Party lookup table attributes.

Source:  Brenton White ArcMap Project 4 data.

Tables 1, 2 and 3 summarize the attributes of the three tables that I decide on to meet the objectives of the database.  I broke up the PIN number into three fields to keep from having a field with multiple values and to allow easy searches based on any of the three numbers (assuming that the numbers have some significant  meaning, such as geographic district, sub district, or the like).  I did use coded values for the EthnicityID and PartyID fields in the VoterID table to minimize data entry errors, and I included the lookup table as an example of what I would need if I didn't use coded values.

Why I Used the Attributes I Used

VoterReg Table

VoterID—I chose Long Integer for this field under the assumption that the values would not exceed five digits, nor 99999 in value.  Since I made this field a number, I didn't want to use the Short Integer since it would only allow values up to 32,767.  I gave the field a Length of 5 to conserve disk space and account of the values up to 99999.  I made this the Primary Key since there should only be one VoterID value for each person (as opposed to PIN, for which there could be multiple voters living on a particular parcel).  I did not allow NULL values since each voter MUST have a voter ID.  I also did not specify a Default Value as each value in this field is unique.

PinA
—I chose Text for this field with a Length of 3.  There did not seem to be a reason to have this be an integer since it didn't appear we'd need to do any arithmetic operations on the field and the third element of the PIN was alpha-numeric.  I did not allow NULL values as each person has to live somewhere.  I did not specify a Default Value since there are a minimal number of duplicate values. 

PinB
—I chose Text for this field with a Length of 3.  There did not seem to be a reason to have this be an integer since it didn't appear we'd need to do any arithmetic operations on the field and the third element of the PIN was alpha-numeric.  I did not allow NULL values as each person has to live somewhere.  I did not specify a Default Value since there are a minimal number of duplicate values.

PinC
—I chose Text for this field with a Length of 3.  There did not seem to be a reason to have this be an integer since it didn't appear we'd need to do any arithmetic operations on the field and the third element of the PIN was alpha-numeric.  I did not allow NULL values as each person has to live somewhere.  I did not specify a Default Value since there are a minimal number of duplicate values.

EthnicityID
—I chose Short Integer with a Length of 2 since the values for this field will not exceed 99.  This is a Foreign Key to join with the EthncityID field in the Ethnicity table.  I did not allow NULL values as the sample table had ethnicity identified for all voters and the information is important for the purpose of querying the data.  I could have specified a default value since "white" appeared to be the most common, but I did not specify a Default value to ensure that the operator had to select the ethnicity of the individual.  Also, there is a bias toward the most common ethnicity if it is the default and forgetting to change the default will cause mistaken identity (call me politically correct if you wish!). 

PartyID
—I chose Short Integer with a Length of 2 since the values for this field will not exceed 99.  This is a Foreign Key to join with the PartyID field in the Party table.  I did not allow NULL values as the sample table had party identified for all voters and the information is important for the purpose of querying the data.  I could have specified a default value since "Mugwumps," for example, appeared to be common, but I did not specify a Default value to ensure that the operator had to select the party of the individual. 

Ethnicity Table

EthnicityID—I chose Short Integer with a Length of 2 since the values for this field will not exceed 99.  This is the Primary Key since it is unique and the code for the ethnicity.  I did not allow NULL values as the primary key cannot have null values.  I did not specify a Default value since this is a look up table and the values are different for each record.

Ethnicity—I chose Text of Length 15 since the longest description had less than 15 characters and if I were to add new descriptions, I just abbreviate to fit within 15 characters.  I did not allow NULL values as each record should indicate an ethnicity.  I did not specify a Default value for the same reason as the Null value.

Party Table

PartyID—I chose Short Integer with a Length of 2 since the values for this field will not exceed 99.  This is the Primary Key since it is unique and the code for party.  I did not allow NULL values as the primary key cannot have null values.  I did not specify a Default value since this is a look up table and the values are different for each record.

Party—I chose Text of Length 10 since the longest description had less than 10characters and if I were to add new descriptions, I just abbreviate to fit within 10 characters.  I did not allow NULL values as each record should indicate an ethnicity.  I did not specify a Default value for the same reason as the Null value.

A word about why I didn't allow NULL values:  I could have allowed null values for the ethnicity and party fields in the VoterID table, but specifically chose not to because the purpose of a database it to store data—good data.  Null values just allow folks to be lazy and not fill in the data.  It is possible that for some records the ethnicity or party is not known and not allowing a null value might be a problem.  For this case, I would add a "Not Known" record to my ethnicity and party tables to ensure that something is recorded.

How I Adhered to Each of the Tenets

Minimize use of disk storage space

I minimized the use of disk storage space by keeping the tables to a minimum, reducing the size of redundant data (ethnicity and party ID fields), and specifying each field with an appropriate type and length.  I must admit that I am not sure if a Long Integer with length of 5 uses less data than Text of length 5, but I still chose to have the VoterID field be an integer instead of text since it was a number and that seemed most familiar. 

Lessen the effort required of the summer intern to enter the data

Did I lessen the effort of the summer intern?  Not really, since I didn't specify any default values.  I could (should?) have defaulted ethnicity on "White" since that APPEARED to be the most common, but the ACCURACY of the data was more important to me and, I believe, forcing the intern to make a selection was better for the goals of the database.

I did help the intern by keeping the design simple and providing coded values for the ethnicity and party fields.

Decrease the chance of data entry mistakes

The coded values for the ethnicity and party fields kept the intern from having to remember the code for each entry and, thus, should have decreased the chance of errors.  Also, splitting the PIN into three fields and not having them type the period (.), I believe, also helped to lower the chance of errors by removing two additional keystrokes per record.

Allow for easy and efficient querying within ArcMap

Whether using the coded values or joined lookup values, the design allows for a simple query to select all records of a particular ethnicity, race, PIN, etc.  For example, it is trivial to find all Hispanic voters who are Whigs.  Assuming that the values in the PIN have significance, I've also made it very easy to identify voters by the elements of the PIN.
 

Sources
 

GEOG 5223: Project 4: Database Design. August 2004.

 


This document is published in fulfillment of an assignment by a student enrolled in an educational offering of The Pennsylvania State University. The student, named above, retains all rights to the document and responsibility for its accuracy and originality.