|
Proposed Design Summary
In order to:
-
Minimize use of disk storage
space
-
Lessen the effort required of
the summer intern to enter the data
-
Decrease the chance of data
entry mistakes
-
Allow for easy and efficient
querying within ArcMap
I
decided to have one table for voter registration information and two lookup
tables, one for ethnicity and one for party, even though I used coded values for
the ethnicity and party ID fields.
Voter Registration Table (VoterReg)
| Name |
Key |
Type |
Length/Precision |
| VoterID |
PK |
Long Integer |
5 |
| PinA |
|
Text |
3 |
| PinB |
|
Text |
3 |
| PinC |
|
Text |
3 |
| EthnicityID |
FK |
Short Integer |
2 |
| Party |
FK |
Short Integer |
2 |
Table
1: Voter registration table attributes.
Ethnicity Table (Ethnicity)
| Name |
Key |
Type |
Length/Precision |
| EthnicityID |
PK |
Short Integer |
2 |
| Ethnicity |
|
Text |
15 |
Table 2: Ethnicity lookup table attributes.
Party Table (Party)
| Name |
Key |
Type |
Length/Precision |
| PartyID |
PK |
Short Integer |
2 |
| Party |
|
Text |
10 |
Table 3: Party lookup table attributes.
Source: Brenton White ArcMap Project 4 data.
Tables 1, 2 and 3 summarize the attributes of the
three tables that I decide on to meet the objectives of the database. I
broke up the PIN number into three fields to keep from having a field with
multiple values and to allow easy searches based on any of the three numbers
(assuming that the numbers have some significant meaning, such as
geographic district, sub district, or the like). I did use coded values
for the EthnicityID and PartyID fields in the VoterID table to minimize data
entry errors, and I included the lookup table as an example of what I would need
if I didn't use coded values.
Why I Used the Attributes I Used
VoterReg Table
VoterID—I chose
Long Integer for this field under the assumption that the values
would not exceed five digits, nor 99999 in value. Since I made this
field a number, I didn't want to use the Short Integer since it would only
allow values up to 32,767. I gave the field a Length of 5 to
conserve disk space and account of the values up to 99999. I made this
the Primary Key since there should only be one VoterID value for each
person (as opposed to PIN, for which there could be multiple voters living
on a particular parcel). I did not allow NULL values since each
voter MUST have a voter ID. I also did not specify a Default Value
as each value in this field is unique.
PinA—I chose Text for this field with a Length of 3.
There did not seem to be a reason to have this be an integer since it didn't
appear we'd need to do any arithmetic operations on the field and the third
element of the PIN was alpha-numeric. I did not allow NULL
values as each person has to live somewhere. I did not specify a
Default Value since there are a minimal number of duplicate values.
PinB—I chose Text for this field with a Length of 3.
There did not seem to be a reason to have this be an integer since it didn't
appear we'd need to do any arithmetic operations on the field and the third
element of the PIN was alpha-numeric. I did not allow NULL
values as each person has to live somewhere. I did not specify a
Default Value since there are a minimal number of duplicate values.
PinC—I chose Text for this field with a Length of 3.
There did not seem to be a reason to have this be an integer since it didn't
appear we'd need to do any arithmetic operations on the field and the third
element of the PIN was alpha-numeric. I did not allow NULL
values as each person has to live somewhere. I did not specify a
Default Value since there are a minimal number of duplicate values.
EthnicityID—I chose Short Integer with a Length of 2
since the values for this field will not exceed 99. This is a
Foreign Key to join with the EthncityID field in the Ethnicity table.
I did not allow NULL values as the sample table had ethnicity
identified for all voters and the information is important for the purpose
of querying the data. I could have specified a default value since
"white" appeared to be the most common, but I did not specify a Default
value to ensure that the operator had to select the ethnicity of the
individual. Also, there is a bias toward the most common ethnicity if
it is the default and forgetting to change the default will cause mistaken
identity (call me politically correct if you wish!).
PartyID—I chose Short Integer with a Length of 2 since
the values for this field will not exceed 99. This is a Foreign Key
to join with the PartyID field in the Party table. I did not allow
NULL values as the sample table had party identified for all voters and
the information is important for the purpose of querying the data. I
could have specified a default value since "Mugwumps," for example, appeared
to be common, but I did not specify a Default value to ensure that
the operator had to select the party of the individual.
Ethnicity Table
EthnicityID—I
chose Short Integer with a Length of 2 since the values for
this field will not exceed 99. This is the Primary Key since it
is unique and the code for the ethnicity. I did not allow NULL
values as the primary key cannot have null values. I did not
specify a Default value since this is a look up table and the values are
different for each record.
Ethnicity—I chose Text of Length 15 since the
longest description had less than 15 characters and if I were to add new
descriptions, I just abbreviate to fit within 15 characters. I did
not allow NULL values as each record should indicate an ethnicity.
I did not specify a Default value for the same reason as the Null
value.
Party Table
PartyID—I chose Short
Integer with a Length of 2 since the values for this field will
not exceed 99. This is the Primary Key since it is unique and
the code for party. I did not allow NULL values as the primary
key cannot have null values. I did not specify a Default value
since this is a look up table and the values are different for each record.
Party—I chose Text of Length 10 since the longest description
had less than 10characters and if I were to add new descriptions, I just
abbreviate to fit within 10 characters. I did not allow NULL
values as each record should indicate an ethnicity. I did not
specify a Default value for the same reason as the Null value.
A word about why I didn't allow
NULL values: I could have allowed null values for the ethnicity and party
fields in the VoterID table, but specifically chose not to because the purpose
of a database it to store data—good data. Null values just allow folks to
be lazy and not fill in the data. It is possible that for some records the
ethnicity or party is not known and not allowing a null value might be a
problem. For this case, I would add a "Not Known" record to my ethnicity
and party tables to ensure that something is recorded.
How I Adhered to Each of the Tenets
Minimize use of disk storage
space
I minimized the use of disk
storage space by keeping the tables to a minimum, reducing the size of
redundant data (ethnicity and party ID fields), and specifying each field
with an appropriate type and length. I must admit that I am not sure
if a Long Integer with length of 5 uses less data than Text of length 5, but
I still chose to have the VoterID field be an integer instead of text since
it was a number and that seemed most familiar.
Lessen the effort required of
the summer intern to enter the data
Did I lessen the effort of the
summer intern? Not really, since I didn't specify any default values.
I could (should?) have defaulted ethnicity on "White" since that APPEARED to
be the most common, but the ACCURACY of the data was more important to me
and, I believe, forcing the intern to make a selection was better for the
goals of the database.
I did help the intern by
keeping the design simple and providing coded values for the ethnicity and
party fields.
Decrease the chance of data
entry mistakes
The coded values for the
ethnicity and party fields kept the intern from having to remember the code
for each entry and, thus, should have decreased the chance of errors.
Also, splitting the PIN into three fields and not having them type the
period (.), I believe, also helped to lower the chance of errors by removing
two additional keystrokes per record.
Allow for easy and efficient
querying within ArcMap
Whether using the coded values
or joined lookup values, the design allows for a simple query to select all
records of a particular ethnicity, race, PIN, etc. For example, it is
trivial to find all Hispanic voters who are Whigs. Assuming that the
values in the PIN have significance, I've also made it very easy to identify
voters by the elements of the PIN.
|