Primary keys: Natural versus Surrogate Keys

From Robs_Wiki
Jump to: navigation, search

Introduction

When you design tables with SQL Server, a table typically has a column or a number of columns that are known as the primary key. The primary key is a unique value that identifies each record. Sometimes the primary key is made up of real data and these are normally referred to as natural keys, while other times the key is generated when a new record is inserted into a table. When a primary key is generated at runtime, it is called a surrogate key. A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values.

  • Natural Keys: A natural key is a single column or set of columns that uniquely identifies a single record in a table, where the key columns are made up of real data, data that has meaning and occurs naturally in the world of data. A natural key is a column value that has a relationship with the rest of the column values in a given data record. Some examples of natural keys values are: Social Security Number, ISBN, and TaxId.
  • Surrogate Keys: A surrogate key like a natural key is a column that uniquely identifies a single record in a table. But this is where the similarity stops. Surrogate keys are keys that don’t have a natural relationship with the rest of the columns in a table. The surrogate key is just a value that is generated and then stored with the rest of the columns in a record. The key value is typically generated at run time right before the record is inserted into a table. It is sometimes also referred to as a dumb key, because there is no meaning associated with the value. Surrogate keys are commonly a numeric number.

What to use as your Primary Key: A natural key or a surrogate key?

The opinions on this question vary with about 50% of the designers choosing the first and 50 % of the designers choosing the latter. Let's see the pro's and con's of each option:

Natural Key as Primary Key : Pros

  • Will require less joins when you only need to return the key value of a foreign key table. This is because the natural key will already be imbedded in your table.
  • Easier to search because natural keys have meaning and will be stored in your table. Without the natural key in your table, a search for records based on a natural key would require a join to the foreign key table to get the natural key.

Natural Key as Primary Key : Cons

  • Almost always, more columns to join on. If Table B is a detail table of master A and C is a detail to B and D is a detail to C, you will need at least 3 columns to join D to C in an SQL query. Als, since natural keys are typically larger in size then surrogate keys and are strings instead of integers joins between two tables on a natural key will take more time.
  • Your primary key index will be larger because natural keys are typically larger in size then surrogate keys.
  • If your uniqueness of a table changes (Ex: the number of Columns making your Table unique changes from 2 to 3), with Natural Keys, all of your SQL (Stored Procedures, Reports, Views, SQL Scripts, Application Code) will have to be re-written and your foreign keys relating to that table will have to be changed. With Surrogate Keys – usually as simple as modifying that table.
  • If the data type of a Primary Key column changes (Ex: you used a varchar(20), now it's not big enough and has to be changed to varchar(100)), with Natural Keys, all Foreign Keys related to that Table will have to be changed (may also impact SQL Code). With Surrogate Keys – usually as simple as modifying that table.
  • The Classic: a value changes in the Primary Key (Like Last Name). Now you've got to update that in every Foreign Key. Which means you'll have a big headache when you have to temporarily drop the constraint(s).


In sum, natural keys may be appropriate then you have small,rarely changing, unique values (such as lookup tables),while surrogate keys are desirable for large, highlychanging data key columns with duplicate key values (like"people" tables).

Surrogate Key : Pros

  • When two tables are joined by a surrogate key there is an immediate loss of semantic information which would exist if natural keys were used instead. For example, what's the "real" relationship between the two tables and which of the "real" columns are involved in this relationship?
  • No business logic in key so no changes based on business requirements. For example, if the Employee table above used a integer surrogate key you couldsimply add a separate column for SIN if you added an office in Canada (to beused in place of SSN)
  • Less code if maintaining same key strategy across all entities. Forexample, application code can be reused when referencing primary keys if theyare all implemented as a sequential integer.
  • Better performance since key value is smaller. Less disk IO is requiredon when accessing single column indexes.
  • Surrogate key is guaranteed to be unique. For example, when movingdata between test systems you don't have to worry about duplicate keyssince new key will be generated as data is inserted.
  • If a sequence used then there is little index maintenance required sincethe value is ever increasing which leads to less index fragmentation.

Surrogate Key : Cons

  • Extra column(s)/index for surrogate key will require extra disk space
  • Extra column(s)/index for surrogate key will require extra IO when insert/updatedata
  • Requires more table joins to child tables since data has no meaning on itsown.
  • Can have duplicate values of natural key in table if there is no other uniqueconstraint defined on the natural key
  • Difficult to differentiate between test and production data. For example,since surrogate key values are just auto-generated values with no business meaningit's hard to tell if someone took production data and loaded it into atest environment.
  • Key value has no relation to data so technically design breaks 3NF
  • The surrogate key value can't be used as a search key