In my last post, I talked about GUIDs and if they were good or bad items to use for Primary Keys, today, lets talk about when we would want to use a GUID.
Usages
So, why would someone want to use a GUID compared to a integer primary key? I can think of a couple of good examples for this…
When to Use
Large Franchises (AKA Big Data)
Well, lets assume that we are working for a project with a “large” franchise basis (hypothetically like either a sub-shop or hotel), and lets also assume that each of the store fronts have to send in a daily accounting of what they have sold as a summary (For sub-shop, how may of each type of item and for the hotel, how many rooms and room occupants such as “12 rooms with 2 occupants were rented today”). Now, I have no problem envisioning a solution that would take the data from a file and do an individual import of that data into a table that contained integer values for primary keys, and linking that to a specific record for each store. To me, this would look something like this below.
Integer Possible
When I go to buy something from a certain sub-shop they usually make my sandwich and then I go over to the register where they ring me up with all of the items that I purchase (such as a foot-long roast beef sandwich, a bag of chips, a large drink, a cookie and a milk) and they will usually mark it as 1 – 12″ roast beef combo (LG), 1 – cookie, 1 milk. At the end of the day, the store owner will want to look at all of the sales for the day, but he probably won’t want to look at each individual sale, he will want to look at a summary, such as X – 12″ Roast Beef Combo (LG), Y – 12″ Roast Beef Combo (MD), Z – 12″ Roast Beef Combo and so on. As such, I am expecting that the table structure would look something like this…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
CREATE TABLE Products( ProductID INT NOT NULL, ProductDescription VARCHAR(50), ProductTaxable BIT NOT NULL, CONSTRAINT PK_Products PRIMARY KEY CLUSTERED( ProductID ASC ) ) CREATE TABLE DailySummary( DailySummaryTransactionID INT NOT NULL, DailySummaryProductID INT NOT NULL, DailySummaryQuantity INT NOT NULL, DailySummaryTransmitted BIT NOT NULL DEFAULT(0), CONSTRAINT PK_DailySummary PRIMARY KEY CLUSTERED( DailySummaryTransactionID, DailySummaryProductID ), CONSTRAINT FK_DailySummary_Products FOREIGN KEY( DailySummaryProductID ) REFERENCES Products ( ProductID ) ) CREATE TABLE POSSales( POSTransactionID INT NOT NULL, POSTransactionDate DATETIME NOT NULL, POSSubTotal NUMERIC(5, 2) NOT NULL, POSTax NUMERIC(5, 2) NOT NULL, CONSTRAINT PK_POSSales PRIMARY KEY CLUSTERED( POSTransactionID ) ) CREATE TABLE POSDetails( POSDetailsPOSTransactionID INT NOT NULL, POSDetailsID INT NOT NULL, POSDetailsQuantity INT NOT NULL, POSDetailsProductID INT NOT NULL, POSDetailsPrice NUMERIC(5, 2) NOT NULL, POSDetailsTaxed BIT DEFAULT(0), POSDetailsDailySummaryTransactionID INT, CONSTRAINT PK_POSDetails PRIMARY KEY CLUSTERED( POSDetailsPOSTransactionID ASC, POSDetailsID ASC ), CONSTRAINT FK_POSDetails_POSSales FOREIGN KEY( POSDetailsPOSTransactionID ) REFERENCES POSSales ( POSTransactionID ), CONSTRAINT FK_POSDetails_Products FOREIGN KEY( POSDetailsProductID ) REFERENCES Products ( ProductID ), CONSTRAINT FK_POSSales_DailySummary FOREIGN KEY( POSDetailsDailySummaryTransactionID ) REFERENCES DailySummary ( DailySummaryTransactionID ) ) |
As the cashier starts ringing me up, they create a POSSales record, and for each item that they add to the order becomes a new record within the POSDetails table. At the end of the day the boss clicks a button that will go through all of the POSSales Records for a specific day and summarize the POSDetails into the DailySummary table and then update the POSDetails POSDetailsDailySummaryTransactionID to be the new record. After they have reviewed it, they press another button that will transmit the details to the corporate office which will then assimilate those summaries to their own system.
When the Corporate Folder receives this file they then insert their data to their database, which I can assume to look like this…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
CREATE TABLE Stores( StoreID INT NOT NULL, StoreName VARCHAR(50) NOT NULL, StoreAddress VARCHAR(50) NOT NULL, StoreCity VARCHAR(50) NOT NULL, StoreState VARCHAR(2) NOT NULL, StoreZipCode VARCHAR(9) NOT NULL, CONSTRAINT PK_Stores PRIMARY KEY CLUSTERED( StoreID ASC ) ) CREATE TABLE Products( ProductID INT NOT NULL, ProductDescription VARCHAR(50), ProductTaxable BIT NOT NULL, CONSTRAINT PK_Products PRIMARY KEY CLUSTERED( ProductID ASC ) ) CREATE TABLE SandwichesSoldPerDay( SandwichTransactionID INT NOT NULL, SandwichStoreID INT NOT NULL, SandwichProductID INT NOT NULL, SandwichTransactionDate DATE NOT NULL, SandwichQuantitySold INT NOT NULL, CONSTRAINT PK_SandwichesSoldPerDay PRIMARY KEY CLUSTERED( SandwichTransactionID ASC, SandwichProductID ASC ), CONSTRAINT FK_SandwichesSoldPerDay_Stores FOREIGN KEY( SandwichStoreID ) REFERENCES Stores ( StoreID ), CONSTRAINT FK_SandwichesSoldPerDay_Products FOREIGN KEY( SandwichProductID ) REFERENCES Products ( ProductID ) ) |
This would work in a “perfect” world, but alas, we don’t live in a perfect world and there will be times where the Corporate will see something “strange” such as selling 1,000,000 Roast Beef Combo (LG) for a single store, and they would probably have to ask a question. In this construct, they would have to call up the owner of the franchise and ask them, please send me a detailed listing of all Roast Beef Combo (LG) for the date of mm/dd/yyyy that you transmitted to us on mm/dd/yyyy, and then the franchise owner would have to open his records and then look at it and report back. The issue is that as the data was translated from DailySummary to SandwichesSoldPerDay any reference to the POSDetails was lost.
GUID Approach
If we did a slight change to the construction of these two databases, such as converting the fields POSDailySummaryTransactionID, DailySummaryTransactionID and SandwichTransactionID to UNIQUEIDENTIFIERs, we could have just picked up the data from the file and plopped it into the Corporate database, maintaining the referential link to all of the POSDetails for the daily summary in the unconnected database. Within the Application, we can also include a command that would take the DailySummaryTransactionID and return all the POSDetails records that are associated with it, allowing corporate to investigate without the need of contacting the Franchise Owner (in case he is in some sort of fraud) and without having to constantly transfer all of the POSDetail Records for every day.
Here is how the tables would be affected for the client side…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
CREATE TABLE Products( ProductID INT NOT NULL, ProductDescription VARCHAR(50), ProductTaxable BIT NOT NULL, CONSTRAINT PK_Products PRIMARY KEY CLUSTERED( ProductID ASC ) ) CREATE TABLE DailySummary( DailySummaryTransactionID UNIQUEIDENTIFIER NOT NULL, DailySummaryProductID INT NOT NULL, DailySummaryQuantity INT NOT NULL, DailySummaryTransmitted BIT NOT NULL DEFAULT(0), CONSTRAINT PK_DailySummary PRIMARY KEY CLUSTERED( DailySummaryTransactionID ), CONSTRAINT FK_DailySummary_Products FOREIGN KEY( DailySummaryProductID ) REFERENCES Products ( ProductID ) ) CREATE TABLE POSSales( POSTransactionID INT NOT NULL, POSTransactionDate DATETIME NOT NULL, POSSubTotal NUMERIC(5, 2) NOT NULL, POSTax NUMERIC(5, 2) NOT NULL, CONSTRAINT PK_POSSales PRIMARY KEY CLUSTERED( POSTransactionID ), POSDailySummaryTransactionID UNIQUEIDENTIFIER, CONSTRAINT FK_POSSales_DailySummary FOREIGN KEY( POSDailySummaryTransactionID ) REFERENCES DailySummary ( DailySummaryTransactionID ) ) CREATE TABLE POSDetails( POSDetailsPOSTransactionID INT NOT NULL, POSDetailsID INT NOT NULL, POSDetailsQuantity INT NOT NULL, POSDetailsProductID INT NOT NULL, POSDetailsPrice NUMERIC(5, 2) NOT NULL, POSDetailsTaxed BIT DEFAULT(0), POSDailySummaryTransactionID UNIQUEIDENTIFIER, CONSTRAINT PK_POSDetails PRIMARY KEY CLUSTERED( POSDetailsPOSTransactionID ASC, POSDetailsID ASC ), CONSTRAINT FK_POSDetails_POSSales FOREIGN KEY( POSDetailsPOSTransactionID ) REFERENCES POSSales ( POSTransactionID ), CONSTRAINT FK_POSDetails_Products FOREIGN KEY( POSDetailsProductID ) REFERENCES Products ( ProductID ), CONSTRAINT FK_POSSales_DailySummary FOREIGN KEY( POSDailySummaryTransactionID ) REFERENCES DailySummary ( DailySummaryTransactionID ) ) |
And this is the changes necessary for the Corporate Side
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
CREATE TABLE Stores( StoreID INT NOT NULL, StoreName VARCHAR(50) NOT NULL, StoreAddress VARCHAR(50) NOT NULL, StoreCity VARCHAR(50) NOT NULL, StoreState VARCHAR(2) NOT NULL, StoreZipCode VARCHAR(9) NOT NULL, CONSTRAINT PK_Stores PRIMARY KEY CLUSTERED( StoreID ASC ) ) CREATE TABLE Products( ProductID INT NOT NULL, ProductDescription VARCHAR(50), ProductTaxable BIT NOT NULL, CONSTRAINT PK_Products PRIMARY KEY CLUSTERED( ProductID ASC ) ) CREATE TABLE SandwichesSoldPerDay( SandwichTransactionID UNIQUEIDENTIFIER NOT NULL, SandwichStoreID INT NOT NULL, SandwichProductID INT NOT NULL, SandwichTransactionDate DATE NOT NULL, SandwichQuantitySold INT NOT NULL, CONSTRAINT PK_SandwichesSoldPerDay PRIMARY KEY CLUSTERED( SandwichTransactionID ASC ), CONSTRAINT FK_SandwichesSoldPerDay_Stores FOREIGN KEY( SandwichStoreID ) REFERENCES Stores ( StoreID ), CONSTRAINT FK_SandwichesSoldPerDay_Products FOREIGN KEY( SandwichProductID ) REFERENCES Products ( ProductID ) ) |
Granted, there are ways to also do this in the Integer methodology, but there are also some serious gotchas that could occur.
Semi-Random Data
There are also times that you will want to send across semi-random data, such as in a Password Reset process, where the user will have to click a link that you don’t want to have living long afterwards, or have a logical pattern. In such an instance, I would write the table as similar to this…
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE TABLE UserAccount( UserName VARCHAR(50) NOT NULL, UserPassword VARCHAR(255) NOT NULL, UserEmail VARCHAR(200) NOT NULL, UserFailedLoginAttempts INT NOT NULL, UserLastLoginAttempt DATETIME, UserResetLink UNIQUEIDENTIFIER, UserResetRequestDate DATETIME, CONSTRAINT PK_UserAccount PRIMARY KEY CLUSTERED( UserName ASC ) ) |
This allows for multiple options, my personal favorite would be when the user clicks the forgot password link, it creates a new GUID and emails it to them as part of the password recovery phase, after the user has successfully reset their password, you can then store a NULL into the table making the previous email link invalid. You can also set it up so that if they have waited for more then 30 minutes of time from the UserResetRequestDate, we can then assume that this is not an appropriate connection and reset the GUID and have it
Really LARGE Data
When working in a table that you expect would go past the typical size that is possible for an integer type (2,147,483,647). In my limited exposure, I can’t think of anything that would easily go past that number simply.
Why Not to Use
Small Data
If you are building something that is anticipating to be living and breathing on a single server where flexibility is and user friendliness is important, I would be more inclined to handle Integers. As we saw in the last post, they have a very slight advantage when it comes to performance, but it has a huge advantage in handling physical spaces. For the small to mid sized business (SMB), this is an important trait, but I wouldn’t use this one piece as the pivotal point of my argument. Instead, the more pivotal point would be easier for users to read. In SMBs, the biggest issue revolves around people. Large organizations I would expect to have teams of people that are extremely well versed in how to build software maintaining the data and handling update/upgrades, while in the SMB environment, most of the time it will be individuals who are doing this in addition to their normal jobs, so understanding how a GUID Works probably won’t work out to well for them.
Normal Data
Unless there is a particular reason why you would have a two identical Sales Orders or AR Invoice numbers, then I would stick to a standard Integer Primary Key. Typically, if the primary key is going to be human readable, it probably should not be a GUID for a primary key.
Summary
Basically, if I am anticipating a database structure that will cross multiple databases, or have something that will possibly walk out to the general public that I don’t want to have a sequential or repeatable URL I should use GUIDs. If I am expecting users to be able to interact with the data easily, or drive space is a concern, I would expect to use Integers, otherwise, they both are awesome choices for the Primary Key of a database table.