GUIDs, What can they be for?


In my last post, I talked about GUIDs and if they were good or bad items to use for Primary Keys, today, lets talk about when we would want to use a GUID.


So, why would someone want to use a GUID compared to a integer primary key? I can think of a couple of good examples for this…

When to Use

Large Franchises (AKA Big Data)

Well, lets assume that we are working for a project with a “large” franchise basis (hypothetically like either a sub-shop or hotel), and lets also assume that each of the store fronts have to send in a daily accounting of what they have sold as a summary (For sub-shop, how may of each type of item and for the hotel, how many rooms and room occupants such as “12 rooms with 2 occupants were rented today”).  Now, I have no problem envisioning a solution that would take the data from a file and do an individual import of that data into a table that contained integer values for primary keys, and linking that to a specific record for each store.  To me, this would look something like this below.

Integer Possible

When I go to buy something from a certain sub-shop they usually make my sandwich and then I go over to the register where they ring me up with all of the items that I purchase (such as a foot-long roast beef sandwich, a bag of chips, a large drink, a cookie and a milk) and they will usually mark it as 1 – 12″ roast beef combo (LG), 1 – cookie, 1 milk. At the end of the day, the store owner will want to look at all of the sales for the day, but he probably won’t want to look at each individual sale, he will want to look at a summary, such as X – 12″ Roast Beef Combo (LG), Y – 12″ Roast Beef Combo (MD), Z – 12″ Roast Beef Combo and so on. As such, I am expecting that the table structure would look something like this…

As the cashier starts ringing me up, they create a POSSales record, and for each item that they add to the order becomes a new record within the POSDetails table. At the end of the day the boss clicks a button that will go through all of the POSSales Records for a specific day and summarize the POSDetails into the DailySummary table and then update the POSDetails POSDetailsDailySummaryTransactionID to be the new record. After they have reviewed it, they press another button that will transmit the details to the corporate office which will then assimilate those summaries to their own system.

When the Corporate Folder receives this file they then insert their data to their database, which I can assume to look like this…

This would work in a “perfect” world, but alas, we don’t live in a perfect world and there will be times where the Corporate will see something “strange” such as selling 1,000,000 Roast Beef Combo (LG) for a single store, and they would probably have to ask a question. In this construct, they would have to call up the owner of the franchise and ask them, please send me a detailed listing of all Roast Beef Combo (LG) for the date of mm/dd/yyyy that you transmitted to us on mm/dd/yyyy, and then the franchise owner would have to open his records and then look at it and report back. The issue is that as the data was translated from DailySummary to SandwichesSoldPerDay any reference to the POSDetails was lost.

GUID Approach

If we did a slight change to the construction of these two databases, such as converting the fields POSDailySummaryTransactionID, DailySummaryTransactionID and SandwichTransactionID to UNIQUEIDENTIFIERs, we could have just picked up the data from the file and plopped it into the Corporate database, maintaining the referential link to all of the POSDetails for the daily summary in the unconnected database. Within the Application, we can also include a command that would take the DailySummaryTransactionID and return all the POSDetails records that are associated with it, allowing corporate to investigate without the need of contacting the Franchise Owner (in case he is in some sort of fraud) and without having to constantly transfer all of the POSDetail Records for every day.

Here is how the tables would be affected for the client side…

And this is the changes necessary for the Corporate Side

Granted, there are ways to also do this in the Integer methodology, but there are also some serious gotchas that could occur.

Semi-Random Data

There are also times that you will want to send across semi-random data, such as in a Password Reset process, where the user will have to click a link that you don’t want to have living long afterwards, or have a logical pattern. In such an instance, I would write the table as similar to this…

This allows for multiple options, my personal favorite would be when the user clicks the forgot password link, it creates a new GUID and emails it to them as part of the password recovery phase, after the user has successfully reset their password, you can then store a NULL into the table making the previous email link invalid. You can also set it up so that if they have waited for more then 30 minutes of time from the UserResetRequestDate, we can then assume that this is not an appropriate connection and reset the GUID and have it

Really LARGE Data

When working in a table that you expect would go past the typical size that is possible for an integer type (2,147,483,647). In my limited exposure, I can’t think of anything that would easily go past that number simply.

Why Not to Use

Small Data

If you are building something that is anticipating to be living and breathing on a single server where flexibility is and user friendliness is important, I would be more inclined to handle Integers. As we saw in the last post, they have a very slight advantage when it comes to performance, but it has a huge advantage in handling physical spaces. For the small to mid sized business (SMB), this is an important trait, but I wouldn’t use this one piece as the pivotal point of my argument. Instead, the more pivotal point would be easier for users to read. In SMBs, the biggest issue revolves around people. Large organizations I would expect to have teams of people that are extremely well versed in how to build software maintaining the data and handling update/upgrades, while in the SMB environment, most of the time it will be individuals who are doing this in addition to their normal jobs, so understanding how a GUID Works probably won’t work out to well for them.

Normal Data

Unless there is a particular reason why you would have a two identical Sales Orders or AR Invoice numbers, then I would stick to a standard Integer Primary Key.  Typically, if the primary key is going to be human readable, it probably should not be a GUID for a primary key.


Basically, if I am anticipating a database structure that will cross multiple databases, or have something that will possibly walk out to the general public that I don’t want to have a sequential or repeatable URL I should use GUIDs. If I am expecting users to be able to interact with the data easily, or drive space is a concern, I would expect to use Integers, otherwise, they both are awesome choices for the Primary Key of a database table.