MS SQL GUIDs, Curse or Benefit?

MS SQL GUIDs

What are the benefits and detriments of having a GUIDs within your database? Can you rely upon them to be truly unique? Are there any hidden “gotcha’s” associated with GUIDs? Today, lets dig into that and find out.

The other day, I was talking to a friend of mine and we were discussing GUIDs, and his observation was that they were messy to use as Primary Keys and also thought that they had some serious disadvantages, such as being very slow in large numbers. I, being the everything must have some benefit and detriment, pointed out some of the good things about them such as their extreme uniqueness, but I wasn’t to positive about their performance, although I did agree that they were very messy for trying visually see how things were linked and also could be a huge waste of physical space. Being the curious person that I am, I decided that I would dig into this and try to figure out exactly what a GUID was, a benefit or a curse. Now for this article, I am going to reference a lot of other websites, so it might be fast and loose and for that, and I will apologize for that in advance.

What is a GUID ({3cecbfbb-b53e-42e6-ae85-8b543f4b1ac3})?

Now the first question is, what is a GUID? A GUID, according to The Quick Guide to GUIDs, is a 1039 long hexadecimal number that is broken down into five sections (broken out in hex as 8, 4, 4, 4 and 12), and is supposed to be random with hints about how, when and where it was made. For me, that doesn’t cut it, so I dug further in and found a post by the Raymond Chen over at Microsoft’s MSDN are where he posted GUIDs are globally unique, but substrings of GUIDs aren’t, and describes how they are built.

  • 60 bits of timestamp
  • 48 bits of computer identifier
  • 14 bits of uniquifier
  • 06 bits are fixed

He goes onto describe how if a computer is generating two or more GUIDs within rapid succession (i.e. within the same second) or someone’s computer is back/forward date then the GUID counter will not be in the same place allowing the 14 bits of “uniquifier” to be different. To me, this means that the only way to get two GUIDS the same is if the records were 1) created at the exact same second, 2) the computer identifier has to equate out to be the same, 3) the GUID Counter must be in the same exact location on both machines.

OK, being at the exact same second is a difficult but not impossible thing for computers to create a record, but what about the hash of the computer identifier, how is that made? According to RFC 4122 Section 4.1.6, this is supposed to be generated from the IEEE 802 MAC Address, and if the system doesn’t have a MAC Address a pseudo-random value can be used. So in order to handle this, the computer must have either a) no MAC address and the pseudo-random number generator had to duplicate the values or b) you have to have installed a network card that had a hacked MAC address that is spoofing another computer (which would not allow you to be on the same network as the spoofed computer) or c) the same exact network card as another computer.

Lastly, the uniquifier, how does that work? Well, according to Ken Simmons who wrote Understanding and Examining the Uniquifier in SQL Server, the uniquifier is only hit when the record will be duplicated and the field is marked as Unique. This seems to be a safe bet the the GUID will be unique across different machines and times, even if the times are the same you should be pretty safe as long as the generating computer isn’t a virtualized computer, sharing physical access to the network card, running to a different SQL Server that is creating records at the same exact time and the same exact information.

Performance?

To test this out, I decided to build a program that would insert any number of records I wanted into one of three different table sets, a standard Integer controlled environment, using an integer key generated by the application, a GUID Automatically entered controlled environment and a GUID that the application would generate and manage. Very simple tables, almost identical except for the primary keys and the SQL Statements are as such

The program itself is to large to paste the Source Code, you can download it here, Database GUID Test; however, it will

  1. Drop the tables it anticipates working on
  2. Create the tables (ah, new table smell)
  3. Create the records one at a time and record the entire time it took for all records
    1. Create a “Header” Record, returning the Key Field using the SQL Output Clause (SQL 2008 and above)
    2. Create 10 “Detail” Records
  4. Join all the records for both the “Header” table and the “Detail” table and record the entire time it took for all records
  5. Get a listing of all the “Header” record Primary Keys,
  6. If selected, delete the records and record the entire time it took for all records
    1. Loop through all of the “Header” records and
      1. Delete all of the “Detail” records for the Header
      2. Delete the “Header” record
  7. If selected, drop the tables and act like it never occurred.
  8. Display the results of each event in milliseconds

There is some expected overhead for writing to the console the current record it is working on although I would suspect that is minimal and also I suspect that it should add to all three tests at the same time because that is in effect for all three. For the test, I am running a Windows 10 Intel Core i5 (2.40 GHz) with 8 Gigs of ram and SQL 2016, I had it generate 100,000 records into all three types at the same time in different instances of the application (try and press it hard) to see what would happen. Below are the results.

Insertions

  • 100,000 Records
    • Integer Based Records: 1666678.0463 Milliseconds = 27:46 Minutes
    • GUID Based Records – SQL Managed: 1716539.7326 Milliseconds  = 28:36 Minutes
    • GUID Based Records – Application Managed: 1683044.8951 Milliseconds  = 28:03 Minutes
  • 1 Record
    • Integer Based Records: 23.8278 Milliseconds
    • GUID Based Records – SQL Managed: 22.3039 Milliseconds
    • GUID Based Records – Application Managed: 23.6874 Milliseconds

Joinings

  • 100,000 Records
    • Integer Based Records: 7849.4239 Milliseconds = 00:07 Minutes
    • GUID Based Records – SQL Managed: 13602.463 Milliseconds = 00:13 Minutes
    • GUID Based Records – Application Managed: 12598.6037 Milliseconds = 00:12 Minutes
  • 1 Record
    • Integer Based Records: 7.1296 Milliseconds
    • GUID Based Records – SQL Managed: 7.5144 Milliseconds
    • GUID Based Records – Application Managed: 7.4296 Milliseconds

Deletions

  • 100,000 Records
    • Integer Based Records: 314318.4881 Milliseconds = 05:13 Minutes
    • GUID Based Records – SQL Managed: 345862.0843 Milliseconds = 05:45 Minutes
    • GUID Based Records – Application Managed: 334389.0808 Milliseconds = 05:34 Minutes
  • 1 Record
    • Integer Based Records: 4.4671 Milliseconds
    • GUID Based Records – SQL Managed: 4.265 Milliseconds
    • GUID Based Records – Application Managed: 4.0305 Milliseconds

Physical Space

  • 100,000 Records
    • Integer Based Records:
      • Header Record: 2.13 Mb
      • Detail Record: 27.69 Mb
    • GUID Based Records – SQL Managed:
      • Header Record: 4.57 Mb
      • Detail Record: 86.94 Mb
    • GUID Based Records – Application Managed:
      • Header Record: 4.50 Mb
      • Detail Record: 87.07 Mb
  • 1 Record
    • Integer Based Records:
      • Header Record: 8 KB
      • Detail Record: 8 KB
    • GUID Based Records – SQL Managed:
      • Header Record: 8 KB
      • Detail Record: 8 KB
    • GUID Based Records – Application Managed:
      • Header Record: 8 KB
      • Detail Record: 8 KB

Summary

All in all, the only two downsides to having a GUID as a primary key are Readability and Disk Space, Performance wise, it appears that it is nearly on par with each other, and even on a single entry the GUID appears to have a slight advantage. From these observations, a GUID is a realistic alternative for Integer values for a reference as Primary Keys.