Blog Home  Home Feed your aggregator (RSS 2.0)  
What did you learn today? - Tables with GUIDs as keys are just as fast as tables with ints
Phil Denoncourt's Technology Rants
 
 Wednesday, September 13, 2006

I was sitting in this meeting in which I was only peripherally involved and there's only so much doodling you can do.  So I did what most other people in the meeting were doing, I started daydreaming. 

I'm a big fan of GUIDs, but there's all this noise about how slow they are because they're bigger than an int making for inefficient searching.  So I started to wonder, how much slower are GUIDs than ints? So I started to do the math, and for a table with 4 billion rows, I was coming out with only one more logical read if the key was a GUID than an int.... I know.. Didn't make sense to me.  So I put it to a practical test.

I created two tables.  One called IntWithData and the other called GuidWithData.


I then populated them with a million rows.  --As a side note, I started the scripts to populate the tables at about the same time, and they finished at the same time.  Something I think I will research further is how much slower it is to insert a record in a GUID table compared to an INT table.  Common sense tells you it's supposed be way slower to populate a GUID--

I took a look at the statistics for the index by using the following statements:
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),object_id('dbo.GuidWithData'),null,null,'DETAILED')
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),object_id('dbo.IntWithData'),null,null,'DETAILED')

That produced these results:

database_id object_id index_id partition_number index_type_desc alloc_unit_type_desc index_depth index_level avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages page_count avg_page_space_used_in_percent record_count ghost_record_count version_ghost_record_count min_record_size_in_bytes max_record_size_in_bytes avg_record_size_in_bytes forwarded_record_count
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 0 99.037 5919 1 5919 68.85677 1000000 0 0 31 31 31 NULL
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 1 96.2963 27 1 27 67.68663 5919 0 0 23 23 23 NULL
6 2105058535 1 1 CLUSTERED INDEX IN_ROW_DATA 3 2 0 1 1 1 8.314801 27 0 0 23 23 23 NULL

 

database_id object_id index_id partition_number index_type_desc alloc_unit_type_desc index_depth index_level avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages page_count avg_page_space_used_in_percent record_count ghost_record_count version_ghost_record_count min_record_size_in_bytes max_record_size_in_bytes avg_record_size_in_bytes forwarded_record_count
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 0 0.423403 317 8.195584 2598 99.84113 1000000 0 0 19 19 19 NULL
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 1 0 9 1 9 46.33886 2598 0 0 11 11 11 NULL
6 2073058421 1 1 CLUSTERED INDEX IN_ROW_DATA 3 2 0 1 1 1 1.420806 9 0 0 11 11 11 NULL

This is tough to read, but what you should be getting out of this is: Guid table is very fragmented.  Int table is not.  Also the Guid table had about 2.25 more pages than the Int table.  Ideally you would be running an overnight process that reorganized pages in your DB, and that would help with the fragmentation.

So now the test query.  The question I wanted to answer was:  How many logical reads are needed to read a record out of the Int table and how many are needed to read out of the Guid table?

SET STATISTICS IO ON

SELECT * FROM IntWithData WHERE ID = 619284

SELECT * FROM GuidWithData
WHERE ID = '7DD78950-D3CD-4016-8D92-738A6E0666F2' -- I had to find this value in my data

The results:
....

(1 row(s) affected)

Table 'IntWithData'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

Table 'GuidWithData'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

3 logical reads. That means in a table with 1 million rows, there should be no difference between a table with Guid as a key and one that has an int.  This is because the records are searched for using a binary search algorithm.  In a perfectly balanced tree, it shouldn't take the algorithm more than 20 tries to find the key with many of those "tries" are contained in the same index page.  At some volume, and I haven't calculated exactly where, there will be a difference of one logical read, but that's it.

Hopefully, I haven't misinterpreted the results of this test.  I don't think I have.  The bottom line is that Guids take up more physical space, but you're able to find them just as fast as ints.

 

Wednesday, September 13, 2006 9:51:55 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]   Development | SQL  | 
Comments are closed.
Copyright © 2008 Phil Denoncourt III. All rights reserved.
DasBlog 'Portal' theme by Johnny Hughes.
Pick a theme: