Home Microsoft SQL Server Universal method to work with sets

Polls

Who administers your corporate database infrastructure?
 

related links:

Universal method to work with sets
User Rating: / 0
PoorBest 
Written by Administrator   
Tuesday, 10 February 2009 06:40

Recently I have published the article Working with sets in Transact-SQL. This is useful for those who write high performance production code for Microsoft SQL Server. But the inconvenience in those methods is in peculiarity. If you wrote a script to intersect two tables and than decided that you want to actually merge or except them you need to significantly rewrite the script (especially if you use early version of SQL Server without intersect and except operators). Another inconvenience is that you need the same script for another DBMS you need to rewrite the script again. The following approach is universal regarding set operation to change it from except to intersect or union you need only to change a parameter. And it is cross-platform - it uses only ANSI SQL-92 union all, group by and having clauses so works n all versions of Microsoft SQL Server (SQL Server 7, SQL Server 2000, SQL Server 2005, SQL Server 2008), Oracle 8i, 9i , 10g, 11g, Sybase, ASE, MySQL, DB2, PostgreSQL etc.

 

 How to compare two tables or resultsets

 Most of DBAs know this trick with union/union all selects (based on the fact that union does not return duplicate rows while union all does).

Basically this method looks like

select count(*) as CountTableA from tableA 
select count(*) as CountTableB from tableB
select count(*) as UnionCount
from ( select * from TableA union select * from tableB) a
select count(*) as UnionAllCount
from ( select * from TableA union all select * from tableB) a

 if your UnionCount and UnionAllCount are not equal than your tables have duplicates. UnionAllCount is always less or equal to UnionCount (because UnionCount does not include duplicates). So you see that there is unwanted difference or duplicates in your tables, most likely you want see these records to make a decision about what to do with them. That's where you need the universal approach to work with sets in sql.

 How to see the difference between tables or resultsets

 Let's create three test tables and fill them in to have the visual understanding of the method

 create table tableA (Number int, Value varchar(100), Name varchar(100))  
create table tableB (Number int, Value varchar(100), Name varchar(100))
create table tableC (Number int, Value varchar(100), Name varchar(100))
insert into tableA
select 2, 'aaaa', 'bbbbbb'
union all select 3, 'cccc', 'dddddd'
union all select 4, 'yyyy', 'uuuu'
insert into tableB
select 7, 'tyyyyi', 'pppp'
union all select 3, 'cccc', 'dddddd'
union all select 67, 'llll', 'union'
insert into tableC
select 78, 'expr', 'pppppddd'
union all select 3, 'cccc', 'dddddd'
union all select 4, 'yyyy', 'union'

 Now we need to assign to each set a binary identifier as a degree of number two.

TableA - 20  - 1

TableB - 21  -2

TableA - 22  - 4

 When we need to intersect the sets we need to sum up its binary identifier values using SUM aggregation function. Thus any basic intersection will have its own unique identifier. In other words binary identifiers are jut bitmasks and to get an intersection of sets you need to set necessary bits of the resulting bitmask.

So the universal construction looks like

select Number, Value, Name, SUM(BinaryID) 
from (
select *, BinaryID = 1
from tableA
union all
select *, BinaryID = 2
from tableB
union all
select *, BinaryID = 4
from tableC
) allinter
group by Number, Value, Name
having SUM(BinaryID) in (setoperation parameter)

 where setoperation parameter  depends on the set operation we wand to implement.

For instance Intersection of all tables

select Number, Value, Name, SUM(BinaryID) 
from (
select *, BinaryID = 1
from tableA
union all
select *, BinaryID = 2
from tableB
union all
select *, BinaryID = 4
from tableC
) allinter
group by Number, Value, Name
having SUM(BinaryID) in (1+2+4)

 returns only one record - 3    cccc    dddddd    7

So main thing is that any record of participating tables will have a resulting sum of BindaryIDs and this number will be only 1 for a record.

Thus assume you want to see the records that exist only in TableA, records that exist in TableB and TableC but don't exist in TableA.

You need the next BinaryID sums - TableA - 1, TableB+TableC = 2+4 = 6.

select Number, Value, Name, SUM(BinaryID) 
from (
select *, BinaryID = 1
from tableA
union all
select *, BinaryID = 2
from tableB
union all
select *, BinaryID = 4
from tableC
) allinter
group by Number, Value, Name
having SUM(BinaryID) in (1, 6)

 You can deal with up to 32 tables using this approach. Additionally you are not limited to compare tables using this. You can also compare resultsets. For example you optimized a view and doubt if it still returns the same results - all you need to see if there are any difference run the query

 select field1, field2, field3, SUM(BinaryID)  
from (
select *, BinaryID = 1
from viewversion1
union all
select *, BinaryID = 2
from viewversion2
) allinter
group by field1, field2, field3
having SUM(BinaryID) in (1, 2)

 The performance of described method is not too excellent (you can ensure looking at performance plans) so I don't recomment you use it in you production code. But this is really usuful when you need to compare two talbes or resultsets quickly.

 

Comments (0)

 
SPONSORED LINKS:
Content View Hits : 265874
DB-staff Remote DBA Services, Powered by DB-Staff 2008