| Universal method to work with sets |
| Written by Administrator |
| Tuesday, 10 February 2009 06:40 |
|
Recently I have published the article Working with sets in Transact-SQL. This is useful for those who write high performance production code for Microsoft SQL Server. But the inconvenience in those methods is in peculiarity. If you wrote a script to intersect two tables and than decided that you want to actually merge or except them you need to significantly rewrite the script (especially if you use early version of SQL Server without intersect and except operators). Another inconvenience is that you need the same script for another DBMS you need to rewrite the script again. The following approach is universal regarding set operation to change it from except to intersect or union you need only to change a parameter. And it is cross-platform - it uses only ANSI SQL-92 union all, group by and having clauses so works n all versions of Microsoft SQL Server (SQL Server 7, SQL Server 2000, SQL Server 2005, SQL Server 2008), Oracle 8i, 9i , 10g, 11g, Sybase, ASE, MySQL, DB2, PostgreSQL etc. How to compare two tables or resultsetsMost of DBAs know this trick with union/union all selects (based on the fact that union does not return duplicate rows while union all does). Basically this method looks like select count(*) as CountTableA from tableA if your UnionCount and UnionAllCount are not equal than your tables have duplicates. UnionAllCount is always less or equal to UnionCount (because UnionCount does not include duplicates). So you see that there is unwanted difference or duplicates in your tables, most likely you want see these records to make a decision about what to do with them. That's where you need the universal approach to work with sets in sql. How to see the difference between tables or resultsetsLet's create three test tables and fill them in to have the visual understanding of the method create table tableA (Number int, Value varchar(100), Name varchar(100)) Now we need to assign to each set a binary identifier as a degree of number two. TableA - 20 - 1 TableB - 21 -2 TableA - 22 - 4 When we need to intersect the sets we need to sum up its binary identifier values using SUM aggregation function. Thus any basic intersection will have its own unique identifier. In other words binary identifiers are jut bitmasks and to get an intersection of sets you need to set necessary bits of the resulting bitmask. So the universal construction looks like select Number, Value, Name, SUM(BinaryID) where setoperation parameter depends on the set operation we wand to implement. For instance Intersection of all tables select Number, Value, Name, SUM(BinaryID) returns only one record - 3 cccc dddddd 7 So main thing is that any record of participating tables will have a resulting sum of BindaryIDs and this number will be only 1 for a record. Thus assume you want to see the records that exist only in TableA, records that exist in TableB and TableC but don't exist in TableA. You need the next BinaryID sums - TableA - 1, TableB+TableC = 2+4 = 6. select Number, Value, Name, SUM(BinaryID) You can deal with up to 32 tables using this approach. Additionally you are not limited to compare tables using this. You can also compare resultsets. For example you optimized a view and doubt if it still returns the same results - all you need to see if there are any difference run the query select field1, field2, field3, SUM(BinaryID) The performance of described method is not too excellent (you can ensure looking at performance plans) so I don't recomment you use it in you production code. But this is really usuful when you need to compare two talbes or resultsets quickly.
Comments (0) |