| Working with sets in Transact-SQL (UNION, INTERSECT, EXCEPT) |
|
One of disadvantages of earliest versions of sql server (before sql server 2005) in comparison with Oracle is the lack of embedded functions to work with sets. Here I will describe how to deal with sets depending on your version of Microsoft SQL Server because in 2005 and 2008 Microsoft significantly extended functionality of T-SQL and lightened the work for some cases. This article provides a uniform approach to work with sets. It is possible to work with sets using the same template for all cases. Set operationsA set in our context means a bunch of unique records that we are going to compare somehow with another set of unique records that have the same column list. More information about set operations definitions you can find by link The easiest way of set representation is graphical so I will use this representation. Suppose we work with two sets - people who have experience in MS SQL Server and people who are experienced in Oracle.
So graphically some obvious operations on this sets we can represent as
Traditional approaches working with sets using T-SQL
Lets create the necessary tables and fill them with the needed values
CREATE TABLE #A(Name varchar(10) primary key clustered) INSERT #A(Name) values('Bill') INSERT #A(Name) values('John') CREATE TABLE #B(Name varchar(10) primary key clustered) INSERT #B(Name) values('Bill') INSERT #B(Name) values('Jim')
UNION This is the easiest for user and sql server engine operation. SQL Server has the same name operator UNION inherited yet from Sybase. So the query will look like
SELECT Name FROM #A UNION SELECT Name FROM #B
which returns Bill, Jim , John There is also option ALL that you may use with union operator (UNION ALL). Withour "all" keyword sql server filters output for duplicate rows and returns olnly unique ones. If you use UNION ALL it does not perform this check and returns all rows. You should use union all carefully, only in case you are completely confident in uniquness of the output. But you should also tend to use it on your production environment because it works much faster than union without "all" option. The script is
SELECT Name FROM #A UNION ALL SELECT Name FROM #B
Output Bill, John, Bill, Jim . Notice that Bill made it to the output twice.
INTERSECTION This task may be performed using JOIN-operator or subquery
SELECT a.Name FROM #A as a INNER JOIN #B as b on a.Name = b.Name The same result may be received using subquries
SELECT Name FROM #A WHERE Name IN(SELECT Name FROM #B)
SELECT Name FROM #A a WHERE EXISTS ( SELECT 1 FROM #B b WHERE b.Name = a.Name)
The first query is a common subquery and the second - correlated subquery Starting from Microsoft SQL Server 2005 you may use INTERSECT operator
SELECT Name FROM #A INTERSECT SELECT Name from #B All of quries return only Bill of course.
DIFFERENCE
In Microsoft SQL Server 2000 there is no special operator for this operation. So you need to use sophisticated approaches.
SELECT Name FROM #A WHERE Name NOT IN(SELECT Name FROM #B)
Explanation: get all names from #A and check for every name that there is no such name in #B. This aprroach works only for sets of records containing only one column
SELECT Name FROM #A a WHERE NOT EXISTS (SELECT 1 FROM #B b WHERE b.Name = a.Name)
Explanation: get all records from #A and check for every record that there is no such record in #B. You may list as many columns as you need using this approach.
SELECT a.Name FROM #A a LEFT OUTER JOIN #B b ON a.Name = b.Name WHERE b.Name IS NULL Explanation: join two tables and then filter by rows that don't have corresponding row in #B Starting from Microsoft SQL Server 2005 you can use EXCEPT operator
SELECT NAME FROM #A EXCEPT SELECT NAME FROM #B
Off course all these quries return only John.
SYMMETRIC DIFFERENCE In MS SQL Server 2000 you can get this result by using a combination of previous approaches. For intance (A-B) + (B-A)
SELECT Name FROM #A WHERE Name NOT IN(SELECT Name FROM #B) UNION SELECT Name FROM #B WHERE Name NOT IN(SELECT Name FROM #A)
and in Microsoft SQL Server 2005 there is a much more convenient method
SELECT NAME FROM #A UNION SELECT NAME FROM #B EXCEPT SELECT NAME FROM #A INTERSECT SELECT NAME FROM #B
All the quries return Jim, John.
For more information about Union, Except and Intersect operator address EXCEPT and INTERSECT (Transact-SQL).
Comments (1) |



