Home Microsoft SQL Server Working with sets in Transact-SQL (UNION, INTERSECT, EXCEPT)

Polls

Who administers your corporate database infrastructure?
 

related links:

Working with sets in Transact-SQL (UNION, INTERSECT, EXCEPT)
User Rating: / 44
PoorBest 

 One of disadvantages of earliest versions of sql server (before sql server 2005) in comparison with Oracle is the lack of embedded functions to work with sets.

    Here I will describe how to deal with sets depending on your version of Microsoft SQL Server because in 2005 and 2008 Microsoft significantly  extended functionality of T-SQL and lightened the work for some cases.

This article provides a uniform approach to work with sets. It is possible to work with sets using the same template for all cases.

Set operations

A set  in our context means a bunch of unique records that we are going to compare somehow with another set of unique records that have the same column list.

More information about set operations definitions you can find by link The easiest way of set representation is graphical so I will use this representation.

Suppose we work with two sets - people who have experience in MS SQL Server and people who are experienced in Oracle.

SQL Sever(Set A)

Oracle (Set B)

John

Jim

Bill

Bill

So graphically some obvious operations on this sets we can represent as 

Union Intersection Difference A-B Symmetric difference
union intersection difference symmetric difference
All people experienced in a DBMS
People experienced in both systems People who are familiar to SQL Server but are not gurus in Oracle
People who are experienced in DBMS but are not experienced in both systems
John, Jim, Bill Bill John John, Jim

 

 Traditional approaches working with sets using T-SQL

 

Lets create the necessary tables and fill them with the needed values

  CREATE TABLE #A(Name varchar(10) primary key clustered)

  INSERT #A(Name) values('Bill')

  INSERT #A(Name) values('John')

  CREATE TABLE #B(Name varchar(10) primary key clustered)

  INSERT #B(Name) values('Bill')

  INSERT #B(Name) values('Jim')

 

 

 UNION

 union

 This  is the easiest for user and sql server engine operation. SQL Server has the same name operator UNION inherited yet from Sybase. So the query will look like

  SELECT Name FROM #A

  UNION

  SELECT Name FROM #B

 

which returns Bill, Jim , John

  There is also option ALL that you may use with union operator (UNION ALL). Withour "all" keyword sql server filters output for duplicate rows and returns olnly unique ones. If you use UNION ALL it does not perform this check and returns all rows. You should use union all carefully, only in case you are completely confident in uniquness of the output. But you should also tend to use it on your production environment because it works much faster than union without "all" option. The script is

  SELECT Name FROM #A

  UNION ALL

  SELECT Name FROM #B

 

  Output   Bill, John, Bill, Jim . Notice that Bill made it to the output twice.

 

 INTERSECTION

 intersection

This task may be performed using JOIN-operator or subquery

 

  SELECT a.Name

  FROM #A as a

      INNER JOIN #B as b on a.Name = b.Name

 The same result may be received using subquries

 

  SELECT Name

  FROM #A

  WHERE Name IN(SELECT Name FROM #B)

 

  SELECT Name

  FROM #A a

  WHERE EXISTS ( SELECT 1

                         FROM #B b

                         WHERE b.Name = a.Name)

 

The first query is a common subquery and the second - correlated subquery

Starting from Microsoft SQL Server 2005 you may use INTERSECT operator

 

  SELECT Name

  FROM #A

  INTERSECT

  SELECT Name

  from #B

  All of quries return only Bill of course.

 

DIFFERENCE

 difference

 

 In Microsoft SQL Server 2000 there is no special operator for this operation. So you need to use sophisticated approaches.

 

  SELECT Name

  FROM #A

  WHERE Name NOT IN(SELECT Name FROM #B)

 

   Explanation: get all names from #A and check for every name that there is no such name in #B. This aprroach works only for sets of records containing only one column

 

  SELECT Name

  FROM #A a

  WHERE

  NOT EXISTS (SELECT 1

              FROM #B b

              WHERE b.Name = a.Name)

 

    Explanation: get all records from #A and check for every record that there is no such record in #B. You may list as many columns as you need using this approach.

 

  SELECT a.Name FROM #A a

  LEFT OUTER JOIN #B b ON a.Name = b.Name

  WHERE b.Name IS NULL

  Explanation: join two tables and then filter by rows that don't have corresponding row in #B

Starting from Microsoft SQL Server 2005 you can use EXCEPT operator

  SELECT NAME

  FROM #A

  EXCEPT

  SELECT NAME

  FROM #B

 

 Off course all these quries return only John.

 

SYMMETRIC DIFFERENCE

 symmetric difference

In MS SQL Server 2000 you  can get this result by using a combination of previous approaches. For intance (A-B) + (B-A)

 

  SELECT Name

  FROM #A

  WHERE Name NOT IN(SELECT Name FROM #B)

  UNION

  SELECT Name

  FROM #B

  WHERE Name NOT IN(SELECT Name FROM #A)

 

 and in Microsoft SQL Server 2005 there is a much more convenient method

 

      SELECT NAME

      FROM #A

      UNION

      SELECT NAME

      FROM #B

      EXCEPT

      SELECT NAME

      FROM #A

      INTERSECT

      SELECT NAME

      FROM #B

 

All the quries return Jim, John.

 

 For more information about Union, Except and Intersect operator address EXCEPT and INTERSECT (Transact-SQL).

 

 

Comments (1)

 
SPONSORED LINKS:
Content View Hits : 265872
DB-staff Remote DBA Services, Powered by DB-Staff 2008