Jun/090
Advanced SQL – Sub-Queries
This advanced SQL tutorial focuses on the design of more complex SQL statements and the strategies available for implementing them, it concentrates on sub queries and joins because they are often inter-changeable, and views because these are often used to hide the complexity of queries involving sub-queries and joins.
As this is an advanced tutorial there is some consideration of performance issues, but this aspect is more thoroughly explored in our series on Oracle performance tuning.
This article talks about the following:
- Introduction to Sub-queries
- Use of sub-queries.
- Nested sub queries
- Correlated sub-queries.
Introduction to Sub-queries
Sub queries are also known as nested queries and are used to answer multi-part questions. Sub queries and joins are often interchangeable and in fact the Oracle optimiser may well treat a query containing a sub-query exactly as if it were a join.
Let’s use a trivial example of finding the names of everybody who works in the same department as a person called Jones to illustrate this point. The SQL could be written using a sub query as follows:
SELECT name FROM emp WHERE dept_no =
(SELECT dept_no FROM emp WHERE name = ‘JONES’)
or as a join statement, like this:-
SELECT e1.name FROM emp e1,emp e2
WHERE e1.dept_no = e2.dept_no AND e2name = ‘JONES’
With a trivial example like this there would probably be very little difference in terms of performance of the SQL for such a simple query, but with more complex queries there could well be performance implications. For this reason it is always worth trying a few variations of the SQL for a query and examining the execution plans before deciding on a particular approach, unless they’re very simple queries.
Learn more about Oracle performance tuning here.
Non Correlated Sub-Queries
There are, in fact, two types of sub query: correlated and non-correlated. The example shown above is a non-correlated sub query. The difference between them is that a correlated sub query refers to a column from a table in the parent query, whereas a non-correlated sub query doesn’t. This means that a non-correlated sub query is executed just once for the whole SQL statement, whereas correlated sub queries are executed once per row in the parent query.
Uses of Sub Queries
The most common use of sub queries is in the WHERE clause of queries to define the limiting condition for the rows returned (i.e. what value(s) the rows must have to be of interest), as in the previous example. However, they can also be used in other parts of the query. Specifically, sub queries can be used:
* to define the limiting conditions for SELECT, UPDATE and DELETE statements in the following clauses:-
- WHERE
- HAVING
- START WITH
Instead of a table name in
- INSERT statements
- UPDATE statements
- DELETE
- statements the FROM clause of SELECT statements
* To define the set of rows to be created in the target table of a CREATE TABLE AS or INSERT INTO sql statement.
* To define the set of rows to be included by a view or a snapshot in a CREATE VIEW or CREATE SNAPSHOT statement.
* To provide the new values for the specified columns in an UPDATE statement
The first example of sub query in SQL shown above, used a simple equality expression as we were interested in only one row, but we can also use the sub query to provide a set of rows.
For example, to find the names of all employees in the same departments as Smith and Jones, we could use the following SQL statement :-
SELECT name FROM emp WHERE dept_no IN
(SELECT dept_no FROM emp WHERE name = ‘JONES’ OR name = ‘SMITH’)
In fact, the original example could also return more than one row from the sub query if there were two or more people that were called Jones working in different departments. In the first example a run-time SQL error would be generated in that case, because the first example, by using ‘=’, specified that the sub query should produce no more than one row (it is perfectly legitimate for a sub query to return no rows).
We can reverse the question to ask for the names of all the employees that are NOT in the same department as Jones, To do this, the sense of the sub query just has to be reversed by prefixing it with ‘NOT’ or ‘!’. Again depending on whether there might be more than one Jones, we would either use ‘IN’ or ‘=’
SELECT name FROM emp WHERE dept_no NOT IN
( SELECT dept_no FROM emp WHERE name = ‘JONES’)
Or
SELECT name FROM emp WHERE dept_no !=
( SELECT dept_no FROM emp WHERE name = ‘JONES’)
Nested Sub-Queries
The SQL syntax allows queries to be nested, meaning that a sub query itself can contain a sub query, enabling very complex queries to be built as there is no syntacttical limit to the level of besting. However, very complex queries should be avoided as they are difficult to understand and to maintain and may not perform that well either.
For example, the SQL statement to find the departments that have employees with a salary higher than the average employee salary could be written as:
SELECT name FROM dept
WHERE id IN
(
SELECT dept_id FROM emp
WHERE sal >
(SELECT avg(sal)FROM emp)
)
Any of the other comparison operators instead of ‘=’ or ‘IN’ such as ‘>’, or ‘<’ can also be used with a sub query.
Sub Queries In The From Clause
The examples so far in this advanced SQL tutorial all had sub queries in the where clause, but sub queries can also be used in the from clause instead of a table name. In these circumstances the sub query acts as if it had been predefined as a view.
For example, the following SQL statement returns the amount of used space, the free space and the total allocated space for all tablespaces in a database.
SELECT ts.tablespace_name
,ROUND(fs.mbytes,2) “Free (Mbytes)”
FROM dba_tablespaces ts
,(
SELECT tablespace_name
,SUM(bytes)/1024/1024 mbytes
FROM dba_free_space
GROUP BY tablespace_name
) fs
WHERE ts.tablespace_name = fs.tablespace_name
Note that the sub query is given an alias so that results can be used in the main body of the query.
Sub Queries That Return No Rows
Up until now the queries shown have all been expected to produce a result, but when creating tables, it can be very useful to write the SQL to use a sub query which will not return any rows – when just the table structure is required and not any of the data.
In the following example we create a copy of the policy table with no rows:
CREATE TABLE new_policy AS
SELECT * from policy WHERE 1=0;
The sub query returns no data but does return the column names and data types to the ‘create table’ statement.
Correlated Sub-Queries
As we’ve seen already, there are two types of sub query: correlated and non-correlated. We’ve already looked at non-correlated sub queries (see advanced SQL tutorial part 1). All of the examples of sub queries up until now have been non-correlated sub queries.
Just like non-correlated sub queries, correlated sub queries are used to answer multi-part questions, but they are most often used to check for existence or absence of matching records in the parent table and the related table in the sub query.
A correlated sub query refers to a column from a table in the parent query. As mentioned in part 1 sub-queries (both correlated and non-correlated) and joins are usually interchangeable. However the SQL may be significantly faster when a correlated sub-query is used as correlated sub queries refer to a column from their parent queries, they are executed once per row in the parent query whereas non-correlated sub queries are executed once for the whole statement.
For example, using the emp and dept tables from before, to find out which departments have no employees assigned to them, we can write the SQL statement in 3 different ways – as a non-correlated sub query, as an outer join, or as a correlated sub-query.
Example 1 – non-correlated sub query
SELECT dept.name FROM dept
WHERE dept.id NOT IN
(
SELECT dept_id
FROM emp
WHERE dept_id IS NOT NULL
)
Example 2- outer join
SELECT dept.name FROM dept,emp
WHERE emp.dept_id (+) = dept.id
Example 3 – correlated sub query
SELECT dept.name FROM dept
WHERE NOT EXISTS (SELECT dept_id
FROM emp
WHERE emp.dept_id = dept.id)
The second example is an outer join SQL statement which may produce differnt reults to the other 2 queries as it returns both matching rows and the non-matching rows on one side of the join. In this case the query would return the names of departments which have no employees assigned to them plus the names of those departments that do have employees assigned to them.
The first and the third SQL statements would produce exactly the same results, but the first would probably be slower than the third if the dept_id column in the emp table were indexed (depending on the sizes of the tables).
The first SQL statement can not use any indexes – the where clause of the sub query is just checking for NOT NULL rows – so a full table scan would be performed. Also the sub query would be executed once for each row in the dept table.
On the other hand, the sub query in the third example can use the index and since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.
As you can see there are nearly always several ways in which the SQL for a query may be written, and it is therefore best to try alternative SQL statements particularly for complex queries before deciding on the preferred one.