Dec/090
Oracle Performance Tuning Tactics Part 2: Using Indexes
To recap, the “weapons” available in your “arsenal” for performance tuning are as follows:- Full table scans ; Indexes; Joins; Views; De-normalization; Oracle Stored Procedures; Sub-queries.
2. Indexes
There are several types of indexes available with Oracle databases:
* normal/B-Tree (balanced/ binary tree) indexes;
* bitmap indexes (which literally use a map of bits to represent the whole index and are
* function-based indexes (which are based on the value of an expression instead of the value of a column);
* partitioned indexes (tables and indexes may be partitioned separately)
* domain indexes (application-specific indexes created on complex data types such as spatial data).
This article only discusses bitmap and normal (b-tree) indexes.
2.1 B-Tree Indexes
B-Tree Indexes are most appropriate when retrieving a small amount of data from a very large table. The bigger the table and the lower the number of rows that you want to retrieve then the more effective the index.
If you think of the printed material in the real world, this makes perfect sense, you wouldn’t bother to create an index for a short article, but you would certainly want one for a lengthy reference book. The reasons are analogous too. When reading a short article it would take longer to find the areas of interest using the index then it would to read the whole article. It’s the same when you’re querying the database – for a small table with Oracle’s multi-block read feature of Oracle, the whole table would be read in one go, so by going to the index first, the database would have to read more data overall.
Indexes may improve read access to data and update operations (including insert and delete) using sub-queries. However update performance will be impaired because every time a row is inserted into, deleted from or updated (if the indexed column(s) is/are updated) in the database, the indexes have to be adjusted accordingly, thereby increasing the amount of i/o that has to performed.
Therefore having many indexes is ideal when the database is mostly read only, but if there are a high proportion of inserts and deletes and only a few read operations, then you would find that adding more indexes would degrade rather than improve performance. You also need to ensure that the column(s) you are proposing to index is/are used frequently as the limiting conditions for a query, if not you will waste a lot of space and degrade database performance.
Another factor to consider with b-tree indexes in Oracle databases, is how specific they are – their cardinality. The ideal b-tree index is one that refers to only one row in the table, as a primary key does by definition. This reduces considerably the amount of i/o required to retrieve the data, but if the index value relates to many rows because it is not selective, then a lot more data will be retrieved resulting in unnecessary i/o.
Dec/090
Oracle Performance Tuning Part 1: Using Full Table Scans
It is a common misconception that all SQL queries on all tables in Oracle databases should be index driven. In fact, using full-table scans can improve performance in two scenarios: when querying very small tables and when querying very large tables.
The Effect of Full Table Scans When Querying Small Tables
Let’s suppose your using your Oracle database to run in-house designed and built HR application. Consider a reference table such as a list of department ids and the associated department names. Even a large company is likely to have only a few departments – HR, Sales, Marketing, Finance, IT, so the table is going to be quite small.
Now let’s suppose the table has just 2 columns – department id and department name – with an index on the department id. To find the department name for a given id, we would have to read the index and then read the table, but because the table is so small and because Oracle reads multiple database blocks in one read operation the whole table is scanned in just one read, so however efficient the index, by using it we will be performing unnecessary i/o.
In this case, therefore, a full table scan is faster than an index scan and table lookup. The exception to this of course is when the table has been created as an index-only table (available since Oracle
which means that the whole table is stored in a B-tree structure (although you may have pointers to overflow areas).
The Effect of Full Table Scans When Querying Very Large Tables
Let’s look at using this technique for querying very large tables in your Oracle database. Surely they should use an index? Otherwise you might have to read thousands of blocks. It is correct to say that a full table scan of a very large table could read many thousands of data blocks, but as we shall see it may be better to do this than to perform an index scan and table lookup.
The situation when the a full table scan is very likely to perform better than an index scan and table lookup is when you are retrieving 10% or more of the data in the table and it may perform better even when you are retrieving as little as 1% of the table data. Of course if you only want to retrieve one row in the table, then you would want to use an index.
Index Scan And Table Lookup Vs. Full Table Scan For Very Large Tables in Oracle Databases
Let’s look at the 2 scenarios then – retrieving 10% of the table data by index scan and table lookup vs. full table scan.
To make the maths easy, assume our table has 10,000,000 rows with 10 data rows per block and 100 index entries per block. Therefore to read 10% of the table via an index scan and table lookup, we would have to read 10,000 (1,000,000/100) index blocks plus 100,000 (1,000,000/10) data blocks. That’s 110,000 blocks in total.
However this assumes that the data is stored in order which means that we only retrieve the blocks of data that we want. If the data is not sorted then the worst case is that we would have to read 1 block for each row of data i.e. 1,000,000 blocks, which would would give us a worst case total of 1,010,000 blocks.
For a full table scan the maths is easy: (10,000,000 rows)/(10 rows per block) = 1,000,000 blocks. This is less than the worst case scenario for an indexed read, but more than the best case scenario for an indexed read. This would seem to suggest that if you sort your data before loading, an indexed read would be faster than a full-table scan.
Whilst it is true that pre-sorting the data of very large table will improve performance , it is not necessarily correct that the read via the index will be better overall. We also need to take into account what happens to the blocks stored in the buffer cache of the Oracle SGA and the impact this will have on other users of the database.
Let’s look at the effect on the buffer cache of reading many data blocks via an index. As we know, data and index blocks are stored in the buffer cache by Oracle for reuse by other queries by being marked as least recently used when we do an indexed read. However, those data blocks read by a full table scan are quickly aged out of the buffer cache, because they are not marked as least recently used.
What this means is that the large number of blocks (1,010,000) of index and table data read via the indexed read of our table will be saved in the SGA flushing practically all other data from it – which will obviously have an effect on other users.
Conversely, when a full table scan is performed only the last blocks read are held in the SGA (the actual number is determined by the multi-block read count) so the impact on other users of the database would be minimal.
Summary
To decide whether or not a full-table scan would be better than an indexed read, for a large table you need to consider what proportion of the table the query will retrieve from your Oracle database and consider the likely effect of that on other users. The denser the data, the more efficient a full table scan is for very large tables, but generally if you’re reading more than 1-10% of a very large table, a full table scan would be more efficient than an index scan and table lookup.
For small tables you will get much better performance from your Oracle database by caching the table in its entirety (so that it is always in memory), or by using an index-organised table then you will by relying on indexes. Having said that, every table should have a primary key index to guarantee uniqueness, but you don’t have to use it.
Dec/090
Oracle Performance Tuning Overview
There isn’t the time or space to provide an exhaustive study – whole books have been written about Oracle performance tuning. Instead we’ll just consider a few of the most important tactics and there’ll be suggestions for further reading, so you can expand your knowledge of this important subject.
One point that needs to be made clear is that performance tuning is not an exact science – you can’t always predict that a certain tactic or technique will improve performance. Every change has to be tested, because whilst it may improve performance in one area, it may also degrade performance in another area. Adding an index is a classic example of this – it may improve query performance but the performance of inserts and deletes become slower because the index entries also have to be updated.
The other thing to remember about performance tuning is that you need to decide on your goals before you start – do you need a 10% decrease in response time for queries ? Is adding new records or updating existing records 20% slower than targets ?
You also need to be aware that improving response time is not the only aim that you might have – you might want to reduce memory usage instead – which might mean having to redesign or rewrite some stored procedures. All these issues need to be considered before you start looking at performance.
Let’s start with a list of the topics that we’ll cover:
Use of full-table scans
How and when to use indexes.
How you can optimise joins.
How to use views to get a high performance database.
Why your database should NOT be normalised.
How to use stored procedures to sky-rocket Oracle performance.
How to use sub queries to boost Oracle performance
Jul/090
What is a Database
Definitions:
A database is a collection of information organized into interrelated tables of data and specifications of data objects.
A table in a relational database is a predefined format of rows and columns that define an entity.
Database tables are composed of individual columns corresponding to the attributes of the object.
In a relational database, a row consists of one set of attributes (or one tuple) corresponding to one instance of the entity that a table schema describes.
Also Known As: Record
A single data item related to a database object. The database schema associates one or more attributes with each database entity.
Also Known As: field, column
A database record consists of one set of tuples for a given relational table. In a relational database, records correspond to rows in each table.
Databases are designed to offer an organized mechanism for storing, managing and retrieving information. They do so through the use of tables. If you’re familiar with spreadsheets like Microsoft Excel, you’re probably already accustomed to storing data in tabular form. It’s not much of a stretch to make the leap from spreadsheets to databases. Let’s take a look.
Database Tables
Just like Excel tables, database tables consist of columns and rows. Each column contains a different type of attribute and each row corresponds to a single record. For example, imagine that we were building a database table that contained names and telephone numbers. We’d probably set up columns named “FirstName”, “LastName” and “TelephoneNumber.” Then we’d simply start adding rows underneath those columns that contained the data we’re planning to store.
If we were building a table of contact information for our business that has 50 employees, we’d wind up with a table that contains 50 rows.
Databases and Spreadsheets
At this point, you’re probably asking yourself an obvious question – if a database is so much like a spreadsheet, why can’t I just use a spreadsheet? Databases are actually much more powerful than spreadsheets in the way you’re able to manipulate data. Here are just a few of the actions that you can perform on a database that would be difficult if not impossible to perform on a spreadsheet:
* Retrieve all records that match certain criteria
* Update records in bulk
* Cross-reference records in different tables
* Perform complex aggregate calculations
Jul/090
What is Oracle PLSQL
PL/SQL is Oracle’s SQL++ programming language providing structure and flow control extensions to SQL. The name PLSQL is derived from the term “Procedural Language extensions to SQL”.
On its own, SQL enables you to specify what you want done but not how it is done. However, you often need more control over how data is retrieved and manipulated and this is where PL/SQL comes in.
The procedural capabilities combined with standard SQL in Oracle PLSQL gives developers far more control of how their SQL statements interact with the database and makes using PL/SQL an excellent alternative to developing applications in other languages such as Java or C or VB.
The language itself is modeled on Ada, so Java/C/C++ programmers will find the syntax rather strange and probably won’t like the single”=” being used for comparison, but anyone who’s used Pascal or Ada or Modula2 will fell right at home.
PL/SQL is not a pure object-oriented language like Java or Ada, but it does support some obect-oriented features such as classification, polymorphism and, in the later versions, inheritance.
Jun/090
Advanced SQL – Sub-Queries
This advanced SQL tutorial focuses on the design of more complex SQL statements and the strategies available for implementing them, it concentrates on sub queries and joins because they are often inter-changeable, and views because these are often used to hide the complexity of queries involving sub-queries and joins.
As this is an advanced tutorial there is some consideration of performance issues, but this aspect is more thoroughly explored in our series on Oracle performance tuning.
This article talks about the following:
- Introduction to Sub-queries
- Use of sub-queries.
- Nested sub queries
- Correlated sub-queries.
Introduction to Sub-queries
Sub queries are also known as nested queries and are used to answer multi-part questions. Sub queries and joins are often interchangeable and in fact the Oracle optimiser may well treat a query containing a sub-query exactly as if it were a join.
Let’s use a trivial example of finding the names of everybody who works in the same department as a person called Jones to illustrate this point. The SQL could be written using a sub query as follows:
SELECT name FROM emp WHERE dept_no =
(SELECT dept_no FROM emp WHERE name = ‘JONES’)
or as a join statement, like this:-
SELECT e1.name FROM emp e1,emp e2
WHERE e1.dept_no = e2.dept_no AND e2name = ‘JONES’
With a trivial example like this there would probably be very little difference in terms of performance of the SQL for such a simple query, but with more complex queries there could well be performance implications. For this reason it is always worth trying a few variations of the SQL for a query and examining the execution plans before deciding on a particular approach, unless they’re very simple queries.
Learn more about Oracle performance tuning here.
Non Correlated Sub-Queries
There are, in fact, two types of sub query: correlated and non-correlated. The example shown above is a non-correlated sub query. The difference between them is that a correlated sub query refers to a column from a table in the parent query, whereas a non-correlated sub query doesn’t. This means that a non-correlated sub query is executed just once for the whole SQL statement, whereas correlated sub queries are executed once per row in the parent query.
Uses of Sub Queries
The most common use of sub queries is in the WHERE clause of queries to define the limiting condition for the rows returned (i.e. what value(s) the rows must have to be of interest), as in the previous example. However, they can also be used in other parts of the query. Specifically, sub queries can be used:
* to define the limiting conditions for SELECT, UPDATE and DELETE statements in the following clauses:-
- WHERE
- HAVING
- START WITH
Instead of a table name in
- INSERT statements
- UPDATE statements
- DELETE
- statements the FROM clause of SELECT statements
* To define the set of rows to be created in the target table of a CREATE TABLE AS or INSERT INTO sql statement.
* To define the set of rows to be included by a view or a snapshot in a CREATE VIEW or CREATE SNAPSHOT statement.
* To provide the new values for the specified columns in an UPDATE statement
The first example of sub query in SQL shown above, used a simple equality expression as we were interested in only one row, but we can also use the sub query to provide a set of rows.
For example, to find the names of all employees in the same departments as Smith and Jones, we could use the following SQL statement :-
SELECT name FROM emp WHERE dept_no IN
(SELECT dept_no FROM emp WHERE name = ‘JONES’ OR name = ‘SMITH’)
In fact, the original example could also return more than one row from the sub query if there were two or more people that were called Jones working in different departments. In the first example a run-time SQL error would be generated in that case, because the first example, by using ‘=’, specified that the sub query should produce no more than one row (it is perfectly legitimate for a sub query to return no rows).
We can reverse the question to ask for the names of all the employees that are NOT in the same department as Jones, To do this, the sense of the sub query just has to be reversed by prefixing it with ‘NOT’ or ‘!’. Again depending on whether there might be more than one Jones, we would either use ‘IN’ or ‘=’
SELECT name FROM emp WHERE dept_no NOT IN
( SELECT dept_no FROM emp WHERE name = ‘JONES’)
Or
SELECT name FROM emp WHERE dept_no !=
( SELECT dept_no FROM emp WHERE name = ‘JONES’)
Nested Sub-Queries
The SQL syntax allows queries to be nested, meaning that a sub query itself can contain a sub query, enabling very complex queries to be built as there is no syntacttical limit to the level of besting. However, very complex queries should be avoided as they are difficult to understand and to maintain and may not perform that well either.
For example, the SQL statement to find the departments that have employees with a salary higher than the average employee salary could be written as:
SELECT name FROM dept
WHERE id IN
(
SELECT dept_id FROM emp
WHERE sal >
(SELECT avg(sal)FROM emp)
)
Any of the other comparison operators instead of ‘=’ or ‘IN’ such as ‘>’, or ‘<’ can also be used with a sub query.
Sub Queries In The From Clause
The examples so far in this advanced SQL tutorial all had sub queries in the where clause, but sub queries can also be used in the from clause instead of a table name. In these circumstances the sub query acts as if it had been predefined as a view.
For example, the following SQL statement returns the amount of used space, the free space and the total allocated space for all tablespaces in a database.
SELECT ts.tablespace_name
,ROUND(fs.mbytes,2) “Free (Mbytes)”
FROM dba_tablespaces ts
,(
SELECT tablespace_name
,SUM(bytes)/1024/1024 mbytes
FROM dba_free_space
GROUP BY tablespace_name
) fs
WHERE ts.tablespace_name = fs.tablespace_name
Note that the sub query is given an alias so that results can be used in the main body of the query.
Sub Queries That Return No Rows
Up until now the queries shown have all been expected to produce a result, but when creating tables, it can be very useful to write the SQL to use a sub query which will not return any rows – when just the table structure is required and not any of the data.
In the following example we create a copy of the policy table with no rows:
CREATE TABLE new_policy AS
SELECT * from policy WHERE 1=0;
The sub query returns no data but does return the column names and data types to the ‘create table’ statement.
Correlated Sub-Queries
As we’ve seen already, there are two types of sub query: correlated and non-correlated. We’ve already looked at non-correlated sub queries (see advanced SQL tutorial part 1). All of the examples of sub queries up until now have been non-correlated sub queries.
Just like non-correlated sub queries, correlated sub queries are used to answer multi-part questions, but they are most often used to check for existence or absence of matching records in the parent table and the related table in the sub query.
A correlated sub query refers to a column from a table in the parent query. As mentioned in part 1 sub-queries (both correlated and non-correlated) and joins are usually interchangeable. However the SQL may be significantly faster when a correlated sub-query is used as correlated sub queries refer to a column from their parent queries, they are executed once per row in the parent query whereas non-correlated sub queries are executed once for the whole statement.
For example, using the emp and dept tables from before, to find out which departments have no employees assigned to them, we can write the SQL statement in 3 different ways – as a non-correlated sub query, as an outer join, or as a correlated sub-query.
Example 1 – non-correlated sub query
SELECT dept.name FROM dept
WHERE dept.id NOT IN
(
SELECT dept_id
FROM emp
WHERE dept_id IS NOT NULL
)
Example 2- outer join
SELECT dept.name FROM dept,emp
WHERE emp.dept_id (+) = dept.id
Example 3 – correlated sub query
SELECT dept.name FROM dept
WHERE NOT EXISTS (SELECT dept_id
FROM emp
WHERE emp.dept_id = dept.id)
The second example is an outer join SQL statement which may produce differnt reults to the other 2 queries as it returns both matching rows and the non-matching rows on one side of the join. In this case the query would return the names of departments which have no employees assigned to them plus the names of those departments that do have employees assigned to them.
The first and the third SQL statements would produce exactly the same results, but the first would probably be slower than the third if the dept_id column in the emp table were indexed (depending on the sizes of the tables).
The first SQL statement can not use any indexes – the where clause of the sub query is just checking for NOT NULL rows – so a full table scan would be performed. Also the sub query would be executed once for each row in the dept table.
On the other hand, the sub query in the third example can use the index and since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.
As you can see there are nearly always several ways in which the SQL for a query may be written, and it is therefore best to try alternative SQL statements particularly for complex queries before deciding on the preferred one.
Jun/090
Short Oracle Tutorial For Beginners
Introduction
This is just a brief Oracle tutorial for beginners, to provide a short history of databases and Oracle’s role in them, explain relational theory and give you an idea on how relational databases work with a few examples. There is also a very brief discussion of object-oriented design as it applies to databases.
Time and space don’t permit an in-depth discussion of all the features available in Oracle, but if you would like to learn more just contact us and ask for our free Oracle tutorial mini-course.
History of Databases – From Trees To Objects
The storage and management of data is probably the biggest headache for all businesses. It has been so for a long while and is likely to continue for a long while too. As companies try to store more and more details about their customers and their buying habits and as regulatory requirements for storing more data for longer, so companies will need to store and manage more and more data. The only way this can be done at a reasonable cost is by the use of computers.
In the late 1960s/early 1970s, specialised data management software appeared – the first database management systems (DBMS). These early DBMS were either hierarchical (tree) or network (CODASYL) databases. These early systems were very complex and inflexible and so it adding new applications or reorganising the data was very difficult and time-consuming.
In 1970 the relational data model was defined by E.F. Codd (see “A Relational Model of Data for Large Shared Data Banks” Comm. ACM. 13 (June 6, 1970), 377-387). This delivered a solution to the problems of tree and network databases due to the concept of normalisation which involves the separation of the logical and physical representation of data.
In 1974 IBM started a project called System/R to prove the theory of relational databases. This led to the development of a query language called SEQUEL (Structured English Query Language) later renamed to Structured Query Language (SQL) for legal reasons and now the query language of all databases.
In 1978 a prototype System/R implementation was evaluated at a number of IBM customer sites. By 1979 the project finished with the conclusion that relational databases were a feasible commercial product.
IBM’s research into relational databases had also come to the attention of a group of engineers in California who were so convinced of the potential that they formed a company called Relational Software, Inc. in 1977 to build such a database. Their product was called Oracle and the first version for VAX/VMS was released in 1979, thereby becoming the first commercial rdbms, beating IBM to market by 2 years.
In the 1980s the company was renamed to Oracle Corporation. Throughout the 1980s, new features were added and performance improved as the price of hardware came down and Oracle became the largest independent rdbms vendors. By 1985 they boasted of having more than 1000 installations.
As relational databases became accepted, companies wanted to expand their use to store images, spreadsheets, etc. which can’t be described in 2-dimensional terms. This led to the Oracle database becoming an object-relational hybrid in version 8.0, i.e. a relational database with object extensions, enabling you to have the best of both worlds.
What is a relational database?
As mentioned before, a relational database is based on the separation and independence of the the logical and physical representations of the data. This provides enormous flexibility and means you can store the data physically in any way without affecting how the data is presented to the end user. The separation of physical and logical layers means that you can change either layer without affecting the other.
A relational database can be regarded as a set of 2-dimensional tables which are known as “relations” in relational database theory. Each table has rows (”tuples”) and columns (”domains”). The relationships between the tables is defined by one table having a column with the same meaning (but not necessarily value) as a column in another table.
For example consider a database with just 2 tables :
emp(id number
,name varchar2(30)
,job_title varchar2(20)
,dept_id number)
holding employee information and
dept(id number
,name varchar2(30))
holding department information.
There is an implied relationship between these tables because emp has a column called dept_id which is the same as the id column in dept. In Oracle this is usually implemented by what’s called a foreign-key relationship which prevents values being stored that are not present in the referenced table.
Relational databases obtain their flexibility from being based on set theory (also known as relational calculus) which enables sets or relations to be combined in various ways, including:
* join/intersection
* union (i.e. the sum of 2 sets);
* exclusive “OR” (i.e. the difference between 2 sets)
* and outer-join which is a combination of intersecting and exclusive or ing.
The intersection or join between 2 sets (in this case, tables) produces only those elements that exist in both sets.
Therefore, if we join Emp and Dept on department id, we will be left with only those employees who work for a department that is in the dept table and only those departments which have employees who are in the emp table.
The union produces the sum of the tables – meaning all records in Emp and all records in Dept. and this may be with or without duplicates.
Let’s use the following data to provide specific examples:
Dept
| Id | Name |
| 1 | HR |
| 2 | IT |
| 3 | Marketing |
| 4 | Sales |
| 5 | Finance |
Emp
| Id | Name | Dept Id |
| 1 | Bill Smith | 3 |
| 2 | Mike Lewis | 2 |
| 3 | Ray Charles | 3 |
| 4 | Andy Mallory | 4 |
| 5 | Mandy Randall | 6 |
| 6 | Allison White | 1 |
The join of Emp and Dept. on the department id would produce the following result:
| Emp.Id | Emp.Name | Dept.Id | Dept.Name |
| 1 | Bill Smith | 3 | Marketing |
| 2 | Mike Lewis | 2 | IT |
| 3 | Ray Charles | 3 | Marketing |
| 4 | Andy Mallory | 4 | Sales |
| 6 | Allison White | 1 | HR |
The union of Emp and Dept. would produce the following results
| Id | Name |
| 1 | Bill Smith |
| 2 | Mike Lewis |
| 3 | Ray Charles |
| 4 | Andy Mallory |
| 5 | Mandy Randall |
| 1 | HR |
| 2 | IT |
| 3 | Marketing |
| 4 | Sales |
| 5 | Finance |
The union operator is only allowed when the number and data types of the columns in the 2 sets are the same. It is not normally be used to combine sub sections from one or more tables rather than entire tables.
There are other operators and variations but there isn’t the space or the time to provide full details in this short Oracle tutorial.
The later versions of Oracle (Oracle 8 onwards) support both relational and object-oriented features. The relational features are more prominent at the moment, but this is beginning to change. In this context an object has both attributes and methods (programs stored with the object that performs a certain action or task) and in a true object-oriented database would belong to a class and would allow multilevel inheritance.
Jun/090
A Few Hints And Tips On Optimizing Your SQL
This was intended to be a list of hints and tips that you might find useful when using SQ,L but a mere list of tips would be of little benefit without the knowledge to make use of them, so we’ve expanded the list to include the explanations to increasing its usefulness and to make it a proper tutorial.
One important point to remember is that Oracle caches the compiled form of SQL and is therefore able to re-use queries which are the same as previously executed queries. This saves the time and resources required to parse the statement and determine the execution plan. How can you do this ?
1. Use Views
Views are a good way to ensure the same query is re-used as much as possible.
Remember that even just changing the case and spacing of the words could prevent a query from being reused. A view is merely a pre-defined query, the text of which is stored in the database. Therefore by using views you are using exactly the same queries and eliminating the re-parsing overhead. As the load on the database increases this re-parsing overhead becomes more and more significant. Materialized views take the concept one stage further by actually running the query and storing the results in a table
2. Use Stored Procedures
Another way is to use stored procedures which are program units that contain both SQL and logic statements and are stored in the database. Oracle allows the use of PL/SQL and Java stored procedures. Stored procedures and views also have the advantage that the queries in the views/stored procedures have to be tuned only once, not in every place where they’re used. Like views, stored procedures also eliminate the overhead of sending the queries from the client to the server as the queries are already on the server.
3. Use Bind Variables
The use of bind variables in queries makes them generic and therefore re-usable.
For example, instead of writing a query like :-
SELECT name,addr FROM custs WHERE id = 12345;
Change it to:-
SELECT name,addr FROM custs WHERE id = <cust_id>;
The first query will only be re-used when you request the details for customer number 12345, whereas the second query will be re-used for any other customer.
4. Use Selective Indexes
Ensure that tables are accessed via selective indexes, unless the table is very small or very large, in which case it may be better not to use the indexes.
If the table were very small it could be cached completely, or all the columns could be indexed which means only the index would have to be read to satisfy any query.
Also make sure that you’re not disabling the use of an index by:-
* using an operator on the column (eg. <indexed_col> + 1);
* the use of hints, if you’re running Oracle(only applies if you’re using the cost based optimizer);
* using NULL and not equal checks. (eg. <indexed_col> <> 12345 ; or <indexed_col> IS NULL)
5. Use Full-Table Scans
If the table is very large, depending on how many blocks are read, using an index may remove everything else from the buffer cache and degrade the performance of all other queries. In which case a full-table scan is better – only the last few blocks read are kept in the buffer cache.
6. Optimize Joins
* All other things being equal, the driving table is the one listed LAST in the FROM clause, when using the rule-based optimizer. Changing the order of the columns in the join condition does not change which table is used as the driving table. Choose the driving table carefully to ensure the minimum number of rows are returned.
When using the cost-based optimizer, ensure that all the tables in the join have been analyzed (ask your dba), if they haven’t, this may well cause poor performance. The most usual way to optimize queries when using the cost-based optimizer is to use hints, which instruct the parser as to which indexes should or should not be used, or which tables should be scanned in full. You can also experiment with the order of the tables in the join.
* Indexes – these can still be used even if the where clause contains a “like” condition but not if there is a “not like” condition.
* Outer joins – the correct syntax for outer joins using Oracle syntax is:
tab1.col1(+)= tab2.col1
or tab1.col1 = tab2.col1(+)
The bracketed plus sign follows the column of the table which has/may have the missing row.
An alternative is to use the ANSI standard outer join format which has the advantage that it enables you to perform a full outer join in one statement:
tab1.col1 left outer join tab2.col1 (return all rows from tab1)
tab1.col1 right outer join tab2.col1 (return all rows from tab2)
tab1.col1 full outer join tab2.col1 (return all rows from tab1 and tab2)
The final tip for this short tutorial is:
7. Name The Columns In A Query
There are three good reasons why it is better to name the columns in a query rather than to use “select * from …”.
1. Network traffic is reduced. This can have a significant impact on performance if the table has a large number of columns, or the table has a long or long raw column or have in-line clob or blob columns (all of which can be up to 2 Gigabytes in length). These types of columns will take a long time to transfer over the network and so they should not be fetched from the database unless they are specifically required.
2. The code is easier to understand, which means you need fewer comments!
3. It could save the need for changes in the future. If you are using views, not only might columns be added to or removed from the view, but the order of the columns could well change – in which case using “SELECT *” at best would fetch the wrong data and at worst would fail with an Oracle error which might take a long while to understand.
Jun/090
Oracle Performance Tuning Tactics Part 1
Use Full Table Scans To Boost The Performance Of Your Oracle Database
Before we start, let me make it clear that performance tuning is not always an exact science – by which I mean that you can’t always predict that a certain tactic or technique will improve performance. Every change has to be tested, because whilst it may improve performance in one area, it may also degrade performance in another area. Adding an index is a classic example of this – it may improve query performance but the performance of inserts and deletes become slower because the index entries also have to be updated.
The other thing to remember about performance tuning is that you need to decide on your goals before you start – do you need a 10% decrease in response time for queries ? Is adding new records or updating existing records 20% slower than targets ?
You also need to be aware that improving response time is not the only aim that you might have – you might want to reduce memory usage instead – which might mean having to redesign or rewrite some stored procedures. All these issues need to be considered when you you start looking at performance.
I’ll start by listing the “weapons” you have available in your “arsenal” which are as follows:
1. Full table scans
2. Indexes
3. Joins
4. Views
5. De-normalization
6. Stored Procedures
7. Sub-queries
1. Full table scans
This might seem counter-intuitive, surely all queries on a table should be index driven ? Actually, no, they shouldn’t. If the table is very large or very small using an index may increase the amount of i/o that is required to access the data and as i/o is the slowest operation, this will cause a degradation in performance.
1.1 The Effect On Small Tables
Consider a very small (reference) table such as a list of department ids and the associated department names in a HR application. Even a large company is likely to have only a few departments – hr, sales, marketing, finance, IT, so the table is going to be quite small. Now let’s suppose the table has just 2 columns – department id and department name – with an index on the department id. To find the department name for a given id, we would have to read the index and then read the table, but because the table is so small and because Oracle reads multiple database blocks in one read operation the whole table is scanned in just one read, so however efficient the index, by using it we will be performing unnecessary i/o.
In this case, therefore, a full table scan is faster than an index scan and table lookup. The exception to this of course is when the table has been created as an index-only table (available since Oracle Cool which means that the whole table is stored in a B-tree structure (although you may have pointers to overflow areas).
1.2 The Effect On Very Large Tables
Let’s look at using this technique for very large tables now. Surely they should use an index ? Otherwise you might have to read thousands and thousands of blocks. It is correct to say that a full table scan of a very large table could read many thousands of data blocks, but as we shall see it may be better to do this than to perform an index scan and table lookup.
The situation when the a full table scan is very likely to perform better than an index scan and table lookup is when you are retrieving 10% or more of the data in the table and it may perform better even when you are retrieving as little as 1% of the table data. Of course if you only want to retrieve one row in the table, then you would want to use an index.
1.2.1 Index Scan And Table Lookup Vs. Full Table Scan For Very Large Tables
Let’s look at the 2 scenarios then – retrieving 10% of the table data by index scan and table lookup vs. full table scan.
To make the maths easy, assume our table has 10,000,000 rows with 10 data rows per block and 100 index entries per block. Therefore to read 10% of the table via an index scan and table lookup, we would have to read 10,000 (1,000,000/100) index blocks plus 100,000 (1,000,000/10) data blocks. That’s 110,000 blocks in total.
However this assumes that the data is stored in order which means that we only retrieve the blocks of data that we want. If the data is not sorted in the table then the worst case is that we would have to read 1 block for each row of data i.e. 1,000,000 blocks, which would would give us a worst case total of 1,010,000 blocks.
For a full table scan the maths is easy: (10,000,000 rows)/(10 rows per block) = 1,000,000 blocks. This is less than the worst case scenario for an indexed read, but more than the best case scenario for an indexed read. This would seem to suggest that if you sort your data before loading, an indexed read would be faster than a full-table scan.
Whilst it is true that pre-sorting the data of very large table will improve performance , it is not necessarily correct that the read via the index will be better overall. We also need to take into account what happens to the blocks stored in the buffer cache of the Oracle SGA and the impact this will have on other users of the database.
Let’s look at the effect on the buffer cache of reading many data blocks via an index. As we know, data and index blocks are stored in the buffer cache for reuse by other queries by being marked as least recently used when we do an indexed read. However, those data blocks read by a full table scan are quickly aged out of the buffer cache, because they are not marked as least recently used.
What this means is that the large number of blocks (1,010,000) of index and table data read via the indexed read of our table will be saved in the SGA flushing practically all other data from it – which will obviously have an effect on other users.
Conversely, when a full table scan is performed only the last blocks read are held in the SGA (the actual number is determined by the multi-block read count) so the impact on other users of the database would be minimal.
1.3 Summary
In conclusion, therefore, to decide whether or not a full-table scan would be better than an indexed read, for a large table you need to consider what proportion of the table the query will retrieve from your Oracle database and consider the likely effect of that on other users. The denser the data, the more efficient a full table scan is for very large tables, but generally if you’re reading more than 1-10% of a very large table, a full table scan would be more efficient than an index scan and table lookup.
For very small tables you will get much better performance from your Oracle database by caching the table in its entirety (so that it is always in memory), or by using an index-organised table then you will by relying on indexes. Having said that, every table should have a primary key index to guarantee uniqueness, but you don’t have to use it.