You’ll hear it from many people, the biggest problem in many applications is the non-use of bind variables. Oracle bind variables are a super important way to make Oracle SQL reentrant.
Why is the use of bind variables such an issue? Well, Oracle uses a signature generation algorithm to assign a hash value to each SQL statement based on the text of the SQL statement. Any change in a statement (including whitespaces or lower-uppercase change) will result in a new hash and thus Oracle assumes it is a new statement. Each new statement must be verified, parsed and have an execution plan generated and stored, all high overhead that might be avoided by using bind variables.
Bind variables does well on most of the cases, but there are some rare cases that can reduce performance. Remember, the main idea on bind variables is to preserve the sql text, so, get the same hash value, so, skip parsing, so, use the same execution plan for the *same* (or it's better to say "similar", but neither the same, nor similar, let's invent a new word: "sameilar" :) ) queries. But there are some cases, especially for tables that has large clustering factor, or the data is not homogenous, or for partitioned tables, the optimizer will generate different execution plans for that "sameilar" queries. So with using bind variables, we force the optimizer to use the same execution plan, usually.
Here, for this article, I just have a scenario which the bind variables fails.
Here is the code:
SQL> drop table bind_is_bad;
Table dropped.
SQL> create table bind_is_bad (id number, name varchar2(200), surname varchar2(200));
Table created.
SQL>
SQL> declare
2 ran varchar2(200);
3 begin
4 for i in 1..100000
5 loop
6 ran:=dbms_random.string('l',200);
7 insert into bind_is_bad values(i,ran,ran);
8 insert into bind_is_bad values(100001,ran,ran);
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
SQL>
SQL> create index idx_bind_is_bad_id on bind_is_bad(id,name);
Index created.
We just created a new table called "bind_is_bad" and fill the table with id's of 1 to 100000 and randomly generated strings, plus 100000 times the record of the id=100001. Also, we insert them in nested order (1,100001,2,100001,3,100001 ...) to have a great clustering factor value. Then, we have id values of 1..100000 once and 100001 for 100000 times :) Then we created an index on it.
So, let's look at the statistics:
SQL> select table_name,num_rows,blocks,empty_blocks,last_analyzed
from all_tables where table_name='BIND_IS_BAD';
TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS LAST_ANAL
------------------------------ ---------- ---------- ------------ ---------
BIND_IS_BAD
SQL> select index_name, table_name, blevel, leaf_blocks,
clustering_factor, num_rows, last_analyzed
from all_indexes where index_name='IDX_BIND_IS_BAD_ID';
INDEX_NAME TABLE_NAME BLEVEL LEAF_BLOCKS
------------------------------ ------------------------------ ---------- -----------
CLUSTERING_FACTOR NUM_ROWS LAST_ANAL
----------------- ---------- ---------
IDX_BIND_IS_BAD_ID BIND_IS_BAD 2 6241
111757 200000 28-JUL-07
We see there is no statistics about table, and *some* for index. Clustering factor=111757 is really bad for a 200000-row table :)
Now, let's examine the executing plan's for different values of the same query (which are actually the "sameilar" queries :) )
SQL> set autotrace traceonly explain;
SQL> select * from bind_is_bad where id=1;
Execution Plan
----------------------------------------------------------
Plan hash value: 1198113483
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 217 | 4 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID| BIND_IS_BAD | 1 | 217 | 4 (0)|
|* 2 | INDEX RANGE SCAN | IDX_BIND_IS_BAD_ID | 1 | | 3 (0)|
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=1)
Note
-----
- dynamic sampling used for this statement
SQL> select * from bind_is_bad where id=2;
Execution Plan
----------------------------------------------------------
Plan hash value: 1198113483
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 217 | 4 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID| BIND_IS_BAD | 1 | 217 | 4 (0)|
|* 2 | INDEX RANGE SCAN | IDX_BIND_IS_BAD_ID | 1 | | 3 (0)|
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=2)
Note
-----
- dynamic sampling used for this statement
SQL> select * from bind_is_bad where id=100000;
Execution Plan
----------------------------------------------------------
Plan hash value: 1198113483
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 217 | 4 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID| BIND_IS_BAD | 1 | 217 | 4 (0)|
|* 2 | INDEX RANGE SCAN | IDX_BIND_IS_BAD_ID | 1 | | 3 (0)|
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=100000)
Note
-----
- dynamic sampling used for this statement
SQL> select * from bind_is_bad where id=100001;
Execution Plan
----------------------------------------------------------
Plan hash value: 4063739968
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 97122 | 20M| 2619 (1)| 00:00:32 |
|* 1 | TABLE ACCESS FULL| BIND_IS_BAD | 97122 | 20M| 2619 (1)| 00:00:32 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=100001)
Note
-----
- dynamic sampling used for this statement
SQL> select /*+ INDEX(bind_is_bad idx_bind_is_bad_id) */ * from bind_is_bad
where id=100001;
Execution Plan
----------------------------------------------------------
Plan hash value: 1198113483
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 97122 | 20M| 59708 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID| BIND_IS_BAD | 97122 | 20M| 59708 (1)|
|* 2 | INDEX RANGE SCAN | IDX_BIND_IS_BAD_ID | 97122 | | 3221 (1)|
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=100001)
Note
-----
- dynamic sampling used for this statement
SQL>
The plans are as expected, CBO uses the index for 1,2,3 and 100000, but full table scan for 100001. That is actually the best performance way. The decision is so, because of dynamic sampling. The optimizer sees most of the values are 100001, so use full table scan for 100001. We also forced the optimizer to use index scan for 100001 and the difference between FTS(full table scan) and IRS(index range scan) is really giant.
But now, the story begins. When we use bind variables, we save for the cost of parsing and use the same (actually the first parsed) execution plan for this "sameilar" queries, but we map the optimizer's choice for the value 100001 and that will also executed with the others' execution plan(using index).
SQL> set timing on
SQL> declare
2 TYPE test1_tab IS TABLE OF bind_is_bad%ROWTYPE;
3 t_tab test1_tab;
4 begin
5 for i in 99999 .. 100001
6 loop
7 execute immediate 'select * from bind_is_bad where id=:a'
bulk collect into t_tab using i;
8 end loop;
9 end;
10 /
PL/SQL procedure successfully completed.
Elapsed: 00:05:03.62
The code with bind variables lasts so much, isn't it? It was just 3 simple queries in a loop! But if we don't use bind variables:
SQL> alter system flush buffer_cache;
System altered.
Elapsed: 00:00:00.26
SQL> declare
2 TYPE test1_tab IS TABLE OF bind_is_bad%ROWTYPE;
3 t_tab test1_tab;
4 begin
5 for i in 99999 .. 100001
6 loop
7 execute immediate 'select * from bind_is_bad where id='||i
bulk collect into t_tab;
8 end loop;
9 end;
10 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:07.18
Download the complete pure code suitable for just copy/paste and execute, from here.
You see the drama? Bind variable really fails for a stupid query that just executed 3 times :))
CONCLUSION
This is just a scenario to show you nothing is *always* best to use for all cases. Even the bind variables has some worse cases. So, know the tools, know how to use the tools, but not copy & paste the ideas, just plan your own situation, your own solution for your specific problem.
And, don't guess, don't estimate, just test different solutions and different approaches. That is the developers' responsibility to write an efficient code, not the DBAs', to optimize the code block, package or query itself.
|