The Oracle RDBMS allows you to collect
statistics of many different kinds as an aid to improving performance. This
package is concerned with optimizer statistics only. Given that Oracle sets
automatic statistics collection of this kind on by default, this package is
intended for only specialized cases.
The statistics of interest to be viewed or
modified can reside in the dictionary or in a table created in the user's
schema for this purpose. You can also collect and manage user-defined
statistics for tables and domain indexes using this package.
When a SQL
statement is executed, the database must convert the query into an execution
plan and choose the best way to retrieve the data. For Oracle, each SQL query
has many choices for execution plans, including which index to use to retrieve
table row, what order in which to join multiple tables together, and which
internal join methods to use (Oracle has nested loop joins, hash joins, star
joins, and sort merge join methods). These execution plans are computed by the
Oracle cost-based SQL optimizer commonly known as the CBO.
The choice of
executions plans made by the Oracle SQL optimizer is only as good as the Oracle
statistics. To always choose the best execution plan for a SQL query, Oracle
relies on information about the tables and indexes in the query.
Starting with the introduction of the dbms_stats package, Oracle provides a simple way for the Oracle professional to collect statistics for the CBO. The old-fashioned analyze table and dbms_utility methods for generating CBO statistics are obsolete and somewhat dangerous to SQL performance because they don't always capture high-quality information about tables and indexes. The CBO uses object statistics to choose the best execution plan for all SQL statements.
The dbms_stats utility does a far better job in estimating statistics, especially for large partitioned tables, and the better stats result in faster SQL execution plans. Andrew Holdsworth of Oracle Corporation notes that dbms_stats is essential to good SQL performance, and it should always be used before adjusting any of the Oracle optimizer initialization parameters:
The parameter options
Using one of the four provided methods, this option governs the way Oracle statistics are refreshed:
·
gather?Reanalyzes the whole schema
·
gather empty?Only analyzes tables that have no existing statistics
·
gather stale?Only reanalyzes tables with more than 10% modifications (inserts,
updates, deletes).
·
gather auto?Reanalyzes objects which currently have no statistics and objects with
stale statistics (Using gather auto is like combining gather stale
and gather empty.)
Note that both gather stale and gather
auto require monitoring. If you issue the alter table xxx monitoring
command, Oracle tracks changed tables with the dba_tab_modifications
view, which allows you to see the exact number of inserts, updates, and deletes
tracked since the last analysis of statistics
The estimate percent
option
The following estimate_percent
argument is a new way to allow Oracle’s dbms_stats to
automatically estimate the best percentage of a segment to sample when
gathering statistics:
estimate_percent => dbms_stats.auto_sample_size
You can verify the accuracy of the automatic statistics sampling by looking at the dba_tables sample_size column. It is interesting to note that Oracle chooses between 5 and 20 percent for a sample size when using automatic sampling. Remember, the better the quality of your statistics, the better the decision of the CBO.
The method_opt
option
The method_opt parameter for dbms_stats
is very useful for refreshing statistics when the table and index data change.
The method_opt parameter is also very useful for determining which
columns require histograms.
In some cases, the distribution of values within an
index will effect the CBOs decision to use an index versus perform a full-table
scan. This happens when a where clause has a disproportional amount of values,
making a full-table scan cheaper than index access.
Oracle histograms statistics can be created when you have a highly skewed index, where some values have a disproportional number of rows. In the real world, this is quite rare, and one of the most
Oracle histograms statistics can be created when you have a highly skewed index, where some values have a disproportional number of rows. In the real world, this is quite rare, and one of the most
common mistakes with the CBO is the unnecessary
introduction of histograms in the CBO statistics. As a general rule, histograms
are used when a column's values warrant a change to the execution plan.
To aid in intelligent histogram generation, Oracle uses the method_opt parameter of dbms_stats. There are also important new options within the method_opt clause, namely skewonly, repeat and auto:
To aid in intelligent histogram generation, Oracle uses the method_opt parameter of dbms_stats. There are also important new options within the method_opt clause, namely skewonly, repeat and auto:
method_opt=>'for all indexed columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for columns size auto'
The skewonly option is very
time-intensive because it examines the distribution of values for every column
within every index.
If dbms_stats discovers an index whose columns are unevenly distributed, it will create histograms for that index to aid the cost-based SQL optimizer in making a decision about index versus full-table scan access. For example, if an index has one column that is in 50 percent of the rows, a full-table scan is faster than an index scan to retrieve these rows.
--*************************************************************
-- SKEWONLY option?Detailed
analysis
--
-- Use this method for a
first-time analysis for skewed indexes
-- This runs a long time because
all indexes are examined
--*************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
cascade => true,
degree => 7
);
end;
If you need to reanalyze your statistics,
the reanalyze task will be less resource intensive with the repeat
option. Using the repeat option will only reanalyze indexes with
existing histograms, and will not search for other histograms opportunities.
This is the way that you will reanalyze you statistics on a regular basis.
--**************************************************************
-- REPEAT OPTION - Only reanalyze
histograms for indexes
-- that have histograms
--
-- Following the initial analysis,
the weekly analysis
-- job will use the ?repeat?
option. The repeat option
-- tells dbms_stats that no
indexes have changed, and
-- it will only reanalyze
histograms for
-- indexes that have histograms.
--**************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
cascade => true,
degree => 7
);
end;
Parallel collection
Oracle allows for parallelism when collecting CBO statistics, which can greatly speed up the time required to collect statistics. A parallel statistics collection requires an SMP server with multiple CPUs.
Better
execution speed
The dbms_stats utility is a great way to
improve SQL execution speed. By using dbms_stats to collect top-quality
statistics, the CBO will usually make an intelligent decision about the fastest
way to execute any SQL query. The dbms_stats utility continues to improve and
the exciting new features of automatic sample size and automatic histogram
generation greatly simplify the job of the Oracle professional.
Example – 01:
Collect Statistics on Table Level
sqlplus scott/tiger
exec dbms_stats.gather_table_stats (
ownname => 'HR',
tabname => 'EMPLOYEES',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
cascade => true,
degree => 5
)
/
PL/SQL procedure successfully completed.
Collect Statistics on Schema Level
sqlplus scott/tiger
exec dbms_stats.gather_schema_stats (
ownname => 'HR',
options => 'GATHER',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
cascade => true,
degree => 5
)
/
PL/SQL procedure successfully completed.
Execute
the below two queries to find out the values for the statistics gathered.
SQL- 01:
select ind.table_name,
ind.uniqueness,
col.index_name,
col.column_name,
ind.distinct_keys,
ind.sample_size
from dba_ind_columns col, dba_indexes ind
where ind.table_owner = 'HR'
and ind.table_name in ('EMPLOYEES', 'DEPARTMENTS')
and col.index_owner = ind.owner
and col.index_name = ind.index_name
and col.table_owner = ind.table_owner
and col.table_name = ind.table_name
order by col.table_name, col.index_name, col.column_position;
SQL- 02:
select table_name,
column_name,
data_type,
num_distinct,
sample_size,
to_char(last_analyzed, 'HH24:MI:SS')
last_analyzed,
num_buckets buckets
from dba_tab_columns
where table_name in ('EMPLOYEES', 'DEPARTMENTS')
order by table_name, column_id;
No comments:
Post a Comment