Pages

Wednesday, November 25, 2015

Querying Data from Flat Files in Oracle | External Tables



External tables allow Oracle to query data that is stored outside the database in flat files. The ORACLE_LOADER driver can be used to access any data stored in any format that can be loaded by SQL*Loader. No DML can be performed on external tables but they can be used for query, join and sort operations. Views and synonyms can be created against external tables. They are useful in the ETL process of data warehouses since the data doesn't need to be staged and can be queried in parallel. They should not be used for frequently queried tables.
 


1. Create the directory, MY_DATA which will contain the data file.
CREATE OR REPLACE DIRECTORY my_data AS ‘D:\DATA_FILE’;

2. The data file, emp.csv contains the employee number, employee name, job, manager number, salary, commission and department number in the following format:

7369,SMITH,CLERK,7902,800,,20
7499,ALLEN,SALESMAN,7698,1600,300,30
7521,WARD,SALESMAN,7698,1250,500,30
7566,JONES,MANAGER,7839,2975,,20
7654,MARTIN,SALESMAN,7698,1250,1400,30
7698,BLAKE,MANAGER,7839,2850,,30
7782,CLARK,MANAGER,7839,2450,,10
7788,SCOTT,ANALYST,7566,3000,,20
7839,KING,PRESIDENT,,5000,,10
7844,TURNER,SALESMAN,7698,1500,0,30
7876,ADAMS,CLERK,7788,1100,,20
7900,JAMES,CLERK,7698,950,,30
7902,FORD,ANALYST,7566,3000,,20
7934,MILLER,CLERK,7782,1300,,10

3. Create the external table, emp_external.

CREATE TABLE emp_external
(
  EMPNO NUMBER(4),
  ENAME VARCHAR2(10),
  JOB VARCHAR2(9),
  MGR NUMBER(4),
  SAL NUMBER(7,2),
  COMM NUMBER(7,2),
  DEPTNO NUMBER(2)
)
ORGANIZATION EXTERNAL
    (TYPE ORACLE_LOADER
     DEFAULT DIRECTORY my_data
     ACCESS PARAMETERS
          (RECORDS DELIMITED BY NEWLINE
           LOGFILE my_data:'data.log'
           BADFILE my_data:'data.bad'
           DISCARDFILE my_data:'data.disc'
           fields terminated by ','
           MISSING FIELD VALUES ARE NULL)
    LOCATION (my_data:'emp.csv')
    );

The DEFAULT DIRECTORY points to the directory which contains the data file. LOGFILE, BADFILE and DISCARDFILE are the files which will be created when the data is loaded into the database. LOCATION indicates the name of the data file in the directory.

4. You can now access the data in the file by querying the external table.
SQL> select * from emp_external;

Output:

Query output of External Table
Query output of External Table

Wednesday, May 20, 2015

Wrapping PLSQL Source Code



Wrapping is the process of hiding PLSQL source code. Wrapping helps to protect your source code from business competitors and others who might misuse it. You can wrap PLSQL source code with either the wrap utility or DBMS_DDL subprograms. The wrap utility wraps a single source file, such as a SQL*Plus script. The DBMS_DDL subprograms wrap a single dynamically generated PLSQL unit, such as a single CREATE PROCEDURE statement.

Wrapped source files can be moved, backed up, and processed by SQL*Plus and the Import and Export utilities, but they are not visible through the static data dictionary views *_SOURCE.

Guidelines for Wrapping

  1. Wrap only the body of a package or object type, not the specification. This allows other developers to see the information they must use the package or type, but prevents them from seeing its implementation.
  2. Wrap code only after you have finished editing it. You cannot edit PL/SQL source code inside wrapped files. Either wrap your code after it is ready to ship to users or include the wrapping operation as part of your build environment. To change wrapped PL/SQL code, edit the original source file and then wrap it again.
  3. Before distributing a wrapped file, view it in a text editor to be sure that all important parts are wrapped.
Limitations of Wrapping

  1. Wrapping is not a secure method for hiding passwords or table names. Wrapping a PL/SQL unit prevents most users from examining the source code, but might not stop all of them.
  2. Wrapping does not hide the source code for triggers. To hide the workings of a trigger, write a one-line trigger that invokes a wrapped subprogram.
  3. Wrapping does not detect syntax or semantic errors. Wrapping detects only tokenization errors (for example, runaway strings), not syntax or semantic errors (for example, nonexistent tables or views). Syntax or semantic errors are detected during PL/SQL compilation or when executing the output file in SQL*Plus.
  4. Wrapped PL/SQL units are not downward-compatible. Wrapped PL/SQL units are upward-compatible between Oracle Database releases, but are not downward-compatible. 
Wrapping PL/SQL Code with wrap Utility

The wrap utility processes an input SQL file and wraps only the PL/SQL units in the file, such as a package specification, package body, function, procedure, type specification, or type body. It does not wrap PL/SQL content in anonymous blocks or triggers or non-PL/SQL code.

To run the wrap utility, enter the wrap command at your operating system prompt using the following syntax (with no spaces around the equal signs):

wrap iname=input_file [ oname=output_file ]

input_file is the name of a file containing SQL statements, that you typically run using SQL*Plus. If you omit the file extension, an extension of .sql is assumed. For example, the following commands are equivalent:

wrap iname=/mydir/myfile
wrap iname=/mydir/myfile.sql

You can also specify a different file extension:
wrap iname=/mydir/myfile.src

output_file is the name of the wrapped file that is created. The defaults to that of the input file and its extension default is .plb. For example, the following commands are equivalent:

wrap iname=/mydir/myfile
wrap iname=/mydir/myfile.sql oname=/mydir/myfile.plb

You can use the option oname to specify a different file name and extension:
wrap iname=/mydir/myfile oname=/yourdir/yourfile.out


Input and Output Files for the PL/SQL wrap Utility
The input file can contain any combination of SQL statements. Most statements are passed through unchanged. CREATE statements that define subprograms, packages, or object types are wrapped; their bodies are replaced by a scrambled form that the PLSQL compiler understands.

The following CREATE statements are wrapped:

CREATE [OR REPLACE] FUNCTION function_name
CREATE [OR REPLACE] PROCEDURE procedure_name
CREATE [OR REPLACE] PACKAGE package_name
CREATE [OR REPLACE] PACKAGE BODY package_name
CREATE [OR REPLACE] TYPE type_name AS OBJECT
CREATE [OR REPLACE] TYPE type_name UNDER type_name
CREATE [OR REPLACE] TYPE BODY type_name

The CREATE [OR REPLACE] TRIGGER statement, and [DECLARE] BEGIN-END anonymous blocks, are not wrapped. All other SQL statements are passed unchanged to the output file.

All comment lines in the unit being wrapped are deleted, except for those in a CREATE OR REPLACE header and C-style comments (delimited by /* */). The output file is a text file, which you can run as a script in SQL*Plus to set up your PL/SQL subprograms and packages. Run a wrapped file as follows:

SQL> @wrapped_file_name.plb;


Using DBMS_DDL.CREATE_WRAPPED Procedure
In the below example CREATE_WRAPPED is used to dynamically create and wrap a package specification and a package body in a database.

Example: Using DBMS_DDL.CREATE_WRAPPED Procedure to Wrap a Package
DECLARE
  package_text VARCHAR2(32767); -- text for creating package spec & body
 
  FUNCTION generate_spec(pkgname VARCHAR2) RETURN VARCHAR2 AS
  BEGIN
    RETURN 'CREATE PACKAGE ' || pkgname || ' AS
            PROCEDURE raise_salary (emp_id NUMBER, amount NUMBER);
            PROCEDURE fire_employee (emp_id NUMBER);
            END ' || pkgname || ';';
  END generate_spec;
 
  FUNCTION generate_body(pkgname VARCHAR2) RETURN VARCHAR2 AS
  BEGIN
    RETURN 'CREATE PACKAGE BODY ' || pkgname || ' AS
            PROCEDURE raise_salary (emp_id NUMBER, amount NUMBER) IS
            BEGIN
              UPDATE employees
              SET salary = salary + amount WHERE employee_id = emp_id;
            END raise_salary;
           
            PROCEDURE fire_employee (emp_id NUMBER) IS
            BEGIN
              DELETE FROM employees WHERE employee_id = emp_id;
              END fire_employee;
            END ' || pkgname || ';';
  END generate_body;
 
BEGIN
  -- Generate package spec
  package_text := generate_spec('emp_actions')
 
  -- Create wrapped package spec
  DBMS_DDL.CREATE_WRAPPED(package_text);
 
  -- Generate package body
  package_text := generate_body('emp_actions');
 
  -- Create wrapped package body
  DBMS_DDL.CREATE_WRAPPED(package_text);
 
END;

-- Invoke procedure from wrapped package
CALL emp_actions.raise_salary(120, 100);
When you check the static data dictionary views *_SOURCE, the source is wrapped, or hidden, so that others cannot view the code details. For example:

SELECT text FROM USER_SOURCE WHERE name = 'EMP_ACTIONS';

The resulting output is similar to the following:

TEXT
--------------------------------------------------------------------
PACKAGE emp_actions WRAPPED
a000000
1f
abcd
...

Sunday, April 26, 2015

SQL Tuning

paid surveys

There are 4 main areas of performance tuning:
 
1        SQL Tuning – Responsibility of the Developer
2        Database Tuning – Responsibility of the Database Administrator
3        System Tuning – Responsibility of the System Administrator
4        Network Tuning – Responsibility of the Network / LAN / WAN Administrator.
    
      Before you begin

Every Oracle developer should have a copy of the Oracle Online Documentation – a series of books in HTML format forming a complete reference of the RDBMS and other Oracle products. Critical to understanding performance tuning are the manuals:
  1. Concepts
  2. Performance Tuning Guide and Reference
  3. SQL Reference

Do this before you start individual SQL statement tuning

This broad-brush approach can save thousands of hours of tedious SQL tuning because you can hundreds of queries at once.  Remember, you MUST do this first, else later changes to the optimizer parameters or statistics may un-tune your SQL.



SQL Tuning 


Remember, you must ALWAYS start with system-level SQL tuning, else later changes might undo your tuned execution plans:
  1. Optimize the server kernel - You must always tune your disk and network I/O subsystem (RAID, DASD bandwidth, network) to optimize the I/O time, network packet size and dispatching frequency.
     
  2. Adjusting your optimizer statistics - You must always collect and store optimizer statistics to allow the optimizer to learn more about the distribution of your data to take more intelligent execution plans.  Also, histograms can hypercharge SQL in cases of determining optimal table join order, and when making access decisions on skewed WHERE clause predicates.
     
  3. Adjust optimizer parameters - Optimizer optimizer_mode, optimizer_index_caching, optimizer_index_cost_adj.
     
  4. Optimize your instance - Your choice of db_block_size, db_cache_size, and OS parameters (db_file_multiblock_read_count, cpu_count, &c), can influence SQL performance.
     
  5. Tune your SQL Access workload with physical indexes and materialized views - Just as the 10g SQLAccess advisor recommends missing indexes and missing materialized views, you should always optimize your SQL workload with indexes, especially function-based indexes, a benefit for SQL tuning. 

Oracle SQL tuning goals:

Oracle SQL tuning is a phenomenally complex subject. There are some general guidelines that every Oracle DBA follows in order to improve the performance of their systems.
The goals of SQL tuning focus on improving the execution plan to fetch the rows with the smallest number of database "touches" (LIO buffer gets and PIO physical reads). 
1.    Remove unnecessary large-table full-table scans—Unnecessary full-table scans cause a huge amount of unnecessary I/O and can drag-down an entire database. The tuning expert first evaluates the SQL based on the number of rows returned by the query. The most common tuning remedy for unnecessary full-table scans is adding indexes. Standard b-tree indexes can be added to tables, and bitmapped and function-based indexes can also eliminate full-table scans. In some cases, an unnecessary full-table scan can be forced to use an index by adding an index hint to the SQL statement.

2.    Cache small-table full-table scans—In cases where a full-table scan is the fastest access method, the administrator should ensure that a dedicated data buffer is available for the rows.  In Oracle8 and beyond, a small table can be cached by forcing it into the KEEP pool.

3.    Verify optimal index usage—Oracle sometimes has a choice of indexes, and the tuning professional must examine each index and ensure that Oracle is using the proper index.

4.    Materialize your aggregations and summaries for static tables - One features of the Oracle 10g SQLAccess advisor is recommendations for new indexes and suggestions for materialized views.  Materialized views pre-join tables and pre-summarize data, a real silver bullet for data mart reporting databases where the data is only updated daily. Again, see the book "Oracle Tuning: The Definitive Reference", for complete details on SQL tuning with materialized views.


A strategic plan for Oracle SQL tuning
Step 1:  Identify high-impact SQL

The SQL statements will be ranked according the number of executions and will be tuned in this order. The executions column of the v$sqlarea view and the stats$sql_summary or the
dba_hist_sql_summary table can be used to locate the most frequently used SQL.  Note that we can display SQL statements by:
  1. Rows processed:  Queries that process a large number of rows will have high I/O and may also have impact on the TEMP tablespace.
     
  2. Buffer gets:  High buffer gets may indicate a resource-intensive query.
     
  3. Disk reads:  High disk reads indicate a query that is causing excessive I/O.
     
  4. Memory KB:  The memory allocation of a SQL statement is useful for identifying statements that are doing in-memory table joins.
     
  5. CPU secs:  This identifies the SQL statements that use the most processor resources.
     
  6. Sorts:  Sorts can be a huge slowdown, especially if they're being done on a disk in the TEMP tablespace.
     
  7. Executions:  The more frequently executed SQL statements should be tuned first, since they will have the greatest impact on overall performance.
Step 2:  Determine the execution plan for SQL
As each SQL statement is identified, it will be 'explained' to determine its existing execution plan. There are a host of third-party tools on the market that show the execution plan for SQL statements. The most common way of determining the execution plan for a SQL statement is to use Oracle's explain plan utility. By using explain plan, the Oracle DBA can ask Oracle to parse the statement and display the execution class path without actually executing the SQL statement.
To see the output of an explain plan, you must first create a 'plan table.' Oracle provides a script in $ORACLE_HOME/rdbms/admin called utlxplan.sql. Execute utlxplan.sql and create a public synonym for the plan_table:
Most relational databases use an explain utility that takes the SQL statement as input, runs the SQL optimizer, and outputs the access path information into a plan_table, which can then be interrogated to see the access methods. Listing 1 runs a complex query against a database.
This syntax is piped into the SQL optimizer, which will analyze the query and store the plan information in a row in the plan table identified by RUN1. Please note that the query will not execute; it will only create the internal access information in the plan table. The plan tables contains the following fields:
  1. operation:  The type of access being performed. Usually table access, table merge, sort, or index operation
     
  2. options:  Modifiers to the operation, specifying a full table, a range table, or a join
     
  3. object_name:  The name of the table being used by the query component
     
  4. Process ID:  The identifier for the query component
     
  5. Parent_ID:  The parent of the query component. Note that several query components may have the same parent.

 Step 3:  Tune the SQL statement
For those SQL statements that possess a sub-optimal execution plan, the SQL will be tuned by one of the following methods:
  1. Adding SQL 'hints' to modify the execution plan  
  2. Re-write SQL with Global Temporary Tables
  3. Rewriting the SQL in PL/SQL. For certain queries this can result in more than a 20x performance improvement. The SQL would be replaced with a call to a PL/SQL package that contained a stored procedure to perform the query.
Using hints to tune Oracle SQL

Among the most common tools for tuning SQL statements are hints. A hint is a directive that is added to the SQL statement to modify the access path for a SQL query.
Oracle publishes many dozens of SQL hints, and hints become increasingly more complicated through the various releases of Oracle and on into Oracle.
Note:  Hints are only used for de-bugging and you should adjust your optimizer statistics to make the CBO replicate the hinted SQL.  Let's look at the most common hints to improve tuning:
1.    Mode hints:  first_rows_10, first_rows_100
3.    Dynamic sampling: dynamic_sampling 
4.    Oracle SQL undocumented tuning hints - Guru's only
5.    The cardinality hint
6.    Self-order the table joins - If you find that Oracle is joining the tables together in a sub-optimal order, you can use the ORDERED hint to force the tables to be joined in the order that they appear in the FROM clause.
7.    Try a first_rows_n hint. Oracle has two cost-based optimizer modes, first_rows_n and all_rows. The first_rows mode will execute to begin returning rows as soon as possible, whereas the all_rows mode is designed to optimize the resources on the entire query before returning rows.

SELECT /*+ first_rows */

A case study in SQL tuning
One of the historic problems with SQL involves formulating SQL queries. Simple queries can be written in many different ways, each variant of the query producing the same result, but with widely different access methods and query speeds.

For example, a simple query such as 'What students received an A last semester'' can be written in three ways, as shown in below, each returning an identical result.

A standard join:
 
SELECT *
FROM STUDENT, REGISTRATION
WHERE
    STUDENT.student_id = REGISTRATION.student_id
AND
    REGISTRATION.grade = 'A';
 
A nested query:

SELECT *
FROM STUDENT
WHERE
    student_id =
    (SELECT student_id
        FROM REGISTRATION
        WHERE
       grade = 'A'
    );

A correlated subquery:

SELECT *
FROM STUDENT
WHERE
    0 <
    (SELECT count(*)
        FROM REGISTRATION
        WHERE
        grade = 'A'
        AND
        student_id = STUDENT.student_id
    );

Let's wind up with a review of the basic components of a SQL query and see how to optimize a query for remote execution.


Tips for writing more efficient SQL

Space doesn't permit me to discuss every detail of Oracle tuning, but I can share some general rules for writing efficient SQL in Oracle regardless of the optimizer that is chosen. These rules may seem simplistic but following them in a diligent manner will generally relieve more than half of the SQL tuning problems that are experienced:
  1. Rewrite complex subqueries with temporary tables - Oracle created the global temporary table (GTT) and the SQL WITH operator to help divide-and-conquer complex SQL sub-queries (especially those with with WHERE clause subqueries, SELECT clause scalar subqueries and FROM clause in-line views).  Tuning SQL with temporary tables (and materializations in the WITH clause) can result in amazing performance improvements.
     
  2. Use minus instead of EXISTS subqueries - Some say that using the minus operator instead of NOT IN and NOT Exists will result in a faster execution plan. 
  3. Use SQL analytic functions - The Oracle analytic functions can do multiple aggregations (e.g. rollup by cube) with a single pass through the tables, making them very fast for reporting SQL.
  4. Re-write NOT EXISTS and NOT EXISTS subqueries as outer joins - In many cases of NOT queries (but ONLY where a column is defined as NULL), you can re-write the uncorrelated subqueries into outer joins with IS NULL tests.  Note that this is a non-correlated sub-query, but it could be re-written as an outer join.
select book_key from book
where
book_key NOT IN (select book_key from sales);

Below we combine the outer join with a NULL test in the WHERE clause without using a sub-query, giving a faster execution plan.

select b.book_key
from book b, sales s
where  b.book_key = s.book_key(+)
and  s.book_key IS NULL;
  1. Index your NULL values - If you have SQL that frequently tests for NULL, consider creating an index on NULL values.  To get around the optimization of SQL queries that choose NULL column values (i.e. where emp_name IS NULL), we can create a function-based index using the null value built-in SQL function to index only on the NULL columns.   
  2. Leave column names alone - Never do a calculation on an indexed column unless you have a matching function-based index (a.k.a. FBI).  Better yet, re-design the schema so that common where clause predicates do not need transformation with a BIF:
where salary*5            > :myvalue
where substr(ssn,7,4)     = "1234"
where to_char(mydate,mon) = "january"
  1. Avoid the use of NOT IN or HAVING. Instead, a NOT EXISTS subquery may run faster (when appropriate). 
  2. Avoid the LIKE predicate = Always replace a "like" with an equality, when appropriate.
     
  3. Never mix data types - If a WHERE clause column predicate is numeric, do not to use quotes. For char index columns, always use quotes. There are mixed data type predicates:
where cust_nbr = "123"
where substr(ssn,7,4) = 1234
  1. Use decode and case - Performing complex aggregations with the "decode" or "case" functions can minimize the number of times a table has to be selected.  
  2. Don't fear full-table scans - Not all OLTP queries are optimal when they uses indexes.  If your query will return a large percentage of the table rows, a full-table scan may be faster than an index scan.  This depends on many factors, including your configuration (values for db_file_multiblock_read_count, db_block_size), query parallelism and the number of table/index blocks in the buffer cache. 
  3. Use those aliases - Always use table aliases when referencing columns.

paid surveys