Oracle® Data Mining 1.1 Intended Audience This administrator’s guide is intended for anyone planning to install and run Oracle Data Mining — either a database administrator or a system a
Trang 1Oracle® Data Mining
1.1 Intended Audience
This administrator’s guide is intended for anyone planning to install and run Oracle Data Mining — either a database administrator or a system administrator
1.2 Structure
This guide is organized as follows:
■ Section 2, "Overview" : Briefly describes Oracle Data Mining 10g
Release 1 (10.1)
■ Section 3, "Oracle Data Mining Installation" : Describes the generic installation steps and upgrade information Platform-specific information is in the platform-specific README file
■ Chapter 4, "Database Configuration Issues" : Describes the database configuration issues that can affect ODM performance
■ Section 5, "Oracle Data Mining Administration" : Describes topics of interest to administrators, including improving Oracle Data Mining performance, detecting errors, etc
Oracle is a registered trademark, and Oracle9i, PL/SQL, and SQL*Plus are trademarks or registered trademarks of Oracle
Corporation Other names may be trademarks of their respective owners
Copyright 2003, Oracle.
Trang 2■ Section 6, "ODM Native Model Export and Import" : Describes using the PL/SQL interface to perform Model Export and Import, including requirements and restrictions.
■ Section 7, "Documentation Accessibility" : Describes Oracle
documentation accessibility standards
1.3 Where to Find Further Information
The documentation set for Oracle Data Mining is part of the Oracle10g
Database Documentation Library; the ODM document set consists of the
following documents:
■ Oracle Data Mining Administrator’s Guide, 10g Release 1 (10.1) (this
document) Includes generic installation information
README files
■ Oracle Database 10g Installation Guide for your platform.
■ Oracle Data Mining Concepts, 10g Release 1 (10.1)
■ Oracle Data Mining Application Developer’s Guide, 10g Release 1 (10.1)
For detailed information about the ODM Java API, see the ODM Javadoc in the directory $ORACLE_HOME/dm/doc/odmjdoc.zip
(for Windows, %ORACLE_HOME%\dm\doc\odmjdoc.zip) on any system where ODM is installed To prepare the Javadoc for user access, unzip this file so that users can display it in a browser
1.3.1 Related Manuals
For more information about the Oracle database, see:
■ Oracle Database Administrator's Guide
■ Oracle Universal Installer Concepts Guide
■ Oracle Database Migration
■ PL/SQL Packages and Types Reference
1.4 Conventions
In this manual, Windows refers to the Windows 2000 and Windows XP
Trang 3In examples, an implied carriage return occurs at the end of each line, unless otherwise noted You must press the Return key at the end of a line
of input
2 Overview
Oracle Data Mining (ODM) embeds data mining within the Oracle
database The data never leaves the database — the data, data preparation, model building, and model scoring results all remain in the database This enables Oracle to provide an infrastructure for application developers to integrate data mining seamlessly with database applications
Data mining functions such as model building, testing, and scoring are provided via a Java API and a PL/SQL API
Oracle Data Mining supports the following features:
Vector Machines
For detailed information about the classes that constitute the ODM Java API, see the Javadoc descriptions of classes
For detailed information about the subprograms and functions that
constitute the ODM PL/SQL API, see the PL/SQL Packages and Types
Trang 43 Oracle Data Mining Installation
This section specifies generic ODM requirements and provides a
description of the generic installation steps
There are three common cases for installing ODM:
■ Oracle9i release 1 (or earlier) is installed on your system (Section 3.2.2)
■ Oracle9i release 2 is installed on your system (Section 3.2.2)
3.2.1 No Database Installed
If this is a first-time installation of ODM on a system where the current release of Oracle is not installed, there are two basic ways to install the Oracle Enterprise Edition:
1. Create a database with the starter database (Section 3.2.1.1)
(Section 3.2.1.2)
database that automatically includes features that result in a highly effective database that is easy to manage
Trang 5see the release notes for late-breaking information that may affect the installation steps or your choices After you have specified the source and destination, continue with the following steps in OUI:
which configuration to choose, select "Create a starter database" and select "General-purpose database", or see Section 3.2.1.2 for information about installing ODM with a customized database
SID, a database character set, and indicate whether you would like to install example schemas
Management or Raw Devices
location
Install
After successful installation, all ODM software is located in the $ORACLE_HOME/dm (for Windows, %ORACLE_HOME%\dm) directory Perform the following post-installation steps:
passwords
appropriate privileges set for that user
Windows, %ORACLE_HOME%\dm\admin) and run odmuser.sql
the user has the privileges specified in the SQL script
odmuser.sql
parameter The value should be the path name of a directory that the database can write to
Trang 63.2.1.2 ODM Installation with a Customized Database Installing and creating a customized database involves more steps than creating a starter database, but gives you full control to specify database components that you wish to install
These are the major steps required to install ODM without using a starter database:
Section 3.2.3 for information about recommended database parameter
improved performance
install the ODM option; DBCA is described in the Oracle Database
Administrator’s Guide You will have the option of selecting the ODM
(for Windows, %ORACLE_HOME%\dm\admin\dmuserld.sql)
3.2.2 Upgrade from Oracle9i Releases
If Oracle9i Release 1 (9.0.1) or Release 2 (9.2.0) with the ODM option is
installed on your system, you can choose to upgrade your system to the current release ODM is upgraded as part of the database upgrade process
For detailed information about upgrading the database, see Oracle Database
Migration For information about upgrading ODM, see Section 3.6
3.2.3 Database Initialization Parameters for Oracle Data Mining
The default values of initialization parameters in an Oracle starter database are generally sufficient for running ODM
Make sure that job_queue_processes is set to a value appropriate for your application (a minimum of 2)
The parameter utl_file_dir must be set to a directory path specific to your site
Trang 73.3 Verifying ODM Installation
Oracle10g Data Mining is an option to the Oracle10g Enterprise Edition If
ODM is part of your installation, the following query should return a value
of TRUE:
SELECT value
FROM v$option
WHERE parameter = ’Oracle Data Mining’;
This query is usually run by the DBA logged in as dba
3.4 ODM Installation on a Real Application Cluster
ODM installation on a Real Application Cluster (RAC) is similar to ODM installation on a non-RAC system If you use Oracle Universal Installer to create the preconfigured database on RAC, ODM will be installed in this database just as it is in a non-RAC environment
If you choose to create a customized database on your Real Application Cluster (RAC) and install ODM there, we recommend that you configure the ODM tablespace with a raw device partition of at least 250 MB
3.5 Data Mining Scoring Engine Installation
Data Mining Scoring Engine is a custom installation option for Oracle Data Mining Select this option to install the ODM Scoring Engine as an
alternative to installing Oracle Data Mining
For more information about the Oracle Data Mining Scoring Engine, see
Oracle Data Mining Concepts.
3.6 Upgrading ODM
ODM upgrade is part of the Oracle RDBMS 9.2.0 to 10.1.0 upgrade process When the database server upgrade completes, ODM is upgraded to the 10.1.0 release level
In order to upgrade ODM 9.2.0 to ODM 10.1 release, you must upgrade your RDBMS to the latest RDBMS 9.2.0.4 patch set release level before starting the migration from 9.2 to 10.1 ODM is part of the RDBMS 9.2.0.4 patch set release For detailed information about upgrading an Oracle
database, see the Oracle Database Migration manual
Trang 83.6.1 ODM Schema Object Upgrade
There are major schema changes between ODM 9.2 and the current release These changes are required to fully support the ODM multi-user
environment and to implement Oracle Advanced Security features
In ODM 9.2, there were two ODM-required database schemas, namely, ODM and ODM_MTR In the current release, these two schemas have been upgraded to DMSYS and the DM user schema (the former ODM schema) The DMSYS schema is the ODM repository, which contains data mining metadata ODM schema becomes the DM user schema that holds user input and output/result data sets Customers can choose to either use the
upgraded ODM schema or create one or more data mining user schema(s)
to perform data mining activities
When you upgrade to the current release, the existing ODM 9.2 data mining models, settings, and results are upgraded to the current release format Customers can continue to conduct various data mining activities using objects upgraded from the 9.2 release There are schema definition changes
in the current release schema
New objects created in the ODM 10.1 environment are subject to a naming restriction, that is, names of objects must be 25 bytes or less This restriction applies across DM user database schemas However, after upgrading, 9.2 object names (models, settings, and results) are retained in the current release environment It is recommended that users follow the new ODM naming convention when creating objects in the future
In the 9.2 release, all mining activities are conducted through the ODM schema (with definer’s rights) In the current release, data mining activities are performed in the DM user schema (with invoker’s rights) In an
upgraded ODM environment, the ODM schema has been upgraded from a definer’s schema to an invoker’s schema
If necessary, ODM schema objects can be downgraded to the 9.2.0.4 final patch set release
3.6.2 Category Data Type in 9.2 and in the Current Release
In ODM 9.2, we did not store category data type in the dm_category_matrix_entry table In the current release, we do store data type In migrating from 9.2 to the current release, this results in all categories restored having a string data type, no matter what the actual data type
Trang 93.7 Sample Programs for Oracle Data Mining
The directory $ORACLE_HOME/dm/demo/sample (on UNIX) or %ORACLE_HOME%\dm\demo\sample (on Windows) contains sample programs for ODM This directory contains the following subdirectories:
Property-based ODM Java sample programs are removed from the product shipment in 10g They are downloadable from OTN
ODM PL/SQL packages DBMS_DATA_MINING and DBMS_DATA_MINING_TRANSFORMS (in the PL/SQL Packages and Types Reference.The directory plsql contains a subdirectory utl; contains sample
programs illustrating how to export and import ODM models
The data used by all the sample programs is in $ORACLE_
HOME/dm/demo/data on Unix or %ORACLE_HOME%\dm\demo\data on Windows ODM sample data sets need to be loaded into a user schema prior to using the sample programs Refer to the following scripts for creating Oracle tablespace, user schema, and loading ODM sample data sets:
For 10g, ODM Java and PL/SQL sample programs also use datasets
shipped with Oracle Common Schema (SH) In order to use the datasets, the Sample schema SH must be installed by a site DBA in the target
$ORACLE_HOME/dm/admin/dmshgrants.sql
$ORACLE_HOME/dm/admin/dmsh.sql
Trang 103.8 Downgrading ODM
ODM 10.1 can be downgraded if customers are not satisfied with the results
of upgrading ODM 9.2 to 10.1 The downgrade must comply with RDBMS downgrade policy The initialization parameter COMPATIBLE needs to be retained as 9.2.0 in the database during the upgrade process
Once the RDBMS downgrade process completes, ODM will be downgraded
to the latest 9.2.0 patch set release level The ODM repository schema in the database will be ODM ODM_MTR schema will be retained
3.9 Deinstalling ODM
You can use the OUI to deinstall ODM
4 Database Configuration Issues
This section summarizes the database configuration issues that can
influence ODM performance, given the respective hardware resource
Many Oracle initialization parameters are tunable via initSID.ora file,
which is located under $ORACLE_HOME/dbs directory A pre-configured database (SeedDB, also referred to as starter database) sets many
parameters with default values ODM users can tune these values based on site-specific circumstances
For detailed descriptions, refer to Oracle SQL Reference and Oracle Database
Administrator’s Guide.
4.1 Shared Global Area (SGA)
Subject to physical memory capacity, the database System Global Area (SGA) should be set adequately to enhance the database performance A DBA should determine how much total memory on the system is available for Oracle database to consume (referred to as "available memory") A certain amount of physical memory on the system needs to be reserved for buffering and process memory consumption
SGA size consists of the following init parameter settings:
Table 1 Init Parameter Settings for SGA Size
Parameter Description
shared_pool_size Specifies (in bytes) the size of the shared pool The
Trang 11v$sgastat records SGA dynamic allocation stats For details, refer to the
Oracle Administrator’s Guide
More memory-related tunable parameters are described as below:
db_cache_size The DB_CACHE_SIZE parameter specifies the size of the
cache of standard block size buffers, where the standard block size is specified by DB_BLOCK_SIZE The size should be set as 20- 80% of the available memory.log_buffer Specifies the amount of memory (in bytes) when
buffering redo entries to a redo log file Redo log entries contain a record of the changes that have been made to the database block buffers
Table 2 Tunable Parameters Related to Memory
Parameter Description
java_pool_size Specifies the size (in bytes) of the Java pool, from which
the Java memory manager allocates most Java state during runtime execution
large_pool_size Specifies the size (in bytes) of the large pool allocation
heap The large pool allocation heap is used in shared server systems for session memory, by parallel execution for message buffers, and by backup processes for disk I/O buffers
sort_area_size Specifies in bytes the maximum amount of memory
Oracle will use for a sort
hash_area_size Specifies the maximum amount of memory, in bytes, to
be used for hash joins
pga_aggregate_size Introduced in 9i The parameter manages runtime
memory allocation It replaces hash_area_size, sort_area_size, create_bitmap_area_size, and bitmap_merge_area_size parameters
Recommended to be set as 20 -80% of the available memory
Table 1 (Cont.) Init Parameter Settings for SGA Size
Parameter Description
Trang 124.2 Parallel Queries (PQ)
The following PQ parameters are tunable:
Most PQ settings are subject to the available number of CPUs on the host For machines with a single CPU, the parallel execution is limited ODM algorithms in most cases use default parallel degree setting The number of CPUs and their capacity largely influences the parallelism
The v$process view records the status for all slave processes
4.3 Multi-Threaded Server (MTS)
Multi-Threaded Server configuration enables a large number of user sessions to share the same server process; workload is distributed via the dispatcher In MTS configuration, the User Global Area is part of the SGA; hence a larger SGA configuration is recommended The actual degree of increase is subject to the values of MTS-related init parameters
Table 3 Tunable PQ Parameters
parallel_max_servers Maximum parallel server processes (setting
value is subject to CPU number on the host).parallel_min_servers Minimum parallel server processes
parallel_min_percent Operates in conjunction with parallel_
max_servers and parallel_min_severs
It sets the minimum percentage of parallel execution processes (of the value of parallel_max_servers) required for parallel execution
parallel_automatic_tuning Setting parallel_automatic_tuning to
TRUE will result in the database configuring itself to support parallel execution (default is FALSE)
parallel_threads_per_cpu Describes the number of parallel execution
processes or threads that a CPU can handle during parallel execution If the machine appears to be overloaded, decrease the value
of this parameter; if the system is I/O bound, increase the value