Importing data into an Oracle database can be a challenge at times. The Oracle loader utility (sqlldr) can be a bit problematic and does not allow you to view the data prior to importing it.
Creating an external table has the advantage of allowing manipulation of the data as if it was an actual database table. However, like sqlldr, data that is even slightly nonconforming to the expected format will cause a myriad of problems. Three common formatting problem are incorrect line terminators, inconsistent quotes, and embedded commas.
When a comma delimited flat file is exported from a UNIX database and transferred for loading into a Windows based Oracle database the text lines are terminated with 0x0A in UNIX and not the 0x0A 0x0D pair utilized in the DOS/Windows OS. Although many technical text editors will correct this for you or a program could be quickly written to make the changes this is, as the example below shows, not necessary.
Another problem often encountered is that the fields are randomly double quoted, and some contain commas internal to the field's data. Here is an example of a typically messy format to import. This is from a real life situation I was working on. All of the data in the example is, of course, altered for privacy:
100001,"7123 HIGHLAND DR","03/28/1965"," Mercer Island WA 98040","Morgan, Mary A", "","","63034630752", 14,1,"F","T","06/28/2000","",0,"","01/01/1999","N", "01/01/1999",""
100020,"5432 SOUTH 28TH ST","01/01/1951"," Mercer Island WA, 98040", "Burgess,Frank", "","2566400",ZPW345070938,64,1,"M",B,"02/23/2000","",0,"", "12/31/1799","P","12/31/1979",""
The external table definition used to load this data is:
CREATE TABLE some_data (
ACCOUNT_NUM VARCHAR2(10),
ADDRESS VARCHAR2(30),
BIRTH_DATE VARCHAR2(10),
CITY_STATE_ZIP VARCHAR2(40),
NAME VARCHAR2(30),
EMPLOYER_NAME VARCHAR2(30),
GROUP_NAME VARCHAR2(30),
ID_NO VARCHAR2(50),
PLAN_NO VARCHAR2(50),
VAR_REC VARCHAR2(50),
SEX VARCHAR2(2),
TYPE VARCHAR2(10),
UPDATE_DATE VARCHAR2(10),
EXD_ID VARCHAR2(30),
PCP VARCHAR2(30),
PCP_EFFECT VARCHAR2(30),
CANCEL_DATE VARCHAR2(10),
COV_DEP VARCHAR2(30),
EFFECT_DATE VARCHAR2(10),
ADDL_INFO VARCHAR2(50))
ORGANIZATION EXTERNAL
(TYPE oracle_loader
DEFAULT DIRECTORY ext
ACCESS PARAMETERS
(RECORDS DELIMITED BY 0x'0A'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL)
LOCATION ('UNIX_DATA.DAT'))
REJECT LIMIT unlimited;
desc some_data
set linesize 121
col account_num format a10
col address format a20
col birth_date format a20
col city_state_zip format a20
col name format a20
SELECT account_num, address, birth_date, city_state_zip, name
FROM some_data;
Defining the record access parameters with
RECORDS DELIMITED BY 0x'0A'
stops the Oracle loader from seeing the file as one single record due to expecting lines to be terminated in the same manner as the host OS. If the data is a large file that exceeds the half MB limit, Oracle loader would simply produce an error.
The record access parameters
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
allows fields like "Doe,Jane R" to be correctly interpreted as a single field instead of two containing "Doe and Jane R" while not rejecting fields that do not have quotes.
Creating an external table has the advantage of allowing manipulation of the data as if it was an actual database table. However, like sqlldr, data that is even slightly nonconforming to the expected format will cause a myriad of problems. Three common formatting problem are incorrect line terminators, inconsistent quotes, and embedded commas.
When a comma delimited flat file is exported from a UNIX database and transferred for loading into a Windows based Oracle database the text lines are terminated with 0x0A in UNIX and not the 0x0A 0x0D pair utilized in the DOS/Windows OS. Although many technical text editors will correct this for you or a program could be quickly written to make the changes this is, as the example below shows, not necessary.
Another problem often encountered is that the fields are randomly double quoted, and some contain commas internal to the field's data. Here is an example of a typically messy format to import. This is from a real life situation I was working on. All of the data in the example is, of course, altered for privacy:
100001,"7123 HIGHLAND DR","03/28/1965"," Mercer Island WA 98040","Morgan, Mary A", "","","63034630752", 14,1,"F","T","06/28/2000","",0,"","01/01/1999","N", "01/01/1999",""
100020,"5432 SOUTH 28TH ST","01/01/1951"," Mercer Island WA, 98040", "Burgess,Frank", "","2566400",ZPW345070938,64,1,"M",B,"02/23/2000","",0,"", "12/31/1799","P","12/31/1979",""
The external table definition used to load this data is:
CREATE TABLE some_data (
ACCOUNT_NUM VARCHAR2(10),
ADDRESS VARCHAR2(30),
BIRTH_DATE VARCHAR2(10),
CITY_STATE_ZIP VARCHAR2(40),
NAME VARCHAR2(30),
EMPLOYER_NAME VARCHAR2(30),
GROUP_NAME VARCHAR2(30),
ID_NO VARCHAR2(50),
PLAN_NO VARCHAR2(50),
VAR_REC VARCHAR2(50),
SEX VARCHAR2(2),
TYPE VARCHAR2(10),
UPDATE_DATE VARCHAR2(10),
EXD_ID VARCHAR2(30),
PCP VARCHAR2(30),
PCP_EFFECT VARCHAR2(30),
CANCEL_DATE VARCHAR2(10),
COV_DEP VARCHAR2(30),
EFFECT_DATE VARCHAR2(10),
ADDL_INFO VARCHAR2(50))
ORGANIZATION EXTERNAL
(TYPE oracle_loader
DEFAULT DIRECTORY ext
ACCESS PARAMETERS
(RECORDS DELIMITED BY 0x'0A'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL)
LOCATION ('UNIX_DATA.DAT'))
REJECT LIMIT unlimited;
desc some_data
set linesize 121
col account_num format a10
col address format a20
col birth_date format a20
col city_state_zip format a20
col name format a20
SELECT account_num, address, birth_date, city_state_zip, name
FROM some_data;
Defining the record access parameters with
RECORDS DELIMITED BY 0x'0A'
stops the Oracle loader from seeing the file as one single record due to expecting lines to be terminated in the same manner as the host OS. If the data is a large file that exceeds the half MB limit, Oracle loader would simply produce an error.
The record access parameters
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
allows fields like "Doe,Jane R" to be correctly interpreted as a single field instead of two containing "Doe and Jane R" while not rejecting fields that do not have quotes.
No comments:
Post a Comment