What is the problem?
Every Oracle database has 2 different character sets for "string" data types. These character sets are defined whenever a database is created.
- CHARACTER SET - This is the main character set for the database. It is set for specific environment and application needs. This character set is normally used to store and handle all string data.
- NATIONAL CHARACTER SET - This is the NLS (National Language Support) character set and is available for multi-lingual string data storage on all Oracle databases. It can be set to one of these UNICODE character sets: "AL16UTF16" or "UTF8".
The problem is that text files will generally have string data for the database's main character set. A different technique is needed to unload and load the NLS string data. Oracle's DB Sample Schemas uses the UNISTR format to segregate NLS string data from string data that uses the database's main character set.
We will be using the UNISTR format as well.
How to Unload NLS String Data to a Text File?
The US7ASCII character set is a common subset of almost all main character sets for an Oracle database. This is important because any NLS string data saved to a text file needs a common format with the database's main character set. Also note the US7ASCII character set only uses the 7 LSBs (least significant bits) for each character. Characters in this range can be found when Oracle's ASCII function returns decimal values between 0 and 127. Additionally, the "\" character needs to be "escaped" with another "\" character.
We used the following function to encode database NLS string data to UNISTR format. The resulting string data from this function will only contain US7ASCII characters and can be safely stored in a text file.
function unistr_encode(in_nclob in nclob)return clobisacode pls_integer;oclob clob;beginfor i in 1 .. length(in_nclob)loopacode := ascii(substr(in_nclob,i,1));casewhen acode = 92 -- "\"thenoclob := oclob || '\\';when acode > 127thenoclob := oclob || '\' || to_char(acode,'FM000X');elseoclob := oclob || chr(acode);end case;end loop;return oclob;end;
We chose to unload our data to a CSV file. We also chose to load our data from the CSV file using SQL*Loader. Below is our control file to load the OE.PRODUCT_DESCRIPTIONS data from Oracle's DB Sample Schemas.
- Our CSV file includes a Header Record with the names of the columns in the file.
- We don't know why the "to_char" conversion is required for the UNISTR function. However, if it is omitted, the error "ORA-30186: '/' must be followed by four hexdecimal characters or another '/'" will be thrown.
- 5x storage space is allocated for the TRANSLATED_NAME and TRANSLATED_DESCRIPTION fields to allow for UNISTR encoding characters.
OPTIONS (SKIP=1)LOAD DATAAPPEND INTO TABLE "OE"."PRODUCT_DESCRIPTIONS"FIELDS CSV WITH EMBEDDEDTRAILING NULLCOLS(PRODUCT_ID FLOAT EXTERNAL,LANGUAGE_ID CHAR(3),TRANSLATED_NAME CHAR(250) "UNISTR(to_char(:TRANSLATED_NAME))",TRANSLATED_DESCRIPTION CHAR(10000) "UNISTR(to_char(:TRANSLATED_DESCRIPTION))")
No comments:
Post a Comment