Friday, October 4, 2024

Unloading/Loading NLS (National Language Support) Data in Oracle

What is the problem?

Every Oracle database has 2 different character sets for "string" data types.  These character sets are defined whenever a database is created.

  • CHARACTER SET - This is the main character set for the database. It is set for specific environment and application needs.  This character set is normally used to store and handle all string data.
  • NATIONAL CHARACTER SET - This is the NLS (National Language Support) character set and is available for multi-lingual string data storage on all Oracle databases.  It can be set to one of these UNICODE character sets: "AL16UTF16" or "UTF8".
Oracle's DB Sample Schemas use the NLS character set for multi-lingual string data.  This data is loaded with SQL INSERT statements.  However, we have a need to unload and load this data to/from text files.

The problem is that text files will generally have string data for the database's main character set.  A different technique is needed to unload and load the NLS string data.  Oracle's DB Sample Schemas uses the UNISTR format to segregate NLS string data from string data that uses the database's main character set.

We will be using the UNISTR format as well.

How to Unload NLS String Data to a Text File?

The US7ASCII character set is a common subset of almost all main character sets for an Oracle database.  This is important because any NLS string data saved to a text file needs a common format with the database's main character set.  Also note the US7ASCII character set only uses the 7 LSBs (least significant bits) for each character.  Characters in this range can be found when Oracle's ASCII function returns decimal values between 0 and 127.  Additionally, the "\" character needs to be "escaped" with another "\" character.

We used the following function to encode database NLS string data to UNISTR format.  The resulting string data from this function will only contain US7ASCII characters and can be safely stored in a text file.

function unistr_encode
      (in_nclob  in nclob)
   return clob
is
   acode  pls_integer;
   oclob  clob;
begin
   for i in 1 .. length(in_nclob)
   loop
      acode := ascii(substr(in_nclob,i,1));
      case
      when acode = 92  -- "\"
      then
         oclob := oclob || '\\';
      when acode > 127
      then
         oclob := oclob || '\' || to_char(acode,'FM000X');
      else
         oclob := oclob || chr(acode);
      end case;
   end loop;
   return oclob;
end;


How to Load NLS String Data from a Text File

We chose to unload our data to a CSV file.  We also chose to load our data from the CSV file using SQL*Loader.  Below is our control file to load the OE.PRODUCT_DESCRIPTIONS data from Oracle's DB Sample Schemas.
  1. Our CSV file includes a Header Record with the names of the columns in the file.
  2. We don't know why the "to_char" conversion is required for the UNISTR function.  However, if it is omitted, the error "ORA-30186: '/' must be followed by four hexdecimal characters or another '/'" will be thrown.
  3. 5x storage space is allocated for the TRANSLATED_NAME and TRANSLATED_DESCRIPTION fields to allow for UNISTR encoding characters.
OPTIONS (SKIP=1)
LOAD DATA
APPEND INTO TABLE "OE"."PRODUCT_DESCRIPTIONS"
FIELDS CSV WITH EMBEDDED
TRAILING NULLCOLS
   (PRODUCT_ID                     FLOAT EXTERNAL
   ,LANGUAGE_ID                    CHAR(3)
   ,TRANSLATED_NAME                CHAR(250) "UNISTR(to_char(:TRANSLATED_NAME))"
   ,TRANSLATED_DESCRIPTION         CHAR(10000) "UNISTR(to_char(:TRANSLATED_DESCRIPTION))"
   )

 

No comments:

Post a Comment