Wednesday, July 25, 2018

Teradata - FastLoad - DataLoad Utility


Fastload is Teradata utility to load a large amount of data (mostly flat file)into Teradata table. Fastload is a command-driven utility. 

1) It loads data into an empty table only.

2) It doesn't allow duplicate rows, even if the table is defined as MULTISET table.

3) It doesn't use TRANSIENT JOURNALS, so data loads quickly.

4) Teradata FastLoad uses multiple sessions and block level operation to load data.

5) One Fastload job could load data into one table only. 

6) Data can be loaded from Disk/tape, network etc. or any other device which can provide the properly formatted source data

7) Target table should not have the secondary index, join index and foreign key reference or any trigger on the table. Primary index is permissible.  

8) AMP should not be down during fastload.

9) Only one datatype conversion is allowed per column. 



Command to run Teradata utility: Type "fastload" on the command prompt.  If Teradata "fastload" in installed on the system, it will return FASTLOAD UTILITY versions number and other information. 



Component of FastLoad :

1) Log Table: Log table stores the processing records information during load. This table contains one rows for each FastLoad job.

2) Empty Target Table: It is the mandatory requirement. 

3) Error tables (2):   Two tables are required to log error messages.  These are automatically created during runtime (you can define the error table name). First error table contains any data type (conversion error) issue, unavailability of AMP, constraint violations (like not null etc.).      Second table contains unique primary key index violations.

Both error tables have the same structure.  Say example your error table names are errorlog1 & errorlog2. To see the error message, run the query -

SEL  DataParcel,ErrorCode, ErrorFieldName FROM    errorlog1  ORDER BY ErrorCode ;
DataParcel: is kind of primary key of this table.  It stores entire data records (up to 64K). 

How FastLoad Works: It loads data into two phases.

Phase I (Acquisition Phase): Primary purpose of this phase to load data from the host computer into Teradata System. PI (Parsing engine) reads the records from the input file and send the block (size 64 KB) to each(random) AMP.


Phase II (Application Phase): This phase starts when FastLoad receives the END LOADING statement.  Each AMP hash each record, later sorts the records and sends to correct AMP and writes them to disk. Locks on target table get released and error tables are dropped.


Steps for fastload script : 

1)Logging onto Teradata
2)Define the target table.
3)Declare the input file delimiter.(Default delim is pipe)
4)Map the column with input file column sequence
5)Definethe input data file
6)Instruct system to start loading
7)Instruct the system to insert data into the target table
8)End the session.


Does hashing happens at PE or at AMP ?
Could be either AMP or PE. In particular, FastLoad sends data blocks to AMPs round-robin and the AMPs hash the rows and send to the correct target AMP. But for a single-row fetch or insert, the PE computes the hash to determine which AMP should handle the request.

Fastload Script : 

>fastload < D:\fastloadtest.txt

<<Script Starts Here >>
/***** Section 1 *****/
/* In this section we give the LOGIN credentials which is required to connect to TD system.
Sessions command is used to restrict the number of sessions Fastload will make to connect to TD.
Default is one session per AMP. */

SESSIONS 10;
errlimit 25;
.LOGON  HostName or IP Address/UserID,Password;

/***** Section 2 *****/
/* In this section we are defining the table which we want to load from Fastload. DROP commands are optional.
There is no need to define the structure of ERROR tables they’ll be created itself by Fastload. */

DROP TABLE  DBNAME.Errortable1;
DROP TABLE  DBNAME.Errortable2;
DROP TABLE  DBNAME.fastloadtest;

CREATE multiset DBNAME.fastloadtest
(
  att1 VARCHAR(50)
, att2 VARCHAR(100)
, col3 VARCHAR(90)
, col4 VARCHAR(200)
, col5 VARCHAR(300)
, col6 VARCHAR(200)
)no primary index
;

/***** Section 3 *****/
/*RECORD command is used to skip the starting rows from the data file. RECORD THRU command is used to skip the last rows of data file.
SET RECORD command is used to define the records layout and the “,” is the delimiter which we are using in our data file to separate columns. */

SET RECORD VARTEXT ",";
RECORD 1;


/***** Section 4 *****/
/* DEFINE statement is used to define the structure of the data file. This should be in accordance with the actual target table structure.
Fastload DEFINE statement allows only VARCHAR format. */
DEFINE
col1(VARCHAR(50)) ,
col2(VARCHAR(100)) ,
col3(VARCHAR(90)) ,
col4(VARCHAR(200)) ,
col5(VARCHAR(300)) ,
col6(VARCHAR(200))


/***** Section 5 *****/
/* FILE command defines the data file path and name. */

FILE=D:\Amrendra\FastLoadData.csv;
SHOW;

/***** Section 6 *****/
/* In this section we give the BEGIN loading statement. As soon as Fastload receives this statement it starts PHASE 1. */

BEGIN LOADING DBNAME.Errortable1
   ERRORFILES DBNAME.Errortable2,
              DBNAME.fastloadtest;


/***** Section 7 *****/
/* INSERT command is used to load the data file into actual target table. NOTE – For DATE columns we can use the data conversion by the syntax given below. */
INSERT INTO ndw_ebi_work.fastloadtest
(
att1,
att2,
col3,
col4,
col5,
col6
)
VALUES (
:col1,
:col2,
:col3,
:col4,
:col5,
:col6

);

/***** Section 8 *****/
/* END LOADING ends PHASE 1 and starts the execution of PHASE 2. LOGOFF is required to close all the sessions created by Fastload. */



END LOADING;
LOGOFF;


<<Script Ends Here>>


<<Scirpt you could do some customoziation also >>

INSERT INTO retail.emp_test
(
:emp_id ,
:emp_name,
:dept_id ,
:salary,
:dob (format ‘yyyy-mm-dd’)
); 
In above code, it dob which is varchar , is casting to date data type.

.RECORD 1;
.RECORD THRU 3;
SET RECORD VARTEXT “,”;
RECORD command is used to skip the starting rows from the data file. RECORD THRU command is used to skip the last rows of data file. SET RECORD command is used to define the records layout and the “,” is the delimiter which we are using in our data file to separate columns.


YOu can define checkpoint also - 
BEGIN LOADING EMPDB.EMP_TABLE
ERRORILES EMPDB.EMP_ERR1, EMPDB.EMP_ERR2
CHECKPOINT 10000000;


No comments:

Post a Comment