Thursday, July 26, 2018

Teradata Database Hashing Algorithm



How data distribution works in Teradata? It depends on the primary Index.  The primary index is a set of table column/s which is/are used as input for hashing algorithm. 

When a row assigned to particular AMP based on the primary index, Teradata uses the hashing algorithm to determine which AMP gets the row. 


How Hashing algorithm works?
When parser (PI) or AMP received the data/row, it passes that records to the hashing algorithm. The hashing algorithm hashes the primary index value of each record and returns a 32-bit number (Row Hash), which later used to store data into AMP. 


Rows that are inserted into NoPI tables using single-row inserts or Teradata Parallel Data Pump ARRAY INSERTs are allocated to the AMPs based on the hashed value of the Query ID for their request

Rows that are inserted into NoPI tables using INSERT … SELECT operations are, by default, appended AMP-locally to their target table, so there is no need to hash them to a destination AMP.

The rows that are FastLoaded into nonpartitioned NoPI tables (see NoPI Tables, Column-Partitioned NoPI Tables, and Column-Partitioned NoPI Join Indexes) are assigned to their destination AMPs based on a round-robin protocol.


Hash Bucket Number
The first 16 or 20 bits of the rowhash is a hash bucket number, which is used to define the number of the hash bucket to which the hashed row belongs. 


A 32-bit rowhash provides 4.2 billion possible rowhash values, which reduces hash collisions to a level that eliminates their impact on retrieval performance for all intents and purposes.
Because the number of possible values for a given domain may exceed 4.2 billion, further information, called the, is required to make each row uniquely identifiable. This situation occurs for tables hashed on a nonunique primary index or primary AMP index, where different rows can have the same index value, and so the same rowhash value, and also for tables hashed on a unique primary index if there is a hash collision.
If there is any record with same row hash, then it increments the uniqueness ID which is a 32 bit number. For new row hash, uniqueness ID is assigned as 1 and incremented whenever a record with same row hash is inserted.
The combination of Row hash and Uniqueness ID is called as Row ID.

Row ID prefixes each record in the disk.

Each table row in the AMP is logically sorted by their Row IDs.



Query to check data distribution in Teradata -

SELECT HASHAMP(HASHBUCKET(HASHROW(<PRIMARY INDEX>))) AS "AMP"
,COUNT(*) as "#row"
FROM <your_table>
GROUP BY HASHAMP(HASHBUCKET(HASHROW(<PRIMARY INDEX>)))
ORDER BY 2 DESC;


Teradata uses HASH values to store data in various AMPs (Access Module Processor)  in the Teradata System. 

Wednesday, July 25, 2018

Teradata - FastLoad - DataLoad Utility


Fastload is Teradata utility to load a large amount of data (mostly flat file)into Teradata table. Fastload is a command-driven utility. 

1) It loads data into an empty table only.

2) It doesn't allow duplicate rows, even if the table is defined as MULTISET table.

3) It doesn't use TRANSIENT JOURNALS, so data loads quickly.

4) Teradata FastLoad uses multiple sessions and block level operation to load data.

5) One Fastload job could load data into one table only. 

6) Data can be loaded from Disk/tape, network etc. or any other device which can provide the properly formatted source data

7) Target table should not have the secondary index, join index and foreign key reference or any trigger on the table. Primary index is permissible.  

8) AMP should not be down during fastload.

9) Only one datatype conversion is allowed per column. 



Command to run Teradata utility: Type "fastload" on the command prompt.  If Teradata "fastload" in installed on the system, it will return FASTLOAD UTILITY versions number and other information. 



Component of FastLoad :

1) Log Table: Log table stores the processing records information during load. This table contains one rows for each FastLoad job.

2) Empty Target Table: It is the mandatory requirement. 

3) Error tables (2):   Two tables are required to log error messages.  These are automatically created during runtime (you can define the error table name). First error table contains any data type (conversion error) issue, unavailability of AMP, constraint violations (like not null etc.).      Second table contains unique primary key index violations.

Both error tables have the same structure.  Say example your error table names are errorlog1 & errorlog2. To see the error message, run the query -

SEL  DataParcel,ErrorCode, ErrorFieldName FROM    errorlog1  ORDER BY ErrorCode ;
DataParcel: is kind of primary key of this table.  It stores entire data records (up to 64K). 

How FastLoad Works: It loads data into two phases.

Phase I (Acquisition Phase): Primary purpose of this phase to load data from the host computer into Teradata System. PI (Parsing engine) reads the records from the input file and send the block (size 64 KB) to each(random) AMP.


Phase II (Application Phase): This phase starts when FastLoad receives the END LOADING statement.  Each AMP hash each record, later sorts the records and sends to correct AMP and writes them to disk. Locks on target table get released and error tables are dropped.


Steps for fastload script : 

1)Logging onto Teradata
2)Define the target table.
3)Declare the input file delimiter.(Default delim is pipe)
4)Map the column with input file column sequence
5)Definethe input data file
6)Instruct system to start loading
7)Instruct the system to insert data into the target table
8)End the session.


Does hashing happens at PE or at AMP ?
Could be either AMP or PE. In particular, FastLoad sends data blocks to AMPs round-robin and the AMPs hash the rows and send to the correct target AMP. But for a single-row fetch or insert, the PE computes the hash to determine which AMP should handle the request.

Fastload Script : 

>fastload < D:\fastloadtest.txt

<<Script Starts Here >>
/***** Section 1 *****/
/* In this section we give the LOGIN credentials which is required to connect to TD system.
Sessions command is used to restrict the number of sessions Fastload will make to connect to TD.
Default is one session per AMP. */

SESSIONS 10;
errlimit 25;
.LOGON  HostName or IP Address/UserID,Password;

/***** Section 2 *****/
/* In this section we are defining the table which we want to load from Fastload. DROP commands are optional.
There is no need to define the structure of ERROR tables they’ll be created itself by Fastload. */

DROP TABLE  DBNAME.Errortable1;
DROP TABLE  DBNAME.Errortable2;
DROP TABLE  DBNAME.fastloadtest;

CREATE multiset DBNAME.fastloadtest
(
  att1 VARCHAR(50)
, att2 VARCHAR(100)
, col3 VARCHAR(90)
, col4 VARCHAR(200)
, col5 VARCHAR(300)
, col6 VARCHAR(200)
)no primary index
;

/***** Section 3 *****/
/*RECORD command is used to skip the starting rows from the data file. RECORD THRU command is used to skip the last rows of data file.
SET RECORD command is used to define the records layout and the “,” is the delimiter which we are using in our data file to separate columns. */

SET RECORD VARTEXT ",";
RECORD 1;


/***** Section 4 *****/
/* DEFINE statement is used to define the structure of the data file. This should be in accordance with the actual target table structure.
Fastload DEFINE statement allows only VARCHAR format. */
DEFINE
col1(VARCHAR(50)) ,
col2(VARCHAR(100)) ,
col3(VARCHAR(90)) ,
col4(VARCHAR(200)) ,
col5(VARCHAR(300)) ,
col6(VARCHAR(200))


/***** Section 5 *****/
/* FILE command defines the data file path and name. */

FILE=D:\Amrendra\FastLoadData.csv;
SHOW;

/***** Section 6 *****/
/* In this section we give the BEGIN loading statement. As soon as Fastload receives this statement it starts PHASE 1. */

BEGIN LOADING DBNAME.Errortable1
   ERRORFILES DBNAME.Errortable2,
              DBNAME.fastloadtest;


/***** Section 7 *****/
/* INSERT command is used to load the data file into actual target table. NOTE – For DATE columns we can use the data conversion by the syntax given below. */
INSERT INTO ndw_ebi_work.fastloadtest
(
att1,
att2,
col3,
col4,
col5,
col6
)
VALUES (
:col1,
:col2,
:col3,
:col4,
:col5,
:col6

);

/***** Section 8 *****/
/* END LOADING ends PHASE 1 and starts the execution of PHASE 2. LOGOFF is required to close all the sessions created by Fastload. */



END LOADING;
LOGOFF;


<<Script Ends Here>>


<<Scirpt you could do some customoziation also >>

INSERT INTO retail.emp_test
(
:emp_id ,
:emp_name,
:dept_id ,
:salary,
:dob (format ‘yyyy-mm-dd’)
); 
In above code, it dob which is varchar , is casting to date data type.

.RECORD 1;
.RECORD THRU 3;
SET RECORD VARTEXT “,”;
RECORD command is used to skip the starting rows from the data file. RECORD THRU command is used to skip the last rows of data file. SET RECORD command is used to define the records layout and the “,” is the delimiter which we are using in our data file to separate columns.


YOu can define checkpoint also - 
BEGIN LOADING EMPDB.EMP_TABLE
ERRORILES EMPDB.EMP_ERR1, EMPDB.EMP_ERR2
CHECKPOINT 10000000;


Thursday, October 26, 2017



Changing default passwords

https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.1.0/com.ibm.nz.npds.doc/t_change_password.html

Wednesday, April 5, 2017

Creating an administrative user group



Procedure

  1. Connect to the System database as the admin user. For example:
  2. [nz@nzhost ~]$ nzsql -d system -u admin -pw password 


  3. Create a group for your administrative users. For example:
  4. SYSTEM.ADMIN(ADMIN)=> CREATE GROUP administrators;


  5. Grant the group all administrative permissions. For example:
  6. SYSTEM.ADMIN(ADMIN)=> GRANT ALL ADMIN TO administrators WITH GRANT  OPTION;


  7. Grant the group all object permissions. For example:
  8. SYSTEM.ADMIN(ADMIN)=> GRANT ALL ON DATABASE, GROUP, SCHEMA, SEQUENCE,SYNONYM, TABLE, EXTERNAL TABLE, FUNCTION, AGGREGATE, USER, VIEW, PROCEDURE, LIBRARY TO administrators WITH GRANT OPTION;


  9. Grant the group select permissions on all system and management objects. For example:
  10. SYSTEM.ADMIN(ADMIN)=> GRANT LIST, SELECT ON SYSTEM TABLE, SYSTEM VIEW, MANAGEMENT TABLE, MANAGEMENT VIEW TO administrators WITH GRANT OPTION;


  11. Add users to the group to grant them the permissions of the group. For example:
  12. SYSTEM.ADMIN(ADMIN)=> ALTER USER nzuser WITH IN GROUP administrators;


GRANT LIST,SELECT,DELETE,UPDATE,INSERT,TRUNCATE,GROOM,ALTER,ABORT ON SANDBOX..TABLE TO  SANDBOX_ADMIN; 

Tuesday, February 21, 2017

is date function in Netezza

Check for valid date/time-stamp
There is so such built-in function , but you can user regexp to validate the value. Below the code. It will return true if it is valid time-stamp else false.

Select SQLEXT.regexp_like(current_timestamp ,'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}$'



CASE  WHEN  REGEXP_LIKE(COl1,'[0-9]' )  THEN  'NUMBER ONLY' ELSE COl1  END  as COl1

nz_migrate - Migrating data from one netezza box to another (or same) box


nz_migrate 

Purpose of  this command is to copy or migrate data from one one NPS server to another. It can be also used to make a copy of a database/table on the same server.


Sample Command : 
./nz_migrate -shost <source Host name>  -thost <Target Host name> -sdb  <Source Database Name> -tdb  <Target Database Name> -suser  <Source User Name> -spassword <Source Password> -tuser <Target User Name> -tpassword <Target Password>  -t <list of tables , seperated by comma. >   -cksum fast -genStats Full -threads 5  -CreateTargetTable yes -CreateTargetDatabase yes   >> /tmp/nz_migrate_0220_3.log &


If you are want to migrate all table under database (don't mention -t flag) -

./nz_migrate -shost <source Host name>  -thost <Target Host name> -sdb  <Source Database Name> -tdb  <Target Database Name> -suser  <Source User Name> -spassword <Source Password> -tuser <Target User Name> -tpassword <Target Password>    -cksum fast -genStats Full -threads 5  -CreateTargetTable yes -CreateTargetDatabase yes   >> /tmp/nz_migrate_0220_3.log &



If you are migrating data to new box, I will recommend you to run this command on new box. If you run this command on new box, it will save some memory and processing of your old box.


Note : You might get some error message saying "group/user "username" not found, "function 'COVAR_POP(FLOAT8, FLOAT8)' does not exist" , ignore it while doing the migration. These errors might not have any impact on data transfer.


Tuesday, January 24, 2017

SQuirreL SQL Client - for netezza on window/linux,mac


SQuirreL SQL Client


SQuirreL SQL Client is a graphical Java program that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc. The minimum version of Java supported is 1.6.x as of SQuirreL version 3.0.
SQuirreL's functionality can be extended through the use of plugins.

My client having issue connecting netezza on mac using any sql client tool. He was forced to use "Parallels" to install window client tool to connect to netezza.

SQuirreL SQL Client  is java based tool ,which could be installed on any OS , including mac, window, linux etc.

Download and Install client -




After installing SQuirreL tool, first step is to configure it for netezza. 
Below the steps to configure this tool -

Create new Driver for Netezza

You will find “Aliases” & “Drivers” tabs on left pan. Click on “Drivers” tab -> Click on “+” icon to add new Drivers.

Drivers

  • Name : Give some user friendly name
  • Example URL: jdbc:netezza://hostname or ip address:5480/databasename
  • Website URL : jdbc:netezza:// hostname or ip address:5480/databasename 
  • Extra Class Path : Click on “Add” to browse the netezza jar file. If you don't have netezza jdbc driver (jar file), you can dowload it from IBM.
  • Class Name : org.netezza.Driver


Click OK


Once you configured and created driver, its time to create/configure alias.


  • Click on “Aliases” tab on left pan
  • Click on “+” icon to add new aliases. 
  • Add Alias window will open.
  • Name : Pass user friendly name
  • Drivers : Select you drivers, which you have created in previous step
  • URL : jdbc:netezza://hostname or ip address:5480/database name
  • User name : Your Netezza User name
  • Password : Your netezza password
  • Click on “Test” to test the connection



  • Once test connection successful , now its time to run your query
  • Select your netezza Aliases name.
  • Click on “Connect to Selected Alias (Control O)” icon 
  • Now to to “SQL” tab and run you query 


Thanks & Good luck. Post me question, in case you have any problem configuring this tool.