Friday, September 11, 2009

Skewness

How will you determine which column will be appropriate for Index, that can avoid skewness?

I have given below the syntax. Before assigning a column as Index, make sure you check data in that column is not skewed.

SELECT
HASHAMP(HASHBUCKET(HASHROW(col whose amp distribution that needs to be checked; eg: PI)))
,CAST(COUNT(*) AS DECIMAL(18,0))
FROM
DBNAME.TBLNAME
GROUP BY 1
ORDER BY 1
;

PS: This alone cannot let you determine PI. As, you may sometimes have to decide based on how frequently it is used for Joining purposes etc.

3 comments:

  1. Hi meena here iam not comment
    but i want to ask you some questions on that
    1 what is statistics in teradata.
    2 what is the use of it
    3 on what col's we hava to create stats
    4 can we have same kind of stats in other RDBMS like oracle ,sqlserver and db2
    thank for reply

    ReplyDelete
  2. Hello Brady,

    #1 Statistics is metadata information, by which a Parsing Engine would understand how to access data most efficiently. The most updated your statistics are, the better performance of your query will be.
    #2 Like I mentioned before, it is used internally by Parsing Engine. By collecting statistics, you tell the parsing Engine how the data is distributed across available AMPS.
    #3 You need to decide based on frequency of access/Join
    #4 To my knowledge, this is specific to the database architecture, so the usage would vary accordingly.

    ReplyDelete
  3. HI Meena

    Yhis is Venkat
    May i know impartant functions which are using in real time.Can you please give those functions with the scenarios.No blog is providing these functions.So please provide the functions with the used situations.

    Thank you

    ReplyDelete