Sunday, October 31, 2010

Forcing Case Sensitivity in Teradata

Teradata is by default case insensitive. But there are certain scenarios that you might face, where you want the WHERE clause to check only for the pattern in the specified case. This is how you achieve it: by adding a predicate to the constraint (CASESPECIFIC)

Eg:

SELECT CURRENT_DATE
WHERE
UPPER('hello') = 'hello' (CASESPECIFIC); --will not return the current_date

SELECT CURRENT_DATE
WHERE
LOWER
('HELLO') = 'hello' (CASESPECIFIC); --will return the current_date

(Please leave a comment if you find my posts helpful)

Friday, October 8, 2010

Top vs Sample

There was a question raised in my team's discussion forum as to which one of these is better to use to get a sample of the data? There were different kinds of suggestions from various online forums, but none had the reason behind why one is better than the other.
I read through TD documentation and this is what I put forth to my group:

TOP:
1. The system evaluates the top n operator after all other clauses in the SELECT request have been evaluated. (So, I think it doesn’t help in data processing time. Must only affect the I/O time)
2. It gives the ranked result only when you specify order by clause. So, If you don’t specify order by, results can be any n rows.

SAMPLE:
1. SAMPLE clause comes with a keyword ‘RANDOMIZED ALLOCATION’ that specifies whether retrieved rows are sampled randomly across the amps or proportionate to the number of qualified rows per AMP. Proportional allocation is default.
2. SAMPLE also operates on the evaluated output. Hence, it does not improve query processing time.

My inference:
Comparing TOP and SAMPLE, as both execute the query completely, it is better to use TOP(without order by) because we will not provide the system an overhead of getting proportional output from each amp.

Just couple more interesting finds about TOP vs RANK:
1. The QUALIFY clause with the RANK or ROW_NUMBER ordered analytical functions returns the same results as the TOP n operator.
2. For best performance, use the TOP option instead of the QUALIFY clause with RANK or ROW_NUMBER. In best-case scenarios, the TOP option provides better performance; in worst-case scenarios, the TOP option provides equivalent performance

Hope this helps!

Wednesday, August 4, 2010

Timestamp to time

I recently encountered an issue, when I tried to typecast timestamp into time in SQL Assistant(it ran fine in bteq scripts),
Eg:
SELECT CAST((CURRENT_TIMESTAMP(0) + INTERVAL '7:00' HOUR TO MINUTE) AS TIME WITH TIME ZONE );
Is what I executed, and I got the following error:
SELECT Failed. 5402: WITH TIME ZONE is incompatible with specified data type.
 
My version details:
Teradata 12.00.00.0235
ODBC 13.00.00.04
 
Resolution:
Control Panel -> Administrative tools -> ODBC -> click configure on DSN -> Options -> Datetime format -> Set to AAA
(Mine was set to III)
 
I decided to write a blog entry, as I searched quite extensively on this issue, but was not able to find a resolution on internet. My DBA team helped me by filing TD service request. Hope this helps.
 
(P.S: Sorry for the discontinuity in performance enhancement suggestions, I will continue the series when I find some spare time)
 
 

Wednesday, July 7, 2010

Performance tuning #1

I have taken a long break from updating this blog.
To appease my mind, I am going to update a series of posts for Performance tuning, as it is going on in my project work.
 
When it comes to performance tuning, we cannot stick to a certain set of rules. It varies based on the data you are dealing with.
Although, we can create a baseline and address issues based on scenarios we face on a day to day basis.
 
Feel free to comment and ask questions. I will try and address them as soon as I can.
 
  1. Utilizing Teradata’s Parallel Architecture:
If you understand what happens in the background, you will be able to make your query work its best. So, try and run explain plan on your query before executing it and see how the PE(Parsing Engine) has planned to execute it.
Understand the Key-words in Explain plan. I will have to write a more detailed post on this topic. But for now, let us go on with the highlights
  1. Understanding Resource consumption:
Resource that you consume can be directly related to dollars. Be aware and frugal about the resources you use.
Following are the factors you need to know and check from time to time:
  1. CPU consumption
  2. Parallel Efficiency / Hot amp percentage
  3. Spool usage
  1. Help the Parser:
Since the architecture has been made to be intelligent, we have to give it some respect J. You can help the parser understand data you are dealing with, by collecting statistics. But you need to be careful when you do so, due to 2 reasons:
  1. Incorrect stats are worse than not collecting stats, so make sure your stats are not stale(old)
  2. If your dataset changes rapidly in your table, and suppose you are dealing with a lot of data, then collecting stats itself might be resource consuming. So, based on how frequently your table will be accessed, you will have to make the call
  1. Since same SQL can be written in different ways, you will have to know which method is better than which. For eg, creating Volatile table vs Global temp table vs working table. You cannot directly point out which is the best, But I can touch base on the pros and cons and comparison for them.
  2. Take a step back and look at the whole process. Consider how much data you need to keep, how critical is it for your business to get the data soon, how frequently do you need to run your SQL. Most of the times, the ‘big picture’ will give you a lot of answers