KEMBAR78
DSFAQ | PDF | Data Warehouse | Subroutine
0% found this document useful (0 votes)
253 views33 pages

DSFAQ

Source: flatfiles,oracle,(any database). -> target: db2(8.1px),orcle,sql-server. What are the ETL stages you have used in your project? -> transformer,sort,join,dataset,lookup,change capture,aggrigrater. Which stage gives good performance? -> dataset stage.

Uploaded by

aleemdstg
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
253 views33 pages

DSFAQ

Source: flatfiles,oracle,(any database). -> target: db2(8.1px),orcle,sql-server. What are the ETL stages you have used in your project? -> transformer,sort,join,dataset,lookup,change capture,aggrigrater. Which stage gives good performance? -> dataset stage.

Uploaded by

aleemdstg
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 33

TCS

1.What are the sources systems and target systems in your project?
-> source: flatfiles,oracle,(any database).
-> target: db2(8.1px),orcle,sql-server.

2.what are the ETL stages you have used in your project?

-> transformer,sort,join,dataset,lookup,change capture,aggrigrater.

3.what are the difference between oracle and odbc enterprise stages?

4.which stage gives good performance?

-> dataset stage

5.what is the diff between dataset and sequentical stages?

->SEQ FILE:

1.extract/load from/to seq file.max==2GB.

2.when used as a source at the time of compilation it will be converted into native

format from ASCII.


3.By default it will be processed in sequence only.
4.doesn't support null values.
5.processed at the server.
6.supports .csv,.txt,.xls etc...
7.does not support lkp fileset.

-> DATASET:

1.used as an intermediate stage.


2.at compile time conversion is not required.
3.datasets get processed in our local system.so performance is improved as the
server

is not loaded.
4.supports only .ds
5. 2GB limit is not thete
6.support lkp

6.what are the diff processing stages you have used in your project?

-> transformer,modify,changeapply,merge,sort,survgate.

7.what is the shared container? what is the purpose of using this stage?
-> create reusable object that many jobs within the project can include.

-> When we go for parallel shared container the logic can be reusable across many
jobs

8.what is the funnel stage?

-> this is stage is use to comibne maltiple input data(same metadata) to single
outputdta.

1.CONTINOUS FUNNEL: NO PARTCULAR ORDER( LOAD WITHOUT ORDERING).

2.SORT FUNNEL: IT LOAD IN PARTCULAR ORDER(ASCENDING OR DECENDING).

3.SEQVANCE FUNNEL: IT READ FIRST INPUT FIRST AFTER SECOND INPUT SECOND
LOAD

9.what is the diff between joner and merge stage?


10.what is the diff between joiner and lookup stage?
11.suppose if i have souce data records and lookup data is about 50000
records.which stage is preferred to used for this requirement?

-> look up

12.suppose if i have 50000 records in source and 100 records in lookup stage,
then what is the stage preferred for this requirement?
->lookup

13.what is the diff between normal transformer and basic transformer?

-> A Basic transformer compiles in "Basic Language" whereas a Normal Transformer

compiles in "C++".

-> Basic transformer does not run on multiple nodes wheras a Normal Transformer
can

run on multiple nodes giving better performance.

-> Basic transfomer takes less time to compile than the Normal Transformer.

-> Basic transformer stage can only be used for SMP(Symmetric Multiprocessors)

systems and
not for MPP(Massively Parellel Processing) system or clusters systems.
USAGE :
-> A basic transformer should be used in Server Jobs.
-> Normal Transformer should be used in Parallel jobs as it will run on multiple
nodes

here giving better performance.

14. what is the diff between parallel and server jobs?

-> The basic difference between server and parallel jobs is the degree of parallelism

-> Server job stages do not have in built partitoning and parallelism mechanism
for

extracting and loading data between different stages.

-> Datastage compiled in to BASIC(interpreted pseudo code) and Parallel compiled


to

OSH(Orchestrate Scripting Language).

-> Server Transformer supports basic transforms only, but in parallel both basic and

parallel transforms.

-> In File stages, Hash file available only in Server and Complex falat file , dataset ,

lookup file set avail in parallel only.


-> Parallel Jobs haveing a huge amount of data to maintains.
-> server Jobs haveing a low amount of data to maintains.
-> Parallel Jobs it having a more stages comperies server Jobs .
-> Parallel Jobs ARE supports smp,mpp.
-> server Jobs support smP.

15.what is the stage you have used to load the data into teradata table?

-> teradeta api stage


16.what is the multinode concept in teradata?
17.what is the slowly changing dimension?

-> it explain how to capture the changes in the target over the period of the time.

-> it explain change data capture.

18.how do u capture the changes in slowly changing?

-> Type I: Replace the old record with a new record with updated data there by we
lose

the history.

But data warehouse has a responsibility to track the history effectively where
Type I implementation fails.

-> Type II: Create a new additional dimension table record with new value. By this
way

we can keep the history.

We can determine which dimension is current by adding a current record flag or

by time stamp on the dimensional row.

-> Type III: In this type of implementation we create a new field in the dimension
table

which stores the old value of the dimension.

When an attribute of the dimension changes then we push the updated value to

the current field and old value to the old field.

19.what is the surrogate key?

-> A surrogate key is a system-generated (non-meaningful from a business

perspective).it specified by uniqness

-> It is the system generated key which cannot be edited by the user.
-> It is the primary key of the dimension table in warehouse.
-> It is nothing but the sequence which generates the numbers for primary key
column.

20.what is the diff between narmalized model and dimensional model?

-> normalized data is held in a very simple structure. The data is stored in tables.

-> Each table has a primary key and should contain data relating to one entity – so
a

normalized customer table contains only data about customers.

-> We need to make logical connections between the entities (for example, this

customer placed these orders).

-> To do this we use a foreign key in the orders table to point to the primary key in
the

customer table.

21.what is the star schema?


-> A relational database schema organized around a central table (fact table) joined
to

a few smaller tables (dimension tables) using foreign key references.


-> The fact table contains raw numeric items that represent relevant business facts.
-> Star schema is a type of organising the tables such that we can retrieve the
result

from the database easily and fastly in the warehouse environment.

Usually a star schema consists of one or more dimension tables around a fact table

which looks like a star,so that it got its name.

22.what is the ods and edw?

-> ods mean Operational Data Store


-> it maintain the transactional date(current data)
-> in ods tha data is volatile(it can be changed)
-> EDW means enterprize datawarehouse
-> it maintain histrocial data for long time
-> in edw data is non-volatile(it cant chane)

23.when do u use ods and edw?

-> ODS can be described as a snap shot of the OLTP system.It acts as a source for

EDW(Enterprise datawarehouse).

-> ODS is more normalised than the EDW.Also ODS doesnt store any

history.Normally the Dimension tables remain at the ODS

(SCD types can be applied in ODS)where as the Facts Flow till the EDW.
-> edw maintain histrocialdata.
-> it is very use ful to decision making for enterprize.

24.WHAT IS DATAMART?

-> A data mart is a subset of an organizational data store, usually oriented to a


specific

purpose or major data subject,

that may be distributed to support business needs.

25.if i want to delete dataset file in the unix, how do i do that?

-> YA, ORCHADMIN RM <DATASET NAME>


26.how do u schedule the jobs?

-> using datastahe director-> tools

27.how do you scedule the jobs in job sequence?

-> 1. Open Datastage


-> 2. Open New Job Sequence
-> 3. Save it with some name.
-> 4. A your job sequence is created.
-> 5. Now drag the job sequence stages-->as per your business requirement.

28.what is the normal view and materialized view? when do u used which view?
VIEW
-> A view takes the output of a query and makes it appear like a virtual table. You
can

use a view in most places where a table can be used.

-> All operations performed on a view will affect data in the base table and so are

subject to the integrity constraints and triggers of the base table.

-> A View can be used to simplify SQL statements for the user or to isolate an

application from any future change to the base table definition.

-> A View can also be used to improve security by restricting access to a


predetermined

set of rows or columns.


MATERLIZED VIEW
-> Materialized views are schema objects that can be used to summarize,
precompute,

replicate, and distribute data. E.g. to construct a data warehouse.

-> A materialized view provides indirect access to table data by storing the results
of a

query in a separate schema object.

-> Unlike an ordinary view, which does not take up any storage space or contain
any

data.

-> The existence of a materialized view is transparent to SQL, but when used for
query
rewrites will improve the performance of SQL execution
-> An updatable materialized view lets you insert, update, and delete.

29.what it the buildops?

-> Buildops are good if users need custom coding but do not need dynamic
(runtime-

based) input and output interfaces.

-> Buildop provides a simple means of extending beyond the functionality provided
by

PX,but does not use an existing executable (like the wrapper).

Reasons to use Buildop include:

-> Speed / Performance

-> Complex business logic that cannot be easily represented using existing stages,
like

advanced lookups across a range of values,


custom surrogate key generation, calculating olling aggregates
-> Build once and reusable everywhere within project, no shared container
necessary
-> Can combine functionality from different stages into one

30.In datastage, which language code will be used?

-> OSH(orcistrate shell lang)

31.why do u use copy stage? can do sort in copy stage?

-> to copy tha data in multiple targets

-> Yes we do sort in copy stage.

32.in what other stage we can do copy?

-> transformer

33.what is sort stage? in what other stage we can do sort?

-> soting(ascending or descending) the data based on key.

-> using copy stage we can sort.

34.why transformer stage so costlier in development? how this stage different from
other stage?

-> ref: OSH : Archestrate shell script


-> multi purpose
-> counters and hold values for previous rows to make comparison
-> hold dervvation to be used in multiple field dervations
-> can be used to control execution of constraints
-> it compile on c++ genrated object mode

35.what is modify stage? what are all diff things we can in modify stage?

-> modify column types


-> null handling
-> date/time handling
-> string handling
-> add or drop columns

36.what is column generate stage?

-> it adds column to incoming data and generates mock data for these columns for
each

data row processed

37.instead of using column generate stage can we generate a column in any stage?

-> runtime column propagation (RCP)

38.what is RCP?

-> runtime column propagation (RCP)

-> IF Your job counters extra columns that are not defined in the metadata when it

actually runs

it will adopt these extra columns and propagate them through the rest of the job.
39.diff between lookup,join and merge stage
40.can we track rejections in join stage?

-> NO

41.why do we use funnel stage?

-> the funnel stage copies multiple input data(same meta data) to a ingle output
dataset
-> the operation is usepul for combining separate datasets into a single large
dataset
-> a processing stage that combines datafrom multiple input links to a single output
link

42.which one will give good performance between ODBC & Oracle Enterprise
stages?

-> oracle enterprize(parell and easy connct)

43.How do we reject the records in sequential file stage?

-> using reject port

44.In what other stage we can reject the records?

-> all except join

45.what is filter stage?

-> to select aportion of data for checkng

-> filter the data based on condtion

46.what is difference between job parameters and environment variables?

-> Basically Environment variable is predefined variable those we can use while

creating DS job.
-> We can set eithere as Project level or Job level.
-> Once we set specific variable that variable will be availabe into the project/job

47.what is the aggregator stage?

-> which is use to calculate the summrized for a group og records

-> columns to be aggregated

48.what is shared container stage?

-> create reusable object that many jobs within the project can include.

-> When we go for parallel shared container the logic can be reusable across many
jobs

49.Have you used sequencer jobs? what are the different triggers you have used?

-> ok condition,failed warning,custom,userstatus,unconditional

50.Explain about yourself, what is your work experience and how do you rate
yourself on

datawaresing?
51.what model we are doing here?

->star shema

52.suppose if we have two flat files and if we want to build one scenario where one
fact
table
and two dimensions then how do you build this and what trasformations you use
there?

-> one transformer,one lookup,two targets

53.why you use shared container to read read unix flat file why not some other
stage?

54.what is that some other stage?


55.what is the teradata version you are using?
56.what is ur target,sources?

-> target:db2

-> source:flatfiles

57.who will create environment variables?

-> admin

58.how do you register a plug-in?

59.suppose take three jobs and if we want to run those jobs using sequencer how
do you
use that and
explain.
-> arrange seqvansilly with links

60.how do the jobs run in sequecer? parallelly or sequecially?

-> we can run ant type( both server and parelle)

61.what are different stages involve in the sequencer job and explain them?

62.what is the difference between star & snow flake schemas?

for what purpose u will use them?


63.do you know narmalization techniques?
64.suppose u have build summarised tables and if we need to improve performance
what

do u do?
65.what type performance u will get?

66.how do you improve load performance?

->increase memory size

67.what is an aggregator?What it do?

POLARIS-INTERVIEW

68. What is the size of database in your project?

-> 110gb

69. What is the size of data extracted in the extraction process?

-> lee then 1gb

70. How many data marts are there in your project?

-> 2

71. How many fact and dimension tables in your project?

-> 4 fact tables 10 dimension table

72. What is the size of fact table in your project?

73. How many dimension tables did you had in your project and name some
dimensions?

-> 3 book,country,counter,subaccount,counterparty,security,legal
entity,intercompany,trnstype,project.
74. How many measures you have created?

-> 45

75. How do you enter into oracle?

JDA
76.What is difference between local & environment varialbles?
77.what is the routine? have you used the routines in your project?
78.what are after and before routines?
79.what are stage variables? where do we create stage variables? why do we use
stage
variables?
80.how do we access the oracle database from datastage?

-> orcle enterprize,odbc


81.suppose if there is no client version in your local machine,the oracle is been
installed
in unix box,then how do you get access the oracle connection.

82.can create an environment variable from your designer?

->no

83.suppose there is no datastage client components installed in your local machine


then

how do you compile and run the job?

84.suppose a single job is accessed in three sequencer jobs, can we run three
sequencer
jobs at a time?
-> all

85.what is job controls? when do we specify these controls and where?

86.how sequential file execute during the run time in PX? Either sequential or
parallel

mode?

->seqvential

87.how do we make a sequential file stage to run in parallel mode?

-> no of readers for node =2

IBM INTERVIEW QUESTIONS (15TH OCT’07)


88. Have you built separate data marts for financial and manufacturing aspects?
89. Which schema have you used?
90. Wether you were loading data into DWH or a staging area?If loading the date
into

DWH, then how do you load the data into target table?
91. How many fact and dimension tables you have used?
92. Tell me about funnel stage
93. Which stage in server jobs is equivalent to funnel stage?
94. Difference between merge stage and funnel stage
95. Explain about SCD type-2?
96. What is the order of execution: stage variable,constraints,derivation?

->yes
97. How you used to get the source?
98. Difference between data set stage and file set stage
99. If we are using SFS as source and hash files as reference, with transformer
stage and

pulling data

into target without using any constraints, then which kind of join simulation we are

doing?

100. What is a factless fact? Give example?

101. What is a semi-additive fact. Give example


102. How to go for SCD type 2 in server jobs?
WIPRO INTERVIEW QUESTIONS (26TH OCT’07)

103. Tell me about your project. Was it a Datawarehousing project?

104. How many dimensions and fact tables you have developed?name some of the
fact

and dimension tables

105. Have you been a part of the production team?

->no
106. Name some of the stages used by you
107. Difference between join and look-up stage
108. Difference between funnel and merge stage
109. What is the difference between stage variable and parameters?
110.If the look-up table has more than 10,000 records and the source table has

considerably less number of records,which stage will you use?look-up or join

'-> sprase lokup


111. What is configuration file?
112. What is a node?
113. What are the different types of partitions? Hwat is the default?
114. Difference between same and entire partition
115. What different things can be done by using transformer stage

116. What all things you can do using copy stage


117. If I have a file containing date as one of the columns and I have 5 years of
data.
Now I have to load the data at run time according to user requirements which is 6
months

of data. How to go for it?


-> using job parmater

TERRADATA INTERVIEW QUESTIONS (26th Oct ’07)

DATA STAGE

118. What is a configuration file?

119. have you ever created a configuration file

-> no

120. If you have a 4CPU box, what kind of configuration file you suggest i.e, how
many

number of processing nodes


121. What is a pipeline parallesism? How is is different from partition parallelism
122. What is a data set?How is it different from SFS?
123. In data stage where do we store jobnames,parameters etc. what is the name of
the

place/database where we store them


124. What do you mean by restartability of sequencer
125. What is meant by multiple instances
126. Have you developed any routines or custom transforms
127. What is meant by shared containers? What are the different types of shared

containers?what is the difference between them


128. Difference between look-upstage,join stage and merge stage?
129. How to optimize and tune the parallel jobs for better performance
130. What is an environment variable

ORACLE
131. What is a correlated subquery
132. What is bitmap-indexing
133. What is a materialized view
UNIX

134. What is shell scripting


DWH
135. What is the difference between a fact table and a dimension table
136. What is confimed dimension

137. What is a factless fact


138. What is the best architecture?
139. Explain SCD typeIII
IBM Interview 14th and 15th June-2007
140. What is invocation id?
141. What is RCP?
142. Difference between stage and environmental variables?
143. What is array size and transaction size?
144. In a sequence of 5 jobs, if aborts at 3rd job. If we run the sequence, from
where it

starts

processing?

145. Difference between sparse and normal lookup and where we use sparse. Can
give

one situation?
146. Data set will be stored in two formats. What are they?
147. How do you invoke the UNIX scripts in Data stage jobs?
148. What is Email task?
149. When do you use separate sort stage and inbuilt sort utility in a stage? What is
the

difference between these two?


150. Have you run the jobs with odd number of nodes? If not why?
151. How many nodes have you used in your project?
152. Have you created environmental variables?

153. Difference between sequential file and data set?


154. If a job is aborted. Can you run it with out compiling?
155. How do you committ the records while loading?
156. What is the purpose of check points?

157. What are job Controls?

158. How to call the stored procedures in data stage jobs? Can you specify
constraints?
159. How do you compile stored procedures?
160. If a job has 3 stages and nodes=3 then how many process you get?
161. A sequential file has 100 records. How to use first 19 records in the job?
162. How to check the number of nodes while running the job in UNIX environment?
163. Have you used the load option?
164. A transformer stage is running in parallel. How can you change to sequential?
165. What is preserve partitioning?
166. Configuration file is specified at project level and at job level. Which one will

override the other?


-> may be job level

->

167. In UNIX you don’t know some commands. While running the jobs you need the

commands, then how can you get them?

168. What is isolation?

-> see below

170. What is NLS?

-> national lang support

171. What command is used to search a string?

-> grep command:-> is used for finding any string in the file.

172. What commands do you know in UNIX?

-> grep ( searchinf file)

-> echo (display file)

-> chmod ( change mode)


-> rm ( remove)
-> cp(copy)
-> ls (list of files)

173. How do you start UNIX server and stop it?

-> n a UNIX environment, you can also use the pd_start script to manually start and

stop the server processes.

-> syn: # pd_start {start|restart|stop|status}

174. Performance difference between join and merge. Which takes more memory?

-> performance = join

-> join takes more memory

175. What is Email loop activity?

-> Notification Activity - used for sending emails to user defined recipients from

within Datastage
176. A sequential file has 4 duplicate records. How do you get 3rd record?

-> no not posible in seq file

177. What types of joins can be specified in look up stage?

-> inner and outer

178. What routines you wrote?

179. If the job is aborted while running in UNIX, then what error message you get?

180. Command VI is used for what?

-> The default editor that comes with the UNIX operating system is called vi (visual

editor)

-> editing

181. Which version of UNIX you are using?

-> 5.2

182. How do you connect to UNIX?

-> using command line in admin


-> osh commands
-> orchadmin

183. Link collector is present in parallel jobs or not?

-> ya

184. To join 2 sequential files which stage you use?

-> join or merge

185. Performance difference between ODBC, Oracle Enterprise, Dynamic RDBMS?

-> may be oracle enterprize bcoz of native connection

186. What performance techniques do you use while developing jobs?

-> using partioning methods


-> types of parallel( pipeline and partion)
-> using hash key
-> enablin and increase cashe memory

INTERVIEW QUESTIONS
ORACLE

Satyam Teynampet 17/04/2004

187. What are inline views?

-> An inline view is a SELECT statement in the FROM-clause of another SELECT

statement.

188. What is a subquery & correlated subquery. Tell with example?

-> In sub query the inner query is executed only once. Depeding upon the results of

inner query outer query is evaluated.

-> ex: select * from emp where deptno=( select deptno from dept where

dname='hyd');

-> In correlated subquery the inner query is evaluated once for each row processed

by the parent statement or outer query.

-> SELECT empnum, name FROM empl e WHERE sal > (SELECT avg(sal)

FROM emp WHERE department = e.department);

189. What is normalisation. Please state all forms with example?

-> Normalization is process for reducing the redency of the data.there 5 normal

forms mainly used in 3 forms4th one boyesscoded normal form.

-> First Normal form (1NF): A relation is said to be in 1NF if it has only single

valued attributes, neither repeating nor

arrays are permitted.

-> Second Normal Form (2NF): A relation is said to be in 2NF if it is in 1NF and

every non key attribute is fully functional

dependent on the primary key.

-> Third Normal Form (3NF): We say that a relation is in 3NF if it is in 2NF and has

no transitive dependencies.

-> Eliminate Columns Not Dependent On Key - If attributes


do not contribute to a description of the key, remove them to a

separate table.

190. If I select from two tables, how do I assure which table is searched first?

->

191. What are triggers. What are Row level and Statement Level triggers?

-> Triggers are an action that is performed is a condition is met within the code.
-> Row-Level Triggers
-> Statement-Level Triggers
-> Before Trigger
-> After Trigger
-> Schema Triggers
-> Database level Triggers

-> row-level trigger: A row-level trigger fires once for each row that is affected by a

triggering event

-> Statement-Level Triggers: fires once per triggering statement regardless of the

number of rows affected by the triggering event

192. Implicit and Explicit Cursors. Does it work for and update statement?

-> PL/SQL declares a cursor implicitly for all SQL data manipulation statements,

including quries that return only one row

-> explicit cursor or use a cursor FOR loop However,queries that return more than

one row.

-> An implicit cursor is used for all SQL statements Declare, Open, Fetch, Close. An

explicit cursors are used to process multirow

SELECT statements An implicit cursor is used to process INSERT, UPDATE,

DELETE and single row SELECT. .INTO statements.

Satyam G.N.Chetty Road

28/03/2004

193. Write a statement using inline views?

-> small example


-> select * from ( select * from dept);

194. Write a sql to fetch the sumof top 5 salaries department wise?

->

195. What is a snapshot to old error?

-> If you have lots of updates, long running SQL and too small UNDO, the ORA-

01555 error will appear.

-> The SNAPSHOT_TOO_OLD erro will come ... the query will goves into

cartesion product or infinite loop

-> Other case is suppose ur updated loarge no of rows at atime without saveing the

records .

In this case u can use commit statement for every 500 records then u can avoid this

problam.

Or ask DBA to extend the table space for this segment.

196. What is the difference between Function and Procedure?

-> 1. Functions must return a value while procedure may or may not return any

value.

-> 2. Function can return only a single value at a time while procedure can return

one, many or none.

-> 3. Function can be called from sql statements while procedures can't.

197. What if I give a return parameter in Function as NULL?

198. How do I complile Pl/Sql procedure from Unix Prompt?

-> x := OSCommand_Run('/home/test/myoscommand.sh')

199. A view has three base tables. One of the column of a table is removed. What

happens to the view.

-> a pattel error occur

200. How do I enable the above view again.


201. How to read a flat file using utl file?

-> "UTL_FILE" allows PL/SQL programs to both read and write to any operating

system files that are accessible from the server

on which your database instance is running.

ex: utl_file_dir = c:sampledata

202. What are Ref cursors?

-> CURSOR IN WHICH QUERY CAN BE CHANGED AT RUNTIME IS

CALLED REF CURSOR

-> ref cusor is a dynamic type of cursor which is used to provide Reusability of

cursor.......

-> Ref Cursors also known as Dynamic cursors cane be associated with as many

SELECT statements you want at different times.

You can associate with different SELECT stataments at dynamic time


-> two types in dynamic cur
1) Strong- Return type
2) weak- No Return type

203. What are the Build in packages you have used?

204. What does a change capture do

->same question below

205. What does a transformer stage do?

-> dervations

-> constrains

-> experssions for constraints and dervations can referance


-> job parameters
-> funcations
-> system varaiables and cinstraints
-> stage variables
-> external routines

206. What is a sequencer job and why do we use it


-> to control the flow of jobs

-> it excutes jobs in seqvantially.

207. Why do we run jobs thru shell scripts

208. What is making a job as of multiple instances ?

-> same question below

209. What is the link ordering in the datastage?

-> -> same question below

210. What is the combinality mode in the teradata stage?

211. What is Hashed file and where do we use it.

-> same question below

235.what is multiple instance of making the job?

-> same question below

236. what is the fast multi node & fast default node

->

237. what is the routine?what it does?

same question below

238. what is the shared container?how & where do u use shared containers?

-> same question below

239. Tell me few runtime errors u struggled with?

-> if transformer used check c++ compiler


-> if buildop errors try bulidop from command line
-> chek enveronment variables settings
-> examine genrated osh
-> some stages not supported rcp
-> null handling errors...........

240. what change capture stage do?

-> change capture stage catch holds of changes from two different datasets and

generates a new column called change


code....change code has values

0-copy
1-insert
2-delete
3-edit/update

241. Tell few stages u used in ur development environment?

-> explain any stages

242. what is the defferences between server job & parallel job & main frame jobs?

-> Server jobs. These are available if you have installed DataStage Server.

They run on the DataStage Server connecting to other data sources as

necessary.

-> Parallel jobs. These are only available if you have installed Enterprise Edition.

Server jobs can be run on SMP MPP machines.Here performance is low

i.e speed is less

-> Parallel jobs can be run only on cluster machines .Here performance is high i.e

speed is high

-> These run on DataStage servers that are SMP MPP or cluster systems. They can

also run on a separate z/OS (USS) machine if required.

243.what is the defferences between server shared container & parallel shared
containers?

-> SERVER SHARED CONTAINERS CAN BE USED IN BOTH SERVER & PARALLEL
JOBS,BUT PARRALLEL CONTAINERS CAN NOT BE USED IN SERVER JOBS.

-> When we go for parallel shared container the logic can be reusable across many

jobs

-> Server shared containers contain server stage types, parallel shared containers

contain parallel stage types.

244. where is the repository comes into picture?

-> Repository resides in a spcified data base. it holds all the meta data rawdata
mapping information and all the respective mapping information.

-> it contain jobs,routines,tabledefinations etc.......


245. what is the reject file ordering?
246. what are the responsibilities of Datastage Manager?

-> import and export metadata

-> backup and recovery the project

-> use the mmanager to store and manage reusable metadata for the jobs you
define in

desiner

-> this meta data includes table and file layouts and routines for transforming
extracted

data.

247. what is the Analysis tool? what we do with that?

-> may be usage analysis

248. What do we do with Reporting Assistant?

249. What change capture stage do?

-> change capture stage catch holds of changes from two different datasets and

generates a new column called change

code....change code has values

0-copy
1-insert
2-delete
3-edit/update

250. what is the link ordering in the datastage?

-> U can edit the order of the input and output links from the Link ordering tab in
the

stages.

251. compare the informatica & datastage tools in view of performance

-> Main difference lies in parellism Datastage uses parellism concept through node
configuration where Informatica does not

-> Partitioning - Datastage PX provides many more robust partitioning options than

informatica.

You can also re-partition the data whichever way you want.

-> Parallelism - Informatica does not support full pipeline parallelism (although it

claims).

-> File Lookup - Informatica supports flat file lookup but the caching is horrible.

DataStage supports hash files lookup filesets datasets for much more

efficient lookup.

-> Merge/Funnel - Datastage has a very rich functionality of merging or


funnelling the

streams.

In Informatica the only way is to do a Union which by the way is always

a Union-all.

252. how many jobs u have been created?

-> 100 + jobs in last six months

253. Explain few of the jobs u created?

-> sort
-> aggrigrater
-> remove duplicate
-> surogate key
-> transformer

254. how do u create stage variables in transformer stage?


-> transformerstage -> stage properties -> varables
255. what is the combinality mode in the teradata stage?

256. what is the datastage debugger? where it runs?

-> it check various locations


-> it check and debug the portion of data or data
-> like peek,head,tail......

257. how do u test your jobs at client place?


-> using datastage director

258. if input & output column datatypes are not matched and not used proper
conversion

process

then there is an error like this.

-> pattel error datatypes mis match

-> bad conversion error

259. When checking operator: When binding output schema variable "outRec":
When

binding output

interface field "CPSC_CODE" to field "CPSC_CODE": Converting nullable source to

non-nullable

-> result; fatal runtime error could occur (use modify operator to specify value to

which nullshould be converted)

260. what is the architecture of datastage?

-> client-server architeck

261. how do you define and use the job parameters?

-> create in job properties

-> used at runtime

262. what is stage variables,system variables,environment varaibles?

-> Stage Variable - An intermediate processing variable that retains value during
read

and doesnt pass the value into target column.

-> system variable - System variables have a predefined type and structure that
cannot

be changed. When an expression is stored into a system variable,

it is converted to the variable type, if necessary and possible.

263. how to use routines in datastage?


-> write the routine in C or C++ create the object file and place object in lib
directory.
-> now open disigner and goto routines configure the path and routine names
-> Routines are stored in the Routines branch of the DataStage Repository, where
you

can create, view or edit.


1) Transform functions
2) Before-after job subroutines
3) Job Control routines

264. what is difference between shared-container and local-container?

-> A shared container can be shared between jobs and is great for re-usability.

-> local container can only be used in the job in which it is created,

it is useful for simplifying job design by breaking the job into different sections.

265. how do you connect to oracle ?

-> using odbc connections

266. what is TrimB and TrimF functions in datastage?

-> Trim - removes spaces and tabs (or %stripchar%)


-> TrimB - removes all trailing spaces and tabs
-> TrimF - removes all leading spaces and tabs

267. please explain any ETL process that you have developed?

-> u explain any stages ( including ur job)

268. What is hash file ? Types of hash files.

-> used for look-ups it is like a referance table


-> it is also used in-place od odbc,oci tables fo better performance
-> two types, static hashfile,dynamic hash file

269. If ur are doing any changes in shared-container will it reflect in all the jobs
wherever

you used this shared- container?

-> ya

270. have u written any custom routines in ur project? if so explain?

271. how to make read-only routines in datastage?is it possible if so how?


-> ya one aption is there
272. what are different field operators in routines?
273. How do you get log info into a file?
275. What is isolation level and when do you use them?

-> Transactions not only ensure the full completion (or rollback) of the statements
that

they enclose but also isolate the data modified by the statements.

-> The isolation level describes the degree to which the data being updated is
visible to

other transactions.

276. What is before job subroutine/After job subroutine? When do you use them?

-> Before-stage subroutine Contain the name (and value) of a subroutine that is

executed before the stage starts to process any data.

-> After-stage subroutine Contain the name (and value) of a subroutine that is

executed after the stage has processed the data

277. How do you clear the lock of a job?

->ds dirctor jobs cleanup resources

278. How do you backup and restore the project?

-> using datastage manager ( datastage web sphered 7.5px)

-> using datastage director ( datastahe information server8.1)

279. What is Clear Status File and when do you use it?

-> ds dirctor-> job -> clear status file

280. What is Cleanup Resources and when do you use it?

-> ds dirctor jobs cleanup resources ( we use at when job locking)

281. can i join a flat file and oracle and load into oracle?is this possible?

-> ya

282. how can u do incremental aggregation in datastage?

-> may be using change data capture (cdc)


283. while loading some data into target suddenly thier is a problem loading
process

stopped how can u start loading from the records that were left?

-> startin at error stoping( using restatble check points)

......................

284. what are the general problems that u face in datastage?

-> if transformer used check c++ compiler


-> if buildop errors try bulidop from command line
-> chek enveronment variables settings
-> examine genrated osh
-> some stages not supported rcp
-> null handling errors...........

285. Is it possible to rollback a set of jobs when after some jobs are executed, if the
latest

job fails?

-> no

286. Could DS generate data dictionary of source database and target database ?

-> ya

287. What are the various reports that could be generated using this tool ?

-> usage analysis

288. Could DS show record length, total db size for source and target based on
existing

source data to arrive at target

database sizing?

-> no

289. Does DS support data scrambling?

-> ya

290. Compare ETL features of DS with that of its competitors ? Chart is essential.
-> yes parallel
291. Does DS generate Audit Trail?
292. What other reports could DS provide apart from Impact Analysis Document ? Is

customizable reporting feature available?

-> ya (one option is there)

293. Could DS generate test cases to verify the veracity of mapping etc at design
time to

validate the mapping and transforms?

-> ya

294. Is there any mechanism available in Transformer Stage to show the left out

nand un mapped fields from source and target stages?

-> RCP

295. Could DS handle object relational database table structures?

-> yes

296. Does Meta Data export of Copy book and that data in the same available in
DS ?

-> yes

297. How is data compression handled in DS ?

-> compare stage

peter.zeglis@ascentialsoftware.com

Learn it from ValueCap. They are ex-developers of PX. They can conduct on-site px

course (with all bell and whistles).

lucy.luzza@ascentialsoftware.com

CSC (Hyderabad) – 01/06/07.

298. How you are getting your source?

-> flat files

299. How many routines you have made so far?

-> 4
300. How will you connect to ODBC?

-> computer controle pannel -> admin tools -> data source -> system dsn-> dsn

name,tsn name, pwd

301. Is ODBC is used to connect the database or to store the data?

-> ya

302. Your output from ETL is used to analyses the data for others? So, what type of

data ware house it is?

-> enterpize datawarehouse

303. Your project is slowly changing dimension, then how frequently you are getting

your data? i.e weekly/monthly

-> weekly or monthly for user rquirement

304. What are different source systems?

-> flatfiles,oracle
WIPRO (Chennai) – 01/06/07.
305. What is surrogate key?

-> A surrogate key is a system-generated (non-meaningful from a business

perspective).it specified by uniqness


-> It is the system generated key which cannot be edited by the user.
-> It is the primary key of the dimension table in warehouse.
-> It is nothing but the sequence which generates the numbers for primary key

column.

306. What is the difference between primary key and surrogate key?

-> primary key is combitnation of uniqe and notnull.

-> surrogate key is system genrated seqvance key

307. Where we use the primary key and where we use surrogate key?

-> primary key is combitnation of uniqe and notnull.( ex: empid,customerid....)

-> surrogate key is system genrated seqvance key ( suppose the primary key
repeted

in many times we genrate surrgote key)


308. How you get the timestamp in date type in Oracle?

-> SELECT CURRENT_TIMESTAMP FROM dual;

-> TIMESTAMP datatype which stores all the information that the DATE datatype

stores, but also includes fractional seconds.

309. What is multiple instances?

-> same job run in simultaneously on the same computer.

310. How you compare two tables using multiple instances?

-> If your goal is to compare two tables, send the output of a select * from my_table

order by my_key to a file

for each table and, then, use a compare utility (like UNIX diff).

311. How you control jobs in data stage?

-> ya, using job properties

-> job control lanuage it is used to run more number of jobs at a time with or

without using loops.

312. What is the use of parameters?

-> Dynamic value passing at runtime(while running of job)

-> so ur job wil be dynamic take that parameters and do perform the work.

313. How do you give the parameters? Is it by hard coding or passing?

-> passing
314. If the data in text file, is you use to hard code the parameters or pass ?
315. How you pass the parameters?

-> IN JOBPARAMATERS AT RUN TIME


316. How you trigger the job in windows?
317. If the data is in 10 files and to load in table, which stage will you prefer?

-> same metadata (SEQ FILE SET)

318. What is the difference between dataset and sequential file?

SEQFILE:
-> excute seqvance mode( read and write also seqvance)
-> stored capcity : 2gb stored in hard disk
-> did't support lookupfile
-> support any extention like .txt,.csv....
-> performance less compare to dataset

dataset:

-> excute parall (read and write also parall)


-> srored capacity more syored in logical memory
-> support lookup file
-> support only .ds extention
-> performance high compare to seq file

319. How can you see the dataset in UNIX and windows?

-> orchadmin dump <dataset>.ds

320. How can you delete the dataset in UNIX?

-> create reusable object that many jobs within the project can include.

-> When we go for parallel shared container the logic can be reusable across many

jobs
321. What is the function of shared container?
322. You have the sequential files in funnel stage, the records of first file has to
come

first in target

and second file records next and finally third. How you can achieve this?

-> seqvance funnel

323. What is the difference between remove duplicate & sort stage?

-> REMOVE DUPLICATE: just remove the the duplicate data based on key

-> SORT: just sot the data(ascending or desending) based on key. remove
duplicates

also posible in sort stage.

324. Is it not possible to delete the duplicate in sort stage?

-> no

You might also like