Informatica FAQ: Data Warehousing Basics
Informatica FAQ: Data Warehousing Basics
1. What is a Data Warehouse? A Data Warehouse is a collection of data marts representing historical data from different operational data source (OLTP). The data from these OLTP are structured and optimized for querying and data analysis in a Data Warehouse. 2. What is a Data mart? A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like Sales Dept, HR Dept, etc. The Data Mart are sometimes also called as HPQS (Higher Performance Query Structure). 3. What is OLAP? OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data. 4. What is OLTP? OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business. 5. What are Dimensions? Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension. 6. What are Confirmed Dimensions? The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions. 7. What are Fact Tables? A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the rd information in 3 normal form. A star schema is defined is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart. 8. What are the types of Facts? The types of Facts are as follows. 1. Additive Facts: A Fact which can be summed up for any of the dimension available in the fact table. 2. Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table. 3. Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available in the fact table. 9. What are the types of Fact Tables? The types of Fact Tables are: 1. Cumulative Fact Table: This type of fact tables generally describes what was happened over the period of time. They contain additive facts. 2. Snapshot Fact Table: This type of fact table deals with the particular period of time. They contain nonadditive and semi-additive facts. 10. What is Grain of Fact? The Grain of Fact is defined as the level at which the fact information is stored in a fact table. This is also called as Fact Granularity or Fact Event Level. 11. What is Factless Fact table?
The Fact Table which does not contains facts is called as Fact Table. Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table. 12. What are Measures? Measures are numeric data based on columns in a fact table. 13. What are Cubes? Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis. 14. What are Virtual Cubes? These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data. 15. What is a Star schema design? A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart. 16. What is Snow Flake schema Design? In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design. 17. What is Operational Data Store [ODS] ? It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data. 18. What is Denormalization? Denormalization means a table with multi duplicate key. The dimension table follows Denormalization method with the technique of surrogate key. 19. What is Surrogate Key? A Surrogate Key is a sequence generated key which is assigned to be a primary key in the system (table). 20. What are the client components of Informatica 7.1.1? Informatica 7.1.1 Client Components: Informatica Designer Informatica Work Flow Manager Informatica Work Flow Monitor Informatica Repository Manager Informatica Repository Server Administration Console.
1. 2. 3. 4. 5.
21. What are the server components of Informatica 7.1.1? Informatica 7.1.1 Server Components: 1. Informatica Server 2. Informatica Repository Server. 22. What is Metadata? Data about data is called as Metadata. The Metadata contains the definition of a data. 23. What is a Repository?
Repository is a centrally stored container which stores the metadata, which is used by the Informatica Power center server and Power Center client tools. The Informatica stores Repository in relational database format. Informatica 7.1.1 Repository has 247 database objects Informatica 6.1.1 Repository has 172 database objects Informatica 5.1.1 Repository has 145 database objects Informatica 4.1.1 Repository has 111 database objects
24. What is Data Acquisition Process? The process of extracting the data from different source (operational databases) systems, integrating the data and transforming the data into a homogenous format and loading into the target warehouse database. Simple called as ETL (Extraction, Transformation and Loading). The Data Acquisition process designs are called in different manners by different ETL vendors. Informatica ----> Mapping Data Stage ----> Job Abinitio ----> Graph 25. What are the GUI based ETL tools? The following are the GUI based ETL tools: Informatica DataStage Data Junction Oracle Warehouse Builder Abinitio Business Object Data Integrator Cognos Decision Stream. 26. What are programmatic based ETL tools? 1. Pl/Sql 2. SAS BASE 3. SAS ACCESS 4. Tera Data Utilities a. BTEQ b. Fast Load c. Multi Load d. Fast Export e. T (Trickle) Pump 27. What is a Transformation? A transformation is a repository object that generates, modifies, or passes data. Transformations in a mapping represent the operations the PowerCenter Server performs on the data. Data passes into and out of transformations through ports that you link in a mapping or mapplet. Transformations can be active or passive. An active transformation can change the number of rows that pass through it. A passive transformation does not change the number of rows that pass through it. 28. The following are details description of Transformations available in Informatica. Transformation Type Aggregator Active / Connected Description Performs aggregate calculations
1. 2. 3. 4. 5. 6. 7.
Application Qualifier
Source
Active / Connected
Active or Connected
Passive
Represents the rows that the Power Center Server reads from an application, such as an ERP source, when it runs a session. Calls a procedure in a shared library or DLL. Calculates a value Calls a procedure in a shared library or in the COM layer of windows. Filters data Defines mapplet input rows. Available in the Mapplet Designer Joins data from different databases of flat file systems. Looks up values Source qualifier for COBOL sources. Can also use in the pipeline to normalize data from relational or flat file sources. Defines mapplet output rows. Available in the Mapplet Designer. Limits records to a top or bottom range. Router data into multiple transformations based on group conditions. Generates primary keys. Sorts data base4d on a sort key. Represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. Calls a stored procedure. Defines commit transactions. and rollback
Passive / Connected Active / Connected or Unconnected Active / Connected Passive / Connected Active / Connected Passive / Connected or Unconnected Active / Connected
Passive / Connected Active / Connected Active / Connected Passive / Connected Active / Connected Active / Connected
Passive / Connected or Unconnected Active / Connected Active / Connected Active / Connected Active / Connected
Merges data from different databases or flat file systems. Determines whether to insert, delete, update, or reject rows. Reads data from one or more input ports and outputs XML through a single output port.
Reads XML from one input port and outputs data to one or more output ports. Represents the rows that the PowerCenter Server reads from an XML source when it runs a session.
29. What are features of Informatica Repository Server? Features of Informatica Repository Server. 1. Informatica client application and Informatica server access the repository database tables through the Repository Server. 2. Informatica client connects to the repository server through the host name/ IP address and its port number. 3. The Repository Server can manager multiple repository on different machines on the network. 4. For each repository database registered with the Repository Server it configures and manages a Repository Agent process. 5. The Repository Agent is a multi-threaded process that performs the action needed to retrieve, insert and updated metadata in the repository database tables. 30. How many types of Repositories are there? There are three types of Repositories: 1. Standalone Repository 2. Global Repository 3. Local Repository 31. What are the types of metadata stored in Repository? The following types of metadata are stored in Repository: Database connections Global objects Mappings Mapplets Multi-dimensional metadata Reusable Transformations Sessions and Batches Shortcuts Source Definitions Target definitions Transformations 32. What are the types of locks in Repository? There are two types of Locks in Repository: Read Lock Write Lock Execute Lock Fetch Lock Save Lock 33. What are Repository objects which we can export? We can export the following Repository objects: Sources Targets Transformations Mapplets Mappings
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
1. 2. 3. 4. 5.
1. 2. 3. 4. 5.
6. Sessions 34. What is a Work Flow? A Work Flow is a set of instructions on how to execute tasks such as sessions, emails and shell commands. A WorkFlow is created from Workflow Manager. 35. What are actions which can be performed by pmcmd command? We can perform the following actions with pmcmd: Check whether the Informatica server is running Start and stop sessions and batches Recover sessions. Stop the Informatica server. pmcmd returns zero on success and non zero on failure 36. What is commit interval? A commit interval is the interval at which the Informatica Server commits data to relational targets during a session. 37. What is the use of Stored Procedure Transformation? We use the Stored Procedure Transformation for populating and maintaining the database. 38. What is the use of partitioning the sessions? The partitioning of session increases the session performance by reducing the time period of reading the source data and loading the data into the target. 39. What is the uses of Lookup Transformation? The Lookup Transformation is useful for: 1. Getting a related value form a table using a key column value 2. Update slowly changing dimension table 3. To check whether records already exists in the table. 40. What is Polling? It displays the update information about the session in the monitor window. The monitor window displays the status of each session when you poll the Informatica Server. 41. What is a Parameter File? The parameter File is used to define the values of the parameters and variables used in a session. It is a file created in a notepad and saved with .prm extension. 42. What is Metadata Reporter? It is a web based application that enables you to run reports against the repository metadata. With a metadata reporter you can access information about your repository without having knowledge of SQL. 43. What is meant by Lookup Cache? The Informatica server builts a cache in memory when it process the first row of a data in a cached lookup transformation.
1. 2. 3. 4.
Locks the session and read the session properties Read the parameter file Expand the server and session variables and parameters. Verify permissions and privileges
45. What are the tasks performed by Sequence Generator Transformation? 1. Create keys 2. Replace missing values 3. Cycle through a sequential range of numbers.
46. What is the end value of the Sequence Generator? The end value of the Sequence Generator is 2147483647.
47. What are variables supplied by the Transaction Control Transformation? 1. 2. 3. 4. 5. TC_COMMIT_BEFORE TC_COMMIT_AFTER TC_ ROLLBACK_BEFORE TC_ROLLBACK_AFTER TC_CONTINUE_TRANSACTION [Default]
48. How to implement Update Strategy? To implement Update Strategy Transformation the source and target table should have primary keys to compare the records the records and to find out the latest changes happened.
49. What are constants of Update Strategy Transformation? The constants of Update Strategy Transformation are:
0 1 2 3
50 What are the benefits of Star Schema Design? Fewer tables Designed for analysis across time
51 What is Data Scrubbing? The Data Scrubbing is the process of cleaning up the junk in the legacy data and make it accurate and useful. Simply, making good data out of bad data.
52. What are Bad Rows (Rejected Rows)? The Informatica Server will dumped the bad or rejected rows which are sent out by the transformation into a text file with tablename.bad extension.
53. The Normalizer Transformation is mainly used to extract and format the Cobol files.
54. We can apply Distinct clause only in Source Qualifier and Sorter Transformations.
55. What are types of Dimensional Modeling? 1. Conceptual Modeling 2. Physical Modeling 3. Logical Modeling
56. What is Forward Engineering? Using the Erwin tool the data modeler will convert the .SQL script (logical structure of tables) into a physical structure tables at the database level, this is called as Forward Engineering. 57. What is common use of creating a Factless Fact Table? The most common use of creating a Factless fact table is to capture date transaction events. 58. What are the different sources of Source systems of Data Warehouse? 1. RDBMS 2. Flat Files 3. XML Files 4. SAP R/3 5. PeopleSoft 6. SAP BW 7. Web Methods 8. Web Services 9. Seibel 10. Cobol Files 11. Legacy Systems. 59. You cannot use XML source qualifier in a mapplet and Joiner and Normalizer Transformations.
60. What are the Session Partitions types? 1. Round-robin 2. Hash keys 3. Key range 4. Pass-through 5. Database partitioning
61. You cannot use Incremental Aggregation when the mapping includes an aggregator transformation. 62.While importing source definition the metadata that will be imported are: 1. Source Name 2. Database Location 3. Column Names 4. Data Types 5. Key Constraints 63. We can stop the Batch by two ways: 1. Server Manager 2. By pmcmd command 64. What is stop the Batch and types of Batches? Grouping of sessions is known as Batch. There are two types of batches: 1. Sequential 2. Concurrent 65. What is a tracing level and types of Tracing level? Tracing level represents the amount of information that Informatica server writes in a log file. Types of Tracing levels are: 1. 2. 3. 4. Normal Verbose Verbose lnit Verbose Data
66. What is the default join that source qualifier provides? Inner Join 67. Types of Slowly Changing Dimensions: 1. Type 1 (Recent updates) 2. Type 11 (Full historical information) 3. Type 111 (Partial historical information) 68. What are Update Strategys target table options? 1. Update as Update: Updates each row flagged for update if it exists in the table. 2. Update as Insert: Inserts a new row for each update. 3. Update else Insert: Updates if row exists, else inserts. 69. What is a Data in a database this include the source of tables, the meaning of the keys and the relationship between the tables. 70. In Conceptual Modeling and Logical modeling the tables are called as entities.
71. What does a Mapping document contains? The Mapping document contains the following information : 1. Source Definition from where the database has to be loaded 2. Target Definition to where the database has to be loaded 3. Business Logic what logic has to be implemented in staging area. 72. What does the Top Down Approach says? The Top Down Approach is coined by Bill Immon. According to his approach he says First we need to implement the Enterprise data warehouse by extracting the data from individual departments and from the Enterprise data warehouse develop subject oriented databases called as Data Marts. 73. What does the Bottom Up Approach or Ralph Kimball Approach says? The Bottom Down Approach is coined by Ralph Kimball. According to his approach he says First we need to develop subject oriented database called as Data Marts then integrate all the Data Marts to develop the Enterprise data warehouse. 74. Who is the first person in the organization to start the Data Warehouse project? The first person to start the Data Warehouse project in a organization is Business Analyst. 75. What is a Dimension Modeling? A Dimensional Modeling is a high level methodology used to implement the start schema structure which is done by the Data Modeler. 76. What are the types of OLAPs ? 1. DOLAP: The OLAP tool which words with desktop databases are called as DOLAP. Example: Cognos EP 7 Series and Business Objects, Micro strategy. 2. ROLAP: The OLAP which works with Relational databases are called as ROLAP. Example: Business Object, Micro strategy, Cognos ReportNet and BRIO. 3. MOLAP: The OLAP which is responsible for creating multidimensional structures called cubes are called as MOLAP. Example: Cognos ReportNet. 4. HOLAP: The OLAP which uses the combined features of ROLAP and MOLAP are called as HOLAP. Example Cognos ReportNet. 77. What is the extension of Repository backup? The extension of the Repository backup is .rep 78. Which join is not supported by Joiner Transformation? The non-equi joins are not supported by joiner Transformation. 79. What is SQL Override? Applying the joining condition in the source qualifier is called as sql override. 80.What is Rank Index? When you create a Rank Transformation by default Rank Index port will be created, to store the number of ranks specified. 81. What is Sort Key? The column on which the sorting takes place in the Sorter Transformation is called as Sort Key Column. 82. What is default group in Router Transformation? In the Router Transformation the rejected rows are captured by default group and the data will be passed to target table. 83. What is unconnected Transformation?
The transformation which does not involve in mapping data flow is called as Unconnected Transformation. 84. What is Connected Transformation? The Transformation which involve in mapping data flow is called as connected transformation. By default all the transformation are connected transformation. 85. Which Transformation is responsible to maintain updates in warehouse database? Update Strategy Transformation. 86. What are the caches contained by the Look up Transformation? Static Lookup cache Dynamic Lookup Cache Persistent Lookup Cache Data cache Index cache 87. What are the Direct and Indirect methods in the Flat file extraction? In the direct method the extract the flat file by using its own meta data. In indirect method we extract all the flat files by using one flat files meta data. 88. What is the Maplet? Mapplet is type of meta data object which contains set of reusable transformation logic which can be reused in multiple mapping. A maplet contains one maplet input Transformation and one maplet output Transformation. 89. What is the basic difference between reusable transformation and mapplet? Maplets are set of reusable transformation logic and reusable transformations are created by single transformation logic. 90. What is Target Load Planer? The Target Load plan is the order in which we should load the target to implement the Data Acquisition Process. 91. What is Constraint Based Load ordering? The Constraint Based Load ordering specified the loading of the dimensions tables based on the constraints designed in the dimension table. The Constraint Based Load order is used for implementing snow-flake schema data loading. 92. How may Loading criteria? There are three types of Loading criteria. 1. Paralle loading 2. Sequential 3. Control flow loading 93. What is File Watch Event? The Event Wait activity of a session has event called as File Watch which will watch wether the file is copied or not. 94. What is worklet? The worklet is a group of sessions. To execute the worklet we have to create the workflow. 95. Why we use stored procedure transformation? For populating and maintaining databases. 96. Why we use partitioning the session in Informatica?
1. 2. 3. 4. 5.
Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into target. 97. Why we use lookup transformation? Look up Transformations can access data from relational tables that are not sources in mapping. With Lookup transformation, we can accomplish the following tasks. 98. Which transformation should we use to normalize the COBOL and relational sources? When you drag the COBOL source into the Designer workspace, the normalized transformation automatically appears, creating input and output ports for every column in the source. 99. Which tool you use to create and manage sessions and batches and to monitor and stop the Informatica server? Informatica server manager. 100. What are the types of data that passes between Informatica server and stored procedure? There are three types of data 1. Input/output parameter 2. Return Values 3. Status code 101. What are the groups available in Router Transformation? 1. User defined group 2. Default group 102. What are join types in Joiner Transformation? The joins supplied by the Joiner Transformation are: 1. Normal Join 2. Master Outer Join 3. Detail Outer Join 4. Full Outer Join 103. What are the designer tools available for creation of Transformations? 1. Mapping Designer 2. Transformation Developer 3. Mapplet Designer 104. What are the basic needs to join two sources in Source Qualifier? The two source tables should have a primary key foreign key relationship and the two source tables should have matching data types. 105. What is a Status code? Status code provides error handling facility during the session execution. 106. What is Data Driven? The Data Driven is the instruction which is fed to Informatica Server whether to insert/delete/update when using Update Strategy Transformation. 107. What are the tasks to be done to partition a session? Configure the session to partition the source data Install the Informatica on a machine with multiple CPU 108. In which circumstances the Informatica creates a reject file (bad file)? When it encounters the DD_REJECT in Update strategy Transformation Voilets database constraints file in the rows was truncated or overflowed.
109.In a sequential batch can you run the session if previous session fails? Yes, by setting the option always runs the session. 110. How many ways your create ports? Two ways: 1. Drag the prot from another transformation 2. Click the add button on the ports tab. 111. How can you stop the batch? By using server manager or pmcmd. 112.How can you improve session performance in aggregator transformation? Use sorted input. 113. Can you use the mapping parameters or variables created in one mapping into any other reusable transformation? Yes, because reusable transformation is not contained with any mapplet or mapping. 114. Can you use the mapping parameters or variables created in one mapping into another mapping? No. We can use mapping parameters or variables in any transformation of the same mapping or mapplet in which you have created mapping parameters or variables. 115. Can you start a session inside a batch individually? We can start our required session only in case of sequential batch in case of concurrent batch. We can do like this. 116. Can you start batches with in a batch? You cannot. If you wants to start batch that resides in a batch, create a new independent batch and copy the necessary sessions into the new batch. 117. Can you generate reports in Informatica? Yes. By using Metadata reporter we can generate reports in Informatica. 118. Can you copy the batches? No. 119. After dragging the ports of there sources(sql server, oracle, Infomix) to a single source qualifier, can you map these three ports directly to target? No, Unless and until you join those three ports I source qualifier you cannot map them directly. 120. What are Target Types on the Server? Target Types are File, Relational and ERP. 121.What is the aggregate transformations? Aggregate transformation allows you to perform aggregate calculations, such as averages and sums. 122. What are Target Options on the Servers? Target Options for File Target type are FTP File, Loader and MQ There are no target options for ERP target type. 123. How do you identify existing rows of data in the target table using lookup transformation? Can identify existing rows of data using Unconnected transformation. 124. What is Code Page used for?
Code page is used to identify characters that might be I different languages. If your are importing Japanese data into mapping, you must select the Japanese code page of source data. 125. What is a source qualifier? It represents all data queried from the source. 126. Where should you place the flat file to import the flat file definition to the designer? Place it in Local folder. 127. What are the settings that you use to configure the joiner transformation? 1. Master and detail source 2. Type of join 3. Condition of the join 128. What are the session parameters? Session parameters are like mapping parameters; represent values you might want to change between Sessions such as database connections or source files. 129. What are the methods for creating reusable transformations? There are two methods: 1. Design it in the transformation developer. 2. Promote a standard transformation from mapping designer. After you ass a transformation to the mapping, you can promote it to the status of reusable transformation. 130. What are the joiner caches? When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches bases on the master rows. After building the caches, the Joiner transformation reads records from the detail source and performs joins. 131. What are different options uses to configure the sequential batches? There are two options: 1. Run the session only if previous session completes successfully. 2. Always runs the session. 132. What are the data movement modes in Informatica? Data movement modes determine how Informatica server handles the character data. You chooses the data movement in the Informatica server configuration settings Two types of data movement modes available in Informatica. 1. ASCII mode 2. Uni code mode 133. What is difference between stored procedure transformation and external procedure transformation? Inner equi join. 134. What is difference between stored procedure will be compiled and external procedure transformation? In case of stored procedure transformation procedure will be compiled and executed in a relational data source. You needs data base connection to import the stored procedure in to yours mapping. Where as in external procedure transformation procedure or function will be executed out side of data source. That is you need to make it as a DLL to access in your mapping. No need to have data base connection in case of external procedure transformation. 135. To achieve the session partition what are the necessary tasks you have to do? 1. Configure the session to partition source data. 2. Install the Informatica server on a machine with multiple CPUS
136. Performance tuning in Informatica? The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica server. 137.When can session fail? The session fails when, Server cannot allocate enough system resources Session exceeds the maximum no. of sessions Server cannot obtain an execute lock for the session Server encounters database errors Network related errors 138. How many ways you can update a relational source definition? There are ways you can update a relational source definition: 1. Edit the definition 2. reimport the definition 139. How many ways you can create a Reusable Transformation? There are two ways to create a Reusable Transformation 1. By designing it in the transformation developer 2. By promoting the already existing Transformation to reusable from its properties. 140. What is a aggregator cache in aggregate transformation? The aggregator transformation stores the data in the aggregator cache until it completes aggregate calculations. 141. What is data cache and Index cache? When you use aggregator transformation in your mapping then Informatica server creates Data and Index cache in memory to process the transformation. 142. What is a Mapping Variable? A Mapping Variable represents a value that can change throughout the session. The Informatica server saves the value of mapping variable in repository in the send of the session and uses it for the next session run. 143. In which scenario does the Update Strategy Transformation is best suited? Within a session: When you configure a session, you can instruct the Informatica server to either treat all records in same way (treat all as insert/treat all as update/treat all as update) or use instructions coded into the session to flag records for different database operations. Within a Mapping: Within a mapping, you use the update strategy transformation to flag records for insert, update or reject. 144. What are the types of mappings in Getting Started Wizard? 1. Simple pass through mapping: loads a slowly growing fact or dimension table be inserting new rows. Use this map to loading new data into it. 2. Slowly growing target: Loads a slowly growing fact or dimension table be inserting new rows. Use this map to load new data without disturbing the existing data. 145. How can you recognize whether or not the data is added in the table in Type II dimension? 1. By version number 2. By flag value 3. By effective date range 146. Why you use Repository connectivity?
When you edit or schedule the session each time, Informatica server directly communicates the repository to check whether or not the session and users are valid. 147. What are the data movement modes in Informatica? The data movement modes determines how Informatica server handles the character data. There are two types of data movement modes: 1. ASCII mode 2. Uni code mode 148. Can you copy the session to a different folder or Repository? Yes. By using the copy session wizard you can copy a session in a different folder or Repository. But first you should copy the mapping of that session before you copy session. 149. What is the difference between partitioning of relational target and partitioning of file target? If you partition a session with a relational target Informatica server creates multiple connections to the target database to write target data concurrently. If you partition a session with file target the Informatica server create one target file for each partition. 150. What are the Transformations that restrict the partition of sessions? 1. Advanced External Transformation 2. External Procedure Transformation 3. Aggregator Transformation 4. Joiner Transformation 5. Normalizer Transformation 6. XML Targets 151. What is a Power Center Repository? The Power Center Repository allows you to share metadata across repositories to create a data mart domain. In a data mart domain, you can create a single global repository to store metadata used across an enterprise and a number of local repositories to share the global metadata as needed.
Sources Targets
Legacy: Mainframes (DB2, VSAM, IMS, IDMS, Adabas)AS400 (DB2, Flat File)
Remote Targets Remote Sources This is the sufficient knowledge to start with Informatica. So lets go straight to development in Informatica.
Repository Service: Repository Service is the one that understands content of the repository, fetches data from the repository and sends it back to the requesting components (mostly client tools and integration service) PowerCenter Client Tools: The PowerCenter Client consists of multiple tools. They are used to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The PowerCenter Client connects to the repository through the Repository Service to fetch details. It connects to the Integration Service to start workflows. So essentially client tools are used to code and give instructions to PowerCenter servers. PowerCenter Administration Console: This is simply a web-based administration tool you can use to administer the PowerCenter installation.
There are some more not-so-essential-to-know components discussed below: Web Services Hub: Web Services Hub exposes PowerCenter functionality to external clients through web services. SAP BW Service: The SAP BW Service extracts data from and loads data to SAP BW. Data Analyzer: Data Analyzer is like a reporting layer to perform analytics on data warehouse or ODS data. Metadata Manager: Metadata Manager is a metadata management tool that you can use to browse and analyze metadata from disparate metadata repositories. It shows how the data is acquired, what business rules are applied and where data is populated in readable reports. PowerCenter Repository Reports: PowerCenter Repository Reports are a set of prepackaged Data Analyzer reports and dashboards to help you analyze and manage PowerCenter metadata.
Informatica Transformations
Normalizer Transformation Active & Connected. The Normalizer transformation processes multiple-occurring columns or multipleoccurring groups of columns in each source row and returns a row for each instance of the multipleoccurring data. It is used mainly with COBOL sources where most of the time data is stored in denormalized format. You can create following Normalizer transformation: *VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. VSAM stands for Virtual Storage Access Method, a file access method for IBM mainframe. *Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. This is default when you create a normalizer transformation. Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions. Rank Transformation Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest or smallest numeric value in a port or group or to return the strings at the top or the bottom of a session sort order. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products. As an active transformation, it might change the number of rows passed through it. Like if you pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from the Rank transformation to another transformation. You can connect ports from only one transformation to the Rank transformation. You can also create local variables and write nonaggregate expressions. Router Transformation Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test data. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition and route it to a default output group. If you need to test the same input data based on multiple conditions, use a Router transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient.
Sequence Generator Transformation Passive & Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing primary keys. It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports. Likewise, you cannot add ports to the transformation. NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value. You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target. For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero by default, and the Integration Service does not cache values during the session.For non-reusable
Sequence Generator transformations, setting Number of Cached Values greater than zero can increase the number of times the Integration Service accesses the repository during the session. It also causes sections of skipped values since unused cached values are discarded at the end of each session. For reusable Sequence Generator transformations, you can reduce Number of Cached Values to minimize discarded values, however it must be greater than one. When you reduce the Number of Cached Values, you might increase the number of times the Integration Service accesses the repository to cache values during the session. Sorter Transformation Active & Connected transformation. It is used sort data either in ascending or descending order according to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct. When you create a Sorter transformation in a mapping, you specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order. Source Qualifier Transformation Active & Connected transformation. When adding a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier is used to join data originating from the same source database, filter rows when the Integration Service reads source data, Specify an outer join rather than the default inner join and to specify sorted ports. It is also used to select only distinct values from the source and to create a custom query to issue a special SELECT statement for the Integration Service to read source data SQL Transformation Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors. Stored Procedure Transformation Passive & Connected or UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements. Transaction Control Transformation Active & Connected. You can control commit and roll back of transactions based on a set of rows that pass through a Transaction Control transformation. Transaction control can be defined within a mapping or within a session. Components: Transformation, Ports, Properties, Metadata Extensions. Union Transformation Active & Connected. The Union transformation is a multiple input group transformation that you use to
merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows. Rules 1) You can create multiple input groups, but only one output group. 2) All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups. 3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation such as a Router or Filter transformation. 4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation. 5) The Union transformation does not generate transactions. Components: Transformation tab, Properties tab, Groups tab, Group Ports tab. Unstructured Data Transformation Active/Passive and connected. The Unstructured Data transformation is a transformation that processes unstructured and semi-structured file formats, such as messaging formats, HTML pages and PDF documents. It also transforms structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT. Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy. Update Strategy Transformation Active & Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. It flags rows for insert, update, delete or reject within a mapping. XML Generator Transformation Active & Connected transformation. It lets you create XML inside a pipeline. The XML Generator transformation accepts data from multiple ports and writes XML through a single output port. XML Parser Transformation Active & Connected transformation. The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the pipeline. XML Source Qualifier Transformation Active & Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. has one input or output port for every column in the XML source. External Procedure Transformation Active & Connected/UnConnected transformation. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.
Advanced External Procedure Transformation Active & Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.
Q. Which all databases PowerCenter Server on UNIX can connect to? A. PowerCenter Server on UNIX can connect to following databases: IBM DB2 Informix Oracle Sybase Teradata Infomratica Mapping Designer Q. How to execute PL/SQL script from Informatica mapping? A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure. Q. How can you define a transformation? What are different types of transformations available in Informatica? A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Below are the various transformations available in Informatica: Aggregator Application Source Qualifier Custom Expression External Procedure Filter Input Joiner Lookup Normalizer Output Rank Router Sequence Generator Sorter Source Qualifier Stored Procedure Transaction Control Union Update Strategy XML Generator XML Parser
XML Source Qualifier Q. What is a source qualifier? What is meant by Query Override? A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation. PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query. Q. What is aggregator transformation? A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to perform calculations on groups. The Expression transformation permits calculations on a row-by-row basis only. Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator transformation outputs the last row of each group unless otherwise specified in the transformation properties. Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE. Q. What is Incremental Aggregation? A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally. Q. How Union Transformation is used? A. The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements. Q. Can two flat files be joined with Joiner Transformation? A. Yes, joiner transformation can be used to join data from two flat file sources. Q. What is a look up transformation? A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup
condition. Later returned values can be passed to other transformations. Q. Can a lookup be done on Flat Files? A. Yes. Q. What is the difference between a connected look up and unconnected look up? A. Connected lookup takes input values directly from other transformations in the pipleline. Unconnected lookup doesnt take inputs directly from any other transformation, but it can be used in any transformation (like expression) and can be invoked as a function using :LKP expression. So, an unconnected lookup can be called multiple times in a mapping. Q. What is a mapplet? A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to reuse that transformation logic in multiple mappings. Q. What does reusable transformation mean? A. Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored as a metadata separate from any other mapping that uses the transformation. Whenever any changes to a reusable transformation are made, all the mappings where the transformation is used will be invalidated. Q. What is update strategy and what are the options for update strategy? A. Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target table. If the row has to be updated/inserted based on some logic Update Strategy transformation is used. The condition can be specified in Update Strategy to mark the processed row for update or insert. Following options are available for update strategy : DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is 0. DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1. DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2. DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of DD_REJECT is 3.