Informatica Basic Interview Questions
https://www.naukri.com/code360/library/informatica-interview-questions
1. What is Informatica PowerCenter?
Informatica PowerCenter is a tool used for extraction, transformation, and
loading (ETL) tool. These tools are used in building enterprise data
warehouses. The components within Informatica PowerCenter help users
to connect, fetch, and process data from various sources and transform it
as per business requirements, and load it into a target data warehouse.
Users can, for instance, connect to an Oracle or SQL Server database, or
both, and integrate the data from two databases into a third system.
2. What are the components of Informatica?
The components of Informatica includes:
PowerCenter Designer: A tool for designing and developing data
integration mappings
PowerCenter Workflow Manager: A tool for scheduling and
executing data integration workflows
PowerCenter Workflow Monitor: A tool for monitoring the
execution of data integration workflows
PowerCenter Repository Manager: A tool for managing the
Informatica repository, which stores metadata about data sources,
targets, and mappings
Informatica Integration Service: A service that executes data
integration mappings
3. What are data types in Informatica?
The data types in Informatica are:
Numeric: Integer, decimal, float, double
Character: String, char
Date and Time: Date, time, timestamp
Boolean: True, false
Binary: Binary data
4. What is Lookup transformation?
It is used to look up data in a relational table through mapping. Any
relational database's lookup definition is imported from a source that has
a tendency to connect clients and servers.
5. What is the meaning of Lookup transformation?
The primary use of Lookup transformation is to find the source qualifier, a
target, or other sources to get the relevant data or information. Various
types of files can be searched in the Lookup transformation. The multiple
lookup transformations can be used in mapping and are compared with
the lookup input port values.
6. Name the different types of ports that create the lookup
transformations.
The different types of ports that create the lookup transformation are
Input port
Output port
Lookup port
Return port
7. How many repositories are created in the Informatica workflow
manager?
In Informatica, the Repositories can be created in the workflow manager
depending upon the required number of ports.
8. What do you mean by a domain?
A domain is a collection of nodes and services which helps improve data
management. It is a collection of resources and services that are managed
as a single unit.
9. Can all mappings in the repository be validated
simultaneously?
All the mappings in the repository cannot be validated simultaneously
since each time, and it can only validate one mapping.
10. Explain the Aggregator transformation.
It allows the execution of calculations involving sums, averages, and other
aggregations. On the contrary, expression transformation allows for group
calculations.
11. How are duplicate rows deleted from flat files?
Duplicate rows in flat files can be deleted by comparing each row to
others and removing any duplicates based on the data in the rows. Also,
we can use the sorter transformation and a distinct option to delete
duplicate rows.
12. What is the phrase Staging Area?
Transitory tables associated with the workspace or reality tables used to
prepare information are in the organizing zone.
13. List the use cases of Informatica?
Informatica is a powerful tool that can be used to address a wide range of
data integration and management challenges. By using Informatica,
organizations can improve the quality, consistency, and accessibility of
their data, which can lead to better business decisions and improved
outcomes.
14. What are mapplets?
• It is a reusable object created in the Mapplet Designer
• It consists of a set of transformations and lets us reuse that
transformation logic in multiple mappings
15. Explain the difference between Informatica and DataStage.
Below is a table that summarizes the key differences between Informatica
and DataStage:
Feature Informatica DataStage
Comprehensiveness More Less comprehensive
comprehensive
User-friendliness More user-friendly Less user-friendly
Cost More expensive Less expensive
Performance Good Excellent
Scalability Good Excellent
Flexibility Good Excellent
Customization Good Excellent
16. What is the role of a repository manager?
An administrative tool used to administrate and manage repository
folders, objects, groups, etc., is a repository manager.
A repository manager allows you to navigate multiple folders and
repositories and manage groups and user permissions.
17. What are data-driven sessions?
When you set up a session with an update strategy, the data-driven
session property instructs the Informatica server to use the instructions
coded in the mapping to flag the rows for insert, update, delete, or reject.
It can do it by mentioning “DD_UPDATE” or “DD_INSERT”, or “DD_DELETE”
in the update strategy transformation.
18. What is the target load order?
Target load order generally specifies the order in which an integration
service loads target tables. You can select a target load order based on
the source qualifier transformations in a mapping.
In Informatica, you can specify the order in which data is loaded into
targets when multiple source qualifier transformations connect to various
targets.
19. Differentiate between Mapping and Mapplet.
Mapping Mapplet
Mapping is a collection of source, target, and Mapplet is a collection of only
transformation. transformation.
Mapping is developed with different Mapplet can re-use with other
transformations but is not reusable. mapping and mapplets.
It is developed around what data move to the It is developed for complex
target and what modification is performed. calculations used in multiple
mappings
20. Explain the difference between active and passive
transformation.
Transformation can be classified into two types:
Active transformation Passive transformation
Here, the number of rows that pass from Unlike active transformations, passive
the source to the target is reduced as it transformations do not eliminate the
eliminates the rows that do not meet the number of rows, so all rows pass from
transformation condition. source to target without being modified.
Additionally, it can change the transaction Additionally, it can maintain the transaction
history or row type. boundary and row type.
21. Explain the code page compatibility.
Data loss cannot occur when data is moved between code pages, as long
as both code pages use the same character set. The target page must
contain all characteristics of the source page.
Additionally, if all the characters of the source page are not present on the
target page, then it would be a subset, and there will be loss of data loss
during transformation as the two code pages are not compatible.
22. How do pre-session and post-session shell commands
function?
For a session task, a command task can be called a pre-session or post-
session shell command. The user can run it as the pre-session command,
post-session command success, or post-session command failure.
The application of the shell commands can be changed or modified based
on use cases.
23. How many input parameters can exist in an unconnected
lookup?
Various numbers of input parameters can exist in an unconnected lookup.
For example, you can provide input parameters like column 1, column 2,
column 3, column 4, and so on. But every time, the return value will be
one.
24. Describe Expression transformation.
In this form of transformation, values can calculate in a single row before
writing on the target. It can use to carry out non-aggregate calculations.
Before output results are sent to the target tables, conditional statements
can also be tested.
25. What is Joiner transformation?
While a source qualifier transformation can combine data from a common
source, a joiner transformation combines two affiliated heterogeneous
sources located in different locations.
26. What is the purpose of the Source Qualifier Transformation?
The Source Qualifier Transformation is used to represent the rows that the
Integration Service reads from a source when running a session. It
converts the source data types to Informatica data types and provides
options to filter rows, join data, or sort data at the source level.
27. Explain the difference between Normal and Bulk loading in
Informatica.
Normal loading writes data to the target one row at a time and logs each
row, making it slower but recoverable. Bulk loading writes data in bulk,
which is faster but does not log individual rows, making recovery difficult
in case of failure.
28. What is the difference between a Repository and a Repository
Database?
A Repository is a collection of metadata and rules stored in a Repository
Database, which is a relational database used to store metadata about
mappings, sessions, and other Informatica objects.
29. What are mapplets and how are they different from
mappings?
Mapplets are reusable objects created with multiple transformations,
which can be reused in multiple mappings. Mappings, on the other hand,
are specific ETL workflows created for individual data processing tasks.
https://www.simplilearn.com/top-informatica-interview-questions-and-
answers-article#informatica_interview_questions_for_experienced
1. What are the advantages of Informatica over other ETL tools?
Informatica is the world’s most popular data integration tool. It
interoperates with the widest range of different standards, systems, and
applications; it’s fast; and it is designed to adapt to the constant change
in the field/market, the organization, and the system. Also, you can easily
monitor jobs, and it’s easy to identify the causes of failed jobs.
2. What are the main components of Informatica?
The main features of Informatica are the client tools/applications, server,
repository server, and repository.
3. What can Informatica be used for in an organization?
Informatica can be used for data migration—for example, a company is
transitioning from an older mainframe system to a new database system;
data warehousing—an ETL tool would be needed for moving data from the
production system to the warehouse; data integration—incorporating data
from multiple databases or file-based systems, for example; and cleaning
up data.
4. What is an enterprise data warehouse?
An enterprise data warehouse is a single unified database that holds an
organization’s business information and distributes it throughout the
company. There are variations, but it likely includes a unified approach to
sorting and presenting data, and data can be classified and accessed
according to the subject.
5. Describe an Informatica workflow.
In the workflow manager, you build a workflow by logically connecting
tasks to execute code (for example, scripts). The final workflow will
automatically run all the tasks within it in the specified order.
6. What is the domain?
A domain is composed of relationships and nodes covered by one
organizational point.
7. What are some of the types of transformation?
Some transformation types are aggregator, expression, filter, joiner,
lookup, rank, router, and normalizer.
8. What’s the difference between active and passive
transformation?
An active transformation can change the number of rows that pass
through it, can change the transaction boundary and can change the
actual row type. A passive transformation doesn’t change either the
number of rows that pass through it or the row type and doesn’t change
the transaction boundary.
9. Why might router transformation be better than filter
transformation?
With router transformation, you can have better performance, and it’s less
complex and more efficient than filter transformation.
10. Why would you want to partition a section?
It improves the server’s efficiency; other transformations are carried out in
parallel.
11. What’s the difference between a mapping parameter and a
mapping variable?
Mapping variables, as the name implies, are values that change during a
session’s execution. Values that don’t change are called parameters.
12. How would you self-join in an Informatica mapping?
To self-join, place one transformation minimum between the source
qualifier and the joiner in one branch minimum. You must pre-sort the
data and then configure the joiner to accept sorted input.
13. What are the different join types within a joiner
transformation?
There are four join types: normal join, master outer join, detail outer join
and full outer join.
14. What are the different dimensions in Informatica?
Three dimensions are available in Informatica: junk, degenerative and
conformed.
15. What is the difference between a session and a batch?
A session is a set of commands by which the server moves data to the
target. A batch is a set of individual tasks.
16. How many sessions can be grouped in a batch?
There is no limit to the number of sessions that can comprise a batch. But
the fewer the sessions, the easier the migration.
17. Describe the modes of data movement.
In the normal mode of data movement, a separate DML stmt is prepared
and executed. In the bulk mode, a DML stmt is prepared and executed for
multiple records at a time, improving efficiency.
18. What is the aggregator cache used for?
It stores transitional files found in the local buffer memory, and stores
transformation values if extra memory is required.
19. What is the persistent lookup cache?
This data is stored on the server, saving time because a database query
doesn’t need to happen when a lookup occurs.
20. What are Mapplets?
In the Mapplet Designer, you create mapplets, which are reusable objects
that contain a set of transformations.
21. Describe the differences between a mapplet and a mapping.
Mapplets contain only transformations, can be reused and are developed
for complex calculations. Mappings include source, target, and
transformations; they are not reusable; and are used for less complex
calculations than mapplets, such as for what data to move to a target.
22. How does a pipeline partition improve performance?
A pipeline partition lets you divide a pipeline into different
reader/transformation/writer threads. The integration service can run the
different partitions within the mapping at the same time, increasing
efficiency.
23. What are some other types of partitioning aside from pipeline
partitioning?
Other types of partitioning include database partitioning, round-robin
partitioning, key-range partitioning, pass-through partitioning, hash user-
keys partitioning and hash auto-keys partitioning.
24. Describe the differences between an SQL override and a
lookup override.
When you want to limit the number of rows entering a mapping pipeline,
you’d use an SQL override. When you want to limit the number of lookup
rows to avoid scanning an entire table, you’d use the lookup override.
Lookup override provides only one record even if multiple records for a
condition exist. Also, SQL override doesn’t use the “order by” clause—you
have to manually enter it in the query.
25. What are the configurable commit types?
There are three configurable commit types: target-based, source-based
and user-defined.
26. What is ETL (Extract, transform, Load) and write some ETL
tools?
ETL stands for Extract, Transform, and Load. It is a process used to move
data from one or more sources to a destination database, data warehouse,
or other data repository.
27. What is Informatica PowerCenter?
Write its components. Informatica PowerCenter is a data integration tool
used for extracting, transforming, and loading data from various sources
to a target system.
28. Write the difference between connected lookup and
unconnected lookup.
A connected lookup refers to a lookup transformation that is linked to the
pipeline flow, whereas an unconnected lookup is a standalone
transformation that is not linked to the pipeline flow. A connected lookup
retrieves data from the lookup table and passes it on to the next
transformation in the pipeline, while an unconnected lookup can be called
within another transformation to return a value.
29. An unconnected lookup can have how many input
parameters?
An unconnected lookup can have one or more input parameters.
30. Name the output files that are created by the Informatica
server at runtime.
The Informatica server creates log files, workflow log files, and session
files at runtime.
31. Can we store previous session logs in Informatica?
If yes, how? Yes, we can store previous session logs in Informatica. This
can be done by configuring the session to log all events and then saving
the log files to a designated location for future reference.
32. Explain data driven sessions.
Data-driven sessions are Informatica sessions that are executed based on
the data available in a specified file or table. This allows for the automatic
execution of sessions without manual intervention, as the session is
triggered based on the data available.
33. What is the target load order?
Target load order is the order in which the Informatica server inserts data
into the target tables. The target load order can be specified in the session
properties to ensure that data is inserted into the target tables in a
specific sequence.
34. What is the role of a repository manager?
The role of a repository manager in Informatica is to manage the
PowerCenter Repository, including creating and maintaining the
repository, managing user access, and backing up and restoring the
repository.
35. What are the different ways of parallel processing?
Different ways of parallel processing include:
Multi-threading: breaking down a task into multiple smaller tasks
that can be executed simultaneously by different threads.
Multi-processing: executing multiple tasks simultaneously by
dividing the work between various processors.
Distributed processing: breaking down a task into smaller tasks and
distributing them across multiple systems for processing.
36. What is OLAP, and write its type?
OLAP (Online Analytical Processing) is a data analysis technology that
provides multidimensional business data analysis. There are three types of
OLAP:
Relational OLAP (ROLAP)
Multidimensional OLAP (MOLAP)
Hybrid OLAP (HOLAP)
37. What is the scenario in which the Informatica server rejects
files?
The Informatica server may reject files for a few reasons, including:
The file format is not supported
The file size exceeds the limit set by the Informatica administrator
The file contains incorrect or corrupted data
38. What do you mean by surrogate key?
A surrogate key is a unique identifier generated by the system to replace
the natural primary key of a table. The main purpose of a surrogate key is
to provide a stable, unique identifier for a record, even if the natural
primary critical changes over time.
39. Give a few mapping design tips for Informatica.
A few mapping design tips for Informatica include:
Use source-based or incremental loading whenever possible to
reduce the amount of data processed.
Use the appropriate transformation for the task, as some changes
are more efficient than others.
Use caching where applicable, as this can reduce the number of
database hits and improve performance.
Minimize the number of changes in a mapping, as each
transformation adds overhead to the processing time.
40. How can we improve the performance of Informatica
Aggregator Transformation?
To improve the performance of Informatica Aggregator Transformation, the
following tips can be used:
Filter data before the Aggregator Transformation to reduce the
amount of data processed.
Use indexing to improve the performance of database lookups.
Use memory or disk-based aggregation, depending on the amount
of data being processed.
Use partitioning to split the data into smaller chunks for processing.
https://www.shiksha.com/online-courses/articles/informatica-scenario-
based-interview-questions-and-answers/
Q1. Differentiate between a database, a data warehouse, and a
data mart?
Ans. The database includes a set of sensibly affiliated data, which is
usually small in size as compared to a data warehouse. In contrast, in a
data warehouse, there are assortments of all sorts of data from where
data is taken out only according to the customer’s needs. Datamart is also
a set of data that is designed to cater to the needs of different domains.
Q2. Explain Informatica PowerCenter.
Ans. This is one of the commonly asked Informatica interview questions.
Informatica PowerCenter is a GUI based ETL (Extract, Transform, Load)
tool. This data integration tool extracts data from different OLTP source
systems, transforms it into a homogeneous format and loads the data
throughout the enterprise at any speed. It is known for its wide range of
applications.
Q3. Explain the difference between Informatica 7.0 and 8.0?
Ans. The main difference between Informatica 8.0 and Informatica 7.0 is
that in the 8.0 series, Informatica corp has introduced the power exchange
concept.
Q4. How will you filter rows in Informatica?
Ans. In Informatica, rows can be filtered in two ways:
Source Qualifier Transformation: Rows are filtered while reading data
from a relational data source.
Filter Transformation: Rows are filtered within a mapped data from
any source.
Q5. What is a Sorter Transformation?
Ans. Sorter transformation is used to sort the data in an ascending or
descending order based on single or multiple keys. It sorts collections of
data by port or ports.
Q6. What is Expression Transformation?
Ans. An expression transformation is a collective Powercenter mapping
transformation. It is a connected, passive transformation that calculates
values on a single row and can also be used to test conditional statements
before passing the data to other transformations.
Q7. What is Joiner Transformation?
Ans. The joiner transformation is an active and connected transformation
that helps to create joins in Informatica. It is used to join two
heterogeneous sources.
Q8. What is a Decode in Informatica?
Ans. In Informatica, we use the application of traditional CASE or IF which
is possible by the decode in Informatica. A decode in Informatica is a
function used within an Expression Transformation.
Q9. What is a Router Transformation?
Ans. The Router Transformation allows users to split a single pipeline of
data into multiple. It is an active and connected transformation that is
similar to filter transformation.
Q10. What is a Rank Transformation?
Ans. The Rank Transformation is active and connected used to sort and
rank the top or bottom set of records based on a specific port. It filters
data based on groups and ranks. The rank transformation has an output
port assigning a rank to the rows.
Q11. What is Filter Transformation?
Ans. Filter transformation is used to filter the records based on the filter
condition. It is an active transformation as it changes the no of records.
Q12. What is a Sequence Generator Transformation?
Ans. Sequence Generator Transformation generates primary fundamental
values or a range of sequence numbers for calculations or processing. It is
passive and connected.
Q13. What is a Master Outer Join?
Ans. A master outer join is a specific join typesetting within a joiner
transformation. In a master outer join, all records from the detail source
are returned by the join and only matching rows from the master source
are returned.
Q14. What are some examples of Informatica ETL programs?
Ans. Some examples of Informatica ETL programs are:
Mappings
Workflows
Tasks
Q15. What is a dimensional table? What are the different
dimensions?
Ans. This is one of the most important Informatica interview questions. A
Dimension table is a table in a star schema of a data warehouse.
Dimension tables are used to describe dimensions. They contain attributes
that describe fact records in the table.
For example, a product dimension could contain the name of the products,
their description, unit price, weight, and other attributes as applicable.
The different types of dimension tables are:
SCD (Slowly Changing Dimension):
The dimension attributes tend to change slowly with time rather than
changing in a regular intervals of time.
Conformed Dimension:
Conformed dimensions are exactly the same with every possible fact table
to which they are joined. It is used to maintain consistency.
This dimension is shared among multiple subject areas or data marts. The
same can be used in different projects without any modifications.
Junk Dimension:
A junk dimension is a collection of attributes of low cardinality. It contains
different transactional code flags or text attributes unrelated to any other
attribute. A junk dimension is a structure that provides a convenient place
to store the junk attributes.
Degenerated Dimension:
It is derived from the fact table and does not have its own dimension
table. The attributes are stored in the fact table, not as a separate
dimension table.
Role-playing dimension:
Role-playing dimensions are the dimensions used for multiple purposes
within the same database
Q16. What is star schema?
Ans. It is the simplest form of data warehouse schema that consists of
one or more dimensions and fact tables. It is used to develop data
warehouses and dimensional data marts.
Q17. Describe snowflake schema.
Ans. A snowflake schema is a fact table connected to several dimensional
tables such that the entity-relationship diagram resembles a snowflake
shape. It is an extension of a Star Schema and adds additional
dimensions. The dimension tables are normalized, which splits data into
additional tables.
Q18. What is a Mapplet?
Ans. A Mapplet is a reusable object containing a set of transformations
that can be used to create reusable mappings in Informatica.
Q19. What is a natural primary key?
Ans. A natural primary key uniquely identifies each record within a table
and relates records to additional data stored in other tables.
Q20. What is a surrogate key?
Ans. A surrogate key is a sequentially generated unique number attached
with each record in a Dimension table. It is used in substitution for the
natural primary key.
Q21. What is the difference between a repository server and a
powerhouse?
Ans. A repository server controls the complete repository, which includes
tables, charts, and various procedures, etc
A powerhouse server governs the implementation of various processes
among the factors of the server’s database repository.
Q22. How many repositories can be created in Informatica?
Ans. We can create as many repositories in Informatica as required.
Q23. Describe Data Concatenation.
Ans. Data concatenation is the process of bringing different pieces of the
record together.
Q24. How can one identify whether the mapping is correct
without connecting the session?
Ans. With the help of debugging options.
Q25. Name the designer tools for creating transformations.
Ans. Mapping designer, transformation developer, and mapplet designer
are used for creating transformations.
Q26. Differentiate between sessions and batches?
Ans. A session is a set of commands for the server to move data to the
target, while a batch is a set of tasks that can include one or more tasks.
Q27. What is Enterprise Data Warehousing?
Ans. Enterprise data warehousing is a process of creating a centralized
repository of operational data so that it can be used as per the reporting
and analytics requirements. It has a single access point, and the data is
provided to the server via only source store.
Q28. What are the different names of the Data Warehouse
System?
Ans. The Data Warehouse System has the following names –
Analytic Application
Business Intelligence Solution
Data Warehouse
Decision Support System (DSS)
Executive Information System
Management Information System
Q29. Name different available editions of INFORMATICA
PowerCenter.
Ans. Different editions of INFORMATICA PowerCenter are –
Standard Edition
Advanced Edition
Premium Edition
Q30. How to delete duplicate rows from flat files?
Ans. We can use the sorter transformation to delete duplicate rows from
flat files and select the distinct option.
Q31. What is the difference between Joiner and Lookup
transformations?
Ans. The differences between Joiner and Lookup transformations are:
Joiner Lookup
Joiner is an Active
It is a Passive transformation.
transformation.
Lookup transformation is used to get related values
It is used to join data from
from another table. It also helps in checking for
different sources.
updates in the target table.
It is not possible to do SQL It is possible to override the query by writing a
query override. customized SQL query.
Only the ‘=’ operator is All operators, such as = , < , > , <= . >= are
used. available.
It supports Normal, Master,
By default, it supports left outer join.
Detail, and Full Outer join.
https://www.hirist.tech/blog/top-65-informatica-interview-questions-and-
answers/#Informatica_Scenario_Based_Questions
9. How do you optimize performance in Informatica mappings?
Use pushdown optimization to process transformations in the
database.
Minimize the use of lookup transformations and enable caching.
Reduce the number of staging tables to limit I/O operations.
Use partitioning to process large datasets in parallel.
Optimize SQL queries in the Source Qualifier transformation.
10. Explain the concept of pushdown optimization.
Pushdown optimization shifts transformation logic to the database instead
of processing it in Informatica. It improves performance by reducing
network traffic and utilizing database indexing. Full pushdown, partial
pushdown, and source-side pushdown are the three levels used depending
on the scenario.
11. How do you handle error logging in Informatica
workflows?
Enable session logs and set logging levels.
Use reject files to capture error records.
Implement an error-handling framework with reprocessing logic.
Use Debugger in the Designer to troubleshoot mapping errors.
12. Describe the process of implementing Slowly Changing
Dimensions (SCD) in Informatica.
SCD handles historical data changes in a dimension table:
Type 1: Overwrites old data with new data (no history).
Type 2: Adds a new record with an effective date to track changes.
Type 3: Maintains a separate column for historical values.
13. How do you use Expression Transformation to perform
data manipulation in a mapping?
You might also come across expression transformation in Informatica
interview questions like this one.
Expression Transformation is used to modify data at the row level. For
example, to concatenate first and last names:
FULL_NAME = FIRST_NAME || ‘ ‘ || LAST_NAME
It also performs conditional logic using IIF statements, date conversions,
and mathematical operations before passing data to the next
transformation.
https://www.crsinfosolutions.com/informatica-interview-questions-2025/
#Beginner-Level-Informatica
1. What is Informatica, and how is it used in data integration?
Informatica is a powerful data integration tool widely used
for extracting, transforming, and loading (ETL) data from various
sources to a target system. It acts as a bridge between disparate data
sources, helping organizations unify and process their data efficiently. I
often describe Informatica as the backbone of modern data warehousing
because of its ability to handle large-scale data transformations and
ensure data quality. Its user-friendly interface and drag-and-drop
functionality make it accessible even to those who are new to ETL tools.
I use Informatica in my projects to create robust workflows that automate
the process of data movement and transformation. It supports a variety of
data formats, including relational databases, flat files,
XML, and cloud systems. One of the standout features of Informatica is
its error-handling capabilities, which allow me to identify and fix issues
during the ETL process. This makes it an essential tool for building reliable
and scalable data pipelines in complex systems.
2. Explain the difference between a repository server and a
repository database in Informatica.
The repository server and the repository database are two critical
components of Informatica that work together to manage metadata. The
repository server acts as the communication layer between Informatica
clients and the repository database. In my experience, it ensures that all
requests from users, such as retrieving or storing metadata, are processed
correctly. The server also provides version control and multi-user access to
enable collaborative development.
On the other hand, the repository database is where all the metadata is
physically stored. This includes information about mappings, workflows,
sessions, and other configurations I define in Informatica. Think of it as a
library where every book (metadata) is catalogued. The repository server
fetches these books when needed and ensures data consistency.
Understanding this distinction is key when troubleshooting connection
issues or optimizing the Informatica environment.
3. What are the key components of Informatica PowerCenter?
Informatica PowerCenter consists of several key components, each
playing a unique role in the ETL process. The first component is
the Repository Manager, which helps me manage the repository
database. It allows me to organize, back up, and retrieve the metadata
needed for building mappings and workflows. Without this, managing
large-scale projects would be chaotic.
Next is the Designer, a tool I use to create mappings that define how
data flows from source to target. It provides a visual interface where I can
use transformations like aggregator, filter, or lookup to manipulate the
data as required. The Workflow Manager is another critical component,
allowing me to define the execution flow of mappings. I use it to schedule
jobs, assign parameters, and link various tasks for seamless execution.
The final component I often interact with is the Monitor, which helps me
track the execution status of workflows. It gives me insights into
performance and error details, which are invaluable for debugging.
Together, these components make Informatica PowerCenter a
comprehensive solution for handling complex data integration projects.
4. How does Informatica handle ETL (Extract, Transform, Load)
processes?
Informatica handles the ETL process by dividing it into three distinct
stages: Extraction, Transformation, and Loading. During the
extraction phase, it connects to various sources, such as relational
databases, flat files, or cloud storage, and pulls raw data. What I
appreciate most is its ability to connect to multiple heterogeneous data
sources simultaneously. This capability helps me consolidate diverse
datasets into a unified pipeline.
The transformation stage is where Informatica truly shines. I can use a
variety of transformations, like filter, aggregator, and expression, to
clean, standardize, and enrich the extracted data. For example, I might
use an expression transformation to calculate derived fields like profit
margins. Here’s a simple example:
IIF(SALES > 1000, 'HIGH', 'LOW')
This logic categorizes sales data into “HIGH” or “LOW” based on a
threshold, ensuring my data is consistent and meaningful. The final stage,
loading, is where Informatica writes the processed data to the target
system, such as a data warehouse or a flat file. It provides flexibility in
terms of load strategies, supporting both incremental and bulk loading.
What I find especially useful is the error-handling mechanism in all three
stages. If an error occurs during any phase, Informatica generates detailed
logs that make troubleshooting easier. These features make it an
indispensable tool for managing end-to-end data pipelines.
5. What is a mapping in Informatica?
A mapping in Informatica defines the flow of data from source to target,
including any transformations applied in between. Think of it as a
blueprint for the ETL process. When I create a mapping, I start by
connecting the source definition, such as a table or file, to the target
definition, like a database table. Between these, I add transformations to
shape the data according to business requirements.
Mappings are highly flexible and allow me to incorporate complex
business logic. For example, I can include joiner transformations to
combine data from multiple sources or use filter transformations to
exclude unnecessary records. In one of my projects, I used a router
transformation to separate records based on region, ensuring that each
dataset was sent to the appropriate team. Here’s a simple example of a
mapping:
Source: Customer Data (CSV file)
Transformation: Filter Transformation (Excludes records with invalid
email addresses)
Target: Validated Customer Data Table
The mapping ensures data integrity by removing invalid entries before
loading. I also rely on mappings to automate repetitive tasks, saving time
and reducing errors. By leveraging these capabilities, I can build efficient,
reusable ETL pipelines tailored to specific business needs.
6. Define a session in Informatica and its purpose.
A session in Informatica is a task used to execute a mapping. It bridges
the gap between the design phase and execution, allowing me to run the
ETL logic defined in the mapping. I use sessions to configure source and
target connections, set parameters, and define error-handling rules. For
instance, if I’m loading data from a flat file to a database, the session
specifies how the file is read, processed, and written to the database.
In my experience, sessions are highly customizable. I can enable logging
to monitor the ETL process, define session-level transformations, or even
apply filters to fine-tune the data. The session also handles error recovery,
making it possible to restart workflows from the point of failure, ensuring
data consistency.
7. What is a workflow in Informatica, and how is it used?
A workflow in Informatica is a collection of tasks organized in a sequence
to automate the ETL process. I use workflows to manage the execution
flow, ensuring that dependencies between tasks are respected. For
example, I often create workflows that first validate the source data, then
transform it using a session, and finally load it into the target.
Workflows are incredibly flexible. Using the Workflow Manager, I can
add decision tasks to control the flow based on conditions, or event wait
tasks to pause execution until a specific trigger occurs. One of my
workflows used a decision task to rerun a session if the source data size
exceeded a threshold, ensuring efficient handling of large datasets.
8. Explain the concept of a source qualifier transformation in
Informatica.
The source qualifier transformation is automatically generated when I
add a relational source to a mapping. It acts as a link between the source
definition and Informatica, converting the raw source data into a format
Informatica can process. I use the source qualifier to filter rows, join
tables, or define custom SQL queries.
Here’s an example of a custom SQL query I used in a source qualifier:
SELECT customer_id, order_date, order_total FROM orders WHERE
order_total > 500;
This query extracts only high-value orders, reducing unnecessary
processing in the mapping. By filtering data early, I improve the overall
performance of my ETL pipelines.
9. What is the purpose of the aggregator transformation?
The aggregator transformation is used to perform calculations like
sum, average, or count on grouped data. It’s particularly useful when I
need to generate summaries or roll-up data for reporting. For example, in
one project, I used the aggregator to calculate the total sales per region.
Here’s a small example of an aggregator transformation:
Input: Transaction data with columns for region, product, and sales.
Aggregation Logic: Group by region and calculate the sum of sales.
Output: Total sales for each region.
Using the aggregator transformation simplifies complex calculations,
ensuring accuracy and scalability in large datasets.
10. What are active and passive transformations in Informatica?
Informatica classifies transformations as either active or passive based
on their impact on the row count. Active transformations can change
the number of rows passing through them, such as a filter transformation
that excludes unwanted data. In contrast, passive transformations do
not alter the row count, like an expression transformation used for
calculating derived values.
For example, a filter transformation that removes rows where sales <
100 is active because it reduces the dataset. On the other hand,
an expression transformation to add a new calculated column is
passive because the number of rows remains the same.
Understanding these distinctions helps me design efficient mappings
tailored to the needs of each project.
11. How does a lookup transformation work in Informatica?
A lookup transformation retrieves data from a reference table or file to
enrich the source data during processing. I use it frequently for validation
or to fetch additional details. For example, when processing transaction
data, I might use a lookup transformation to retrieve customer names
based on customer IDs.
Here’s an example of a lookup transformation logic:
Source: Transaction data with a customer_id column.
Lookup Table: Customer table
with customer_id and customer_name .
Output: Enriched transaction data with customer names included.
By configuring the cache settings, I ensure that large lookups perform
efficiently. This feature makes lookup transformations an essential part of
most mappings.
12. What are the different types of caches used in lookup
transformations?
Lookup transformations use two main types of caches: static
cache and dynamic cache. A static cache is created when the session
starts and remains unchanged throughout. I use this for scenarios where
the lookup data doesn’t change during execution. In contrast, a dynamic
cache updates itself as the session runs. This is particularly useful when
processing data incrementally.
For instance, if I’m building a mapping to deduplicate customer records, I
rely on a dynamic cache. This ensures that any new customer added
during processing is immediately available for subsequent lookups.
Configuring the right cache type improves performance and ensures data
accuracy.
13. Explain the role of the router transformation.
The router transformation splits data into multiple groups based on
conditions I define. Unlike the filter transformation, which allows only one
condition, the router can create multiple output groups, making it more
versatile. I often use it to separate data into categories, like regional sales
or product types.
Here’s an example of a router transformation:
Input: Sales data with columns for region and sales amount.
Groups: North Region (region = ‘North’), High Sales (sales > 1000).
Output: Two separate datasets—one for the North region and
another for high-value sales.
Using the router simplifies complex filtering logic, making mappings
cleaner and easier to maintain.
14. What is the difference between connected and unconnected
lookups?
Connected lookups are part of the mapping data flow and return values
directly to the pipeline. I use them when the lookup logic is integral to the
mapping. Unconnected lookups, on the other hand, are called as
functions and return a single value. These are useful for conditional
lookups or when lookup logic needs to be reused.
Here’s an example of an unconnected lookup function:
:LKP.CUSTOMER_NAME (CUSTOMER_ID)
This function retrieves a customer name based on a provided ID. I use
connected lookups for richer datasets and unconnected lookups for
simpler, reusable tasks.
15. Define the term “parameter file” in Informatica.
A parameter file in Informatica stores runtime variables like source or
target paths, database connections, and session parameters. I use it to
make mappings and workflows more dynamic, eliminating the need to
hard-code values. This flexibility makes managing different environments,
like development and production, much easier.
For example, in a parameter file, I might define the database connection
details:
[DEV_CONNECTION]
DBUSER=admin
DBPASS=password
DBNAME=SalesDB
When running a session, Informatica reads these values, ensuring that the
mapping connects to the correct database without altering the workflow
configuration. This feature enhances portability and scalability.
16. How do you handle performance tuning in Informatica
mappings?
In my experience, performance tuning in Informatica mappings involves
multiple strategies to improve the efficiency of data processing. First, I
focus on optimizing the source and target connections by ensuring
that the database queries are efficient and that the connection settings
are appropriate for the volume of data. For example, when querying large
tables, I use pushdown optimization to push transformation logic to the
database, which significantly reduces the amount of data transferred.
Additionally, I carefully analyze transformations. I often choose the most
efficient transformations based on the specific scenario. For instance,
using a filter transformation early in the mapping to eliminate
unnecessary rows reduces the workload for subsequent
transformations. Partitioning and concurrent sessions are also great
ways to improve performance by distributing the processing across
multiple nodes. Lastly, I make sure to monitor session logs to identify and
resolve bottlenecks, such as slow queries or insufficient memory usage.
17. What are the various types of repositories in Informatica?
Informatica uses several types of repositories to store metadata and
session information. The two main types of repositories are
the PowerCenter Repository and the Repository Service.
The PowerCenter Repository stores all the metadata related to
mappings, workflows, sessions, and other design objects. It is the core
repository that holds all the configuration and transformation logic.
The Repository Service manages repository objects and provides a way
for multiple users to work concurrently on a project. This service is
responsible for managing tasks such as version control and metadata
access. Local repositories are smaller, user-specific repositories,
and global repositories are shared repositories used across the entire
organization. I typically use global repositories when working in teams, as
they allow seamless sharing of objects like mappings, sessions, and
workflows.
18. What is the difference between normal load and bulk load in
Informatica?
Normal load and bulk load are two types of data loading methods used
in Informatica. In a normal load, Informatica processes each row
individually, performing all transformations and validations. While this
method offers more flexibility, it can be slower, especially when working
with large datasets. I often use this method when data quality checks,
validations, or complex transformations are necessary.
On the other hand, bulk load is designed for faster data loading. It
bypasses certain transformation steps to directly load large volumes of
data. For example, in a bulk load, transformations like expression or
aggregation might be skipped. I typically choose bulk load when
performance is a priority, and the data does not require complex
processing. However, it’s important to ensure that the target database
can handle the bulk load efficiently.
19. Explain what reusable transformations are in Informatica.
A reusable transformation in Informatica allows me to define a
transformation once and reuse it in multiple mappings. This is especially
useful when the same logic is needed across different projects or
processes, saving me time and effort. For example, if I frequently need to
clean data by removing leading and trailing spaces, I can create a
reusable expression transformation that performs this task.
Reusable transformations can be stored in a shared folder and included
in any mapping that needs the transformation logic. By reusing
transformations, I ensure consistency across projects and reduce the risk
of errors. This also makes maintenance easier because any changes made
to the reusable transformation automatically reflect in all mappings that
use it.
20. What is the difference between joiner and lookup
transformations?
Both joiner and lookup transformations are used to combine data from
different sources, but they serve different purposes and are used in
different scenarios. A joiner transformation is used when I need to join
two sources based on a common key, like a SQL join. It supports different
types of joins, such as inner, left, right, and full outer joins. I typically use
a joiner when the sources are not related in the source qualifier but need
to be merged during processing.
In contrast, a lookup transformation is used to retrieve a single value
from a reference table or file based on a lookup condition. I often use
lookups to enrich source data with additional attributes or validate records
against a reference dataset. The primary difference is that a joiner works
with multiple rows of data, while a lookup typically returns only a single
matching value for each record.
21. What are session parameters, and how are they used?
Session parameters in Informatica are variables defined at the session
level, allowing me to pass values into the session during execution. These
parameters provide flexibility in managing mappings, as I can use them to
change configurations like source and target paths, database connections,
or transformation logic without altering the mapping itself. For example, I
might use a session parameter to specify the file location for my input
data.
Here’s an example of defining a session parameter:
$Source_File = /data/source_file.csv
$Target_Table = target_table_name
In the session, I can reference $Source_File and $Target_Table to make
the mapping dynamic and easier to maintain. Using session parameters is
crucial for handling different environments (development, testing,
production) without hardcoding values into the mappings.
22. Define the term “target load order” in Informatica.
The target load order refers to the sequence in which data is loaded
into multiple target tables in a mapping. I can define the target load order
when there are multiple targets to ensure that data is loaded in the
correct order. For example, if I’m loading data into both a staging table
and a final table, I might want to load the staging table first to ensure the
data is validated and processed before being moved to the final
destination.
By defining the target load order, I avoid data integrity issues such as
loading dependent tables in the wrong sequence. In Informatica, I can
specify the target load order at the session level using the Target Load
Order property, allowing me to manage the flow of data more effectively.
23. What is the purpose of the rank transformation?
The rank transformation is used to assign a rank to each record in a
dataset based on a specified sort order. This transformation is useful when
I need to identify top N records or the lowest N records. For example, in a
sales report, I can use the rank transformation to retrieve the top 10
highest-selling products.
Here’s a small example of using the rank transformation:
Input: A list of products with sales data.
Rank Logic: Rank products based on sales in descending order.
Output: Top 10 highest-selling products.
The rank transformation makes it easy to generate ranked datasets, which
is useful for reporting and analytical purposes.
24. Explain the concept of incremental loading in Informatica.
Incremental loading refers to the process of loading only the data that
has changed since the last load, rather than reloading the entire dataset.
This approach is often used to improve performance and reduce the load
on both source and target systems. In my experience, I often implement
incremental loading by using a timestamp or high-water mark to track the
last processed record.
Here’s an example of how incremental loading works:
Initial Load: Load all records from the source to the target.
Subsequent Loads: Load only records that have a modified
timestamp greater than the last load date.
Incremental loading minimizes the volume of data processed and ensures
that the ETL process remains efficient over time, particularly in large
datasets.
25. What is the difference between static and dynamic cache?
The difference between static and dynamic cache in lookup
transformations lies in how the cache is updated during the session.
A static cache is built once at the beginning of the session and remains
unchanged throughout. I typically use static cache when the lookup data
does not change during processing. This is more efficient when the
reference data is small and doesn’t require frequent updates.
On the other hand, a dynamic cache is updated as records are
processed. This is useful in situations where new records may need to be
added to the cache during the session. For example, if I’m processing a
list of customer orders and need to add new customers to the cache
dynamically, I use a dynamic cache. It ensures that new data is available
for lookups without having to restart the session.
26. How does Informatica handle error logging?
Informatica handles error logging using session logs, workflow logs, and
error tables. For every ETL session, Informatica generates a session
log that contains information about errors such as data type mismatches,
constraint violations, or transformation failures. I find session logs helpful
for pinpointing the exact stage where an error occurs. Additionally,
rejected data can be directed to error tables for further analysis.
For example, if I configure a session to write rejected rows into an error
file, it will capture information like this:
Error Code: 36401
Error Message: Data truncation for column 'Product_Name'
Rejected Row: ID:101, Product_Name:
'UltraLongProductNameExceedingLimit', Quantity: 5
This setup allows me to systematically review and address data issues
while minimizing manual intervention during debugging.
27. What is the use of the expression transformation in
Informatica?
The expression transformation in Informatica is used for performing
row-level operations such as calculations, string manipulations, and
conditional logic. I often use this transformation for tasks like normalizing
data formats or computing new fields.
For example, suppose I want to create a full name by concatenating first
and last names:
Full_Name = First_Name || ' ' || Last_Name
I define this formula in the expression transformation properties. Here’s a
step-by-step snippet for applying a discount calculation:
Discounted_Price = IIF(Total_Price > 100, Total_Price * 0.90, Total_Price)
This logic applies a 10% discount if the price exceeds $100. Such
calculations help me manipulate data directly in Informatica without
additional SQL queries.
28. Explain what metadata is in Informatica.
In Informatica, metadata describes the structure, configuration, and
properties of ETL objects such as mappings, transformations, and
sessions. The Informatica repository stores metadata, which helps the ETL
engine understand and execute workflows.
For instance, metadata for a table transformation might look like this:
Source Table: Orders
Columns: Order_ID, Customer_Name, Order_Date, Total_Amount
Target Table: Processed_Orders
Mapping: Total_Amount -> Total_Sales
Transformation: Order_Date -> To_Date(Order_Date, 'YYYY-MM-DD')
This metadata ensures that the ETL process knows how to read,
transform, and write data. In my experience, accurate metadata
documentation is key to debugging and enhancing workflows.
29. What is a workflow monitor in Informatica, and why is it
important?
The workflow monitor in Informatica is a real-time monitoring tool that
shows the status of workflows and sessions. I use it to check whether
tasks succeed, fail, or remain in progress. It also allows me to
analyze session logs directly for quick troubleshooting.
For example, if a session fails, I can retrieve detailed error information like
this:
Session Status: FAILED
Start Time: 2024-11-25 10:00 AM
Error: SQL Transformation - ORA-00942: Table or view does not exist
The monitor also provides options to restart failed workflows or recover
from a specific checkpoint. This helps maintain workflow continuity
without rerunning the entire process, saving both time and resources.
30. How do you define and use variables in Informatica mappings?
I use mapping variables in Informatica to store values that can
dynamically change during the execution of ETL processes. These
variables help with tasks like incremental data loading or handling
conditional logic. I define variables in the Mapping Designer under the
variable tab and reference them in expressions or transformations.
For instance, if I need to filter records based on a timestamp, I define a
variable $$Last_Load_Date and use it in the filter transformation:
Filter Condition: Transaction_Date > $$Last_Load_Date
Here’s how I update the variable during each session:
Post-Session Variable Assignment: $$Last_Load_Date =
MAX(Transaction_Date)
This ensures only new data is processed in subsequent runs, making the
process efficient and adaptable to changing data scenarios.
ACID Properties in DBMS
In this article, we will learn about
ACID(Atomicity,Consistency,Isolation,Durability) Properties in DBMS.
A transaction is a single logical action that accesses and modifies
the contents of the database through reading and writing operations
To maintain consistency of the database before and after a
transaction, specific properties are followed called acid properties.
Atomicity (A)
An atomic transaction simply means that the transaction happens
only if it can be completed and achieve its purpose or if not, it
doesn’t happen at all.
Atomicity defines that there are no transactions that occur partially
hence the atomicity is also known as the “all or nothing rule”.
It is associated with two operations
Abort: If a transaction is aborted i.e. it is incomplete, changes made
to the database are not visible.
Commit: If a transaction is committed i.e. it is complete changes
made to the database or visible.
Example
Consider a transaction T which consists of T1 and T2: Task is to transfer
100 from account X to account Y
T1: Deduct the amount from account X
T2: Credit the amount to account Y
If the transaction fails after the completion of T1 and before completion of
T2 then the amount will be deducted from account X but it will not be
added to the account Y which ultimately results in an inconsistent
database state.
Consistency(C)
Integrity constraints (maintain certain rules to authenticate
database) must be maintained to ensure your database is
consistent before and after the transaction.
Consistency refers to the correctness of the database.
Example
The total amount before and after the transaction must be maintained.
Total amount before T occurs = 500 + 200 = 700.
Total amount after T occurs = 400 + 300 = 700.
The database is said to be inconsistent if T1 is completed but
fails as a result total.
Transaction P is incomplete.
Isolation (I)
Isolation ensures that multiple transactions can occur at the same
time provided each transaction is independent and shall not
interfere in another transaction.
Changes in one particular transaction are visible to any other
transaction unless a particular change in the transaction is written
to the mem
Durability (D)
Durability ensures that once after the completion of the transaction
execution the updates and modifications to the database are stored
and returned to a disc so that they can be used whenever a system
failure occurs.
So that all the changes become permanent and stored in non-
volatile memory so that any action can be referred to and never
lost.
What is the purpose of these ACID properties?
Provides a mechanism for the correctness and consistency of a
database system.
As a result, each transaction is independent, consistent with each
other and all actions are stored properly and permanently and
support failure recovery.