Source Qualifier transformation
Purpose
Source Qualifier (SQ) Transformation selects records from various sources - those can be flat files, relational tables, Informatica PowerExchange services. It provides an efficient way to filter input columns (or fields) and to perform joins.
Source Qualifier converts source datatypes to native (most compatible) PowerCenter datatypes. It is a mandatory component for relational table sources and flat files. Source Qualifier is an Active transformation (which means that the number of input rows may differ from the number of output ports). It generates a SQL query for relational sources and can perform joins between relational tables on the same database.
Source Qualifier converts source datatypes to native (most compatible) PowerCenter datatypes. It is a mandatory component for relational table sources and flat files. Source Qualifier is an Active transformation (which means that the number of input rows may differ from the number of output ports). It generates a SQL query for relational sources and can perform joins between relational tables on the same database.
Converting datatypes
Data can be converted from one type to another by:
More detailed information on converting data types in Informatica Client Help: 'Port-to-Port Data Conversion' topic
Example
A sample use of Source Qualifier for a Flat File source (top) and ODBC database source (bottom). Note the datatype change for some columns.
Useful tips
Expression transformation
Purpose
Expression is a passive transformation (number of input and output rows is the same), which lets modify individual ports of a single row, or add or suppress them.It helps implement the complicated data transforms, applies business logic and performs checks and validations.
Data can be modified using logical and numeric operators or built-in functions. Sample transformations handled by the expression transformer:
Examples / useful tips
Looking up data in Informatica
Purpose and overview
Lookup transformation - returns values from a database table or a flat file associated with a given input value.
In SQL teminology a lookup may be considered as a sub-query.
In SQL teminology a lookup may be considered as a sub-query.
- There are the following Lookup transformation types:
- Dynamic Lookup (connected) - when the lookup table is also the target, the data may go out of sync with the target table loaded in memory. The Dynamic Lookup transformation allows for the synchronization of the target lookup in-memory table with its physical table in a database.
Dynamic Cache properties worth looking at: Dynamic Lookup Cache (boolean, use onlywith the lookup cache enabled), 'Insert Else Update' or 'Update Else Insert'. - Unconnected lookup - used when a lookup is not needed for each record, the lookup data is fetched at the point the mappings requires it. It doesn't contain links from and to other transformation. Data lookup is performed only for those rows which require it, which can significantly improve performance of the ETL process.
Unconnected lookup function can be invoked within any transformation that supports expressions.
Example - dynamic lookup
Let's consider the daily updated master table with products. Within a day a product may change its status or an error may be corrected. A new product record may be added first and a change to that record may be added later. The change (insert followed by an update) needs to be detected automatically.
Example - unconnected lookup
Lookup table is usually used within a conditional statement in an Expression. For instance, to look up a product color only for rows which don't have it filled, use the following :
IIF ( ISNULL(colorproduct_id),:lkp.COLORLOOKUP(product_id),colorproduct_id)
Lookup cache
Basically there are two types of cache memory: index and data cache. All port values from the lookup table where the port is part of the lookup condition are loaded into index cache.
The index cache contains all port values from the lookup table where the port is specified in the lookup condition.
The data cache contains all port values from the lookup table that are not in the lookup condition and that are specified as "output" ports.
After the cache is loaded, values from the Lookup input port(s) that are part of the lookup condition are compared to the index cache.
Upon a match the rows from the cache are included in the stream.
The index cache contains all port values from the lookup table where the port is specified in the lookup condition.
The data cache contains all port values from the lookup table that are not in the lookup condition and that are specified as "output" ports.
After the cache is loaded, values from the Lookup input port(s) that are part of the lookup condition are compared to the index cache.
Upon a match the rows from the cache are included in the stream.
Key cache related properties in the Lookup transform:
Useful tips / best practices
Sorter transformation
Purpose
Sorter is an active transformation which sorts the incoming data based on one or more key values (in an ascending, descending or mixed order). The Sorter transformation attribute, 'Distinct' provides a facility to remove duplicates from the input rows.
Key properties worth looking at:
Key properties worth looking at:
Aggregator transformation
Purpose
The Aggregator transformation calculates aggregates such as sums, minimum or maximum values across multiple groups of rows.
The Aggregator transformation can apply expressions to its ports however those expressions will be applied to a group of rows unlike the Expression transformation which applies calculations on a row-by-row basis only.
Aggregate functions are created in output ports only. Function grouping requirements are set using the Aggregator Group By port.
Available aggregate expressions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM
Key properties:
The Aggregator transformation can apply expressions to its ports however those expressions will be applied to a group of rows unlike the Expression transformation which applies calculations on a row-by-row basis only.
Aggregate functions are created in output ports only. Function grouping requirements are set using the Aggregator Group By port.
Available aggregate expressions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM
Key properties:
Examples / useful tips
Update Strategy transformation
Purpose
Specifies how each row updates target tables (insert, update, delete, or reject) based on an expression.
The Update Strategy transformation "tags" a row with the appropriate DML (data manipulation language) for the PowerCenter writer to apply to a relational target. Each row can be "tagged" with one of the following flags (the DD label stands for Data Driven). Appropriate SQL is submitted to the target database:
Transaction Control
Purpose
The Transaction Control transformation is Active and Connected which lets control the commit and rollback transactions based on a set of passing-through data. A commit or rollback can be defined based on a varying number of input rows, for example a common product group key (to ensure that the entire group gets loaded) or invoice date (to avoid loaded an incomplete set of invoices for a given date into a data warehouse).
Use
A transaction is the row or set of rows bound by commit or roll back rows. The number of rows may vary for each transaction.
Transaction control can be defined at two levels:
Transaction control can be defined at two levels:
- Within a mapping - a transaction is defined using an expression in a Transaction Control transformation. Based on the return value of the expression, the decision can be made whether to commit, roll back, or continue without any transaction changes.
- Within a session - when a session is configured for user-defined commit, the adminisitrator can choose to commit or roll back a transaction if the Integration Service fails to transform or write any row to the target.
The Transaction Control transformation can be used to define conditions to commit and roll back transactions from transactional targets, such as: relational databases, XML, and dynamic MQSeries targets.
The following built-in variables can be used in the Expression Editor to create a transaction control expression:
Examples / useful tips
To catch date change in the invoices flow, the NEW_DATE indicator is calculated within an Expression Transformation with this expression: IIF(INV_DATE_CURRENT=INV_DATE_PREV, 0,1).
To commit all invoices for a given day the following transaction control expression might be used:IIF(NEW_DATE = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)
Examples / useful tips
JAVA transformation
Purpose
Java transformation in Informatica PowerCenter uses Java programming language to transform the data.
It is not necessary to write the entire Java code or use an external Java development environment as the Java code can be entered as snippets in the PowerCenter Designer client.
The code in Java transformation can invoke Informatica's custom expressions, user-defined functions, unconnected transformations and mapping variables. Java methods, variables, third-party API's, built-in Java packages and static code can be invoked as well.
Java transformation can be re-usable and it can be defined as both active or passive Informatica object.
It is not necessary to write the entire Java code or use an external Java development environment as the Java code can be entered as snippets in the PowerCenter Designer client.
The code in Java transformation can invoke Informatica's custom expressions, user-defined functions, unconnected transformations and mapping variables. Java methods, variables, third-party API's, built-in Java packages and static code can be invoked as well.
Java transformation can be re-usable and it can be defined as both active or passive Informatica object.
The Java Transformation has four self-explanatory tabs: Transformation (general options), Ports (inputs and outputs in separate groups), Properties (active/passive, deterministic), and Java Code. Once the ports and properties are set, the java code can be entered and compiled from within the designer window. The code window is divided into tab windows which includes:
- Import Packages - import 3rd party java packages, built-in or custom Java packages
- Helper code - declare user-defined variables and methods for the Java transformation class.
- On Input Row - the Java code is executed one time for each input row. Only on this tab the input row can be accessed.
- On End of Data - defines the behavior after processing all the input data
- On Receiving transaction - code which is executed when a transaction is received by the transformation
- Java expressions - used for defining and calling Java expressions
Then the code snippets get compiled into byte code behind the scenes and Integration Service starts a JVM that executes it to process the data.
Examples / useful tips
Sample Java transformation window:
Informatica ETL process implementation
Parameters and variables in Informatica PowerCenter
Purpose
A mapping can utilize parameters and variables to store information during the execution. Each parameter and variable is defined with a specific data type and their main purpose is to provide increased development flexibility.
Parameters are different from variables in the fact that:
Parameters are different from variables in the fact that:
Both parameters and variables can be accessed from any component in the mapping which supports it.
To create a parameter or variable, go to Mapping -> Parameters and Variables from within the Mapping Designer in the Designer client.
The format is $$VariableName or $$ParameterName
Changing values of Variables
To change the value of a variable, one of the following functions can be used within an expression: SETMAXVARIABLE($$Variable, value) , SETMINVARIABLE($$Variable, value), SETVARIABLE($$Variable, value) , SETCOUNTVARIABLE($$Variable), where:
At the end of a successful session, the values of variables are saved to the repository. The SetVariable function writes the final value of a variable to the repository based on the Aggregation Type selected when the variable was defined.
Parameter files
Parameter file is an ASCII file which is used to set values of mapping paramteres and variables. Parameters can be set on workflow, worklet or session level. The physical location of a parameter file is set in Workflow manager in Workflows -> Edit. It can also be specified using the pmcmd command when starting a session task.
Parameter file structure
Parameters can be grouped into the following sections:
- [Global]
- [Service: service name]
- [folder name.WF:workflow name]
- [folder name.WF:workflow name.WT:worklet name]
- [folder name.WF:workflow name.WT:worklet name.WT:worklet name...]
- [folder name.WF:workflow name.ST:session name]
- [folder name.session name]
- [session name]
Examples / useful tips
- The value is initialized by the specification that defines it, however it can be set to a different value in a parameter file, specified for the session task
- Initialization priority of Parameters: Parameter file, Declared initial value, Default value
- Initialization priority of Variables: Parameter file, Repository value, Declared initial value, Default value
- Parameters and variables can only be utilized inside of the object that they are created in.
- Parameters and variables can be used in pre and post-SQL
- Sample parameter file:
[Service:IntegrationSvc_01] $$SuccessEmail=dwhadmin@etl-tools.info $$FailureEmail=helpdesk@etl-tools.info [DWH_PROJECT.WF:wkf_daily_loading] $$platform=hpux $$DBC_ORA=oracle_dwh [DWH_PROJECT.WF:wkf_daily_loading.ST:s_src_sa_sapbw] $$DBC_SAP=sapbw.etl-tools.info $$DBC_ORA=oracle_sap_staging
Workflow tasks
Purpose
Tasks are designed in Workflow Manager and represent an executable set of actions, functions or commands - which define the runtime behavior of thw whole ETL process.
Some of the most often used tasks:
UNIX and DOS commands can also be run pre- or post- session in a Session task.
Examples / useful tips
Workflow
Purpose
Workflow, designed in Workflow Manager, is a collection of tasks that descibe runtime ETL processes. Speaking the IBM Infosphere Datastage language, Worflows are Job Sequences, Flows in Ab Initio and Jobs in Pentaho Data Integration.
Examples / useful tips
Design and development best practices
Mapping design tips
- Standards - sticking to consistent standards is beneficial in a long-run. This includes naming conventions, descriptions, environment settings, parameter files, documentation, etc.
- Reusability - in order to be able to react quickly to potential changes, use where possible such Informatica components as mapplets, worklets, reusable transformations.
- Scalability - when designing and developing mappings, it is a good practice to keep volumes in mind. This is caching, queries, partitioning, initial vs incremental loads.
- Simplicity - it is recommended to create multiple mappings instead of few complex ones. Use Staging Area and try to keep the processing logic as clear and simple as possible.
- Modularity - use modular design technique (common error handling, reprocessing).
Mapping development best practices
- Source Qualifier - use shortcuts, extract only the necessary data, limit read of columns and rows on source. Try to use the default query options (User Defined Join, Filter) instead of using SQL Query override which may impact database resources and make unable to use partitioning and push-down.
- Expressions - use local variables to limit the amount of redundant calculations, avoid datatype conversions, reduce invoking external scripts (coding outside of Informatica), provide comments, use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally faster than string operations.
- Filter - use the Filter transformation as close to the source as possible. If multiple filters need to be applied, usually it's more efficient to replace them with Router.
- Aggregator - use sorted input, also use as early (close to the source) as possible and filter the data before aggregating.
- Joiner - try to join the data in Source Qualifier if possible, avoid outer joins. It is a good practice to use a source with fewer rows as a Master source.
- Lookup - relational lookup should only return ports that meet the condition. Call Unconnected Lookup in expression (IIF). Replace large lookup tables with joins when possible. Review the database objects and add indexes to database columns when possible. Use Cache Calculator in session to eliminate paging in lookup cache.Informatica extensions and optional features
Great article! Thank you for sharing such usefull imformation, but i want to know more about Informatica Read Json please share more imformative article with us.
ReplyDeleteInformatica Read Soap API