Tuesday, November 11, 2014

Informatica - 1

Informatica, founded in 1993 is a ldeader in providing enterprise data integration solutions. It provides applications for the following Data Integration areas:



  • Data Migration - ERP and legacy systems consolidation, new application implementation
  • Data Synchronization - business-to-business data transfer
  • Data Warehousing
  • Data Hubs, Data Marts
  • Master Data Management (MDM)
  • Business Intelligence Reporting
  • Business Activity Monitoring (BAM)

    Informatica PowerCenter components

    The following services form the Informatica PowerCenter processing engine:
  • Integration Service - conducts and implements the ETL logic
  • Repository Service - manages connectivity to the metadata repositories, where mapping and workflow definitions are saved
  • Repository Service Process - retrieves, inserts and updates repository metadata. It is a multi-threaded process.
  • Repository - a database which contains the ETL metadata

    Client Applications

    Informatica client applications are desktop tools used to create transformations, manage metadata, execute ETL processes and monitor them.
  • Designer - a developer tool used for the development of ETL mappings.
  • Workflow Manager - create and start workflows (sequences of mappings)
  • Workflow Monitor - monitors workflows
  • Repository Manager - tool used to manage source/targets connections, folders, objects, users, etc.
  • Administrator console - a web-based tool used to perform domain and repository service tasks (configure services, nodes, perform backups, etc.) 

  • Informatica ETL programs
     Informatica ETL programs designed in one of the client applications (Designer or Workflow Manager) and are responsible for the whole ETL process execution.
    In Informatica there are a few important objects definitions:



  • Mapping - developed in Designer, it logically defines the ETL process. Mappings read data from the sources, apply transformation logic to the data and write transformed data to targets.
  • Transformations are elements of a mapping which generate, modify or pass the data. Transformations are linked together with links through ports (input and output) within a mapping. 
    There are two groups of transformations: Passive - where the number of rows entering and exiting the transformation are the same (f.ex. Expression, Source Qualifier); Active - number of output rows may not be the same as input rows (f.ex. Aggregator). 
    Examples of transformations: Source qualifier, expression, filter, sorter, aggregator, joiner, lookup, update strategy, router, transaction control, union.
  • Task - designed in Workflow Manager, is an executable set of actions, functions or commands. A sequence of tasks defines the runtime behavior of an ETL process. 
    Example: Session task runs a mapping, Command tasks executes a shell script, Email task sends an email.
  • Workflow - designed in Workflow Manager, it is a collection of tasks that descibe runtime ETL processes. Tasks can be linked sequentially or concurrently and can also depend on the completion status of previous tasks. 
    Each linked icon represents a task.

  • Informatica mapping development

    The article provides a checklist of topics to consider during the Informatica ETL development development project and covers a variety of tips, guidelines and things to consider before proceeding with the development.
    General ETL development tips


  • Before designing a mapping, it is important to have a clear picture of the end-to-end processes that the data will flow through.



  • It is a good practice to create a high-level view of the mapping first and document a picture of the process with the mapping, using a textual description to explain exactly what the mapping is supposed to accomplish and the methods or steps it will follow to accomplish its goal.
  • Next, document the details at the field level, listing each of the target fields and the source fields that are used to create the target field, along with the transformations used to create this field (for example: a sum calculation, a concatenation of two fields, a comparison of two fields, etc.). At this point the designer may have to do some investigation for some business rules with business guys.
  • Create an inventory of Mappings and Reusable objects (mapplets, worklets). This will be a 'work in progress' list and will have to be continually updated (this is particularly valuable for the lead developer).
  • The administrator or lead developer should gather all of the potential Sources, Targets and Reusable objects and place these in a shared folder accessible to all who may need access to them.
  • As for Reusable objects, they need to be properly documented to make it easier for other developers to determine if they can be re-used.
  • As a developer the specifications for a mapping should include required Sources, Targets and additional information regarding derived ports and finally how the ports relate from the source to the target.
  • Document any other information about the mapping that is likely to be helpful in developing the mapping. This may, for example, include source and target database connection information (database schema owners, passwords and connect strings), lookups and how to match data in the lookup tables, data cleansing needed at a field level, potential data issues at a field level, any known issues with particular fields, pre or post mapping processing requirements, and any information about specific error handling for the mapping.
  • The completed mapping design should then be reviewed with one or more team members for completeness and adherence to the business requirements. In addition, the design document should be updated if the business rules change or if more information is gathered during the build process.

    Informatica-specific mapping development guidelines

  • One of the first things to do is to bring in all required source and target objects into the mapping.



  • Only connect fields that are needed or will be used.
  • Only connect from the Source Qualifier those fields needed subsequently.
  • Filter early and often. Only manipulate data that needs to be moved and transformed. Reduce the number non-essential records that are passed through the mapping.
  • Decide if a Source Qualifier join will net the result needed versus creating a Lookup to retrieve desired results.
  • Reduce the number of transformations. Excessive number of transformations will increase overhead.
  • Consider increasing the shared memory when using a large number of transformations.
  • Make use of variables, local or global, to reduce the number of times functions will have to be used.
  • Watch the data types. The Informatica engine converts compatible data types automatically. Excessive number of conversions is inefficient.
  • Make use of variables, reusable transformations and mapplets for reusable code. These will leverage the work done by others.
  • Use active transformations early in the process to reduce the number of records as early in the mapping as possible.
  • When joining sources, select appropriate driving and master table.
  • Utilize single pass reads. Design mappings to utilize one Source Qualifier to populate multiple targets.
  • Remove or reduce field-level stored procedures. These will be executed for each record and slow performance.

  • Informatica Powercenter weaknesses
    Things that make an Informatica developer's life harder and sometimes makes them laugh. 

    Looking in the internet, we can find a lot of articles on why Informatica is the best etl tool but there's very little information on its weaknesses. Still Informatica Powercenter is a leading ETL tool on the market, however it's far from being perfect. This article covers things that are never revealed on a sales powerpoint presentation level.
    This is a compilation of weaknesses of Informatica from the developer's perspective. Hopefully after reading someone will avoid getting into trouble. 

    Recently on a project I'm taking part in, we migrated and switched from Informatica 8.6.1 to 9.1.0. I was hoping most of the issues described below would be resolved but apparently it's not the case.

    Sorting values

    Sorting of dates and numbers based on string values. This generally looks like an amateur work. Few examples below:
    Is really 2 a bigger number than 14? :) 

    And here's how session are sorted in the ascending order by date:

    Lack of sorting in Workflow Monitor

    For instance you can't order sessions by Name, Start Time, Completion Time. In fact reading the execution logs is a pain, especially when a workflow has many sessions. By default the sessions are sorted by execution order (start time), however when workflow monitor is loaded when a workflow is still running, the sessions get sorted in a random order.

    Workflow monitor

    • In Workflow monitor, why there's no option to filter out folders that are not of one's interest? I mean in big Powercenter installations there are tens of available folders. Typically, a developer has access to maybe three, maximum five. But in the folder tree on the left-hand side there are all of them. It takes time for WF monitor to load and this consumes network traffic.
    • It's also a pity that informatica doesn't provide aweb-based workflow execution monitoring tool

    Repository Manager

    Moving object from one folder to another.

    Let's say we want to copy workflows, sessions, mappings and sources + targets from one folder to another. 
    If we do it in repository manager using copy & paste method or dragging and dropping the main workflow, all the objects will be copied. But... the problem is that the source filter set on a session level gets reverted to the value from the mapping! So if there's a source filter which limits the data in some way (a where SQL filter statement) it gets wiped out. So be careful when copying the sessions. 
    It doesn't happen when doing 'export to XML' and 'import from XML' in the repository manager though. 
    Wouldn't it be handy to be able to choose if the values should be reverted?

    Importing XML export files

    Importing XML repository export files - the folder selection window when you map folders from the XML file to the available folders. You can't type in the folder name. So you need to click ... and it seems that Informatica reads all the available folders and user's permissions. This is ok, however for me it sometimestakes like 15 minutes to load this window! Way too long.

    Mappings and workflows development

    Lack of functionality compared to other tools or things that might be improved:
    • Lack of a possibility to do loops within informatica workflows.
    • Parameter files handling. The concept of using a parameter file is ok, however it would be nice to be able to dynamically set parameters during runtime. Editing the parameter file for each run is rather painful. This would be very helpful especially for development/testing phase.
    • It's not possible to run a single session within a worklet. Sometimes, again especially for testing/development, it's handy to just run sessions one by one. It's ok in workflows but why it's not possible within worklets ?
    • When a workflow is running it's not possible to run another session (the error message says:Could not start execution of this workflow because the current run on this Informatica Server has not completed yet.). Enabling concurrent execution option on a workflow level sometimes helps, not always though. Why it's not possible to put such a session in a queue?
    • Lack of refresh option in designer and wf manager. If for instance a source definition changes, a developer needs to disconnect and connect again. In repository manager the refresh option is available though.
    • In workflow manager (can happen also in designer), when a mapping is modifed, after coming back to the session, it can't be validated, saved or refreshed due to the following error: REP_12355 The object ... has been modified since the time it was read. 
      The only option then is disconnect the repository and connect again or restart the tool. 
      It would be so much easier to be able to do a refresh or if this causes a conflict, to be able to choose what to do. Unfortunately the only option available is a restart, not nice.

     

    1 comment :

    1. Thank you so much for exploring the best and right information about how useful is Informatica's set of tools and how one grasp the best out of it can.

      informatica read rest api

      ReplyDelete