Data Flow Transformation categories in SSIS

10
5088

Aim In this post we will learn abot Data Flow Transformation Categories in SSIS. Transformations are defined as a core component in the Data flow of a package in SSIS. It is that part of the data flow to which we apply our business logic to manipulate and modify the input data into the required format before loading it to the destination. All the Data Flow Transformations are broadly classified into 2 types:-

Type 1 Synchronous Transformations.

Type 2 Asynchronous Transformations.

What is the difference between Synchronous and Asynchronous transformations?

Synchronous Transformations Asynchronous Transformations
Processes each incoming row, modifies   according to the required format and forward it. Stores all the rows into the memory   before it begins the process of modifying input data to the required output   format.
No. of input rows = No. of output   rows. No. of input rows != No. of output   rows
Output rows are in sync with Input   rows i.e. 1:1 relationship. Output rows are not in sync with Input   rows
Less memory is required as they work   on row by row basis. More memory is required to store the   whole data set as input and output buffers do not use the same memory.
Does not block the data flow in the   pipeline. Are also known as “Blocking Transformations”   as they block the data flow in the pipeline until all the input rows are read   into the memory.
Runs quite faster due to less memory   required. Runs generally slow as memory requirement   is very high.
E.g. – Data Conversion Transformation-   Input rows flow into the memory buffers and the same buffers come out in the required   data format as Output. E.g. – Sort Transformation- where the   component has to process the complete set of rows in a single operation.

Further Asynchronous Transformations are divided into 2 categories:-

  1. Partially blocking Transformations creates new memory buffers for the output of the transformation such as the Union All Transformation.
  2. Fully blocking Transformations performs the same operation but cause a full block of the data such as the Sort and Aggregate Transformations.

Data Flow Transformation Categories are as follows:-

1. Row Transformations –Row Transformations

  • This transformation is used to update column values or create new columns.
  • It transforms each row present in the pipeline (Input).
Transformation Name Description
Character Map Modifies strings, typically for changes involving code pages.
Copy Column Copies columns to new output columns.
Data Conversion Performs data casting.
Derived Column Allows the definition of new columns, or the overriding of values in existing columns, based on expressions.
OLE DB Command Executes a command against a connection manager for each row. This transformation can behave as a destination.

2. Rowset Transformations –

  • Rowset transformationsThese transformations are also called Asynchronous as they “dam the flow” of data i.e. Stores all the rows into the memory before it begins the process of modifying input data to the required output format.
  • As a result, a block is caused in the pipeline of data until the operation is completed.
Transformation   Name Description
Aggregate Aggregates   (summarizes) numeric columns
Percentage   Sampling Outputs   a configured percentage of rows
Row   Sampling Outputs   a configured number of rows
Sort Sorts   the data, and can be configured to remove duplicates.
Pivot Pivots   the data
Unpivot Unpivots   the data

3. Split and join Transformations –

Split and Join Transformations

  • Distribute rows to different outputs.
  • Create copies of the transformation inputs.
  • Join multiple inputs into one output.
Transformation   Name Description
Condition Split Uses conditions to allocate rows to multiple outputs.
Look up Performs a look up against a reference   set of data. Typically used in the fact table for loading packages.
Merge Unions two sorted inputs and retains   sort order in the output.
Merge Join Joins two sorted inputs, and can be   configured as Inner, Left Outer or Full Outer.
Multicast Broadcasts (duplicates) the rows to   multiple outputs.
Union All Unions two or more inputs to produce a   single output.

4. Business Intelligence Transformations –

  • Business Intelligence TransformationsThese are used to introduce data mining capabilities and data cleansing.
  • Cleaning data includes identification and removal of duplicate rows based on approximate matches.
  • These are only available with the Enterprise Edition

 

 

 

5. Script Transformations –

  • Script TransformationsExtends the capabilities of the data flow.
  • Delivers optimized performance because it is precompiled.
  • Similar to the Script Task, it delivers the ability to introduce custom logic into the data flow using VB.NET or C#.NET.
  • It can be configured to behave as a Source, a Destination, or any type of Transformation.

6. Other Transformations –

  • Other TransformationsAdd Audit Information such as when the package was run and by whom.
  • Export and Import Data.
  • Stores the row count from the Data Flow into a variable
Transformation   Name Description
Audit Adds audit information as columns to   the output.
Cache Transform Prepares caches for use by the Lookup   transformation.
Export and Import Column Extracts or load data from/to the file   system.
Row Count Stores the number of rows that have   passed through the transformation into a variable.
Slowly Changing Dimensions Produces SCD logic for type 1 and 2   changes.

This completes the basic of  Data Flow Transformation Categories in SSIS. You can have as many transformations as you need according to your business requirement with the use of Script Transformation. This feature of creating custom transformations via VB.NET or C# programming language, makes it a valuable tool in this competeting market of Business Intelligence. Further we will do the implementation of each Data Flow Transformation. We will see how it is configured to work according to our requirements. I hope you like my article on Data Flow Transformation Categories in SSIS. Get some time to post your queries and comments below.

VN:F [1.9.22_1171]
Your Feedback hels to Improve
Rating: 9.1/10 (59 votes cast)
VN:F [1.9.22_1171]
Rating: +44 (from 50 votes)
Data Flow Transformation categories in SSIS, 9.1 out of 10 based on 59 ratings
  • Allen

    Excellent explaination. Really appreciate

  • Thanks a lot Allen !!!

  • Anupama

    Very nice explanation of all the transformation with crisp information

  • Pingback: SSIS Aggregate Dataflow transformation | SSIS Tutorials()

  • Bindu

    Hiiii Chander Sharma,
    Can yu du a favour , i need clear explanation on Look up and Cache transformations ….
    Could yu plz update it asap …

    Thank yu …

  • Jerome

    Hi Chander Sharma,
    Good work – one place to see all the transforms.
    I have three questions – 1. Do we have a similar blog that contains all the tasks that can be part of the control flow
    2.In a multi user environment , how does SSIS work.
    To give more details about my question – If I create a project – would that be visible for others – or vise versa .
    3.can I use a package that is there in another project folder ?

  • Pingback: SSIS Non-Blocking, Semi and Fully-Blocking Transformations()

  • gopal

    Very Nice and SImlpe, Great Work.

  • PRADEEP

    Thank you for sharing very useful stuff.