Aim – In this post we will learn abot Data Flow Transformation Categories in SSIS. Transformations are defined as a core component in the Data flow of a package in SSIS. It is that part of the data flow to which we apply our business logic to manipulate and modify the input data into the required format before loading it to the destination. All the Data Flow Transformations are broadly classified into 2 types:-
Type 1 – Synchronous Transformations.
Type 2 – Asynchronous Transformations.
What is the difference between Synchronous and Asynchronous transformations?
|Synchronous Transformations||Asynchronous Transformations|
|Processes each incoming row, modifies according to the required format and forward it.||Stores all the rows into the memory before it begins the process of modifying input data to the required output format.|
|No. of input rows = No. of output rows.||No. of input rows != No. of output rows|
|Output rows are in sync with Input rows i.e. 1:1 relationship.||Output rows are not in sync with Input rows|
|Less memory is required as they work on row by row basis.||More memory is required to store the whole data set as input and output buffers do not use the same memory.|
|Does not block the data flow in the pipeline.||Are also known as “Blocking Transformations” as they block the data flow in the pipeline until all the input rows are read into the memory.|
|Runs quite faster due to less memory required.||Runs generally slow as memory requirement is very high.|
|E.g. – Data Conversion Transformation- Input rows flow into the memory buffers and the same buffers come out in the required data format as Output.||E.g. – Sort Transformation- where the component has to process the complete set of rows in a single operation.|
Further Asynchronous Transformations are divided into 2 categories:-
- Partially blocking Transformations creates new memory buffers for the output of the transformation such as the Union All Transformation.
- Fully blocking Transformations performs the same operation but cause a full block of the data such as the Sort and Aggregate Transformations.
Data Flow Transformation Categories are as follows:-
- This transformation is used to update column values or create new columns.
- It transforms each row present in the pipeline (Input).
|Character Map||Modifies strings, typically for changes involving code pages.|
|Copy Column||Copies columns to new output columns.|
|Data Conversion||Performs data casting.|
|Derived Column||Allows the definition of new columns, or the overriding of values in existing columns, based on expressions.|
|OLE DB Command||Executes a command against a connection manager for each row. This transformation can behave as a destination.|
2. Rowset Transformations –
- These transformations are also called Asynchronous as they “dam the flow” of data i.e. Stores all the rows into the memory before it begins the process of modifying input data to the required output format.
- As a result, a block is caused in the pipeline of data until the operation is completed.
|Aggregate||Aggregates (summarizes) numeric columns|
|Percentage Sampling||Outputs a configured percentage of rows|
|Row Sampling||Outputs a configured number of rows|
|Sort||Sorts the data, and can be configured to remove duplicates.|
|Pivot||Pivots the data|
|Unpivot||Unpivots the data|
3. Split and join Transformations –
- Distribute rows to different outputs.
- Create copies of the transformation inputs.
- Join multiple inputs into one output.
|Condition Split||Uses conditions to allocate rows to multiple outputs.|
|Look up||Performs a look up against a reference set of data. Typically used in the fact table for loading packages.|
|Merge||Unions two sorted inputs and retains sort order in the output.|
|Merge Join||Joins two sorted inputs, and can be configured as Inner, Left Outer or Full Outer.|
|Multicast||Broadcasts (duplicates) the rows to multiple outputs.|
|Union All||Unions two or more inputs to produce a single output.|
4. Business Intelligence Transformations –
- These are used to introduce data mining capabilities and data cleansing.
- Cleaning data includes identification and removal of duplicate rows based on approximate matches.
- These are only available with the Enterprise Edition
5. Script Transformations –
- Extends the capabilities of the data flow.
- Delivers optimized performance because it is precompiled.
- Similar to the Script Task, it delivers the ability to introduce custom logic into the data flow using VB.NET or C#.NET.
- It can be configured to behave as a Source, a Destination, or any type of Transformation.
6. Other Transformations –
- Add Audit Information such as when the package was run and by whom.
- Export and Import Data.
- Stores the row count from the Data Flow into a variable
|Audit||Adds audit information as columns to the output.|
|Cache Transform||Prepares caches for use by the Lookup transformation.|
|Export and Import Column||Extracts or load data from/to the file system.|
|Row Count||Stores the number of rows that have passed through the transformation into a variable.|
|Slowly Changing Dimensions||Produces SCD logic for type 1 and 2 changes.|
This completes the basic of Data Flow Transformation Categories in SSIS. You can have as many transformations as you need according to your business requirement with the use of Script Transformation. This feature of creating custom transformations via VB.NET or C# programming language, makes it a valuable tool in this competeting market of Business Intelligence. Further we will do the implementation of each Data Flow Transformation. We will see how it is configured to work according to our requirements. I hope you like my article on Data Flow Transformation Categories in SSIS. Get some time to post your queries and comments below.