Data Profiling Task in Sql Server Integration Services (SSIS)

13
2252

What do we mean by Data Profiling task in SSIS?

Data Profiling task is a control flow component and is used to analyze data of a table or view in a SQL Server database so as to identify the potential problems,data issues,patterns and constraints with the quality of data. This is a new component in MS SQL Server 2008. We can generate a XML (Xtensible Markup Language) report and save it to a file or to a SSIS variable. After saving the computed data profiles in a file, we can use Data profile viewer to review the profile output. It helps on analyzing the source data and resolves the data issues to build a mature centralized data ware house. Whether it’s an additional cost or not, it is a necessity these days that an ETL tool should have a data profiling facility. The Data Profiling task works only with data that is stored in SQL Server. This task does not work with third-party or file-based data sources.

NOTE :-  How to Launch Data Profile Viewer – There are 3 ways to launch Data profile viewer to see the regular XML file as output.

1.  Double click on the Data Profiling task. Go to the Control Flow tab and it will open up the General tab. Under this we can click “Open Profile Viewer“.

2.  Go to Start Menu -> Microsoft SQL Server 2012 -> Integration Services -> Data Profile Viewer. Open the output file that was created by the Data profiling task.

3.  Go to run and enter DataProfileViewer.exe

Why do we need Data Profiling?

 1.  Data Analysis Instead of writing a bunch of TSQL to do some data analysis on the table data in your database, SQL Server Integration services provides a feature called Data Profiling Task that helps to perform analysis much easier and faster.

2To fix data issues at source It is important to trap the problems in the beginning and rectify them before sending the data for further analysis to minimize the data quality issues that might occur from source data.

3.  To have a Valid and Quality Data For better analysis and reporting we need a valid and quality data i.e. data should be Accurate, Consistent, Complete and uniform as dirty data will lead to wrong analysis and reporting.

Features of Data Profiling Task :-

The Data Profiling Task can compute 8 different data profiles. 5 of these profiles analyze individual columns, and the remaining 3 analyze multiple columns or relationships between columns and tables.

For more information, you can Visit – http://msdn.microsoft.com/en-us/library/bb895263.aspx

Implementing Data Profiling Task with an Example –  

STEP 1.  Open Business Intelligence Development Studio (BIDS) and create a package with a desired name.

STEP 2.  Once the package is created, Go to the Toolbox and select Data Profiling Task under the Control flow items. Drag and drop this task onto the control flow pane.

Drag_Drop

STEP 3.  Now we need to configure the properties used to profile data sources. Double click on the Data profiling task to launch the editor. Under the General tab, go to the destination field and choose New Connection from the drop down list.

Destination_New Connecti

STEP 4.  A file connection manager editor will pop up. You can choose either create file or existing file from the usage type drop down list. Next step is to browse the file location where we need to save our file. The extension of the file will be .xml

Destination_location

STEP 5.  Under General tab, go to Quick profile button. It will launch single table quick profile form. Then go to the New button to configure connection for the table whose data profile need to be analyzed. Choose the appropriate Server name, Authentication type and the database name and click Ok.

Connection

STEP 6. This will establish an ADO.NET connection with your SQL server database. Now, you can choose the table or view from the drop down list whose data profiles need to be analyzed. Choose the various profiles for the table from the check boxes and click OK.

Quick profile form

STEP 7.  Clicking OK button will lead to Profile requests tab. You can set the data profiles for the table from here also. This can be used as an alternate method.

Configuring profiles

STEP 8.  The configuration of our data profiling task is now complete and we can run it by selecting “Start Debugging” from the Debug menu or by using the F5 shortcut.

Execution of Data Profiling Task

STEP 9. To view the output in Data Profile Viewer

Go to Start Menu -> Microsoft SQL Server 2012 -> Integration Services -> Data Profile Viewer. Open the output file that was created by the Data profiling task OR Go to run and enter DataProfileViewer.exe

Output of Data Profiling task

This completes our post on Data Profiling Task in SSIS. Hope you unserstood the features and functioning of Data profiling task. Your views and comments are valuable.

VN:F [1.9.22_1171]
Your Feedback hels to Improve
Rating: 9.7/10 (34 votes cast)
VN:F [1.9.22_1171]
Rating: +32 (from 32 votes)
Data Profiling Task in Sql Server Integration Services (SSIS), 9.7 out of 10 based on 34 ratings

13 COMMENTS

  1. Easy to understand and simple to implement. Got to know the features of Dat profilng task. Nice post !!!

    • Hi ,

      This is fine for one table and all columns or single column but
      How to do data profiling for selected tables and selected columns . please reply if you know

      I Need this output , if you know this ,please send step by step process for this , either you can post same page.

  2. gampang banget dimengerti, langsung bisa diaplikasikan.
    pertanyaannya bagaimana caranya kalau mau publish ke sharepoint supaya bisa diakses oleh user tanpa harus install sql server/bi dev studio/visual studio?

  3. Hi Phpring n hello to chander n to avinash for such a easy understanding articles.
    Really it s awesome,,,fantastic…
    I have gone through all u r ssis n ssrs articles.
    Helped me a lot to learn quickly n easily
    excepting more ssas practicals from u.. Thank you so much………

  4. This was an excellent piece of work. This example was based on one table, if you could provide better direction on multiple tables or views same way with screen shots that will be really appriciated. Thanks again 🙂

Comments are closed.