Why I love SSIS’ xml task

You too love it, right?? Yes I know, SSIS is such a great tool that’s pretty easy and fast to learn. And, XML’s been a revolution in the way data is being stored, transmitted and consumed. Ultimately, SSIS and XML make a cool combo, don’t they??

In this post, I would like to explore how SSIS’ XML task component could be used to compute Null Ratio Percentage as part of data source profiling and cleansing. I chose this topic because I had seen a lot of blogs out there that were using script task instead to do this job, because of some constraints with namespaces, I guess. I do not say that this is wrong, but just that SSIS being such a popular tool, any developer who is not into VB/ VC++ should be able to use it. I myself being an Object Oriented Guy (Object Oriented Programming + Programming Guy = Object Oriented Guy), find it a bit difficult to straight  away write VC++ code. So just to give those ‘other developers’ a seamless experience, here is my attempt.  So, Ready, let’s dive in…

Here is what my demonstration package looks like

Address Line 2 Profiling and XML Task container

Package

Yes, I set up a Null Ratio Data profiling task for Addressline2 column in Adventure Work’s Person table. This as you know creates a .xml ouput, which looks like

xml version=”1.0″?>
<DataProfile xmlns:xsd=”http://www.w3.org/2001/XMLSchema&#8221; xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance&#8221; xmlns=”http://schemas.microsoft.com/sqlserver/2008/DataDebugger/”&gt;
<ProfileVersion>1.0</ProfileVersion>
<DataSources>  …  </DataSources>
<DataProfileInput> ..  </DataProfileInput>
<DataProfileOutput>
<Profiles>
<ColumnNullRatioProfile ProfileRequestID=”NullRatioReq” IsExact=”true”>
<DataSourceID>{9A3233CB-55D2-4DD8-B089-FB5CD99A8973}</DataSourceID>
RowCount=”19614″ />
<Column Name=”AddressLine2″ SqlDbType=”NVarChar” MaxLength=”60″ Precision=”255″ Scale=”255″ LCID=”1033″ CodePage=”0″ IsNullable=”true” StringCompareOptions=”0″ />
<NullCount>19252
<!–ColumnNullRatioProfile>–>
</Profiles>
</DataProfileOutput>
</DataProfile>

Our interest is the total RowCount; 19614 and the NullCount which is 19252. Based on these results, I would like to compute the Null Ratio as 19252/19614 which is 98%+, yup, pretty good.

Next comes the XML Task, and shown below is a detailed task screen grab

XML Task Editor SSISHere is what the XML task does

  • Read the output .xml of the data profiling task
  • Execute the following XPath query against the profile output to get Null Count value

//*[local-name()=’ColumnNullRatioProfile’]/*[local-name()=’Column’][@Name=’AddressLine2′]/parent::node()/*[local-name()=’NullCount’]

Note: If you had noticed, the *[local-name()=..] is the trick to read nodes using default namespaces.

  • Populate the Null Count value into a package variable for later use

Yup, the third is a script task, and I know, I know, I myself am using a script task here, but just to show you guys what the variable’s value is after the execution, and here it is

XPath Result : 19252

XPath Result : 19252

Cool, so we have what we need. Similarly we could even read the Row Count value and compute the Null Ratio Percentage. However, I would leave to it to you further, to compute the ratio. By the way, here is the XPath expression to read Row Count though,

//*[local-name()=’ColumnNullRatioProfile’]/*[local-name()=’Column’][@Name=’AddressLine2′]/parent::node()/*[local-name()=’Table’]/@RowCount

So, how do you plan to do the Null Ratio Percentage further?? Write to me.. Till I catch you again, Adios..

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s