Monday, November 27, 2017

Data Profiling Task for Multiple Tables

I am a big fan of BIML, so when I was faced with the task of creating DataProfile Tasks for several dozens of tables in the database, I naturally turned to a quick and easy BIML solution:


<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<#
string _con = @"Data Source=SERVER;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=DATABASE";
DataTable _extract_table_info = new DataTable();
SqlDataAdapter _con_da1 = new SqlDataAdapter("select TABLE_NAME from INFORMATION_SCHEMA.TABLES a where a.TABLE_SCHEMA = 'XXX'", _con);
_con_da1.Fill(_extract_table_info);
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Projects>
<PackageProject Name="DataProfiling">
<Packages>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<Package IsEntryPoint="false" PackageName="DP_<#=_extraction["TABLE_NAME"].ToString()#>"/>
<# } #>
<Package PackageName="DataProfiling.Master"/>
</Packages>
</PackageProject>
</Projects>
<Connections>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<FileConnection Name="Profiling_<#=_extraction["TABLE_NAME"].ToString()#>.xml" FileUsageType="CreateFile" FilePath="C:\...\DataProfiling\Profiling_<#=_extraction["TABLE_NAME"].ToString()#>.xml" />
<# } #>
<AdoNetConnection Name="DBCON" ConnectionString="Data Source=SERVER;Initial Catalog=DATABASE;Integrated Security=True;" Provider="System.Data.SqlClient.SqlConnection, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />
</Connections>
<Packages>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<Package Name="DP_<#=_extraction["TABLE_NAME"].ToString()#>" ConstraintMode="Linear">
<Tasks>
<DataProfiling Name="Data Profiling Task" OverwriteDestination="true" >
<FileOutput ConnectionName="Profiling_<#=_extraction["TABLE_NAME"].ToString()#>.xml"></FileOutput>
<ProfileRequests>
<ColumnStatisticsProfileRequest Name="StatisticsReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
<ColumnNullRatioProfileRequest Name="NullRatioReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
<ColumnPatternProfileRequest Name="PatternReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
<CandidateKeyProfileRequest Name="KeyReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>">
<KeyColumns>
<Column></Column>
</KeyColumns>
</CandidateKeyProfileRequest>
<ColumnValueDistributionProfileRequest Name="ValueDistReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
</ProfileRequests>
</DataProfiling>
</Tasks>
</Package>
<# } #>
<Package Name="DataProfiling.Master" ConstraintMode="LinearOnCompletion">
<Tasks>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<ExecutePackage Name="EPT_<#=_extraction["TABLE_NAME"].ToString()#>">
<ExternalProjectPackage Package="DP_<#=_extraction["TABLE_NAME"].ToString()#>.dtsx"/>
</ExecutePackage>
<#} #>
</Tasks>
</Package>
</Packages>
</Biml>
view raw gistfile1.xml hosted with ❤ by GitHub


There are however some other alternative solutions, like here: Using the Data Profiling Task To Profile All the Tables in a Database

Wednesday, November 22, 2017

Remember That The Regex Engine Is Eager

Just a reminder:

Regex is eager and will return the first match in the piped list expressions: (<= |<>|<)

that was important for me for trying to match equality signs, for example: "<>" or "<=" vs "<".

https://www.regular-expressions.info/alternation.html


Tuesday, November 21, 2017

Developing Custom SSIS Data Flow Component: helpful links

Great resource to start with. A sample source code is also available for download.
http://geekswithblogs.net/Compudicted/archive/2012/03/01/creating-a-custom-ssis-data-flow-component---an-example.aspx

How to work with Connections
https://docs.microsoft.com/en-us/sql/integration-services/extending-packages-custom-objects-data-flow-types/developing-a-custom-source-component

Debugging Custom component
https://docs.microsoft.com/en-us/sql/integration-services/extending-packages-custom-objects/building-deploying-and-debugging-custom-objects

1. Component Project in DEBUG mode
2. register .dll in GAC and copy it in the DTS\PipelineComponents
3. in SSIS project: breakpoint on Pre-Execute of the Data Flow Task. Wait till we hit breakpoint
4. in Component project - Debug - > Attach to process. Select DtsDebugHost.exe (Managed, x64)
5. in Component project - put a breakpoint in code in Validate or PreExecute or ProcessInput
6. in SSIS project: click Continue
7. you should hit the breakpoint in the Component project

Automate with Post-build event: registering .dll in GAC and copying it in the DTS\PipelineComponents
http://toddmcdermid.blogspot.ca/2009/01/custom-component-developing-post-build.html

this worked:
copy "$(TargetDir)$(TargetName).DLL" "C:\Program Files (x86)\Microsoft SQL Server\110\DTS\PipelineComponents\"
gacutil.exe -i "$(TargetDir)$(TargetName).DLL"

Supporting different SQL Server platforms:

Support multi-targeting in your custom components

Getting your SSIS custom extensions to be supported by the multi-version support of SSDT 2015 for SQL Server 2016

Using PerformUpgrade

https://www.red-gate.com/simple-talk/sql/ssis/developing-a-custom-ssis-source-component/

https://blogs.perficient.com/microsoft/2016/04/developing-custom-ssis-component-9-lessons-learned/

old but good - MSIL Disassembler tool

Thanks to a pesky issue with a custom icon in the Custom SSIS Data Flow Component, I found out how useful MSIL Disassembler tool could be!
http://www.sqlis.com/sqlis/post/How-to-add-an-icon-to-your-component.aspx