Monday, November 27, 2017

Data Profiling Task for Multiple Tables

I am a big fan of BIML, so when I was faced with the task of creating DataProfile Tasks for several dozens of tables in the database, I naturally turned to a quick and easy BIML solution:


<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<#
string _con = @"Data Source=SERVER;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=DATABASE";
DataTable _extract_table_info = new DataTable();
SqlDataAdapter _con_da1 = new SqlDataAdapter("select TABLE_NAME from INFORMATION_SCHEMA.TABLES a where a.TABLE_SCHEMA = 'XXX'", _con);
_con_da1.Fill(_extract_table_info);
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Projects>
<PackageProject Name="DataProfiling">
<Packages>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<Package IsEntryPoint="false" PackageName="DP_<#=_extraction["TABLE_NAME"].ToString()#>"/>
<# } #>
<Package PackageName="DataProfiling.Master"/>
</Packages>
</PackageProject>
</Projects>
<Connections>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<FileConnection Name="Profiling_<#=_extraction["TABLE_NAME"].ToString()#>.xml" FileUsageType="CreateFile" FilePath="C:\...\DataProfiling\Profiling_<#=_extraction["TABLE_NAME"].ToString()#>.xml" />
<# } #>
<AdoNetConnection Name="DBCON" ConnectionString="Data Source=SERVER;Initial Catalog=DATABASE;Integrated Security=True;" Provider="System.Data.SqlClient.SqlConnection, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />
</Connections>
<Packages>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<Package Name="DP_<#=_extraction["TABLE_NAME"].ToString()#>" ConstraintMode="Linear">
<Tasks>
<DataProfiling Name="Data Profiling Task" OverwriteDestination="true" >
<FileOutput ConnectionName="Profiling_<#=_extraction["TABLE_NAME"].ToString()#>.xml"></FileOutput>
<ProfileRequests>
<ColumnStatisticsProfileRequest Name="StatisticsReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
<ColumnNullRatioProfileRequest Name="NullRatioReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
<ColumnPatternProfileRequest Name="PatternReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
<CandidateKeyProfileRequest Name="KeyReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>">
<KeyColumns>
<Column></Column>
</KeyColumns>
</CandidateKeyProfileRequest>
<ColumnValueDistributionProfileRequest Name="ValueDistReq" ConnectionName="DBCON" SchemaId="XXX" TableId="<#=_extraction["TABLE_NAME"].ToString()#>" />
</ProfileRequests>
</DataProfiling>
</Tasks>
</Package>
<# } #>
<Package Name="DataProfiling.Master" ConstraintMode="LinearOnCompletion">
<Tasks>
<#foreach (DataRow _extraction in _extract_table_info.Rows) { #>
<ExecutePackage Name="EPT_<#=_extraction["TABLE_NAME"].ToString()#>">
<ExternalProjectPackage Package="DP_<#=_extraction["TABLE_NAME"].ToString()#>.dtsx"/>
</ExecutePackage>
<#} #>
</Tasks>
</Package>
</Packages>
</Biml>
view raw gistfile1.xml hosted with ❤ by GitHub


There are however some other alternative solutions, like here: Using the Data Profiling Task To Profile All the Tables in a Database

No comments:

Post a Comment