Merge SQL databases

Imagine you have 2 databases with identical schema. These databases were working for different application instances for some time. And now you need to merge them together for some reason.

You can use different tools to achieve this goal. For example dbForge Data Compare or SQL Data Compare. But these tools cost money and if you don’t merge databases every day this is probably not an option for you. Also these tools does not know full specific of your database structure including unique indexes, check constraints and triggers.

Another big deal is identity columns that are using as primary keys. For two databases these keys can be same but represent different entities. 

In my practice I face database merge task second time and here is how I handle it.

Step 1. Identify tables that must be merged.

If you need to merge only several tables of hundred you probably don’t want to have a deal with full database schema. You also definitely want to skip lookup tables such as “AddressType” it it holds only records like “1 | Home” and “2 | Business”.

 use TargetDB;
 go

 declare @TablesToImport table 
 (
   Name nvarchar(128)
 )

 insert into @TablesToImport (Name) values 
   ('Users'), ('Orders'), ('Products'), ('Categories')

Step 2. Get information about identity columns.

Identity columns are important. Usually they are used as Primary Keys for table rows. But if two databases were working for different application instances you can think that even if identity primary keys are the same for User table for example they represent different entities and there should be no duplicate records. In this case you need to modify all identity primary keys in source database to make data graphs non-overlapped. But first, you need to identify these identity columns.

 declare @TableInformation table 
 (
   [Schema] nvarchar(128),
   Name nvarchar(128),
   IdentityColumn nvarchar(128),
   MaxIdentity bigint
 )

 -- SCAN SYSTEM TABLES
 insert into @TableInformation 
 (
   [Schema], 
   [Name], 
   [IdentityColumn]
 ) select 
     TABLE_SCHEMA, 
     TABLE_NAME, 
     COLUMN_NAME
   from INFORMATION_SCHEMA.COLUMNS
   where COLUMNPROPERTY(
     OBJECT_ID(TABLE_NAME), 
     COLUMN_NAME, 
     'IsIdentity') = 1
   and OBJECTPROPERTY(
     OBJECT_ID(TABLE_SCHEMA + '.' + TABLE_NAME), 
     'IsTable') = 1
   and TABLE_NAME in
   (
     select [Name] 
     from @TablesToImport
   )

Step 3. Calculate max identity value.

Now you have Table that contain all identity column information. Next you beed to find the greatedt identity value for all tables and calculate max value among all of your tables

declare @Schema nvarchar(128),
   @Name NVARCHAR(128),
   @IdentityColumn NVARCHAR(128);

 declare TablesCursor cursor for 
   select [Schema], [Name], [IdentityColumn] 
   from @TableInformation

 open TablesCursor

 fetch next from TablesCursor 
   into @Schema, @Name, @IdentityColumn;

 while @@FETCH_STATUS = 0
 begin
   declare @Select nvarchar (max),
     @ParmDefinition nvarchar(500),
     @Result bigint;

   -- GENERATE DYNAMIC SQL STATEMENT	
   set @Select = N'select @Result = 
     max(' + @IdentityColumn + ') from ' + @Schema + '.' + @Name
   set @ParmDefinition = N'@Result bigint output';

   -- EXECUTE DYNAMIC SQL STATEMENT
   exec sp_executesql @Select, @ParmDefinition, @Result = @Result output;

   -- UPDATE MaxIdentity for Identity Column
     update @TableInformation 
       set MaxIdentity = @Result 
       where [Schema] = @Schema 
       and [Name] = @Name 
       and [IdentityColumn] = @IdentityColumn

   fetch next from TablesCursor 
     into @Schema, @Name, @IdentityColumn;
 end

 close TablesCursor
 deallocate TablesCursor

 -- DISPLAY TableInformation
 select * from @TableInformation

 -- DISPLAY MaxIdentity
 select MAX(MaxIdentity) from @TableInformation

Step 4. Identify Foreign keys that needs to be incremented along with Primary keys.

In step 3 you calculated the biggest identity value among all tables that must be merged. Now we are ready to increase all identities in source database. But you have to keep in mind that all references must not be broken. Because of that we have to identify all Foreign Keys for identity columns and increase them too.

use SourceDB;
 go

 declare @Increment bigint
 set @Increment = 6000 -- value from step 3

 declare @TablesToImport table 
 (
   [Name] nvarchar(128)
 )

 insert into @TablesToImport (Name) values 
   ('Users'), ('Orders'), ('Products'), ('Categories')

 declare @FkToIncrement table 
 (
   [Schema] nvarchar(128),
   [Table] nvarchar(128),
   [Column] nvarchar(128)
 )

 -- SCAN SYSTEM TABLES
 insert into @FkToIncrement ([Schema], [Table], [Column])
 select
   sc.TABLE_SCHEMA,
   t.name [table], 
   c.name ForeignKeyColumn 
 from sys.foreign_key_columns as fk
   inner join sys.tables as t 
     on fk.parent_object_id = t.object_id
   inner join sys.columns as c 
     on fk.parent_object_id = c.object_id 
     and fk.parent_column_id = c.column_id
   inner join INFORMATION_SCHEMA.COLUMNS sc
     on sc.TABLE_NAME = t.name
     and sc.COLUMN_NAME = c.name
 where fk.referenced_object_id in 
   (
     select object_id 
     from sys.tables 
     where name in 
     (
       select [Name] 
       from @TablesToImport
     )
   )
   and t.name in 
   (
     select [Name] 
       from @TablesToImport
   )

Step 5. Remove all FullText indexed, Foreign Keys and Primary keys.

Now we need to remove everything that can stop us to change PK and FK column values. I recommend to delete FullText Indexes, Foreign Keys and Primary Keys. And that’s why we identify FK columns step before (because after this step we will not be able to recognize them).

 -- REMOVE ALL FULLTEXT INDEXES
 declare @dropFtSql nvarchar(max)
 set @dropFtSql = ''

 select @dropFtSql = @dropFtSql 
   + 'drop fulltext index on ' + t.name + ';'
   from sys.tables t 
   inner join sys.fulltext_indexes fi 
     on t.[object_id] = fi.[object_id] 

 if (@dropFtSql <> '')
 begin
   execute (@dropFtSql)
 end   
 print 'all fulltext indexes are removed'

 -- REMOVE ALL FOREIGN KEYS
 declare @removeFkSql nvarchar(MAX) 
 set @removeFkSql = ''

 select @removeFkSql = @removeFkSql + 'alter table ' 
   + QUOTENAME(CONSTRAINT_SCHEMA) + '.' 
   + QUOTENAME(TABLE_NAME) + ' drop constraint ' 
   + QUOTENAME(CONSTRAINT_NAME) + '; ' 
 from INFORMATION_SCHEMA.TABLE_CONSTRAINTS
   where CONSTRAINT_TYPE = 'FOREIGN KEY'

 if (@removeFkSql <> '')
 begin
   execute(@removeFkSql)
 end
 print 'all foreign keys are removed'

 -- REMOVE ALL PRIMARY KEYS
 declare @removePkSql nvarchar(MAX) 
 set @removePkSql = '' 

 select @removePkSql = @removePkSql + 'alter table ' 
   + QUOTENAME(CONSTRAINT_SCHEMA) + '.' 
   + QUOTENAME(TABLE_NAME) + ' drop constraint ' 
   + QUOTENAME(CONSTRAINT_NAME) + '; ' 
 from INFORMATION_SCHEMA.TABLE_CONSTRAINTS
   where CONSTRAINT_TYPE = 'PRIMARY KEY'

 if (@removePkSql <> '')
 begin
   execute(@removePkSql)
 end
 print 'all primary keys are removed'

Step 6. Increment all FK column values.

declare @Schema nvarchar(128),
   @Table NVARCHAR(128),
   @Column NVARCHAR(128),
   @DataType NVARCHAR(128);

 declare FkCursor cursor for 
   select * from @FkToIncrement

 open FkCursor

 fetch next from FkCursor 
   into @Schema, @Table, @Column
 while @@FETCH_STATUS = 0
 begin
   execute('update ['+@Schema+'].['+@Table+'] 
     set ['+@Column+'] = ['+@Column+'] + ' + @Increment)
   fetch next from FkCursor into @Schema, @Table, @Column
 end

 close FkCursor
 deallocate FkCursor

 print 'all foreign keys are incremented'

Step 7. Increment all identity columns.

Identity columns are tricky. You cannot just remove “Identity” property from them and you also cannot update values in these columns. The easiest way to do this is to create a copy of identity column, drop the original one and rename the copy.

declare @IdentityToIncrement table 
 (
   [Schema] nvarchar(128),
   [Table] nvarchar(128),
   [Column] nvarchar(128),
   [DataType] nvarchar(128)
 )

 -- SCAN SYSTEM TABLES
 insert into @IdentityToIncrement 
   (
     [Schema], 
     [Table], 
     [Column], 
     [DataType]
   )
   select 
     TABLE_SCHEMA, 
     TABLE_NAME, 
     COLUMN_NAME, 
     DATA_TYPE
   from INFORMATION_SCHEMA.COLUMNS
     where COLUMNPROPERTY(
       OBJECT_ID(TABLE_NAME), 
       COLUMN_NAME, 
       'IsIdentity') = 1
     and OBJECTPROPERTY(
       OBJECT_ID(TABLE_SCHEMA + '.' + TABLE_NAME), 
       'IsTable') = 1
     and TABLE_NAME IN (
       select Name 
       from @TablesToImport
     )

 -- INCREMENT IDENTITY COLUMNS		
 declare IdentityCursor cursor for 
   select * from @IdentityToIncrement

 open IdentityCursor

 fetch next from IdentityCursor 
   into @Schema, @Table, @Column, @DataType

 while @@FETCH_STATUS = 0
 begin
   execute('alter table ['+@Schema+'].['+@Table+'] add '+@Column+'_Copy ' + @DataType + '')
   execute('update ['+@Schema+'].['+@Table+'] set '+@Column+'_Copy = '+@Column+'')
   execute('alter table ['+@Schema+'].['+@Table+'] drop column '+@Column+'')
   execute('exec sp_rename ''['+@Schema+'].['+@Table+'].'+@Column+'_Copy'', '''+@Column+''', ''COLUMN''')
   execute('update ['+@Schema+'].['+@Table+'] set ['+@Column+'] = ['+@Column+'] + ' + @Increment)
   execute('alter table ['+@Schema+'].['+@Table+'] alter column ['+@Column+'] ' + @DataType + ' not null')
   fetch next from IdentityCursor into @Schema, @Table, @Column, @DataType
 end

 close IdentityCursor
 deallocate IdentityCursor

 print 'all identity columns are incremented'

Step 8. Handle constraint violations.

At this point of time you have 2 data graphs that potentially can be merged. But you probably have UNIQUE INDEXes or other CHECK CONSTRAINTs that help you to have data consistent. For example you cannot have 2 users with same login. But 2 databases can have same login for different user and here you must make a decision about how to handle this. It’s up to you if you want to remove this user or change it’s login. 

Step 9. Merge.

Finally we can move data from source database to target one. You can use any tool to automate this process, but you also can use pure SQL that looks like this:

 -- MERGE Categories
 set identity_insert TargetDB.dbo.Categories on
 insert into TargetDB.dbo.Categories ([Id], [Name])
 select [Id],[Name]
   from SourceDB.dbo.Categories
 set identity_insert TargetDB.dbo.Categories off

 -- MERGE Products
 set identity_insert TargetDB.dbo.Products on
 insert into TargetDB.dbo.Products ([Id], [Name], [CategoryId])
 select [Id],[Name],[CategoryId]
   from SourceDB.dbo.Products
 set identity_insert TargetDB.dbo.Products off

That’s it. I hope this was not too complicated. And Here is the Gist with all steps together.

Comments

comments powered by Disqus