本篇文章小編為大家介紹,,基于SQL Server中如何比較兩個表的各組數(shù)據(jù) 圖解說明,。需要的朋友參考下 開始 前一陣子,在項目中碰到這樣一個SQL查詢需求,有兩個相同結構的表(table_left & table_right),,如下: 圖1. 檢查表table_left的各組(groupId),,是否在表table_right中存在有一組(groupId)數(shù)據(jù)(data)與它的數(shù)據(jù)(data)完全相等. 如圖1. 可以看出表table_left和table_right存在兩組數(shù)據(jù)完整相等: 圖2. 分析 從上面的兩個表,可以知道它們存放的是一組一組的數(shù)據(jù),;那么,,接下來我借助數(shù)學集合的列舉法和運算進行分析。 先通過集合的列舉法描述兩個表的各組數(shù)據(jù): 圖3. 這里只有兩種情況,,相等和不相等,。對于不相等,可再分為部分相等,、包含,、和完全不相等。使用集合描述,,可使用交集,,子集,并集,。如下面圖4.,,我列舉出這幾種常見的情況: 圖4. 實現(xiàn) 在數(shù)據(jù)庫中,要找出表table_left和表table_right存在相同數(shù)據(jù)的組,,方法很多,,這里我列出兩種常用的方法。 (下面的SQL腳本,,是以圖4.的數(shù)據(jù)為基礎參考) 方法1: 通過"Select … From …Order by … xml for path('') "把各組的data列數(shù)據(jù)連串起來(如,,圖4.把table_left的組#11的列data連串起來成"data1-data2-data3"),其他分組(包含表table_right)以此方法實現(xiàn)data列數(shù)據(jù)連串起來,;然后通過比較兩表的連串后字段是否存在相等,,若是相等就說明這比較多兩組數(shù)據(jù)相等,由此可以判斷出表table_left的哪組數(shù)據(jù)在表table_right存在與它數(shù)據(jù)完全相等的組,。 針對方法1,,需要對原表增加一個字段dataPath,用于存儲data列數(shù)據(jù)連串的結果,如: 復制代碼代碼如下: alter table table_left add dataPath nvarchar(200) alter table table_right add dataPath nvarchar(200) 分組連串data列數(shù)據(jù)并update至剛新增的列dataPath,,如: 復制代碼代碼如下: update a set dataPath=b.dataPath from table_left a cross apply(select (select '-'+x.data from table_left x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b update a set dataPath=b.dataPath from table_right a cross apply(select (select '-'+x.data from table_right x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b 接下來就是查詢了,,如: 復制代碼代碼如下: select distinct a.groupId from table_left a where exists(select 1 from table_right x where x.dataPath=a.dataPath) 完整代碼: 復制代碼代碼如下: View Code use tempdb go if object_id('table_left') is not null drop table table_left if object_id('table_right') is not null drop table table_right go create table table_left(groupId nvarchar(5),data nvarchar(10)) create table table_right(groupId nvarchar(5),data nvarchar(10)) go alter table table_left add dataPath nvarchar(200) alter table table_right add dataPath nvarchar(200) go create nonclustered index ix_left on table_left(dataPath) create nonclustered index ix_right on table_right(dataPath) go set nocount on go insert into table_right(groupId,data) select '#1','data1' union all select '#1','data2' union all select '#1','data3' union all select '#2','data55' union all select '#2','data55' union all select '#3','data91' union all select '#3','data92' union all select '#4','data65' union all select '#4','data66' union all select '#4','data67' union all select '#4','data68' union all select '#4','data69' union all select '#5','data77' union all select '#5','data79' insert into table_left(groupId,data) select '#11','data1' union all select '#11','data2' union all select '#11','data3' union all select '#22','data55' union all select '#22','data57' union all select '#33','data99' union all select '#33','data99' union all select '#44','data66' union all select '#44','data68' union all select '#55','data77' union all select '#55','data78' union all select '#55','data79' go update a set dataPath=b.dataPath from table_left a cross apply(select (select '-'+x.data from table_left x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b update a set dataPath=b.dataPath from table_right a cross apply(select (select '-'+x.data from table_right x where x.groupId=a.groupId order by x.data for xml path('')) as dataPath)b -- select distinct a.groupId from table_left a where exists(select 1 from table_right x where x.dataPath=a.dataPath) 方法2: 通過SQL Sever提供的集運算符"Except",判斷兩組非重復的數(shù)據(jù),。如果兩組針對對方都不存在非重復的數(shù)據(jù),,就說明這兩組數(shù)據(jù)完全相等。如,,表table_left中的組#11和表 table_right中的組#1,,對列data進行"Except"集運算,無任是(#11 à #1)進行Except集運算,還是(#1 à #11 )進行Except集合運算,,都返回空結果,,這就說明組#1 和#11的data數(shù)據(jù)完全相等,如: 復制代碼代碼如下: select data from table_left where groupId='#11' except select data from table_right where groupId='#1' select data from table_right where groupId='#1' except select data from table_left where groupId='#11' 同樣道理,,我們把表table_left中的組#11和表 table_right中的組#2,,對列data進行"Except"集運算,如: 復制代碼代碼如下: select data from table_left where groupId='#11' except select data from table_right where groupId='#2' select data from table_right where groupId='#2' except select data from table_left where groupId='#11' 只要(#11 à #2 )或 (#2 à #11 )的"Except"集運算結果有記錄,,就說明兩組的數(shù)據(jù)不相等,。 兩張表的所有組都進行比較,我們需要通過以下SQL腳本實現(xiàn),,如: 復制代碼代碼如下: select distinct a.groupId from table_left a inner join table_right b on b.data=a.data where not exists(select x.data from table_left x where x.groupId=a.groupId except select y.data from table_right y where y.groupId=b.groupId ) and not exists(select x.data from table_right x where x.groupId=b.groupId except select y.data from table_left y where y.groupId=a.groupId ) 完整代碼: 復制代碼代碼如下: View Code use tempdb go if object_id('table_left') is not null drop table table_left if object_id('table_right') is not null drop table table_right go create table table_left(groupId nvarchar(5),data nvarchar(10)) create table table_right(groupId nvarchar(5),data nvarchar(10)) go create nonclustered index ix_left on table_left(data) create nonclustered index ix_right on table_right(data) go set nocount on go insert into table_right(groupId,data) select '#1','data1' union all select '#1','data2' union all select '#1','data3' union all select '#2','data55' union all select '#2','data55' union all select '#3','data91' union all select '#3','data92' union all select '#4','data65' union all select '#4','data66' union all select '#4','data67' union all select '#4','data68' union all select '#4','data69' union all select '#5','data77' union all select '#5','data79' insert into table_left(groupId,data) select '#11','data1' union all select '#11','data2' union all select '#11','data3' union all select '#22','data55' union all select '#22','data57' union all select '#33','data99' union all select '#33','data99' union all select '#44','data66' union all select '#44','data68' union all select '#55','data77' union all select '#55','data78' union all select '#55','data79' go --select select distinct a.groupId from table_left a inner join table_right b on b.data=a.data where not exists(select x.data from table_left x where x.groupId=a.groupId except select y.data from table_right y where y.groupId=b.groupId ) and not exists(select x.data from table_right x where x.groupId=b.groupId except select y.data from table_left y where y.groupId=a.groupId ) 方法1 Vs. 方法2 : 方法1和方法2都能找出表table_left在table_right存在數(shù)據(jù)完全相等的組#11,。但性能角度上,方法2比方法1略勝一籌,,可以看它們執(zhí)行過程的統(tǒng)計信息: 方法1: 圖5. 方法2: 圖6. 如果,,數(shù)據(jù)量大情況下,那么方法2比方法1更具有明顯的優(yōu)點,。因為方法1,多兩個更新dataPath的部分,,數(shù)據(jù)量隨著增加,,這里位置的更新就耗很多的資源;如果dataPath列數(shù)據(jù)大小超過900字節(jié),,會導致無法在dataPath創(chuàng)建索引,,影響后面的Select查詢性能。 擴展 這里說擴展,,主要是針對上面的方法2來說,。在當列data的數(shù)據(jù)大小超過900字節(jié),或者含有多個數(shù)據(jù)列要進行比較,,看是否存在兩組(groupId)的各對應列數(shù)據(jù)一一相等,。 圖7. 這樣的情況,可對字段dataSub1 & dataSub2 創(chuàng)建一個哈希索引,,如: 復制代碼代碼如下: alter table table_left add dataChecksum as checksum(dataSub1,dataSub2) alter table table_right add dataChecksum as checksum(dataSub1,dataSub2) go create nonclustered index ix_table_left_cs on table_right(dataChecksum) create nonclustered index table_right_cs on table_right(dataChecksum) 后面的select查詢語句,,在Inner Join 部分稍改動下即可,如: 復制代碼代碼如下: select distinct a.groupId from table_left a inner join table_right b on b.dataChecksum=a.dataChecksum and b.dataSub1=a.dataSub1 and b.dataSub2=a.dataSub2 where not exists(select x.dataSub1,x.dataSub2 from table_left x where x.groupId=a.groupId except select y.dataSub1,y.dataSub2 from table_right y where y.groupId=b.groupId ) and not exists(select x.dataSub1,x.dataSub2 from table_right x where x.groupId=b.groupId except select y.dataSub1,y.dataSub2 from table_left y where y.groupId=a.groupId ) 完整代碼: 復制代碼代碼如下: View Code use tempdb go if object_id('table_left') is not null drop table table_left if object_id('table_right') is not null drop table table_right go create table table_left(groupId nvarchar(5),dataSub1 nvarchar(10),dataSub2 nvarchar(10)) create table table_right(groupId nvarchar(5),dataSub1 nvarchar(10),dataSub2 nvarchar(10)) go alter table table_left add dataChecksum as checksum(dataSub1,dataSub2) alter table table_right add dataChecksum as checksum(dataSub1,dataSub2) go create nonclustered index ix_table_left_cs on table_left(dataChecksum) create nonclustered index table_right_cs on table_right(dataChecksum) go set nocount on go insert into table_right(groupId,dataSub1,dataSub2) select '#1','data1','data7' union all select '#1','data2','data8' union all select '#1','data3','data9' union all select '#2','data55','data4' union all select '#2','data55','data5' insert into table_left(groupId,dataSub1,dataSub2) select '#11','data1','data7' union all select '#11','data2','data8' union all select '#11','data3','data9' union all select '#22','data55','data0' union all select '#22','data57','data2' union all select '#33','data99','data4' union all select '#33','data99','data6' go --select select distinct a.groupId from table_left a inner join table_right b on b.dataChecksum=a.dataChecksum and b.dataSub1=a.dataSub1 and b.dataSub2=a.dataSub2 where not exists(select x.dataSub1,x.dataSub2 from table_left x where x.groupId=a.groupId except select y.dataSub1,y.dataSub2 from table_right y where y.groupId=b.groupId ) and not exists(select x.dataSub1,x.dataSub2 from table_right x where x.groupId=b.groupId except select y.dataSub1,y.dataSub2 from table_left y where y.groupId=a.groupId ) 小結 對于這個問題,,可能還有其他的或更優(yōu)的解決方法.而且在實際的生產(chǎn)環(huán)境中,,可能碰到的情況會有所不同,無論如何,,需要多分析,,多動手多實驗,找到最優(yōu)的解決方法。 |
|
來自: icecity1306 > 《數(shù)據(jù)庫資料》