【原】MySQL窗口函數(shù)的妙用

貪挽懶月 2022-11-12 發(fā)布于廣東

展開全文

問題引入

有一張成績表,，里面有若干個學生，他們來自三個班級,，每個學生學習了兩門課程,，現(xiàn)在要求查詢出各個班級每門課程的前兩名學生id。

分析

要各個班級每門課程的前兩名,，第一反應肯定要根據(jù)班級和課程去分組,，但實際上你要是用group by會發(fā)現(xiàn)不好處理，因為我們要取每個班的前兩名,，如果你用group by再用limit,，那返回的數(shù)據(jù)是在總數(shù)中的取 n 條記錄，而不是每個班取 n 條記錄,。所以我們得用其他方式實現(xiàn),，比如窗口函數(shù)。

窗口函數(shù)

窗口函數(shù)是可以對數(shù)據(jù)庫進行實時分析處理的函數(shù),，可以理解為它是對where或group by處理后的結(jié)果再進行操作,，基本語法如下：
<窗口函數(shù)> over (partition by <用于分組的列名> order by <用于排序的列名>)
窗口函數(shù)又分為兩類：
專用窗口函數(shù)，比如 rank,、dense_rank,、row_number等
聚合函數(shù)，比如sum,、avg,、count、max,、min等

用法

回到剛才那個問題,，用窗口函數(shù)怎么處理呢？

首先準備一張表,，再插入一些數(shù)據(jù)

create table grade
(
    id        int(10) primary key auto_increment comment '主鍵',
    stu_id    int(10) comment '學生id',
    class_id  int(10) comment '班級id',
    course_id int(3)  comment '課程id',
    score     int(3)  comment '學生分數(shù)',
    unique key (stu_id, course_id)
) engine = innodb
  charset = utf8;

insert into grade (stu_id, class_id, course_id, score) values ('1', 1, 1, 90), ('4', 1, 1, 90),
       ('7', 1, 1, 84), ('10', 1, 1, 84), ('13', 1, 1, 88), ('1', 1, 2, 67), ('4', 1, 2, 85),
       ('7', 1, 2, 90), ('10', 1, 2, 88), ('13', 1, 2, 86);

insert into grade (stu_id, class_id, course_id, score) values ('2', 2, 1, 83), ('5', 2, 1, 94), 
       ('8', 2, 1, 81), ('11', 2, 1, 91), ('14', 2, 1, 79), ('2', 2, 2, 99), ('5', 2, 2, 80),
       ('8', 2, 2, 82), ('11', 2, 2, 76), ('14', 2, 2, 66);

insert into grade (stu_id, class_id, course_id, score) values ('3', 3, 1, 98), ('6', 3, 1, 92),
       ('9', 3, 1, 76), ('12', 3, 1, 73), ('15', 3, 1, 83), ('3', 3, 2, 95), ('6', 3, 2, 91),
       ('9', 3, 2, 86), ('12', 3, 2, 87), ('15', 3, 2, 68);

這里就是建立了一個成績表,，然后往表中插入了15個學生，他們來自三個班級,，每個學生學習了兩門課程,。

rank函數(shù)的用法

按照上面窗口函數(shù)的語法，寫出如下SQL：

select *
from (select *, rank() over (partition by class_id, course_id order by score desc ) as ranking
      from grade) t
where t.ranking <= 2;

先看里面窗口函數(shù)那一層,，首先是用了rank(),，partition by class_id, course_id就表示根據(jù)class_id和course_id來分組，order by score desc就表示按照分數(shù)降序，然后把分組且分數(shù)降序的結(jié)果作為ranking字段,，就是排名。外層ranking <= 2就表示每個分組取兩條數(shù)據(jù),。

這樣查詢出來的就滿足要求了,，為了讓結(jié)果更加清晰，外層稍微改寫一下,，不要select *,，改成如下所示：

select stu_id,
       case when class_id = 1 then '六(1)班' when class_id = 2 then '六(2)班' else '六(3)班' end as class,
       IF(course_id = 1, '語文', '數(shù)學')                                                       as course,
       score,
       ranking
from (select *, rank() over (partition by class_id, course_id order by score desc ) as ranking
      from grade) t
where t.ranking <= 2;

執(zhí)行結(jié)果如下：可以看到，六(1)班語文有兩個90分的,，他們并列第一,，但是用rank的時候，第二個90分的也占了一個名額,，并不會把分數(shù)第二大的學生查詢出來,。3. dense_rank()的用法直接將rank換成dense_rank就可以看出區(qū)別了。dense_rank兩個人并列第一名只會占用一個名額,，會把分數(shù)第二大的也查出來,，所以總共查出了13個學生。4. row_number()的用法換成row_number再看執(zhí)行結(jié)果：

我用的MySQL5.x,，沒有這些窗口函數(shù)怎么辦,？

上面說的窗口函數(shù)要MySQL8.0+才支持，5.x的話可以自己去實現(xiàn),。比如要實現(xiàn)一個row_number(),，格式如下：

select <要查詢的字段>,
      ranking
from (select @ranking := if(@<分組字段1> = <分組字段1> and @<分組字段2> = <分組字段2>, @ranking + >1, 1) as ranking,
            @<分組字段1> := <分組字段1>                                                              as <分組字段1>,
            @<分組字段2> := <分組字段2>                                                              as <分組字段2>,
            <排序字段>
            <其他需要查詢出來的字段>
     from (select * from <表名> order by <分組字段1>, <分組字段2>, <排序字段> desc) a,
          (select @ranking = 0, @<分組字段一> = 0) b
    ) c
where ranking <= <要取的條數(shù)>;

根據(jù)題目要求，將分組字段和排序字段代入上面的公式,，可得：

select stu_id,
       case when class_id = 1 then '六(1)班' when class_id = 2 then '六(2)班' else '六(3)班' end as class,
       IF(course_id = 1, '語文', '數(shù)學')                                                       as course,
       score,
       ranking
from (select @ranking := if(@class_id = class_id and @course_id = course_id, @ranking + 1, 1) as ranking,
             @class_id := class_id                                                            as class_id,
             @course_id := course_id                                                          as course_id,
             score,
             stu_id
      from (select * from grade order by class_id, course_id, score desc) a,
           (select @ranking = 0, @class_id = 0) b
     ) c
where ranking <= 2;