MySQL性能调优 – 使用UNION ALL代替UNION
最近遇到了一个MySQL性能调优的问题,在数据库中有一张主要用了2个UNION来联合3张表的视图,而所有对于这个视图的查询反应都很慢,经过一步步的调试(说句题外话,用SQLYog来调试MySQL真的很方便),最终确定问题出在这个视图里所用的UNION上。
查了查MySQL的文档,这么说到
The default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements.
You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION types are treated such that a DISTINCT union overrides any ALL union to its left. A DISTINCT union can be produced explicitly by using UNION DISTINCT or implicitly by using UNION with no following DISTINCT or ALL keyword.
举个例子:
如果union两个不同的select,最终的结果则是2个select的集合:
select 1 union select 2; +---+ | 1 | +---+ | 1 | | 2 | +---+ 2 rows in set (0.00 sec)
而如果union的对象包含重复的数据,例如下条语句包含了两个select 2,默认则会去除重复部分:
select 1 union select 2 union select 2; +---+ | 1 | +---+ | 1 | | 2 | +---+ 2 rows in set (0.00 sec)
而如果将上条语句改为union all的话,则可以显示所有的结果,包含重复部分:
select 1 union all select 2 union all select 2; </strong>+---+ | 1 | +---+ | 1 | | 2 | | 2 | +---+ 3 rows in set (0.00 sec)
可以看到,union和union all的差别就在于union会对数据做一个distanct的动作,而这个distanct动作的速度则取决于现有数据的数量,数量越大则时间也越慢。而对于几个数据集,要确保数据集之间的数据互相不重复,基本是O(n)的算法复杂度。
有了理论依据后,便动手更改view的结构,在确保数据逻辑上不会有重复情况出现后,将2个union都改成了union all,query的反应速度从1.7秒变成了300毫秒左右,耗费时间只有以前的17%。