如何创建合适的索引-白红宇

如何创建合适的索引

阅读量：3606 次

发布时间：2019-05-20

本文共 4979 字，大约阅读时间需要 16 分钟。

当你接手一个系统时，通常是从检查系统负载，cpu、内存使用率开始，查看statspack top5等待事件，逻辑读、物理读排名靠前的sql语句等等，然后进行初步的优化。而随着业务的深入了解，你开始从系统的角度去考虑据库设计，考虑应用实现的合理性，是否有更好的改进方案等。假设通过statspack报表找到了很耗资源的sql，表分析过，执行计划也是走索引，这种情况下怎么去判断 sql是优化的？看下面的实际案例：

1.提取逻辑读排名靠前的sql

6,813,699 336 20,278.9 10.1 66.72 80.45 3039661161

module: java@admin1 (tns v1-v3)

select b.biz_source, count(*) as counts from tb_hanzgs_de

tail a, tb_business_info b where a.id = b.hanzgs_id

and a.status = :1 and a.deal_id = :2 and a.create_date

>= to_date(:3,'yyyy-mm-dd hh24:mi:ss') and a.create_date < to

_date(:4,'yyyy-mm-dd hh24:mi:ss') group by b.biz_source

2.查看执行计划

sql> explain plan for

2 select b.biz_source, count(*) as counts

3 from tb_hanzgs_detail a, tb_business_info b

4 where a.id = b.hanzgs_id

5 and a.status = :1

6 and a.deal_id = :2

7 and a.create_date >= to_date(:3, 'yyyy-mm-dd hh24:mi:ss')

8 and a.create_date < to_date(:4, 'yyyy-mm-dd hh24:mi:ss')

9 group by b.biz_source;

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan hash value: 1387434542

-----------------------------------------------------------------------

-----------------------------------------------------------------------

0 | select statement | | 1 | 31 |215

1 | sort group by | | 1 | 31 |215

2 | filter | | | |

3 | nested loops | | 1 | 31 |199

4 | table access by index rowid| tb_hanzgs_detail | 1| 21 |198

5 | index range scan | ind_tb_hanzgs_create | 231| |397

6 | index range scan | ind_tb_business_info_biz | 1| 10 |1

-----------------------------------------------------------------------

索引定义

create index ind_tb_hanzgs_create on tb_hanzgs_detail (create_date, deal_id,status, id)tablespace tbs_tb_ind online compute statistics;

从执行计划来看，sql走以create_date为开头的索引，而在oltp系统中，查询比较频繁的sql是不适合走时间索引的。

3.查看语句执行时间

sql>select b.biz_source, count(*) as counts

3 from tb_hanzgs_detail a, tb_business_info b

4 where a.id = b.hanzgs_id

5 and a.status = 1

6 and a.deal_id = 0

7 and a.create_date >= to_date(sysdate-10, 'yyyy-mm-dd hh24:mi:ss')

8 and a.create_date < to_date(sysdate-5, 'yyyy-mm-dd hh24:mi:ss')

9 group by b.biz_source;

biz_source counts

---------- ----------

102 712

501 7881

701 1465

3 rows selected.

elapsed: 00:00:17.03

sql> /

biz_source counts

---------- ----------

102 713

501 7882

701 1465

3 rows selected.

elapsed: 00:00:05.32

这个语句查询时间在5.3秒左右，对于查询频繁的oltp系统中，毫无疑问全表扫描的代价是最高的，按时间索引扫描数据效率也是很低的，毕竟一个时间段的数据也是不少的。仔细分析这个sql语句，如果status，rule_id列稀疏读很高的话，这些列建立索引性能是否会有很大的提高呢？

4.查看表数据分布

sql> select status, count(*) as counts

2 from tb_hanzgs_detail

3 where create_date >= to_date(sysdate - 50, 'yyyy-mm-dd hh24:mi:ss')

4 and create_date < to_date(sysdate - 49, 'yyyy-mm-dd hh24:mi:ss') + 1

5 group by status;

status counts

--------- ----------

0 2

1 286

2 3567

3 123477

根据随机抽取几天数据分布结果来看，表中97%以上数据的status都等于3，而status不等于3的数据量很少很少，以status列来建立索引，性能应该会有很大的提高。我们来分析一下status=3的情况，status为3占表中的大部分数据，这种查询会消耗大量资源，甚至是全表扫描，显然不适合放在查询频繁的oltp系统中，DBA也不允许这种sql部署到生产系统上，数据库压力太大。其实还要综合考虑deal_id的数据分布，我就武断的略过了，我的目的是给大家提供一种思路。

5.尝试以status建立新索引

create index ind_tb_hanzgs_de_sta on tb_hanzgs_detail (status, deal_id, create_date,id)tablespace tbs_tb_ind online compute statistics;

新执行计划

sql> explain plan for

2 select b.biz_source, count(*) as counts

3 from tb_hanzgs_detail a, tb_business_info b

4 where a.id = b.hanzgs_id

5 and a.status = :1

6 and a.deal_id = :2

7 and a.create_date >= to_date(:3, 'yyyy-mm-dd hh24:mi:ss')

8 and a.create_date < to_date(:4, 'yyyy-mm-dd hh24:mi:ss') + 1

9 group by b.biz_source;