Ticket #72 (closed 优化: fixed)

Opened 14 years ago

Last modified 14 years ago

增量同步任务耗时优化

Reported by: chenchongqi Owned by:
Priority: major Milestone:
Component: 产品库 Version: 产品库3.0
Keywords: 数据库优化,P_PDLPRODUCTFRONT_ADD Cc:
Due Date: 06/12/2011

Description (last modified by chenchongqi) (diff)

现象

同步任务的存储过程P_PDLPRODUCTFRONT_ADD中,删除图片语句非常慢,大约20秒左右,拖慢整个存储过程。删除语句本身非常简单,但是表数据有450万,没有筛选条件。

DELETE FROM PDL_PRODUCT_PIC A WHERE A.TYPE IN(3,13,5,11,12) AND ID NOT IN(
SELECT MAX(ID) FROM PDL_PRODUCT_PIC B 
WHERE A.PRODUCT_ID = B.PRODUCT_ID AND A.TYPE = B.TYPE)

方案

  • 方案一:增量同步里去掉删除语句,只在全同步中保留,但是了解到如果白天编辑误操作,会导致增量同步后面部分执行失败,因为有可能返回多条记录:
  • 方案二:不删除增量同步里的删除语句,但是缩小范围,因为每15分钟执行一次,那么只要检查15-20分钟内新上传图片对应的产品就行了:
    DELETE FROM PDL_PRODUCT_PIC A WHERE A.TYPE IN(3,13,5,11,12) AND ID NOT IN(
    SELECT MAX(ID) FROM PDL_PRODUCT_PIC B WHERE A.PRODUCT_ID = B.PRODUCT_ID AND A.TYPE = B.TYPE 
    and b.upload_date>sysdate - interval '20' MINUTE)
    

这里有两个地方要特别注意:

  1. A.PRODUCT_ID = B.PRODUCT_ID AND A.TYPE = B.TYPE 这两个条件不能漏否则要删除一大片。
  2. 外面的语句不能加时间限制,否则会漏删,假设原来已经有一条记录在1天前的情况,大家感兴趣的可以自己试试。

结果

优化后的删除语句线上执行显示时间在1秒内,因为原表的upload_date有索引所以能用上,而原语句没条件是全表扫描,这是主要区别。
大家可以注意到,这不是一个纯技术上的优化,而是结合业务特点(有效时间范围,排重)来优化。

Change History

comment:1 Changed 14 years ago by chenchongqi

  • Status changed from new to closed
  • Resolution set to fixed
  • Description modified (diff)

comment:2 Changed 14 years ago by chenchongqi

顺便bs一下达荣:

20分钟前这样写:

SELECT sysdate-(20/(60*24)) into after15Date FROM dual;

其实这样就可以了

SELECT sysdate - interval '20' MINUTE into after15Date FROM dual;

comment:3 follow-up: ↓ 5 Changed 14 years ago by chenyinle

的确是了解业务和熟悉数据才能做的优化,还有,最后一句有个字写错了

comment:4 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:5 in reply to: ↑ 3 Changed 14 years ago by chenchongqi

Replying to chenyinle:

的确是了解业务和熟悉数据才能做的优化,还有,最后一句有个字写错了

改鸟

Note: See TracTickets for help on using tickets.