注册 登录  
 加关注
查看详情
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Oracle、MySQL资料及经验

.

 
 
 

日志

 
 

分析函数案例2——数据去重  

2014-04-07 18:51:48|  分类: sql与pl/sql |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

--内容源于梁敬彬的sql优化课

 


DROP TABLE t purge ;
CREATE TABLE t AS SELECT * FROM dba_objects WHERE rownum<=10;
UPDATE t SET object_id=rownum;
UPDATE t SET object_id=3 WHERE object_id<=3;
UPDATE t SET object_id=4 WHERE object_id>=4 AND object_id<=6;
COMMIT;

select object_id,rowid from t;
 OBJECT_ID ROWID
---------- ------------------
         3 AAAR/0AAEAAAB57AAA
         3 AAAR/0AAEAAAB57AAB
         3 AAAR/0AAEAAAB57AAC
         4 AAAR/0AAEAAAB57AAD
         4 AAAR/0AAEAAAB57AAE
         4 AAAR/0AAEAAAB57AAF
         7 AAAR/0AAEAAAB57AAG
         8 AAAR/0AAEAAAB57AAH
         9 AAAR/0AAEAAAB57AAI
        10 AAAR/0AAEAAAB57AAJ

10 rows selected.


--以下例子顺便 开启set autotrace on 来看看分析函数和非分析函数写法的性能差异
方法1:

SET autotrace ON
SET linesize 2000
SET pagesize 2000

非分析函数写法:
delete from t where rowid < (select max(rowid) from t t2 where t.object_id = t2.object_id);

已删除4行。

执行计划
----------------------------------------------------------
Plan hash value: 3804998351

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | DELETE STATEMENT      |         |     1 |    50 |     8  (25)| 00:00:01 |
|   1 |  DELETE               | T       |       |       |            |          |
|*  2 |   HASH JOIN           |         |     1 |    50 |     8  (25)| 00:00:01 |
|   3 |    VIEW               | VW_SQ_1 |     6 |   150 |     4  (25)| 00:00:01 |
|   4 |     SORT GROUP BY     |         |     6 |   150 |     4  (25)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| T       |     6 |   150 |     3   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL  | T       |     6 |   150 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."OBJECT_ID"="ITEM_1")
       filter(ROWID<"MAX(ROWID)")

Note
-----
   - dynamic sampling used for this statement (level=2)


统计信息
----------------------------------------------------------
         50  recursive calls
          6  db block gets
         40  consistent gets
          0  physical reads
          0  redo size
        684  bytes sent via SQL*Net to client
        659  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          4  rows processed
       
ROLLBACK;

--方法2(注意,这里的写法和上面的并不完全等价,上面是随便删除,保留rowid最大一条,下面是保留时间最新的,上面既然可以随便删了,下面的写法肯定也是符合要求的。其

中row_number必须要有order BY 关键字):
select object_id, rowid rid, row_number() over(partition by object_id ORDER by created desc) rn from t;

 OBJECT_ID RID                        RN
---------- ------------------ ----------
         3 AAAR/0AAEAAAB57AAA          1
         3 AAAR/0AAEAAAB57AAB          2
         3 AAAR/0AAEAAAB57AAC          3
         4 AAAR/0AAEAAAB57AAD          1
         4 AAAR/0AAEAAAB57AAE          2
         4 AAAR/0AAEAAAB57AAF          3
         7 AAAR/0AAEAAAB57AAG          1
         8 AAAR/0AAEAAAB57AAH          1
         9 AAAR/0AAEAAAB57AAI          1
        10 AAAR/0AAEAAAB57AAJ          1

select * from (select object_id, rowid rid, row_number() over(partition by object_id order by object_id) rn from t) where rn > 1;

 OBJECT_ID RID                        RN
---------- ------------------ ----------
         3 AAAR/0AAEAAAB57AAB          2
         3 AAAR/0AAEAAAB57AAC          3
         4 AAAR/0AAEAAAB57AAE          2
         4 AAAR/0AAEAAAB57AAF          3


delete t where rowid in
    (select rid from (select rowid rid, row_number() over(partition by object_id ORDER by created desc) rn from t) where rn > 1);
已删除4行。                 
                 

执行计划
----------------------------------------------------------
Plan hash value: 2005107446

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT             |          |     1 |    24 |     6  (34)| 00:00:01 |
|   1 |  DELETE                      | T        |       |       |            |          |
|   2 |   NESTED LOOPS               |          |     1 |    24 |     6  (34)| 00:00:01 |
|   3 |    VIEW                      | VW_NSO_1 |     6 |    72 |     4  (25)| 00:00:01 |
|   4 |     SORT UNIQUE              |          |     1 |   150 |            |          |
|*  5 |      VIEW                    |          |     6 |   150 |     4  (25)| 00:00:01 |
|   6 |       WINDOW SORT            |          |     6 |   204 |     4  (25)| 00:00:01 |
|   7 |        TABLE ACCESS FULL     | T        |     6 |   204 |     3   (0)| 00:00:01 |
|   8 |    TABLE ACCESS BY USER ROWID| T        |     1 |    12 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("RN">1)

Note
-----
   - dynamic sampling used for this statement (level=2)


统计信息
----------------------------------------------------------
         48  recursive calls
          6  db block gets
         38  consistent gets
          0  physical reads
          0  redo size
        684  bytes sent via SQL*Net to client
        813  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
          4  rows processed

 


虽然目前来看,第一种方法效率也还可以,不过如果是多个字段重复,第一种方法就有些麻烦了,而且如果需求真的是要根据日期保留重复记录中最新的一个,也非常的繁琐了。

不过这类要记住,要是表记录非常大,比如有几千万,而重复记录的数理又特别巨大,这时候可以考虑直接把不重复的数据单独建出来,也是一个灵活的好办法。

 


分解过程:

 SELECT object_id,rowid rid,
          row_number() over(partition by object_id ORDER by created desc) rn
  from t;

 OBJECT_ID RID                        RN
---------- ------------------ ----------
         3 AAAbW0AALAABRADAAA          1
         3 AAAbW0AALAABRADAAB          2
         3 AAAbW0AALAABRADAAC          3
         4 AAAbW0AALAABRADAAD          1
         4 AAAbW0AALAABRADAAE          2
         4 AAAbW0AALAABRADAAF          3
         7 AAAbW0AALAABRADAAG          1
         8 AAAbW0AALAABRADAAH          1
         9 AAAbW0AALAABRADAAI          1
        10 AAAbW0AALAABRADAAJ          1

 

SELECT object_id, rid,rn
            FROM (select object_id,rowid rid,
                       row_number() over(partition by object_id ORDER by created desc) rn
                   from t)
      where rn > 1;

 OBJECT_ID RID                        RN
---------- ------------------ ----------
         3 AAAbW0AALAABRADAAB          2
         3 AAAbW0AALAABRADAAC          3
         4 AAAbW0AALAABRADAAE          2
         4 AAAbW0AALAABRADAAF          3
        
        

--然后就可以根据这些RID,删除了,就留下一条了。
        
delete t
 where rowid in (select rid
                   from (select rowid rid,
                                row_number() over(partition by object_id ORDER by created desc) rn
                           from t)
                  where rn > 1);        

  评论这张
 
阅读(232)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018