alert日志報錯信息: Wed Jun 18 09:03:51 2008 原因分析: DBA_2PC_PENDING Oracle會自動處理分布事務(wù),保證分布事務(wù)的一致性,所有站點全部提交或全部回滾。一般情況下,處理過程在很短的時間內(nèi)完成,根本無法察覺到。但是,如果在commit或rollback的時候,出現(xiàn)了連接中斷或某個數(shù)據(jù)庫站點CRASH的情況,則提交操作可能會無法繼續(xù),此時DBA_2PC_PENDING和DBA_2PC_NEIGHBORS中會包含尚未解決的分布事務(wù)。 對于絕大多數(shù)情況,當(dāng)恢復(fù)連接或CRASH的數(shù)據(jù)庫重新啟動后,會自動解決分布式事務(wù),不需要人工干預(yù)。只有分布事務(wù)鎖住的對象急需被訪問,鎖住的回滾段阻止了其他事務(wù)的使用,網(wǎng)絡(luò)故障或CRASH的數(shù)據(jù)庫的恢復(fù)需要很長的時間等情況出現(xiàn)時,才使用人工操作的方式來維護分布式事務(wù)。
SQL> conn /as sysdba LOCAL_TRAN_ID STATE MIX A FAIL_TIME RETRY_TIME SQL> col LOCAL_TRAN_ID for a15 LOCAL_TRAN_ID IN_OUT DATABASE INTERFACE SQL> COMMIT FORCE '21.3.5270155';
SQL> select count(*) from pending_trans$; COUNT(*) SQL> select LOCAL_TRAN_ID,STATUS,STATE,TOP_DB_USER from pending_trans$; LOCAL_TRAN_ID S STATE TOP_DB_USER SQL> COMMIT FORCE '21.3.5270155'; SQL> COMMIT FORCE '21.3.5270155';
*
LOCAL_TRAN_ID
*
提交完成。 SQL> alter session set "_smu_debug_mode" = 4; 會話已更改。 SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('21.3.5270155'); PL/SQL 過程已成功完成。 SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID FROM DBA_2PC_PENDING; 未選定行 SQL> SELECT LOCAL_TRAN_ID, IN_OUT, DATABASE, INTERFACE FROM DBA_2PC_NEIGHBORS; 未選定行 SQL> select LOCAL_TRAN_ID,STATE from dba_2pc_pending; 未選定行 SQL> commit; 提交完成。 SQL> exit
In this Document
--------------------------------------------------------------------------------
Applies to: Symptoms SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.23.2386'); BEGIN DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.23.2386'); END; SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('37.16.108');
If the remote database no longer exists then the transaction will have to be Follow the instructions on how to purge a 1. Identify the id of the transaction: SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID FROM DBA_2PC_PENDING; 2. Purge the transaction: SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(''); 3. Confirm that the transaction has been purged: SQL> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID FROM DBA_2PC_PENDING;
BEGIN DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.23.2386'); END;
Use the following workaround: You have to use local_tran_id..... Issue commit before alter system set "_smu_debug_mode" = 4; Follow the steps, SQL>commit; SQL> alter session set "_smu_debug_mode" = 4; SQL>execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');
When executing the following procedure(dbms_transaction.purge_lost_db_entry) SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('37.16.108'); ==>For example.. Fix: The transaction to be deleted is in the prepared state and has to be either SQL> select LOCAL_TRAN_ID,STATE from dba_2pc_pending; LOCAL_TRAN_ID STATE SQL> rollback force '37.16.108'; ==>For example Rollback complete. SQL> select LOCAL_TRAN_ID,STATE from dba_2pc_pending; LOCAL_TRAN_ID STATE SQL>Commit; SQL>alter system set "_smu_debug_mode" = 4; SQL> exec dbms_transaction.purge_lost_db_entry('37.16.108'); ==>For example References ============ 其它參考: http://blog./u/12960/showart_457785.html 最近數(shù)據(jù)庫突然出現(xiàn)RECO進程不停的報ORA-02068和ORA-03113的錯誤:
Errors in file /oracle/admin/UBISP/bdump/ubisp_reco_23401.trc:
ORA-02068: following severe error from DBNAME
ORA-03113: end-of-file on communication channel 檢查trace文件發(fā)行tran號總是那幾個,執(zhí)行語句:
select * from dba_2pc_pending;
可以看到記錄的所有LOCAL_TRAN_ID與trace中的一樣,這太奇怪了,從現(xiàn)象上看只能問題RECO進程無法清除這些失敗后的事物,這個問題在RAC環(huán)境有可能出現(xiàn),但單節(jié)點上為什么出現(xiàn)這個問題還不清楚。解決方法如下:
1. select * from dba_2pc_pending;
2. alter system disable distributed recovery ; 3. alter session set "_smu_debug_mode" = 4 ;
4. execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('<transaction id>');
5. select * from dba_2pc_pending; 6. alter system enable distributed recovery; 如果有多個事物,需要在第四步后面執(zhí)行commit;
_smu_debug_mode缺省是0,可以在完成之后改回來,查看該隱藏參數(shù)的命令是:
set linesize 132 column name format a30 column value format a25 select x.ksppinm name, y.ksppstvl value, y.ksppstdf isdefault, decode(bitand(y.ksppstvf,7),1,'MODIFIED',4,'SYSTEM_MOD','FALSE') ismod, decode(bitand(y.ksppstvf,2),2,'TRUE','FALSE') isadj from sys.x$ksppi x, sys.x$ksppcv y where x.inst_id = userenv('Instance') and y.inst_id = userenv('Instance') and x.indx = y.indx and x.ksppinm like '%_smu_debug_mode%' order by translate(x.ksppinm, ' _', ' '); 另外: http://www./dodd/2008/02/ora03113.html 大家知道,ORA-03113錯誤是Oracle數(shù)據(jù)庫常見的錯誤,導(dǎo)致這個錯誤的原因比較復(fù)雜,各種各樣的原因??赡苁蔷W(wǎng)絡(luò)中斷引起的、也可能是數(shù)據(jù)庫本身出現(xiàn)了問題。
下面就一個案例,分析一下ORA-03113錯誤。
故障現(xiàn)象:
開始alert文件提示錯誤: Error 1013 trapped in 2PC on transaction 1.60.1257421. Cleaning up. Error stack returned to user: ORA-02050: transaction 1.60.1257421 rolled back, some remote DBs may be in-doubt ORA-01013: user requested cancel of current operation ORA-06553: PLS-103: Encountered the symbol "EXCEPTION" when expecting one of the following: begin case declare exit for function goto if loop mod null package pragma procedure raise return select separate type update while with <an identifier> <a double-quoted delimited-identifier> <a bind variable> << form table call close current define delete fetch lock Mon Feb 18 09:07:19 2008 DISTRIB TRAN SMSBOSS.09aad41c.1.60.1257421 is local tran 1.60.1257421 (hex=01.3c.132fcd) insert pending collecting tran, scn=8914343855672 (hex=81b.884c8638) 然后時不時的會提示下面錯誤: ERROR, tran=1.60.1257421, session#=1, ose=0: ORA-03113: end-of-file on communication channel *** 2008-02-18 09:45:25.919 ERROR, tran=1.60.1257421, session#=1, ose=0: ORA-03113: end-of-file on communication channel *** 2008-02-18 10:19:42.891
Oracle數(shù)據(jù)庫只有這些錯誤提示,其余狀態(tài)均正常。
從錯誤提示看,應(yīng)該是由于分布事務(wù)由于人為cancel中止,引起的事務(wù)失敗,下面查看相關(guān)信息:
SQL> select LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,MIXED,HOST,COMMIT# from dba_2pc_pending;
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX HOST COMMIT# ---------------------- -------------------- ---------------- --- -------------------- ---------------- 1.60.1257421 SMSBOSS.09aad41c.1.60.1257421 collecting no WORKGROUP\LIUQING 8914343855672
SQL> select * from DBA_2PC_NEIGHBORS;
LOCAL_TRAN_ID IN_ DATABASE DBUSER_OWNER I DBID SESS# BRANCH ---------------------- --- --------------- ------------------------------ ----------------- ---------- -------------------- 1.60.1257421 in BOSSMGR N 1 0000
1.60.1257421 out SMSDBN BOSSMGR N cc3ddb9b 1 4
select * from DBA_2PC_PENDING@smsdbn;
no rows selected
select * from DBA_2PC_NEIGHBORS@smsdbn;
no rows selected
dba_2pc_pending視圖記錄等待恢復(fù)的分布式事務(wù)的信息 dba_2pc_neighbors視圖記錄未決的分布式事務(wù)的輸入輸出連接信息
有上述信息分析原因,1.60.1257421事務(wù)的狀態(tài)為collecting,本機數(shù)據(jù)庫數(shù)據(jù)流向為in,遠(yuǎn)端smsdbn數(shù)據(jù)庫流向為out。
詢問業(yè)務(wù)人員,確實運行過這么個一存儲,中途手工中止了。并且是從smsdbn數(shù)據(jù)庫里select數(shù)據(jù)然后update本地數(shù)據(jù)庫。 這基本證實了我們的猜測。
下面嘗試force commit或者 force rollback此事務(wù),
SQL> commit force '1.60.1257421'; commit force '1.60.1257421' * ERROR at line 1: ORA-02058: no prepared transaction found with ID 1.60.1257421
上述錯誤的原因是由于collecting狀態(tài)的事務(wù)不需要commit/rollback force。 我們現(xiàn)在需要做的就是:
(1) Disable分布式恢復(fù) SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY; System altered.
(2)Puege(清空)in-doubt transaction entry:
SQL> exec DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.60.1257421'); PL/SQL procedure successfully completed.
(3)然后enable 分布式恢復(fù): SQL> ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;
參考信息/更多閱讀:
Note:1012842.102 ORA-2019 ORA-2058 ORA-2068 ORA-2050: Failed Distributed Transactions
Note:100664.1 How to Troubleshoot Distributed Transactions
Note:274321.1
Note:126069.1 Manually Resolving In-Doubt Transactions: Different Scenarios |
|