Ticket #141 (closed 总结: fixed)

Opened 13 years ago

Last modified 13 years ago

停电造成mongodb损坏故障

Reported by: chenchongqi Owned by:
Priority: major Milestone:
Component: 报价库 Version: 报价库5.0
Keywords: mongodb Cc:
Due Date: 16/09/2013

Description (last modified by chenchongqi) (diff)

由于IDC机房突然断电,恢复供电后检查应用时,发现R系统的硬缓存mongodb有异常,经检查发现集群中三台服务器有两台的数据文件已经损坏,具体表现为手工连接上去后,输入show tables显示有异常无法正常列出集合。

剩下的一台也有一定程度的损坏,具体表现为,数据更新时抛出异常:

[09-16 21:48:22.893] 2013-09-16 21:48:22,881 [pool-1-thread-10] ERROR cn.com.pconline.core.r.MongoRClientHelper - [MongoRClientHelper] update cache:http://
m.pconline.com.cn/yp/company_price_js4.jsp?productID=364301&areaID=100&notjs=true&time=14400 error
[09-16 21:48:22.893] com.mongodb.MongoException: getFile(): bad file number value (corrupt db?): run repair
[09-16 21:48:22.893]    at com.mongodb.CommandResult.getException(CommandResult.java:82)
[09-16 21:48:22.893]    at com.mongodb.CommandResult.throwOnError(CommandResult.java:116)
[09-16 21:48:22.893]    at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:131)
[09-16 21:48:22.893]    at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:153)
[09-16 21:48:22.893]    at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:137)
[09-16 21:48:22.893]    at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:336)
[09-16 21:48:22.893]    at com.mongodb.DBCollection.save(DBCollection.java:630)
[09-16 21:48:22.893]    at com.mongodb.DBCollection.save(DBCollection.java:597)
[09-16 21:48:22.893]    at cn.com.pconline.core.r.RRepository.save(RRepository.java:52)
[09-16 21:48:22.893]    at cn.com.pconline.core.r.MongoRClientHelper.update(MongoRClientHelper.java:60)
[09-16 21:48:22.893]    at cn.pconline.r.client.RClient.doGet(RClient.java:448)
[09-16 21:48:22.893]    at cn.pconline.r.client.RClient$3.call(RClient.java:353)
[09-16 21:48:22.893]    at cn.pconline.r.client.RClient$3.call(RClient.java:350)
[09-16 21:48:22.893]    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[09-16 21:48:22.893]    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
[09-16 21:48:22.893]    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
[09-16 21:48:22.893]    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
[09-16 21:48:22.893]    at java.lang.Thread.run(Thread.java:619)

因为硬缓存使用的是price_front_cache库,于是对这个库做了修复操作,该库有55G大,修复耗时比较长。但是修复操作后应用还是报同样的异常,经查询资料后发现还需要修复admin、local库,于是一一修复,该异常不再出现。

修复后的这台数据库,直接通过虚拟机克隆出另外两台,组建了新的集群并用回原ip提供服务,目前状况正常。

Tips: 直接克隆虚拟机比拷贝数据文件过去重建速度要快,因为拷贝文件还涉及到一些底层数据校验之类的。

Change History

comment:1 Changed 13 years ago by chenchongqi

  • Status changed from new to closed
  • Resolution set to fixed

comment:2 Changed 13 years ago by chenchongqi

  • Version set to 报价库5.0

comment:3 Changed 13 years ago by chenchongqi

  • Description modified (diff)
Note: See TracTickets for help on using tickets.