Ticket #118 (new 故障) — at Version 6
jsp编译内存不够问题
| Reported by: | chenchongqi | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | 报价库 | Version: | 报价库5.0 |
| Keywords: | JavaCompileException,Cannot allocate memory | Cc: | |
| Due Date: |
Description (last modified by chenchongqi) (diff)
现象
8.27报价库前台更新jsp的时候,237.45这台虚拟报编译错误:
[08-27 15:27:29.812] com.caucho.java.JavaCompileException: Resin can't execute the compiler `/bin/sh'. This usually means that the compiler is not in the op erating system's PATH or the compiler is incorrectly specified in the configuration. You may need to add the full path to <java compiler='/bin/sh'/>. [08-27 15:27:29.812] [08-27 15:27:29.812] java.io.IOException: Cannot run program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory [08-27 15:27:29.812] at com.caucho.java.ExternalCompiler.executeCompiler(ExternalCompiler.java:435) [08-27 15:27:29.812] at com.caucho.java.ExternalCompiler.compileInt(ExternalCompiler.java:151) [08-27 15:27:29.812] at com.caucho.java.AbstractJavaCompiler.run(AbstractJavaCompiler.java:102) [08-27 15:27:29.812] at java.lang.Thread.run(Thread.java:619)
被监控脚本重启时的系统状态:
system stat:
top - 11:52:19 up 38 days, 5 min, 2 users, load average: 1.23, 1.51, 2.00
Tasks: 84 total, 1 running, 82 sleeping, 1 stopped, 0 zombie
Cpu(s): 14.9%us, 1.2%sy, 0.0%ni, 76.9%id, 1.2%wa, 2.1%hi, 3.6%si, 0.0%st
Mem: 5072544k total, 5025172k used, 47372k free, 5468k buffers
Swap: 2096472k total, 10868k used, 2085604k free, 653936k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19153 root 21 0 4515m 4.0g 9.8m S 99.2 82.9 212:19.36 java
1 root 15 0 10352 580 544 S 0.0 0.0 0:06.58 init
2 root RT -5 0 0 0 S 0.0 0.0 0:40.73 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.54 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 1:05.93 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:00.50 ksoftirqd/1
6 root RT -5 0 0 0 S 0.0 0.0 1:13.14 migration/2
7 root 34 19 0 0 0 S 0.0 0.0 0:00.48 ksoftirqd/2
8 root RT -5 0 0 0 S 0.0 0.0 1:11.65 migration/3
9 root 34 19 0 0 0 S 0.0 0.0 0:00.52 ksoftirqd/3
10 root 10 -5 0 0 0 S 0.0 0.0 0:22.16 events/0
11 root 10 -5 0 0 0 S 0.0 0.0 0:05.40 events/1
12 root 10 -5 0 0 0 S 0.0 0.0 0:05.29 events/2
13 root 10 -5 0 0 0 S 0.0 0.0 0:05.33 events/3
14 root 10 -5 0 0 0 S 0.0 0.0 0:00.28 khelper
63 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
70 root 10 -5 0 0 0 S 0.0 0.0 0:56.25 kblockd/0
71 root 10 -5 0 0 0 S 0.0 0.0 0:13.40 kblockd/1
72 root 10 -5 0 0 0 S 0.0 0.0 0:13.51 kblockd/2
73 root 10 -5 0 0 0 S 0.0 0.0 0:13.39 kblockd/3
74 root 14 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
307 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
308 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/1
309 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/2
310 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/3
313 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
315 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
407 root 10 -5 0 0 0 S 0.0 0.0 30:25.70 kswapd0
408 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
409 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 aio/1
410 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 aio/2
411 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 aio/3
595 root 15 0 132m 3740 1560 S 0.0 0.1 0:43.84 python
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 10868 47364 5468 653936 0 0 130 273 2 2 15 7 77 1 0
1 0 10868 47460 5480 653908 0 0 0 177 987 188 25 1 74 0 0
1 0 10868 46912 5480 653624 0 0 0 0 975 230 24 4 72 0 0
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 77.18 53.45 98.76 3863 549.840 30 66.716 616.556
0.00 100.00 95.12 53.45 99.71 3863 549.840 30 66.716 616.556
0.00 64.81 0.00 54.63 100.00 3865 551.103 31 66.716 617.819
分析
- 老谢之前有过总结: http://bbs.pconline.cn/topic-175.html
- 现象上看是jsp编译的时候,操作系统分配不到内存
- jsp复杂度高导致编译需要的资源多
- 应用的压力大,导致jvm本身占用的内存多,操作系统的内存余量有影响
初步处理
- 237.45增加了1g内存
- 准备dump一次内存看看应用占内存的大头是哪部分,resin是否可以减少内存配置
- 测试环境模拟压力测试,看看能否重现并做动态监控和分析
- 测试环境去掉squid和resin自带缓存,50/s的索引页和100/s的readintf模拟ssi压力测试下16个小时后的系统状态,修改jsp暂时没有出现编译内存问题。线程数保持在100左右,ygc比较频繁基本上一秒一次,之所以还可以比较好的状态估计是因为测试的ssi接口相对线上情况会好很多,不会太多出现超时失败的情形。
top - 08:55:00 up 112 days, 19:28, 1 user, load average: 4.42, 5.13, 5.35 Tasks: 76 total, 3 running, 73 sleeping, 0 stopped, 0 zombie Cpu(s): 68.0%us, 5.6%sy, 0.0%ni, 19.3%id, 0.0%wa, 1.0%hi, 6.2%si, 0.0%st Mem: 4044452k total, 4019004k used, 25448k free, 11736k buffers Swap: 3140696k total, 204052k used, 2936644k free, 698764k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25617 root 20 0 3469m 2.8g 11m S 288.1 73.2 2486:52 java 24687 nobody 15 0 18948 3240 872 R 12.0 0.1 55:54.72 nginx 7561 root 15 0 268m 202m 548 S 10.6 5.1 90:53.67 memcached 24690 nobody 15 0 18948 3304 872 S 5.0 0.1 56:29.73 nginx 24688 nobody 15 0 18920 3220 872 R 2.3 0.1 56:56.46 nginx 24686 nobody 15 0 18796 3160 884 S 0.7 0.1 56:21.20 nginx 25582 root 23 0 348m 56m 9.8m S 0.3 1.4 0:22.67 java 30407 root 15 0 12724 1060 820 R 0.3 0.0 0:00.02 top 1 root 15 0 10348 244 212 S 0.0 0.0 0:01.34 init 2 root RT -5 0 0 0 S 0.0 0.0 0:05.76 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:02.92 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:04.89 migration/1 5 root 34 19 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/1 6 root RT -5 0 0 0 S 0.0 0.0 0:05.21 migration/2 7 root 34 19 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/2 8 root RT -5 0 0 0 S 0.0 0.0 0:05.14 migration/3 9 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/3 10 root 10 -5 0 0 0 S 0.0 0.0 0:54.78 events/0 11 root 10 -5 0 0 0 S 0.0 0.0 0:00.83 events/1 12 root 10 -5 0 0 0 S 0.0 0.0 0:00.69 events/2 13 root 10 -5 0 0 0 S 0.0 0.0 0:00.79 events/3 14 root 10 -5 0 0 0 S 0.0 0.0 0:00.12 khelper 55 root 11 -5 0 0 0 S 0.0 0.0 0:00.01 kthread 62 root 10 -5 0 0 0 S 0.0 0.0 0:20.23 kblockd/0 63 root 10 -5 0 0 0 S 0.0 0.0 0:09.16 kblockd/1 64 root 10 -5 0 0 0 S 0.0 0.0 0:09.61 kblockd/2 65 root 10 -5 0 0 0 S 0.0 0.0 0:08.80 kblockd/3 66 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 294 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0 295 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/1 296 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/2 297 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/3 300 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 khubd 302 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 394 root 10 -5 0 0 0 S 0.0 0.0 3:18.66 kswapd0 395 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 396 root 17 -5 0 0 0 S 0.0 0.0 0:00.00 aio/1 397 root 18 -5 0 0 0 S 0.0 0.0 0:00.00 aio/2 - 测试服务器上dump下来的内存分析:
- 总体内存正常,dump的文件1.5g,实际heap仅180M,剩下的都是可回收性质,跑了十几个小时的情况这样看不存在内存泄漏,主要还是过多的请求压力导致,不过也发现了一个已经没有用的包jeasy分词工具,在里面占了几十M内存,所以需要去掉再测一次。
- 测试环境去掉squid和resin自带缓存,50/s的索引页和100/s的readintf模拟ssi压力测试下16个小时后的系统状态,修改jsp暂时没有出现编译内存问题。线程数保持在100左右,ygc比较频繁基本上一秒一次,之所以还可以比较好的状态估计是因为测试的ssi接口相对线上情况会好很多,不会太多出现超时失败的情形。
长期目标
- 报价应用前台拆分:索引服务 + SSI服务
- 报价服务器拆分:静态文件 + mysql + squid
Change History
Note: See
TracTickets for help on using
tickets.
![(please configure the [header_logo] section in trac.ini)](http://www1.pconline.com.cn/hr/2009/global/images/logo.gif)