Ticket #118 (closed 故障: fixed)
jsp编译内存不够问题
| Reported by: | chenchongqi | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | 报价库 | Version: | 报价库5.0 |
| Keywords: | JavaCompileException,Cannot allocate memory | Cc: | |
| Due Date: |
Description (last modified by chenchongqi) (diff)
现象
8.27报价库前台更新jsp的时候,237.45这台虚拟报编译错误:
[08-27 15:27:29.812] com.caucho.java.JavaCompileException: Resin can't execute the compiler `/bin/sh'. This usually means that the compiler is not in the op erating system's PATH or the compiler is incorrectly specified in the configuration. You may need to add the full path to <java compiler='/bin/sh'/>. [08-27 15:27:29.812] [08-27 15:27:29.812] java.io.IOException: Cannot run program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory [08-27 15:27:29.812] at com.caucho.java.ExternalCompiler.executeCompiler(ExternalCompiler.java:435) [08-27 15:27:29.812] at com.caucho.java.ExternalCompiler.compileInt(ExternalCompiler.java:151) [08-27 15:27:29.812] at com.caucho.java.AbstractJavaCompiler.run(AbstractJavaCompiler.java:102) [08-27 15:27:29.812] at java.lang.Thread.run(Thread.java:619)
被监控脚本重启时的系统状态:
system stat:
top - 11:52:19 up 38 days, 5 min, 2 users, load average: 1.23, 1.51, 2.00
Tasks: 84 total, 1 running, 82 sleeping, 1 stopped, 0 zombie
Cpu(s): 14.9%us, 1.2%sy, 0.0%ni, 76.9%id, 1.2%wa, 2.1%hi, 3.6%si, 0.0%st
Mem: 5072544k total, 5025172k used, 47372k free, 5468k buffers
Swap: 2096472k total, 10868k used, 2085604k free, 653936k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19153 root 21 0 4515m 4.0g 9.8m S 99.2 82.9 212:19.36 java
1 root 15 0 10352 580 544 S 0.0 0.0 0:06.58 init
2 root RT -5 0 0 0 S 0.0 0.0 0:40.73 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.54 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 1:05.93 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:00.50 ksoftirqd/1
6 root RT -5 0 0 0 S 0.0 0.0 1:13.14 migration/2
7 root 34 19 0 0 0 S 0.0 0.0 0:00.48 ksoftirqd/2
8 root RT -5 0 0 0 S 0.0 0.0 1:11.65 migration/3
9 root 34 19 0 0 0 S 0.0 0.0 0:00.52 ksoftirqd/3
...
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 10868 47364 5468 653936 0 0 130 273 2 2 15 7 77 1 0
1 0 10868 47460 5480 653908 0 0 0 177 987 188 25 1 74 0 0
1 0 10868 46912 5480 653624 0 0 0 0 975 230 24 4 72 0 0
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 77.18 53.45 98.76 3863 549.840 30 66.716 616.556
0.00 100.00 95.12 53.45 99.71 3863 549.840 30 66.716 616.556
0.00 64.81 0.00 54.63 100.00 3865 551.103 31 66.716 617.819
可用内存free + buff + cache 有700M应该是不少了,但是估计编译jsp需要的系统内存cache这部分用不上,那就只有几十M,或者因为虚拟机的资源共享缘故这部分可见内存其实物理上分配不了。
分析
- 老谢之前有过总结: http://bbs.pconline.cn/topic-175.html
- 现象上看是jsp编译的时候,操作系统分配不到内存
- jsp复杂度高导致编译需要的资源多
- 应用的压力大,导致jvm本身占用的内存多,操作系统的内存余量有影响
初步处理
- 237.45增加了1g内存
- 准备dump一次内存看看应用占内存的大头是哪部分,resin是否可以减少内存配置,237.54内存快照分析:
//resin自带缓存部分 One instance of "com.caucho.server.cluster.Server" loaded by "sun.misc.Launcher$AppClassLoader @ 0x2aaabe200800" occupies 95,697,912 (12.75%) bytes. The memory is accumulated in one instance of "com.caucho.util.LruCache" loaded by "sun.misc.Launcher$AppClassLoader @ 0x2aaabe200800". Keywords com.caucho.server.cluster.Server sun.misc.Launcher$AppClassLoader @ 0x2aaabe200800 com.caucho.util.LruCache //这部分下周更新会去掉 One instance of "jeasy.analysis.llIlllIIIlIlllll" loaded by "com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080" occupies 83,716,264 (11.16%) bytes. The memory is accumulated in one instance of "java.util.TreeMap$Entry" loaded by "<system class loader>". Keywords java.util.TreeMap$Entry com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080 jeasy.analysis.llIlllIIIlIlllll //mc客户端的占了一半内存,可以考虑减少一点链接数配置 523 instances of "com.schooner.MemCached.SchoonerSockIOPool$TCPSockIO", loaded by "com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080" occupy 377,951,744 (50.36%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]", loaded by "<system class loader>" Keywords java.util.concurrent.ConcurrentHashMap$Segment[] com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080 com.schooner.MemCached.SchoonerSockIOPool$TCPSockIO
- 测试环境模拟压力测试,看看能否重现并做动态监控和分析
- 测试环境去掉squid和resin自带缓存,50/s的索引页和100/s的readintf模拟ssi压力测试下16个小时后的系统状态,修改jsp暂时没有出现编译内存问题。线程数保持在100左右,ygc比较频繁基本上一秒一次,之所以还可以比较好的状态估计是因为测试的ssi接口相对线上情况会好很多,不会太多出现超时失败的情形。
top - 08:55:00 up 112 days, 19:28, 1 user, load average: 4.42, 5.13, 5.35 Tasks: 76 total, 3 running, 73 sleeping, 0 stopped, 0 zombie Cpu(s): 68.0%us, 5.6%sy, 0.0%ni, 19.3%id, 0.0%wa, 1.0%hi, 6.2%si, 0.0%st Mem: 4044452k total, 4019004k used, 25448k free, 11736k buffers Swap: 3140696k total, 204052k used, 2936644k free, 698764k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25617 root 20 0 3469m 2.8g 11m S 288.1 73.2 2486:52 java 24687 nobody 15 0 18948 3240 872 R 12.0 0.1 55:54.72 nginx 7561 root 15 0 268m 202m 548 S 10.6 5.1 90:53.67 memcached 24690 nobody 15 0 18948 3304 872 S 5.0 0.1 56:29.73 nginx 24688 nobody 15 0 18920 3220 872 R 2.3 0.1 56:56.46 nginx 24686 nobody 15 0 18796 3160 884 S 0.7 0.1 56:21.20 nginx 25582 root 23 0 348m 56m 9.8m S 0.3 1.4 0:22.67 java 30407 root 15 0 12724 1060 820 R 0.3 0.0 0:00.02 top ...
- 测试服务器上dump下来的内存分析:总体内存正常,dump的文件1.5g,实际heap仅180M,剩下的都是可回收性质,跑了十几个小时的情况这样看不存在内存泄漏,主要还是过多的请求压力导致,不过也发现了一个已经没有用的包jeasy分词工具,在里面占了几十M内存,所以需要去掉再测一次。
One instance of "jeasy.analysis.llIlllIIIlIlllll" loaded by "com.caucho.loader.EnvironmentClassLoader @ 0x2aaac06d7fb8" occupies 83,716,264 (43.67%) bytes. The memory is accumulated in one instance of "java.util.TreeMap$Entry" loaded by "<system class loader>". Keywords java.util.TreeMap$Entry jeasy.analysis.llIlllIIIlIlllll com.caucho.loader.EnvironmentClassLoader @ 0x2aaac06d7fb8
- 去掉jeasy包再跑十几个小时后分析的内存,只有90M了,里面都是class loader之类,没有什么内存使用上的问题。
Problem Suspect 1 The classloader/component "com.caucho.loader.SystemClassLoader @ 0x2aaabe227fa8" occupies 17,834,288 (18.20%) bytes. The memory is accumulated in classloader/component "com.caucho.loader.SystemClassLoader @ 0x2aaabe227fa8". Keywords com.caucho.loader.SystemClassLoader @ 0x2aaabe227fa8 Problem Suspect 2 The classloader/component "com.caucho.loader.EnvironmentClassLoader @ 0x2aaabfbb28e8" occupies 11,315,808 (11.55%) bytes. The memory is accumulated in classloader/component "com.caucho.loader.EnvironmentClassLoader @ 0x2aaabfbb28e8". Keywords com.caucho.loader.EnvironmentClassLoader @ 0x2aaabfbb28e8 Problem Suspect 3 3,628 instances of "java.lang.Class", loaded by "<system class loader>" occupy 19,203,272 (19.60%) bytes. Biggest instances: class java.lang.ref.Finalizer @ 0x2aaaae242170 - 7,359,072 (7.51%) bytes. class cn.com.pconline.core.pricefront.service.ProductTypeService @ 0x2aaaafb037d0 - 4,825,184 (4.92%) bytes. class com.caucho.vfs.Path @ 0x2aaaae5daa10 - 3,675,544 (3.75%) bytes. Keywords java.lang.Class
- 测试环境去掉squid和resin自带缓存,50/s的索引页和100/s的readintf模拟ssi压力测试下16个小时后的系统状态,修改jsp暂时没有出现编译内存问题。线程数保持在100左右,ygc比较频繁基本上一秒一次,之所以还可以比较好的状态估计是因为测试的ssi接口相对线上情况会好很多,不会太多出现超时失败的情形。
优化
- 调整虚拟机参数,固定内存分配
- 因为索引页的豆腐块没有走squid缓存,考虑这部分加上mc缓存以减少r系统压力,索引页的豆腐块跟型号没关联,数量少对mc内存需求没压力
- 考虑将终端页大部分时间不敏感的ssi转为静态发布的方式减少压力
- 报价应用前台拆分:索引服务 + SSI服务
- 报价服务器拆分:静态文件 + mysql + squid
Change History
comment:17 Changed 13 years ago by chenchongqi
- Status changed from new to closed
- Resolution set to fixed
Note: See
TracTickets for help on using
tickets.
![(please configure the [header_logo] section in trac.ini)](http://www1.pconline.com.cn/hr/2009/global/images/logo.gif)