Ticket #118 (closed 故障: fixed)

Opened 14 years ago

Last modified 13 years ago

jsp编译内存不够问题

Reported by: chenchongqi Owned by:
Priority: major Milestone:
Component: 报价库 Version: 报价库5.0
Keywords: JavaCompileException,Cannot allocate memory Cc:
Due Date:

Description (last modified by chenchongqi) (diff)

现象

8.27报价库前台更新jsp的时候,237.45这台虚拟报编译错误:

[08-27 15:27:29.812] com.caucho.java.JavaCompileException: Resin can't execute the compiler `/bin/sh'.  This usually means that the compiler is not in the op
erating system's PATH or the compiler is incorrectly specified in the configuration.  You may need to add the full path to <java compiler='/bin/sh'/>.
[08-27 15:27:29.812]
[08-27 15:27:29.812] java.io.IOException: Cannot run program "/bin/sh": java.io.IOException: error=12, Cannot allocate memory
[08-27 15:27:29.812]    at com.caucho.java.ExternalCompiler.executeCompiler(ExternalCompiler.java:435)
[08-27 15:27:29.812]    at com.caucho.java.ExternalCompiler.compileInt(ExternalCompiler.java:151)
[08-27 15:27:29.812]    at com.caucho.java.AbstractJavaCompiler.run(AbstractJavaCompiler.java:102)
[08-27 15:27:29.812]    at java.lang.Thread.run(Thread.java:619)

被监控脚本重启时的系统状态:

system stat:
top - 11:52:19 up 38 days, 5 min,  2 users,  load average: 1.23, 1.51, 2.00
Tasks:  84 total,   1 running,  82 sleeping,   1 stopped,   0 zombie
Cpu(s): 14.9%us,  1.2%sy,  0.0%ni, 76.9%id,  1.2%wa,  2.1%hi,  3.6%si,  0.0%st
Mem:   5072544k total,  5025172k used,    47372k free,     5468k buffers
Swap:  2096472k total,    10868k used,  2085604k free,   653936k cached
        
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19153 root      21   0 4515m 4.0g 9.8m S 99.2 82.9 212:19.36 java
    1 root      15   0 10352  580  544 S  0.0  0.0   0:06.58 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:40.73 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.54 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   1:05.93 migration/1
    5 root      34  19     0    0    0 S  0.0  0.0   0:00.50 ksoftirqd/1
    6 root      RT  -5     0    0    0 S  0.0  0.0   1:13.14 migration/2
    7 root      34  19     0    0    0 S  0.0  0.0   0:00.48 ksoftirqd/2
    8 root      RT  -5     0    0    0 S  0.0  0.0   1:11.65 migration/3
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.52 ksoftirqd/3
...
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0  10868  47364   5468 653936    0    0   130   273    2    2 15  7 77  1  0
 1  0  10868  47460   5480 653908    0    0     0   177  987  188 25  1 74  0  0
 1  0  10868  46912   5480 653624    0    0     0     0  975  230 24  4 72  0  0
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00 100.00  77.18  53.45  98.76   3863  549.840    30   66.716  616.556
  0.00 100.00  95.12  53.45  99.71   3863  549.840    30   66.716  616.556
  0.00  64.81   0.00  54.63 100.00   3865  551.103    31   66.716  617.819

可用内存free + buff + cache 有700M应该是不少了,但是估计编译jsp需要的系统内存cache这部分用不上,那就只有几十M,或者因为虚拟机的资源共享缘故这部分可见内存其实物理上分配不了。

分析

  • 老谢之前有过总结: http://bbs.pconline.cn/topic-175.html
  • 现象上看是jsp编译的时候,操作系统分配不到内存
  • jsp复杂度高导致编译需要的资源多
  • 应用的压力大,导致jvm本身占用的内存多,操作系统的内存余量有影响

初步处理

  • 237.45增加了1g内存
  • 准备dump一次内存看看应用占内存的大头是哪部分,resin是否可以减少内存配置,237.54内存快照分析:
    //resin自带缓存部分
    One instance of "com.caucho.server.cluster.Server" loaded by "sun.misc.Launcher$AppClassLoader @ 0x2aaabe200800" occupies 95,697,912 (12.75%) bytes. The memory is accumulated in one instance of "com.caucho.util.LruCache" loaded by "sun.misc.Launcher$AppClassLoader @ 0x2aaabe200800".
    
    Keywords
    com.caucho.server.cluster.Server
    sun.misc.Launcher$AppClassLoader @ 0x2aaabe200800
    com.caucho.util.LruCache
    
    //这部分下周更新会去掉
    One instance of "jeasy.analysis.llIlllIIIlIlllll" loaded by "com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080" occupies 83,716,264 (11.16%) bytes. The memory is accumulated in one instance of "java.util.TreeMap$Entry" loaded by "<system class loader>".
    
    Keywords
    java.util.TreeMap$Entry
    com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080
    jeasy.analysis.llIlllIIIlIlllll
    
    //mc客户端的占了一半内存,可以考虑减少一点链接数配置
    523 instances of "com.schooner.MemCached.SchoonerSockIOPool$TCPSockIO", loaded by "com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080" occupy 377,951,744 (50.36%) bytes. These instances are referenced from one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]", loaded by "<system class loader>"
    
    Keywords
    java.util.concurrent.ConcurrentHashMap$Segment[]
    com.caucho.loader.EnvironmentClassLoader @ 0x2aaadac9f080
    com.schooner.MemCached.SchoonerSockIOPool$TCPSockIO
    
  • 测试环境模拟压力测试,看看能否重现并做动态监控和分析
    • 测试环境去掉squid和resin自带缓存,50/s的索引页和100/s的readintf模拟ssi压力测试下16个小时后的系统状态,修改jsp暂时没有出现编译内存问题。线程数保持在100左右,ygc比较频繁基本上一秒一次,之所以还可以比较好的状态估计是因为测试的ssi接口相对线上情况会好很多,不会太多出现超时失败的情形。
      top - 08:55:00 up 112 days, 19:28,  1 user,  load average: 4.42, 5.13, 5.35
      Tasks:  76 total,   3 running,  73 sleeping,   0 stopped,   0 zombie
      Cpu(s): 68.0%us,  5.6%sy,  0.0%ni, 19.3%id,  0.0%wa,  1.0%hi,  6.2%si,  0.0%st
      Mem:   4044452k total,  4019004k used,    25448k free,    11736k buffers
      Swap:  3140696k total,   204052k used,  2936644k free,   698764k cached
      
        PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                       
      25617 root      20   0 3469m 2.8g  11m S 288.1 73.2   2486:52 java                                                                                         
      24687 nobody    15   0 18948 3240  872 R 12.0  0.1  55:54.72 nginx                                                                                         
       7561 root      15   0  268m 202m  548 S 10.6  5.1  90:53.67 memcached                                                                                     
      24690 nobody    15   0 18948 3304  872 S  5.0  0.1  56:29.73 nginx                                                                                         
      24688 nobody    15   0 18920 3220  872 R  2.3  0.1  56:56.46 nginx                                                                                         
      24686 nobody    15   0 18796 3160  884 S  0.7  0.1  56:21.20 nginx                                                                                         
      25582 root      23   0  348m  56m 9.8m S  0.3  1.4   0:22.67 java                                                                                          
      30407 root      15   0 12724 1060  820 R  0.3  0.0   0:00.02 top                                                                   
      ...
      
    • 测试服务器上dump下来的内存分析:总体内存正常,dump的文件1.5g,实际heap仅180M,剩下的都是可回收性质,跑了十几个小时的情况这样看不存在内存泄漏,主要还是过多的请求压力导致,不过也发现了一个已经没有用的包jeasy分词工具,在里面占了几十M内存,所以需要去掉再测一次。
      One instance of "jeasy.analysis.llIlllIIIlIlllll" loaded by "com.caucho.loader.EnvironmentClassLoader @ 0x2aaac06d7fb8" occupies 83,716,264 (43.67%) bytes. The memory is accumulated in one instance of "java.util.TreeMap$Entry" loaded by "<system class loader>".
      
      Keywords
      java.util.TreeMap$Entry
      jeasy.analysis.llIlllIIIlIlllll
      com.caucho.loader.EnvironmentClassLoader @ 0x2aaac06d7fb8
      
    • 去掉jeasy包再跑十几个小时后分析的内存,只有90M了,里面都是class loader之类,没有什么内存使用上的问题。
        Problem Suspect 1
      The classloader/component "com.caucho.loader.SystemClassLoader @ 0x2aaabe227fa8" occupies 17,834,288 (18.20%) bytes. The memory is accumulated in classloader/component "com.caucho.loader.SystemClassLoader @ 0x2aaabe227fa8".
      
      Keywords
      com.caucho.loader.SystemClassLoader @ 0x2aaabe227fa8
      
        Problem Suspect 2
      The classloader/component "com.caucho.loader.EnvironmentClassLoader @ 0x2aaabfbb28e8" occupies 11,315,808 (11.55%) bytes. The memory is accumulated in classloader/component "com.caucho.loader.EnvironmentClassLoader @ 0x2aaabfbb28e8".
      
      Keywords
      com.caucho.loader.EnvironmentClassLoader @ 0x2aaabfbb28e8
      
      
        Problem Suspect 3
      3,628 instances of "java.lang.Class", loaded by "<system class loader>" occupy 19,203,272 (19.60%) bytes. 
      
      Biggest instances:
      
      class java.lang.ref.Finalizer @ 0x2aaaae242170 - 7,359,072 (7.51%) bytes. 
      class cn.com.pconline.core.pricefront.service.ProductTypeService @ 0x2aaaafb037d0 - 4,825,184 (4.92%) bytes. 
      class com.caucho.vfs.Path @ 0x2aaaae5daa10 - 3,675,544 (3.75%) bytes. 
      
      Keywords
      java.lang.Class
      

优化

  • 调整虚拟机参数,固定内存分配
  • 因为索引页的豆腐块没有走squid缓存,考虑这部分加上mc缓存以减少r系统压力,索引页的豆腐块跟型号没关联,数量少对mc内存需求没压力
  • 考虑将终端页大部分时间不敏感的ssi转为静态发布的方式减少压力
  • 报价应用前台拆分:索引服务 + SSI服务
  • 报价服务器拆分:静态文件 + mysql + squid

Change History

comment:1 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:2 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:3 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:4 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:5 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:6 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:7 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:8 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:9 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:10 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:11 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:12 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:13 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:14 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:15 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:16 Changed 14 years ago by chenchongqi

  • Description modified (diff)

comment:17 Changed 13 years ago by chenchongqi

  • Status changed from new to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.