We have a weblogic domain in which we have 3 weblogic servers configured in horizontal clustering mode. From last few months Core dumps are getting generated(by throwing below error) @ different times on 3 machines where weblogic managed servers are resided. and the process getting killed after core dump getting generated and node manager is starting the killed process. All this process happening with in 5 minutes.
======================================
# An unexpected error has been detected by Java Runtime Environment:
# SIGSEGV (0xb) at pc=0xfee15cf0, pid=*****, tid=40
# Java VM: Java HotSpot(TM) Server VM (10.0-b19 mixed mode solaris-sparc)
# Problematic frame:
# V [libjvm.so+0x615cf0]
=====================================
By default Coredump files on Solaris are generated in the directory "/var/core/". But we have customized, so that it will get generated in domain home directory.
Then I have started to analyze the core dump using pstack and got the below output for a thread 40.
ff29e8cc _lwp_kill (6, 0, ff317080, ff27de54, ffffffff, 6) + 8
ff212950 abort (2cf80, 1, fee76768, ffba0, ff315518, 0) + 110
fee6d5c4 __1cCosFabort6Fb_v_ (1, ff1014f4, 1, ff0ea000, 174f4, 17400) + 5c
fef66c18 __1cHVMErrorOreport_and_die6M_v_ (ff121ff0, 0, 1, ff0b261f, fef73bd2, 17400) + d1c
fe9a5dc8 JVM_handle_solaris_signal (b, a8b3d8f0, a8b3d638, 8b000, 306400, fee15cf0) + a64
fe9a7f50 __1cNCompileBrokerZinvoke_compiler_on_method6FpnLCompileTask__v_ (1b93ef0, e8400, 50, 0, fe9a7140, 306400
fea24e94 __1cNCompileBrokerUcompiler_thread_loop6F_v_ (0, ff120fe0, 306400, 2ff210, 2c800, 2c800) + 65c
fef16e74 __1cKJavaThreadRthread_main_inner6M_v_ (306400, 3068c0, 28, f, ff0ea000, 0) + 48
--------------------------
Didn't understand exactly generated core using pstack command, So I have tried with "pflags" command, got the below output
core 'core' of 29912: /app/rsf/bea10/jdk160_05/bin/java -Dweblogic.Name=WLSVRname -Dbea.home data model = _ILP32 flags = MSACCT|MSFORK
/40: flags = DETACH
sigmask = 0xfffffeff,0x0000ffff cursig = SIGABRT
/42: flags = DETACH|STOPPED pollsys(0x4,0x0,0xa8a1f948,0x0)
why = PR_SUSPENDED sigmask = 0x00000004,0x00000000
Now I got to know that thread 40 has received the SIGNAL "SIGABRT" (The SIGABRT signal is sent to a process to tell it to abort, i.e. to terminate. The signal is usually initiated by the process itself when it calls abort function, but it can be sent to the process from outside as well as any other signal.) I am not sure why SIGABRT was generated in Thread 40. Is it because of the code which I have mentioned above(thread 40) where analyzed pstack command, I am not sure.
----------------------------------------------------------------
As I didn't find the exact reason with pflags, I used another command "pmap". The "pmap" command display information about the address space of a process, in this case the program that causes the coredump.
Using pmap command, getting the below output, still I am not able to find the RCA.
A76C0000 16K r-x-- /app/rsf/bea10/wlserver_10.3/server/native/solaris/sparc/libnodemanager.so
A76D2000 8K rwx-- /app/rsf/bea10/wlserver_10.3/server/native/solaris/sparc/libnodemanager.so
A7780000 8K r-x-- /app/rsf/bea10/wlserver_10.3/server/native/solaris/sparc/libwlfileio2.so
Still working on the same, Will post soon the root cause of the issue.