Overview
This article explains the information required by the Skyvera support team in order to analyze a core dump generated by a process restarting spontaneously.
Solution
You might see a process restarting spontaneously in the syslog file (/var/log/messages) as in the example below.
Aug 21 10:59:02 rtr01 systemd-logind: New session 346158 of user textpass. Aug 21 10:59:02 rtr02 systemd: Started Session 346158 of user textpass.
And this restart could generate a core file. To check if there is a core file related to the restart, you can search for files created around the date and time specified in the previous syslog (e.g. Aug 21 10:59:02), in a different directory for RHEL and Solaris operating systems:
- RHEL
ls -ltrh /var/TKLC/core
- Solaris
ls -ltrh /var/core
Then you will need to provide the following information to the Skyvera support team so that the R&D team can analyze the issue.
The core files are named following the standard syntax
core.<process_name>.<process_id>.gz For example: core.textpass.13714.gz
IMPORTANT: Before raising a ticket with us make sure that the process name shown in the core file name belongs to a Lithium process and not to a different software running in the machine like an antivirus, server monitoring tool, etc.
<supportagent>
Because there are many differences between different customer's Operating System configurations and versions of NewNet Lithium, debugging the core dumps in our test environment is often unreliable or produces much fewer details.
The best method, if possible, is to debug the core dump files in the customer's environment (via VPN or remote session) on the machine where the process crashed. This will show the true errors and back trace. This can save many hours in troubleshooting.
If it is impossible to debug the file in the customer's environment, it is best to get the same OS and same executable version from NewNet Releases. This can take some time, but on difficult tickets with little or no information in the logs, this will save time.
</supportagent>
Core files
For example, in an RHEL server:
$ls -ltrh /var/TKLC/core/ total 1.3G -rw-rw-rw- 1 root root 278M Aug 16 18:51 core.tp_ams.11388.gz -rw-rw-rw- 1 root root 464M Aug 21 10:42 core.textpass.13714.gz -rw-rw-rw- 1 root root 516M Sep 10 01:33 core.tp_hub.11345.gz
You can see a core file around the time when the texpass process restarted in the syslog example.
gdb or pstack/pflags output
You will need to provide the backtrace for all the relevant core files. The backtrace needs to be collected or run from the system where the core file was generated or a system with the exact same component version, and it's different for RHEL and Solaris operating systems.
RHEL
- Change to root user.
- Debug the component, which is the process the core file belongs to. You will be taken to the gdb prompt.
gdb /usr/TextPass/bin/<component>
In the example above, the component is textpass.gdb /usr/TextPass/bin/textpass
- Redirect the gdb output to a file.
(gdb) set logging on (gdb) set logging file <log_file_name>.log
- To get the backtrace you will have to run the following commands on the gdb prompt. Make sure the core file is uncompressed (not .gz).
(gdb) core /var/TKLC/core/<core file name> (gdb) bt (gdb) bt full (gdb) quit
Solaris
- Change to root user.
- Print the stack trace, redirecting the output to a file.
pstack -F <core filename> > <pstack_file_name>.log
- Print the tracing flags, signals, and other status information, redirecting the output to a file.
pflags -r <core filename> > <pflags_file_name>.log
syslog
The full syslog file specifying the date and time when you saw the textpass process restarting.
tp_walkall
The tp_walkall output of the node where the core files occurred.
Priyanka Bhotika
Comments