Japanese page

Tips for capturing kernel messages for troubleshooting hangup/slowdown problem.


Index:

(Tips 1) Configure "serial console" or "netconsole" if possible.
(Tips 2) Specify "sysrq_always_enabled" and "ignore_loglevel" kernel command line parameters if possible.
(Tips 3) Specify larger buffer size using "log_buf_len=" kernel command line parameter if possible.
(Tips 4) Configure /proc/sys/kernel/hung_task_* sysctl parameters if possible.
(Tips 5) Try to capture SysRq-m and SysRq-t for multiple times upon hangup or slowdown.
Appendix: How to capture kernel messages via netconsole (for RHEL7/CentOS7 users).

(Tips 1) Configure "serial console" or "netconsole" if possible.

If a problem occurs before network interfaces become ready, you will need to use "serial console". You can add "console=ttyS$number,$speed" kernel command line parameters (e.g. console=ttyS0,115200n8 console=tty0) in order to enable serial consoles. In most cases $number part is 0. The $speed part is determined based on the quality of the console. Watch out for slow/poor consoles where $speed part has to be a small value.

If you use "netconsole", you can check links shown below. Details for RHEL7/CentOS7 users are explained at Appendix of this page.

(Tips 2) Specify "sysrq_always_enabled" and "ignore_loglevel" kernel command line parameters if possible.

"sysrq_always_enabled" option tells the kernel to unconditionally allow console users to use SysRq shortcut keys. For example, issuing SysRq-c from keyboard shortcut (down Alt key, press PrintScreen/SysRq key while Alt key is still down, then press c key while Alt key is still down, up Alt key) is equivalent with executing a command line shown below as the root user.

---------- command line start ----------
# echo c > /proc/sysrq-trigger
---------- command line end ----------

Since SysRq-c can crash the system by triggering a kernel panic, you should not specify "sysrq_always_enabled" option if there is possibility that some untrusted user has physical access to consoles. (But physical access to consoles is almost identical with physical access to the machine. That is, such untrusted user might do more bad things such as shutting down the machine and install a different OS if physical access to the console is permitted. Therefore, in most environments, it would be possible to specify "sysrq_always_enabled" option. ☺ )

"ignore_loglevel" option tells the kernel to send all kernel messages to consoles. Without this option, only kernel messages with loglevel smaller than the loglevel specified by the first value in /proc/sys/kernel/printk will be sent to consoles (i.e. majority of kernel messages won't be sent to consoles by default).

If you are observing a side effect of specifying "ignore_loglevel" option that the bootup procedure becomes too slow to wait because of the kernel messages, you can change loglevel from command line after the bootup procedure completed. For example, you can execute a command line shown below as the root user.

---------- command line start ----------
# echo 9 > /proc/sys/kernel/printk
---------- command line end ----------

(Tips 3) Specify larger buffer size using "log_buf_len=" kernel command line parameter if possible.

When debugging a hangup or slowdown problem, issuing SysRq-w (down Alt key, press PrintScreen/SysRq key while Alt key is still down, then press w key while Alt key is still down, up Alt key) is not always sufficient for figuring out what is happening. Issuing SysRq-t (which shows the state of all threads) and SysRq-m (which shows memory information) would be needed.

Depending on the amount of output generated by SysRq, some of kernel messages might be dropped due to insufficient buffer size for kernel messages. You can increase the amount of kernel buffer size by specifying "log_buf_len=" option (e.g. "log_buf_len=67108864" if allocating 64MB for kernel message buffer).

(Tips 4) Configure /proc/sys/kernel/hung_task_* sysctl parameters if possible.

It is a common mistake that /proc/sys/kernel/hung_task_warnings is already 0 when actual hangup or slowdown problem occurs, for the default value of /proc/sys/kernel/hung_task_warnings is 10 (i.e. only up to 10 warnings are reported by default). In order to observe whether/how situation changes over time, you can set -1 to /proc/sys/kernel/hung_task_warnings . Also, you might want to set smaller value to /proc/sys/kernel/hung_task_timeout_secs because the default value (120 seconds) might be too coarse for observing whether/how situation changes over time.

---------- command line start ----------
# echo -1 > /proc/sys/kernel/hung_task_warnings
# echo 10 > /proc/sys/kernel/hung_task_timeout_secs
---------- command line end ----------

(Tips 5) Try to capture SysRq-m and SysRq-t for multiple times upon hangup or slowdown.

When debugging a hangup or slowdown problem, it is important that whether/how situation changes over time. It is a common error that SysRq is captured for only once, which makes it impossible to determine whether/how situation changes over time.

In many cases, a slowdown problem occurs under memory pressure. Therefore, it is important that memory information is reported by pressing SysRq-m. Also, since memory reclaim activities involve many threads, it is important that all threads are reported by pressing SysRq-t (rather than SysRq-w). Therefore, try to capture SysRq-m and SysRq-t for multiple times, with some interval (e.g. wait for 10 seconds after confirming that SysRq output completed) between each trial. An example command line is shown below, but please note that you likely need to do it using keyboard shortcuts because you unlikely be able to manipulate shell prompts when a slowdown problem occurs.

---------- command line start ----------
# echo m > /proc/sysrq-trigger
# echo t > /proc/sysrq-trigger
# sleep 10
# echo m > /proc/sysrq-trigger
# echo t > /proc/sysrq-trigger
# sleep 10
# echo m > /proc/sysrq-trigger
# echo t > /proc/sysrq-trigger
---------- command line end ----------

Appendix: How to capture kernel messages via netconsole (for RHEL7/CentOS7 users).

(Step 1)

Prepare a Linux machine which serves as the sender side, and a Linux machine or a Windows machine which serves as the receiver side.

(Step 2)

Prepare a program which runs on the receiver side. Since the kernel messages are not in syslog format, we will need subtle configuration changes if we try to let the syslog daemon receive the kernel messages. While there are several programs which can be used for receiving kernel messages (e.g. "nc"), this page introduces "udplogger". The udplogger is a program which saves text messages (not limited to kernel messages) into log file in YYYY-MM-DD.log format with timestamp/source prefixed to each line.

The source code of "udplogger" is available at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/. You can easily compile from source because of little dependency.

---------- command line start ----------
$ wget -O udplogger.c 'https://osdn.net/projects/akari/scm/svn/blobs/head/branches/udplogger/udplogger.c?export=raw'
$ gcc -Wall -o udplogger udplogger.c
---------- command line end ----------

If you want to use CentOS7 as the receiver side, binary RPM packages are available at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/CentOS7/.

File NameFile Size (bytes)SHA256
udplogger-1.2-1.i686.rpm17,3561ffa8d848a3a96802b707a2de8db9a4cf8bcda199a6178ba256a885038f8c746
udplogger-1.2-1.x86_64.rpm17,572901cf131af97aa9fd96672a75600570edf51e07bbe2035cdaee355342e42f6ee

You can build binary RPM packages using steps shown below. You need "wget" for downloading, "rpm-build" and "gcc" for compilation. You can install using "yum" ( yum install wget rpm-build gcc ) if not yet installed.

---------- command line start ----------
$ mkdir udplogger
$ for file in COPYING Makefile README udplogger.c udplogger.spec; do wget -O udplogger/$file 'https://osdn.net/projects/akari/scm/svn/blobs/head/branches/udplogger/'$file'?export=raw'; done
$ tar -zcf udplogger.tar.gz --owner 0 --group 0 udplogger
$ rpmbuild -tb udplogger.tar.gz
---------- command line end ----------

If you built an RPM package, you can install it using steps shown below.

---------- command line start ----------
# rpm -ivh ~/rpmbuild/RPMS/x86_64/udplogger-1.2-1.x86_64.rpm
---------- command line end ----------

If you don't want to install as an RPM package (because e.g. you are not the root user), you need to copy only /usr/bin/udplogger within the RPM file like shown below. How to run /usr/bin/udplogger without root user's privileges is explained later.

---------- command line start ----------
$ rpm2cpio ~/rpmbuild/RPMS/x86_64/udplogger-1.2-1.x86_64.rpm | cpio -ivd ./usr/bin/udplogger
---------- command line end ----------

If you want to use Ubuntu 16.04 as the receiver side, binary DEB packages are available at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/Ubuntu16.04/.

File NameFile Size (bytes)SHA256PGP signature
udplogger_1.2-1_i386.deb8,77692dac4fae63720727320178f228efc83045a78a5f73a62ef6ee1b65476a6034eudplogger_1.2-1_i386.deb.asc
udplogger_1.2-1_amd64.deb8,7642ee6e205fb49fbe4fc2a94bfc299b429e12061eaa967187f54fdf8d9c24e4709udplogger_1.2-1_amd64.deb.asc

You can build binary DEB packages using steps shown below. You need "wget" for downloading, "devscripts" and "debhelper" and "gcc" for compilation. You can install using "apt-get" ( apt-get install wget devscripts debhelper gcc ) if not yet installed.

---------- command line start ----------
$ mkdir udplogger udplogger/debian
$ for file in COPYING Makefile README udplogger.c; do wget -O udplogger/$file 'https://osdn.net/projects/akari/scm/svn/blobs/head/branches/udplogger/'$file'?export=raw'; done
$ for file in changelog compat control copyright rules udplogger.install; do wget -O udplogger/debian/$file 'https://osdn.net/projects/akari/scm/svn/blobs/head/branches/udplogger/debian/'$file'?export=raw'; done
$ cd udplogger
$ chmod +x debian/rules
$ debuild --no-tgz-check -us -uc
$ cd ..
---------- command line end ----------

If you built a DEB package, you can install it using steps shown below.

---------- command line start ----------
# dpkg -i udplogger_1.2-1_amd64.deb
---------- command line end ----------

If you don't want to install as a DEB package (because e.g. you are not the root user), you need to copy only /usr/bin/udplogger within the DEB file like shown below. How to run /usr/bin/udplogger without root user's privileges is explained later.

---------- command line start ----------
$ dpkg-deb --fsys-tarfile udplogger_1.2-1_amd64.deb | tar -xvf - ./usr/bin/udplogger
---------- command line end ----------

If you want to use Windows as the receiver side, the source code and prebuilt binaries of "udplogger" for Windows (x86 and x64) are available at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/WIN32/. You might need to download and install "Microsoft Visual C++ Redistributable for Visual Studio 2017" ( vc_redist.x86.exe or vc_redist.x64.exe ) from https://support.microsoft.com/help/2977003/the-latest-supported-visual-c-downloads.

File NameFile Size (bytes)SHA256PGP signature
udplogger-1.1.zip25,993052282535849c21fc28942fe8dbaa8c428316c24aa2e5935d720dbe930ad74b8udplogger-1.1.zip.asc

(Step 3)

Start the program (i.e. /usr/bin/udplogger) on the receiver side. Since the program runs in the foreground, you can append & in order to run the program in the background.

---------- command line start ----------
# /usr/bin/udplogger dir=/var/log &
Started at 2018-03-02 12:50:19 from /var/log/2018-03-02.log using options ip=:: port=6666 dir=/var/log timeout=10 clients=1024 wbuf=65536 rbuf=16777216 uid=0 gid=0 perm=0600
---------- command line end ----------

If you want to start the udplogger automatically, you can start it from /etc/rc.d/rc.local using a command line shown below.

---------- command line start ----------
# echo '/usr/bin/udplogger dir=/var/log &' >> /etc/rc.d/rc.local
# chmod +x /etc/rc.d/rc.local
---------- command line end ----------

The udplogger by default listens to UDP port 6666. You will need to update firewall configuration of the receiver side if the firewall does not accept incoming UDP messages to port 6666.

---------- command line start ----------
# firewall-cmd --add-port=6666/udp
---------- command line end ----------

This update will be lost when the firewall configuration is reloaded. If you want to make this update permanent, you can also execute a command line shown below.

---------- command line start ----------
# firewall-cmd --permanent --add-port=6666/udp
---------- command line end ----------

If you are the root user but you don't want to run udplogger with root user's privileges, you can specify "uid" and "gid" options when starting the udplogger. But please be sure to specify a directory where the user specified by "uid" and "gid" options can create files and write to the files.

---------- command line start ----------
# /usr/bin/udplogger dir=/var/tmp uid=1000 gid=1000 &
Started at 2018-03-02 12:53:01 from /var/tmp/2018-03-02.log using options ip=:: port=6666 dir=/var/tmp timeout=10 clients=1024 wbuf=65536 rbuf=16777216 uid=1000 gid=1000 perm=0600
---------- command line end ----------

If you are not the root user, you won't be able to specify "uid" or "gid" option, but that should not become a problem.

---------- command line start ----------
$ /home/user1/usr/bin/udplogger dir=/home/user1/logs &
Started at 2018-03-02 12:53:49 from /home/user1/logs/2018-03-02.log using options ip=:: port=6666 dir=/home/user1/logs timeout=10 clients=1024 wbuf=65536 rbuf=425984 uid=1000 gid=1000 perm=0600
---------- command line end ----------

(Step 4)

Send a test message from the sender side. If you are using bash, you can send UDP messages using a syntax shown below.

Specify the IP address of the receiver side to $IP_address part.

Specify the port number of the receiver side to $port_number part (the default for udplogger is 6666).

---------- command line start ----------
$ echo test > /dev/udp/$IP_address/$port_number
---------- command line end ----------

Confirm that messages are delivered using YYYY-MM-DD.log of the receiver side.

---------- command line start ----------
# tail /var/log/2018-03-02.log
---------- command line end ----------

(Step 5)

Open /etc/sysconfig/netconsole on the sender side using a text editor.

Uncomment SYSLOGADDR= line, SYSLOGPORT= line and SYSLOGMACADDR= line.

Specify the IP address of the receiver side to $IP_address part.

Specify the port number of the receiver side to $port_number part.

Specify the MAC address of the gateway to $MAC_address part if the sender side and the receiver side are on a different subnet, specify the MAC address of the receiver side to $MAC_address part otherwise.

---------- example contents of the configuration file start ----------
# This is the configuration file for the netconsole service.  By starting
# this service you allow a remote syslog daemon to record console output
# from this system.

# The local port number that the netconsole module will use
# LOCALPORT=6666

# The ethernet device to send console messages out of (only set this if it
# can't be automatically determined)
# DEV=

# The IP address of the remote syslog server to send messages to
SYSLOGADDR=$IP_address

# The listening port of the remote syslog daemon
SYSLOGPORT=$port_number

# The MAC address of the remote syslog server (only set this if it can't
# be automatically determined)
SYSLOGMACADDR=$MAC_address
---------- example contents of the configuration file end ----------

Save the changes and then quit the text editor.

(Step 6)

Start the netconsole service on the sender side.

---------- command line start ----------
# systemctl start netconsole
---------- command line end ----------

This service will not be restarted when the system is rebooted. If you want to make this service permanent (i.e. start automatically upon boot), you can also execute a command line shown below.

---------- command line start ----------
# chkconfig netconsole on
---------- command line end ----------

(Step 7)

Confirm that messages are delivered via netconsole using YYYY-MM-DD.log of the receiver side. You can execute a command line shown below as the root user on the sender side in order to generate a test kernel message.

---------- command line start ----------
# echo h > /proc/sysrq-trigger
---------- command line end ----------

If messages are not delivered correctly, check the content of /etc/sysconfig/netconsole and restart the netconsole service (i.e. systemctl restart netconsole) and try again.


Last modified: $Date: 2018-03-14 19:36:15 +0900 (Wed, 14 Mar 2018) $