Jan 11, 2019

Backtrace from a core dump with GDB: pitfalls

At work I mostly program in Ruby, but our system (YaST) has a C/C++ part
too. So I get to debug C crashes infrequently enough to forget some
tricks. Here I write them down for the next time.

  1. Contents and tl;dr
  • $ ulimit -c unlimited so that a core dump is saved
  • $ sudo coredumpctl dump -o core so that it is saved to a file
  • $ gdb -c core is wrong, gdb program core is right
  • (gdb) info sh mylib
  • (gdb) select a stack frame before querying variables or macros
  • $ CFLAGS="-g3 ..." for debugging macros
  1. (Un)helpful crash dump

When Ruby crashes, it produces a very verbose dump of information. It is not
much helpful in our case:

$ rake run[fail]
/sbin/yast2 src/clients/fail.rb
/usr/lib64/ruby/vendor_ruby/2.5.0/yast/yast.rb:204: [BUG] Segmentation fault at 0x0000000000000008
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-gnu]

-- Control frame information -----------------------------------------------
c:0011 p:---- s:0066 e:000065 CFUNC  :call_yast_function
c:0010 p:0061 s:0058 e:000057 BLOCK  /usr/lib64/ruby/vendor_ruby/2.5.0/yast/yast.rb:204 [FINISH]
c:0009 p:0023 s:0054 e:000053 METHOD /local/home-mvidner/svn/yast/samba-server/src/clients/fail.rb:5
[...]
c:0002 p:0489 s:0011 E:000c18 EVAL   /usr/lib/YaST2/bin/y2start:58 [FINISH]
c:0001 p:0000 s:0003 E:001300 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
/usr/lib/YaST2/bin/y2start:58:in `<main>'
/usr/lib64/ruby/vendor_ruby/2.5.0/yast/wfm.rb:195:in `CallFunction'
[...]
/usr/lib64/ruby/vendor_ruby/2.5.0/yast/yast.rb:204:in `call_yast_function'

-- Machine register context ------------------------------------------------
 RIP: 0x00007fa423bf9677 RBP: 0x00007fff30f5ff20 RSP: 0x00007fff30f5fe10
[...]
 R14: 0x00007fff30f5ff48 R15: 0x00007fff30f5fe10 EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
/usr/lib64/libruby2.5.so.2.5(rb_print_backtrace+0x15) [0x7fa428543f15]
/usr/lib64/libruby2.5.so.2.5(0x1c814c) [0x7fa42854414c]
/usr/lib64/libruby2.5.so.2.5(0x93e44) [0x7fa42840fe44]
/usr/lib64/libruby2.5.so.2.5(0x158eb2) [0x7fa4284d4eb2]
/lib64/libc.so.6(0x7fa427ff8160) [0x7fa427ff8160]
/usr/lib64/YaST2/plugin/libpy2lang0_ruby.so(_ZN5YRuby9callInnerENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES5_7YCPList8constPtrI4TypeS8_E+0x4c7) [0x7fa423bf9677]
[...]
/usr/bin/ruby.ruby2.5(_start+0x2a) [0x40086a]

-- Other runtime information -----------------------------------------------

* Loaded script: /usr/lib/YaST2/bin/y2start

* Loaded features:

    0 enumerator.so
    1 thread.rb
    2 rational.so
    3 complex.so
    4 /usr/lib64/ruby/2.5.0/x86_64-linux-gnu/enc/encdb.so
[...]
   87 /usr/lib64/ruby/vendor_ruby/2.5.0/yast/y2start_helpers.rb
   88 /local/home-mvidner/svn/yast/samba-server/src/modules/Fail.rb

* Process memory map:

00400000-00401000 r-xp 00000000 08:01 7766                               /usr/bin/ruby.ruby2.5
[...]
7fa42363b000-7fa42365c000 r--s 00000000 08:01 6998                       /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
[...]
7fa428a5f000-7fa428a60000 rw-p 00026000 08:01 1310795                    /lib64/ld-2.26.so
7fa428a60000-7fa428a61000 rw-p 00000000 00:00 0 
7fff30767000-7fff30f66000 rw-p 00000000 00:00 0                          [stack]
7fff30ff1000-7fff30ff4000 r--p 00000000 00:00 0                          [vvar]
7fff30ff4000-7fff30ff6000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

/sbin/yast2: line 455: 25102 Aborted                 (core dumped) $ybindir/y2start $module "$@" "$SELECTED_GUI" $Y2_GEOMETRY $Y2UI_ARGS
  1. "(core dumped)" is lying

It means to say: Per /usr/lib/sysctl.d/50-coredump.conf the core has been sent
to /usr/lib/systemd/systemd-coredump which obeyed "ulimit -c 0" and did not
save it.
So let's

  • use the ulimit -c unlimited command (bash shell builtin, actually)
  • Let it crash again
  • sudo coredumpctl dump -o core

Also note that without the dump subcommand, the core dump will be deleted
after a day. (Why?)

(To avoid systemd-coredump, you can use
sudo /usr/sbin/sysctl kernel.core_pattern="core.%p"; thanks lslezak)

  1. gdb -c core is not useful without the executable

When we do that, the backtrace is very unhelpful:

(gdb) bt
#0  0x00007eff7d0140e0 in ?? ()
#1  0x0000000000000400 in ?? ()
#2  0xffffffffffffffff in ?? ()
#3  0xffffffffffffffff in ?? ()
#4  0xffffffffffffffff in ?? ()
#5  0xffffffffffffffff in ?? ()
#6  0xffffffffffffffff in ?? ()
#7  0xffffffffffffffff in ?? ()
#8  0xffffffffffffffff in ?? ()
#9  0xffffffffffffffff in ?? ()
#10 0xffffffffffffffff in ?? ()
#11 0xffffffffffffffff in ?? ()
#12 0xffffffffffffffff in ?? ()
#13 0xffffffffffffffff in ?? ()
#14 0x0000000000000000 in ?? ()

For some reason, gdb will not load the executable file even though it is named
in the core file. Do it by hand:

  1. gdb program core
$ gdb /usr/bin/ruby.ruby2.5 core

[... lots of text that we'll get back to in the next section ...]
(gdb) bt
#0  0x00007eff7d0140e0 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007eff7d01579f in __GI_abort () at abort.c:100
#2  0x00007eff7d42be51 in die () at error.c:552
#3  0x00007eff7d42be51 in rb_bug_context (ctx=ctx@entry=0xcebac0, fmt=fmt@entry=0x7eff7d5925c4 "Segmentation fault at %p") at error.c:582
#4  0x00007eff7d4f0eb2 in sigsegv (sig=11, info=0xcebbf0, ctx=0xcebac0) at signal.c:928
#5  0x00007eff7d014160 in <signal handler called> () at /lib64/libc.so.6
#6  0x00007eff78c15677 in YRuby::callInner(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, YCPList, constPtr<Type, Type>) () at /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
#7  0x00007eff78c19d80 in Y2RubyFunction::evaluateCall() () at /usr/lib64/YaST2/plugin/libpy2lang_ruby.so

Note that the frames 0 to 4 show the source code locations (file name and line
number) as well as the argument names and values. It's because gdb has found
the debugging information for glibc and the Ruby library:

(gdb) info sh libc
From                To                  Syms Read   Shared Object Library
0x00007eff7cffe770  0x00007eff7d14365c  Yes         /lib64/libc.so.6
0x00007eff7c981cd0  0x00007eff7c9895e8  Yes         /lib64/libcrypt.so.1
0x00007eff73c256f0  0x00007eff73c27007  Yes (*)     /usr/lib64/libcap.so.2
0x00007eff6972ca10  0x00007eff69731b65  Yes         /usr/lib64/qt5/plugins/platforminputcontexts/libcomposeplatforminputcontextplugin.so
(*): Shared library is missing debugging information.
(gdb) info sh ruby
From                To                  Syms Read   Shared Object Library
0x00007eff7d3c73c0  0x00007eff7d566ea8  Yes         /usr/lib64/libruby2.5.so.2.5
0x00007eff7c433850  0x00007eff7c4344c0  Yes (*)     /usr/lib64/ruby/2.5.0/x86_64-linux-gnu/enc/encdb.so
[...]
0x00007eff7b9ec2c0  0x00007eff7b9f4cb0  Yes (*)     /usr/lib64/ruby/vendor_ruby/2.5.0/x86_64-linux-gnu/yastx.so
0x00007eff78c0fc00  0x00007eff78c1e740  Yes (*)     /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
0x00007eff789f82a0  0x00007eff78a001a5  Yes (*)     /usr/lib64/ruby/vendor_ruby/2.5.0/x86_64-linux-gnu/yast/builtinx.so
(*): Shared library is missing debugging information.

info sh is short for info sharedlibrary. If you omit the library name filter, the
length of the list will easily obscure the "(*)" footnote.

  1. Add debuginfo

On openSUSE (and SUSE Linux Enterprise), the debugging symbols are in separate
RPM packages:

zypper in yast2-ruby-bindings-debug{info,source}

Or take the hint from the gdb message that we omitted in the previous section:

Missing separate debuginfo for /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
Try: zypper install -C "debuginfo(build-id)=8c6a1d943255c6f935e029140e75b49b1b7c22fb"

After restarting gdb:

(gdb) bt
[...]
#5  0x00007eff7d014160 in <signal handler called> () at /lib64/libc.so.6
#6  0x00007eff78c15677 in rb_array_len (a=8) at /usr/include/ruby-2.5.0/ruby/ruby.h:2057
#7  0x00007eff78c15677 in YRuby::callInner(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, YCPList, constPtr<Type, Type>) (this=this@entry=0xfefcc0, module_name=..., function=..., argList=..., wanted_result_type=...) at /usr/src/debug/yast2-ruby-bindings-4.0.6-lp150.1.1.x86_64/src/binary/YRuby.cc:220
[...]

In frame #6, rb_array_len (a=8), is the a argument equal to nil?
Let's check:

(gdb) p Qnil
No symbol "Qnil" in current context.

That's misleading! Although Qnil is defined in a header, and you've been
taught to prefer static const int to macros because of macros being
"global", they are in fact not as much global as for gdb to see them now. We
have to select a frame to have a context of a compilation unit.

(gdb) f 6
#6  rb_array_len (a=8) at /usr/include/ruby-2.5.0/ruby/ruby.h:2057
2057        return (RBASIC(a)->flags & RARRAY_EMBED_FLAG) ?
(gdb) p Qnil
No symbol "Qnil" in current context.

WTF?! Well, having debugging information loaded is not enough, the debuginfo
must have an elevated level that includes the macros.

  1. Debuginfo with macros

Instead of the -g flag to gcc, I had to specify -g3 in CFLAGS and
rebuild and reinstall. After that, finally:

(gdb) f 6
#6  rb_array_len (a=8) at /usr/include/ruby-2.5.0/ruby/ruby.h:2057
2057        return (RBASIC(a)->flags & RARRAY_EMBED_FLAG) ?
(gdb) p Qnil
$1 = 8
(gdb) f 7
#7  YRuby::callInner (this=this@entry=0xfefcc0, module_name=..., function=..., argList=..., wanted_result_type=...) at /usr/src/debug/yast2-ruby-bindings-4.0.6-lp150.1.1.x86_64/src/binary/YRuby.cc:220
220         VALUE backtrace = RARRAY_LEN(trace)>0 ? rb_ary_entry(trace, 0) : rb_str_new2("Unknown");

So a = 8 and Qnil is 8, so we have confirmed that the trace array being
nil is causing the crash.

(For the record, this is bsc#1119690.)

No comments: