Roly's Blog

Whatever will be, will be, the future's not ours to see.

0%

Linux core dump 分析

有时,直接调试应用程序是不可能的。在这些情况下,可以在应用程序终止时收集有关该应用程序的信息,然后对其进行分析。识别应用程序中崩溃问题的最有效方法之一是通过分析 Core dump文件。

什么是Core dump文件?

Core dump 是操作系统在进程收到某些信号而终止运行时,将此时进程地址空间的内容以及有关进程状态的其他信息写入一个磁盘文件。这种信息可以用于调试。它是应用程序停止工作时应用程序内存的一部分的副本,Linux 系统中以ELF格式存储。它包含应用程序的所有内部变量和堆栈,从而允许检查应用程序的最终状态。在使用可执行文件和调试信息进行扩充时,可以使用调试器以类似于分析正在运行的程序的方式分析 Core dump 文件。

除了整个系统内存或中止的程序的一部分,Core dump 文件包括额外的信息,如:

  • 处理器的状态
  • 处理器寄存器的内容
  • 内存管理信息
  • 程序的计数器和堆栈指针
  • 操作系统和处理器信息和标志

Core dump 也被称为 memory dump, storage dump 或者直接称为 dump.

生成Core dump文件

1.开启core dumps,修改/etc/systemd/system.conf文件,将“DefaultLimitCORE”一行修改为:

1
DefaultLimitCORE=infinity

2.重启系统

1
# shutdown -r now

3.解除core dump文件大小的限制:

1
$ ulimit -c unlimited

使用ulimit -a 查看是否生效,如需回退,设置该值为0而不是unlimited。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 3789
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 65536
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 3789
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited

4.当应用程序崩溃时,将生成一个core dump文件。默认情况下,将在应用程序的工作目录中生成一个名为core的文件, 从Linux 2.6开始,可以通过修改 /proc/sys/kernel/core_pattern文件更改该行为, 可以定义一个用于命名core dump文件的模板。

1
2
$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %d %P %E

但是, 在Ubuntu 20.04系统中, apportubuntu上的crash report服务, 在apport启用的时候,core dump文件在以下路径:

1
/var/lib/apport/coredump 

分析Core dump文件

Core dump是一个磁盘文件,包含进程终止时的内存映像,由Linux内核在处理SIGQUIT、SIGILL、SIGABRT、SIGFPE和SIGSEGV等信号时生成。

例如,当应用程序由于段错误(SIGSEGV)而崩溃时,会生成一个core文件。下面的代码块可以产生栈溢出。

1
$ cat core_dump_test.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void stack_buffer_overflow(){
char *buffer = "buffer";
for(int i = 100;i < 1000;i++)
buffer[i] = 'O';
printf("%s\n", buffer);
}
int main(){
stack_buffer_overflow();
return 0;
}

编译这段代码:

1
$ gcc -g -o core_dump_test core_dump_test.c

运行后,生成了一个core文件, 然后把它拷贝到当前目录,命名为core:

1
2
3
4
5
6
7
$ ./core_dump_test
[1] 1334 segmentation fault (core dumped) ./core_dump_test

$ cd /var/lib/apport/coredump
$ ls
core._home_ubuntu_Desktop_core_test.1000.a0870c1b-c0c8-4a05-b906-e8d6c19038b6.21071.13828388
$ cp core._home_ubuntu_Desktop_core_test.1000.a0870c1b-c0c8-4a05-b906-e8d6c19038b6.21071.13828388 core

要详细分析core文件,使用GDB加载可执行文件和core文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ gdb -c core core_dump_test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from core_dump_test...
[New LWP 1617]
Core was generated by `./core_dump_test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005589d3919176 in stack_buffer_overflow () at core_dump_test.c:8
8 buffer[i] = 'O';

在gdb中,查看backtrace:

1
2
3
4
(gdb) bt
#0 0x00005589d3919176 in stack_buffer_overflow () at core_dump_test.c:8
#1 0x00005589d39191a7 in main () at core_dump_test.c:12
(gdb)

为了更好地查看源代码,我们可以在TUI模式下打开GDB:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ gdb -c core core_dump_test -tui
┌──core_dump_test.c─────────────────────────────────────────────────────────┐
│ 1 #include <stdio.h> │
│ 2 #include <stdlib.h> │
│ 3 #include <string.h> │
│ 4 │
│ 5 void stack_buffer_overflow(){ │
│ 6 char *buffer = "buffer"; │
│ 7 for(int i = 100;i < 1000;i++) │
│ >8 buffer[i] = 'O'; │
│ 9 printf("%s\n", buffer); │
│ 10 } │
│ 11 int main(){ │
│ 12 stack_buffer_overflow(); │
│ 13 return 0; │
│ 14 } │
└───────────────────────────────────────────────────────────────────────────┘
core LWP 1617 In: stack_buffer_overflow L8 PC: 0x5589d3919176
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
--Type <RET> for more, q to quit, c to continue without paging--

Reading symbols from core_dump_test...
[New LWP 1617]
Core was generated by `./core_dump_test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005589d3919176 in stack_buffer_overflow () at core_dump_test.c:8
(gdb)

如果编译时没有添加-g, 调试信息和行号将无法解析:

1
2
3
4
(gdb) bt
#0 0x00005589d3919176 in ?? ()
#1 0x00005589d39191a7 in ?? ()
(gdb)

Reference

https://en.wikipedia.org/wiki/Core_dump

https://man7.org/linux/man-pages/man5/core.5.html

https://www.techopedia.com/definition/16251/core-dump#:~:text=A%20core%20dump%20is%20a,when%20the%20program%20ended%20atypically

https://embeddedbits.org/linux-core-dump-analysis/

https://www.brendangregg.com/blog/2016-08-09/gdb-example-ncurses.html#:~:text=(A%20core%20dump%20is%20a,gdb%20to%20inspect%20the%20issue

https://stackoverflow.com/questions/2065912/core-dumped-but-core-file-is-not-in-the-current-directory

https://askubuntu.com/questions/1349047/where-do-i-find-core-dump-files-and-how-do-i-view-and-analyze-the-backtrace-st

https://wiki.ubuntu.com/Apport