Discussion:
[Linux-ha-dev] corosync crash (1.4.3) in memcpy
Ulrich Windl
2013-05-31 07:32:13 UTC
Permalink
Hi!

I just discoverd a 134MB core-dump of corosync on x86_64 (SLES11 SP2). The backtrace looks like this:
Core was generated by `/usr/sbin/corosync'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f1cb4a515a9 in memcpy () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f1cb4a515a9 in memcpy () from /lib64/libc.so.6
#1 0x00007f1cb5370620 in coroipcs_response_send ()
from /usr/lib64/libcoroipcs.so.4
#2 0x00007f1cad3458e5 in ?? () from /usr/lib64/lcrso/service_ckpt.lcrso
#3 0x0000000000406980 in ?? ()
#4 0x00007f1cb5799777 in ?? () from /usr/lib64/libtotem_pg.so.4
#5 0x00007f1cb5791c4d in ?? () from /usr/lib64/libtotem_pg.so.4
#6 0x00007f1cb57961b7 in ?? () from /usr/lib64/libtotem_pg.so.4
#7 0x00007f1cb578db8c in ?? () from /usr/lib64/libtotem_pg.so.4
#8 0x00007f1cb578ea6f in rrp_deliver_fn () from /usr/lib64/libtotem_pg.so.4
#9 0x00007f1cb5788170 in ?? () from /usr/lib64/libtotem_pg.so.4
#10 0x00007f1cb57848d8 in poll_run () from /usr/lib64/libtotem_pg.so.4
#11 0x0000000000407dae in main ()

At a first glance this looks like a NULL-pointer in memcpy()!

[...]
At the time of the problem the node was DC, and the syslog messages around that event were:
May 10 10:54:34 so2 crmd: [12494]: info: crm_update_peer: Node so3: id=553850028 state=lost (new) addr=r(0) ip(172.20.3.33) r(1) ip(192.168.0.63) votes=1 born=4428 seen=4440 proc=
00000000000000000000000000151312
May 10 10:54:34 so2 cib: [12489]: info: ais_dispatch_message: Membership 4444: quorum retained
May 10 10:54:34 so2 cib: [12489]: info: crm_update_peer: Node so3: id=553850028 state=lost (new) addr=r(0) ip(172.20.3.33) r(1) ip(192.168.0.63) votes=1 born=4428 seen=4440 proc=0
0000000000000000000000000151312
May 10 10:54:35 so2 cib: [12489]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/3064, version=0.493.1309): ok (rc=0)
May 10 10:54:35 so2 sbd: [12429]: ERROR: AIS connection terminated - corosync down?
May 10 10:54:35 so2 attrd: [12492]: ERROR: ais_dispatch: Receiving message body failed: (2) Library error: Resource temporarily unavailable (11)
May 10 10:54:35 so2 stonith-ng: [12490]: ERROR: ais_dispatch: Receiving message body failed: (2) Library error: Resource temporarily unavailable (11)
May 10 10:54:35 so2 attrd: [12492]: ERROR: ais_dispatch: AIS connection failed
May 10 10:54:35 so2 stonith-ng: [12490]: ERROR: ais_dispatch: AIS connection failed
May 10 10:54:35 so2 attrd: [12492]: CRIT: attrd_ais_destroy: Lost connection to OpenAIS service!
May 10 10:54:35 so2 stonith-ng: [12490]: ERROR: stonith_peer_ais_destroy: AIS connection terminated
May 10 10:54:35 so2 attrd: [12492]: notice: main: Exiting...
May 10 10:54:35 so2 attrd: [12492]: ERROR: attrd_cib_connection_destroy: Connection to the CIB terminated...
May 10 10:54:35 so2 sbd: [12425]: WARN: Servant for pcmk (pid: 12429) has terminated
May 10 10:54:35 so2 sbd: [15305]: info: Monitoring Pacemaker health
May 10 10:54:36 so2 sbd: [15305]: info: Waiting to sign in with AIS ...
May 10 10:54:37 so2 cib: [12489]: ERROR: send_ais_text: Sending message 1929 via pcmk: FAILED (rc=2): Library error: Connection timed out (110)
May 10 10:54:37 so2 cib: [12489]: WARN: send_ipc_message: IPC Channel to 12490 is not connected
May 10 10:54:37 so2 cib: [12489]: WARN: cib_notify_client: Notification of client 12490/276eb58a-da63-45c6-bb50-ac445b9be077 failed
May 10 10:54:37 so2 cib: [12489]: WARN: send_ipc_message: IPC Channel to 12429 is not connected
May 10 10:54:37 so2 cib: [12489]: WARN: cib_notify_client: Notification of client 12429/03c7daee-597a-4295-a767-daf10218abfc failed
May 10 10:54:37 so2 sbd: [15341]: info: Watchdog enabled.
May 10 10:54:37 so2 sbd: [15341]: info: Setting latency warning to 15
May 10 10:54:37 so2 sbd: [15342]: info: Delivery process handling /dev/disk/by-id/dm-name-Shared-E1_part1
May 10 10:54:37 so2 sbd: [15343]: info: Delivery process handling /dev/disk/by-id/dm-name-Shared-E2_part1
May 10 10:54:37 so2 sbd: [15342]: info: Writing exit to node slot so2
May 10 10:54:37 so2 sbd: [15343]: info: Writing exit to node slot so2
May 10 10:54:37 so2 sbd: [15342]: info: exit successfully delivered to so2
May 10 10:54:37 so2 sbd: [15343]: info: exit successfully delivered to so2
May 10 10:54:37 so2 sbd: [15341]: info: Message successfully delivered.
May 10 10:54:38 so2 sbd: [15305]: info: Waiting to sign in with AIS ...
May 10 10:54:38 so2 sbd: [6643]: info: Received command exit from so2 on disk /dev/disk/by-id/dm-name-Shared-E1_part1
May 10 10:54:38 so2 sbd: [12425]: WARN: Servant for /dev/disk/by-id/dm-name-Shared-E2_part1 (pid: 6549) has terminated
May 10 10:54:38 so2 sbd: [12425]: WARN: Servant for /dev/disk/by-id/dm-name-Shared-E1_part1 (pid: 6643) has terminated
May 10 10:54:38 so2 sbd: [12425]: WARN: Servant for pcmk (pid: 15305) has terminated
May 10 10:54:38 so2 cib: [12489]: ERROR: send_ais_text: Sending message 1930 via pcmk: FAILED (rc=2): Library error: Connection timed out (110)
May 10 10:54:38 so2 crmd: [12494]: info: crmd_ais_dispatch: Setting expected votes to 5
[...]

Maybe this rings a bell for some developer...

Regards,
Ulrich

Loading...