Ulrich Windl
2013-02-25 14:26:36 UTC
Hello,
I'm wondering about these messages:
Feb 25 14:53:31 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:31 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:31 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
If you look at the first and the last item in the retransmit list, it's obvious that this cannot be a ring buffer (as I was expecting). To me it looks like an implementation error.
Those messages appear and disappear without apparent reason. Maybe the reason is having two independent rings combined with poor logging: Here is how the situation switches:
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
I doubt the network can have that many problems as TOTEM reports:
[...]
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 780
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 780
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 782
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 784
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 784
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 786
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 786
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Marking ringid 1 interface 192.168.0.64 FAULTY
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 788
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 789
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 78c
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Automatically recovered ring 1
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Automatically recovered ring 1
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Automatically recovered ring 1
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79a
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79c
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79c
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79e
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79e
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 7a0
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 7a2
[...]
# grep "Retransmit List" /var/log/messages | wc -l
5504
(All in less than an hour when some nodes booted)
Regards,
Ulrich
I'm wondering about these messages:
Feb 25 14:53:31 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:31 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:31 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:53:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
If you look at the first and the last item in the retransmit list, it's obvious that this cannot be a ring buffer (as I was expecting). To me it looks like an implementation error.
Those messages appear and disappear without apparent reason. Maybe the reason is having two independent rings combined with poor logging: Here is how the situation switches:
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a6 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a5
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 3d 3e 3f 40 41 42 43 44 45 46
Feb 25 14:54:18 so4 corosync[12457]: [TOTEM ] Retransmit List: 1f 20 21 22 23 24 25 26 27 28
I doubt the network can have that many problems as TOTEM reports:
[...]
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 780
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 780
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 782
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 784
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 784
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 786
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 786
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Marking ringid 1 interface 192.168.0.64 FAULTY
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 788
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 789
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 78c
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Automatically recovered ring 1
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Automatically recovered ring 1
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Automatically recovered ring 1
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79a
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79c
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79c
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79e
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 79e
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 7a0
Feb 25 14:54:32 so4 corosync[12457]: [TOTEM ] Retransmit List: 7a2
[...]
# grep "Retransmit List" /var/log/messages | wc -l
5504
(All in less than an hour when some nodes booted)
Regards,
Ulrich