Discussion:
[Linux-ha-dev] [PATCH] cluster-glue memory leak
Yuichi SEINO
2013-05-07 10:10:15 UTC
Permalink
Hi All,

I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011)

crmd caused a memory leak. And, the memory leak happens in 3 place.
I could fix 1 place. So, I attached a patch.

However, the rest couldn't be not easy to solve. The issues is that
stonith API can't call DelPILPluginUnive function in pils.c. I think
that we need to call DelPILPluginUnive function to completely relese a
memory which stonith_new function got.

I show Valgrind. This is that I can fixed a memory leak.

==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0xA2C2365: external_run_cmd (external.c:767)
==3484== by 0xA2C1AC8: external_getinfo (external.c:598)
==3484== by 0x9EB9B7E: stonith_get_info (stonith.c:327)
==3484== by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
==3484== by 0x373FA3CD54: g_main_loop_run (gmain.c:2799)
==3484== by 0x4055E7: crmd_init (main.c:154)
==3484== by 0x405419: main (main.c:120)

I show the rest.

==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
==3484== by 0x4E67713: InterfaceManager_plugin_init (pils.c:611)
==3484== by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
==3484== by 0x4E672DC: NewPILPluginUniv (pils.c:487)
==3484== by 0x9EB8FE3: init_pluginsys (stonith.c:75)
==3484== by 0x9EB90EC: stonith_new (stonith.c:105)
==3484== by 0x3F51008137: get_stonith_provider (st_client.c:1434)
==3484== by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)

==3484== 13 bytes in 1 blocks are definitely lost in loss record 28 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
==3484== by 0x4E676D2: InterfaceManager_plugin_init (pils.c:606)
==3484== by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
==3484== by 0x4E672DC: NewPILPluginUniv (pils.c:487)
==3484== by 0x9EB8FE3: init_pluginsys (stonith.c:75)
==3484== by 0x9EB90EC: stonith_new (stonith.c:105)
==3484== by 0x3F51008137: get_stonith_provider (st_client.c:1434)
==3484== by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:***@gmail.com
Lars Ellenberg
2013-05-07 15:22:24 UTC
Permalink
Post by Yuichi SEINO
Hi All,
I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011)
crmd caused a memory leak. And, the memory leak happens in 3 place.
I could fix 1 place. So, I attached a patch.
However, the rest couldn't be not easy to solve. The issues is that
stonith API can't call DelPILPluginUnive function in pils.c. I think
that we need to call DelPILPluginUnive function to completely relese a
memory which stonith_new function got.
Is it just that there is this "few bytes" that are allocated once,
and never freed, or is this a "real" memleak,
that is accumulating more and more bytes during process lifetime?

I suspect the former.
In which case I doubt it is even worthwhile to try and fix it.

Why?
because, in that case we basically have:
main()
{
global_variable = malloc(something);
endless_loop_that_is_not_expected_to_ever_return();
/* so, ok, we could free(global_variable) here.
* but why bother? */
exit(1);
}

In that pseudo code above, it is easy to fix.
In the (over-abstracted) case of PILs, I'm afraid, it's not that easy.
And appart from academic correctness,
there is no gain from fixing this for the real world.

-=-

If however we have a *real* memleak, that has to be fixed, of course.

Lars
Post by Yuichi SEINO
I show Valgrind. This is that I can fixed a memory leak.
==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0xA2C2365: external_run_cmd (external.c:767)
==3484== by 0xA2C1AC8: external_getinfo (external.c:598)
==3484== by 0x9EB9B7E: stonith_get_info (stonith.c:327)
==3484== by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
==3484== by 0x373FA3CD54: g_main_loop_run (gmain.c:2799)
==3484== by 0x4055E7: crmd_init (main.c:154)
==3484== by 0x405419: main (main.c:120)
I show the rest.
==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
==3484== by 0x4E67713: InterfaceManager_plugin_init (pils.c:611)
==3484== by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
==3484== by 0x4E672DC: NewPILPluginUniv (pils.c:487)
==3484== by 0x9EB8FE3: init_pluginsys (stonith.c:75)
==3484== by 0x9EB90EC: stonith_new (stonith.c:105)
==3484== by 0x3F51008137: get_stonith_provider (st_client.c:1434)
==3484== by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
==3484== 13 bytes in 1 blocks are definitely lost in loss record 28 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
==3484== by 0x4E676D2: InterfaceManager_plugin_init (pils.c:606)
==3484== by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
==3484== by 0x4E672DC: NewPILPluginUniv (pils.c:487)
==3484== by 0x9EB8FE3: init_pluginsys (stonith.c:75)
==3484== by 0x9EB90EC: stonith_new (stonith.c:105)
==3484== by 0x3F51008137: get_stonith_provider (st_client.c:1434)
==3484== by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
--
Yuichi SEINO
METROSYSTEMS CORPORATION
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
Dejan Muhamedagic
2013-05-07 16:45:08 UTC
Permalink
Hi,
Post by Lars Ellenberg
Post by Yuichi SEINO
Hi All,
I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011)
crmd caused a memory leak. And, the memory leak happens in 3 place.
I could fix 1 place. So, I attached a patch.
However, the rest couldn't be not easy to solve. The issues is that
stonith API can't call DelPILPluginUnive function in pils.c. I think
that we need to call DelPILPluginUnive function to completely relese a
memory which stonith_new function got.
Is it just that there is this "few bytes" that are allocated once,
and never freed, or is this a "real" memleak,
that is accumulating more and more bytes during process lifetime?
I suspect the former.
In which case I doubt it is even worthwhile to try and fix it.
Agreed. Though the first leak is not related to PILS.
Post by Lars Ellenberg
Why?
main()
{
global_variable = malloc(something);
endless_loop_that_is_not_expected_to_ever_return();
/* so, ok, we could free(global_variable) here.
* but why bother? */
exit(1);
}
In that pseudo code above, it is easy to fix.
In the (over-abstracted) case of PILs, I'm afraid, it's not that easy.
And appart from academic correctness,
there is no gain from fixing this for the real world.
-=-
If however we have a *real* memleak, that has to be fixed, of course.
The first one, for which the patch is provided, could be a real
memory leak. I'll apply the patch. Many thanks!

Cheers,

Dejan
Post by Lars Ellenberg
Lars
Post by Yuichi SEINO
I show Valgrind. This is that I can fixed a memory leak.
==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0xA2C2365: external_run_cmd (external.c:767)
==3484== by 0xA2C1AC8: external_getinfo (external.c:598)
==3484== by 0x9EB9B7E: stonith_get_info (stonith.c:327)
==3484== by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
==3484== by 0x373FA3CD54: g_main_loop_run (gmain.c:2799)
==3484== by 0x4055E7: crmd_init (main.c:154)
==3484== by 0x405419: main (main.c:120)
I show the rest.
==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
==3484== by 0x4E67713: InterfaceManager_plugin_init (pils.c:611)
==3484== by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
==3484== by 0x4E672DC: NewPILPluginUniv (pils.c:487)
==3484== by 0x9EB8FE3: init_pluginsys (stonith.c:75)
==3484== by 0x9EB90EC: stonith_new (stonith.c:105)
==3484== by 0x3F51008137: get_stonith_provider (st_client.c:1434)
==3484== by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
==3484== 13 bytes in 1 blocks are definitely lost in loss record 28 of 161
==3484== at 0x4A07A49: malloc (vg_replace_malloc.c:270)
==3484== by 0x373FA417D2: g_malloc (gmem.c:132)
==3484== by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
==3484== by 0x4E676D2: InterfaceManager_plugin_init (pils.c:606)
==3484== by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
==3484== by 0x4E672DC: NewPILPluginUniv (pils.c:487)
==3484== by 0x9EB8FE3: init_pluginsys (stonith.c:75)
==3484== by 0x9EB90EC: stonith_new (stonith.c:105)
==3484== by 0x3F51008137: get_stonith_provider (st_client.c:1434)
==3484== by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
==3484== by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
==3484== by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
==3484== by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
==3484== by 0x41F991: get_rsc_metadata (lrm.c:436)
==3484== by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
==3484== by 0x4201B0: append_restart_list (lrm.c:607)
==3484== by 0x420670: build_operation_update (lrm.c:672)
==3484== by 0x425AE1: do_update_resource (lrm.c:1906)
==3484== by 0x42622E: process_lrm_event (lrm.c:2016)
==3484== by 0x41EE10: lrm_op_callback (lrm.c:242)
==3484== by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
==3484== by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
==3484== by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
==3484== by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
==3484== by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
--
Yuichi SEINO
METROSYSTEMS CORPORATION
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Loading...