Attila Megyeri
2012-12-20 19:03:32 UTC
hi,
I have a cluster configuration with two IPsrcaddr resources (e.g. IP address "A" and "B")
They are configured to two different addresses, and are never supposed to run on the same nodes. So "A" can run on nodes N1 and N2, "B" can run on N3,N4.
My problem is, that in some cases, crm_mon shows that an ipsrcaddr resource is running on a node where it shouldn't, and of course it is in unmanaged state and cannot be stopped.
For instance:
IP address "A" is started, unamanged on node N3.
I am using pacemaker 1.1.6 on a debian system, with the latest RA from github.
I checked the RA, and here are my findings.
- When status is called, it calls the srca_read() function
- srca_read() returns 2, if a srcip is running on the given node, but with a different IP address.
- srca_status(), when gets "2" from srca_read(), returns "$OCF_ERR_GENERIC"
As a result, in my case IP "B" is running on N3, which is OK, but CRM_mon reports that IP "A" is also running on N3 (unmanaged). [for some reason this is how the OCF_ERR_GENERIC is interpreted]
This is definitively a bug, the question is whether in pacemaker or in the RA.
If I change the script to return "$OCF_NOT_RUNNING" instead of $OCF_ERR_GENERIC" it works properly.
What is the proper behavior in this case?
My recommendation is to fix the RA so that srca_read() returns 1, if there is a srcip on the node, but it is not the queried one.
In this case the RA would return a "$OCF_NOT_RUNNING"
Cheers,
Attila
I have a cluster configuration with two IPsrcaddr resources (e.g. IP address "A" and "B")
They are configured to two different addresses, and are never supposed to run on the same nodes. So "A" can run on nodes N1 and N2, "B" can run on N3,N4.
My problem is, that in some cases, crm_mon shows that an ipsrcaddr resource is running on a node where it shouldn't, and of course it is in unmanaged state and cannot be stopped.
For instance:
IP address "A" is started, unamanged on node N3.
I am using pacemaker 1.1.6 on a debian system, with the latest RA from github.
I checked the RA, and here are my findings.
- When status is called, it calls the srca_read() function
- srca_read() returns 2, if a srcip is running on the given node, but with a different IP address.
- srca_status(), when gets "2" from srca_read(), returns "$OCF_ERR_GENERIC"
As a result, in my case IP "B" is running on N3, which is OK, but CRM_mon reports that IP "A" is also running on N3 (unmanaged). [for some reason this is how the OCF_ERR_GENERIC is interpreted]
This is definitively a bug, the question is whether in pacemaker or in the RA.
If I change the script to return "$OCF_NOT_RUNNING" instead of $OCF_ERR_GENERIC" it works properly.
What is the proper behavior in this case?
My recommendation is to fix the RA so that srca_read() returns 1, if there is a srcip on the node, but it is not the queried one.
In this case the RA would return a "$OCF_NOT_RUNNING"
Cheers,
Attila