Discussion:
[Linux-ha-dev] ManageVE prints bogus errors to the syslog
Roman Haefeli
2013-03-22 07:41:30 UTC
Permalink
Hi,

When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
get a lot of such messages in the syslog:

Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7

It looks to me as if lrmd is making sure the CT is not running anymore.
However, this triggers ManageVE to print an error.

Since the result in this case is expected, shouldn't ManageVE avoid to
print an error? It looks as something went wrong and also it is caught
every time by our log monitor, although nothing is actually wrong.

Roman
Dejan Muhamedagic
2013-04-03 16:25:58 UTC
Permalink
Hi,
Post by Roman Haefeli
Hi,
When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
It looks to me as if lrmd is making sure the CT is not running anymore.
However, this triggers ManageVE to print an error.
Could be. Looking at the RA, there's a bunch of places where the
status is invoked and where this message could get logged. It
could be improved. The following patch should help:

https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015

Cheers,

Dejan
Post by Roman Haefeli
Since the result in this case is expected, shouldn't ManageVE avoid to
print an error? It looks as something went wrong and also it is caught
every time by our log monitor, although nothing is actually wrong.
Roman
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Lars Ellenberg
2013-04-04 19:28:00 UTC
Permalink
Post by Dejan Muhamedagic
Hi,
Post by Roman Haefeli
Hi,
When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
It looks to me as if lrmd is making sure the CT is not running anymore.
However, this triggers ManageVE to print an error.
Could be. Looking at the RA, there's a bunch of places where the
status is invoked and where this message could get logged. It
https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
BTW, why call `vzctl | awk` *twice*,
just to get two items out of the vzctl output?

how about lose the awk, and the second invokation?
something like this:
(should veexists and vestatus be local as well?)

diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
index 56a3d03..53f9bab 100755
--- a/heartbeat/ManageVE
+++ b/heartbeat/ManageVE
@@ -182,10 +182,12 @@ migrate_from_ve()
status_ve()
{
declare -i retcode
-
- veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
- vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
+ local vzstatus
+ vzstatus=`$VZCTL status $VEID 2>/dev/null`
retcode=$?
+ set -- $vzstatus
+ veexists=$3
+ vestatus=$5

if [[ $retcode != 0 ]]; then
ocf_log err "vzctl status $VEID returned: $retcode"



[ BTW, what's all the "declare -i" doing in there?
"local" would have done nicely. ]
Dejan Muhamedagic
2013-04-05 10:39:46 UTC
Permalink
Hi Lars,
Post by Lars Ellenberg
Post by Dejan Muhamedagic
Hi,
Post by Roman Haefeli
Hi,
When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
It looks to me as if lrmd is making sure the CT is not running anymore.
However, this triggers ManageVE to print an error.
Could be. Looking at the RA, there's a bunch of places where the
status is invoked and where this message could get logged. It
https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
BTW, why call `vzctl | awk` *twice*,
just to get two items out of the vzctl output?
how about lose the awk, and the second invokation?
(should veexists and vestatus be local as well?)
diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
index 56a3d03..53f9bab 100755
--- a/heartbeat/ManageVE
+++ b/heartbeat/ManageVE
@@ -182,10 +182,12 @@ migrate_from_ve()
status_ve()
{
declare -i retcode
-
- veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
- vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
+ local vzstatus
+ vzstatus=`$VZCTL status $VEID 2>/dev/null`
retcode=$?
+ set -- $vzstatus
+ veexists=$3
+ vestatus=$5
if [[ $retcode != 0 ]]; then
ocf_log err "vzctl status $VEID returned: $retcode"
Well, you do have commit rights, don't you? :)
Post by Lars Ellenberg
[ BTW, what's all the "declare -i" doing in there?
"local" would have done nicely. ]
No idea. But since the RA is /bin/bash I guess that it doesn't
matter.

Cheers,

Dejan
Post by Lars Ellenberg
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Lars Ellenberg
2013-04-09 22:23:38 UTC
Permalink
Post by Dejan Muhamedagic
Hi Lars,
Post by Lars Ellenberg
Post by Dejan Muhamedagic
Hi,
Post by Roman Haefeli
Hi,
When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
It looks to me as if lrmd is making sure the CT is not running anymore.
However, this triggers ManageVE to print an error.
Could be. Looking at the RA, there's a bunch of places where the
status is invoked and where this message could get logged. It
https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
BTW, why call `vzctl | awk` *twice*,
just to get two items out of the vzctl output?
how about lose the awk, and the second invokation?
(should veexists and vestatus be local as well?)
diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
index 56a3d03..53f9bab 100755
--- a/heartbeat/ManageVE
+++ b/heartbeat/ManageVE
@@ -182,10 +182,12 @@ migrate_from_ve()
status_ve()
{
declare -i retcode
-
- veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
- vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
+ local vzstatus
+ vzstatus=`$VZCTL status $VEID 2>/dev/null`
retcode=$?
+ set -- $vzstatus
+ veexists=$3
+ vestatus=$5
if [[ $retcode != 0 ]]; then
ocf_log err "vzctl status $VEID returned: $retcode"
Well, you do have commit rights, don't you? :)
Sure, but I don't have a vz handy to test even "obviously correct"
patches with, before I commit...


Lars
Dejan Muhamedagic
2013-04-11 06:26:21 UTC
Permalink
Post by Lars Ellenberg
Post by Dejan Muhamedagic
Hi Lars,
Post by Lars Ellenberg
Post by Dejan Muhamedagic
Hi,
Post by Roman Haefeli
Hi,
When stopping a node of our cluster managing a bunch of OpenVZ CTs, I
Mar 20 17:20:44 localhost ManageVE[2586]: ERROR: vzctl status 10002 returned: 10002 does not exist.
Mar 20 17:20:44 localhost lrmd: [2547]: info: operation monitor[6] on opensim for client 2550: pid 2586 exited with return code 7
It looks to me as if lrmd is making sure the CT is not running anymore.
However, this triggers ManageVE to print an error.
Could be. Looking at the RA, there's a bunch of places where the
status is invoked and where this message could get logged. It
https://github.com/ClusterLabs/resource-agents/commit/ca987afd35226145f48fb31bef911aa3ed3b6015
BTW, why call `vzctl | awk` *twice*,
just to get two items out of the vzctl output?
how about lose the awk, and the second invokation?
(should veexists and vestatus be local as well?)
diff --git a/heartbeat/ManageVE b/heartbeat/ManageVE
index 56a3d03..53f9bab 100755
--- a/heartbeat/ManageVE
+++ b/heartbeat/ManageVE
@@ -182,10 +182,12 @@ migrate_from_ve()
status_ve()
{
declare -i retcode
-
- veexists=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $3}'`
- vestatus=`$VZCTL status $VEID 2>/dev/null | $AWK '{print $5}'`
+ local vzstatus
+ vzstatus=`$VZCTL status $VEID 2>/dev/null`
retcode=$?
+ set -- $vzstatus
+ veexists=$3
+ vestatus=$5
if [[ $retcode != 0 ]]; then
ocf_log err "vzctl status $VEID returned: $retcode"
Well, you do have commit rights, don't you? :)
Sure, but I don't have a vz handy to test even "obviously correct"
patches with, before I commit...
Looked correct to me too, but then it wouldn't have been the
first time I got something wrong :D

Maybe the reporter can help with testing. Roman?

Cheers,

Dejan
Post by Lars Ellenberg
Lars
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Loading...