Dejan Muhamedagic
2013-03-13 16:18:38 UTC
The attached patch changes the behaviour of the OpenVZ virtual machine
1. The default resource stop timeout is greater than the hardcoded
Just for the record: where is this hardcoded actually? Is it1. The default resource stop timeout is greater than the hardcoded
also documented?
timeout in "vzctl stop" (after this time, vzctl forcibly stops the
virtual machine) (since failure to stop a resource can lead to the
cluster node being evicted from the cluster entirely - and this is
generally a BAD thing).
Agreed.virtual machine) (since failure to stop a resource can lead to the
cluster node being evicted from the cluster entirely - and this is
generally a BAD thing).
2. The start operation now waits for resource startup to complete i.e.
for the VE to "boot up" (so that the cluster manager can detect VEs
which are hanging on startup, and also throttle simultaneous startups,
so as not-to overburden the node in question). Since the start
operation now does a lot more, the default start operation timeout has
been increased.
I'm not sure if we can introduce this just like that. It changesfor the VE to "boot up" (so that the cluster manager can detect VEs
which are hanging on startup, and also throttle simultaneous startups,
so as not-to overburden the node in question). Since the start
operation now does a lot more, the default start operation timeout has
been increased.
significantly the agent's behaviour.
BTW, how does vzctl know when the VE is started?
3. Backs off the default timeouts and intervals for various operations
to less aggressive values.
Please make patches which are self-contained, but can beto less aggressive values.
described in a succinct manner. If the description above matches
the code modifications, then there should be three instead of
one patch.
Please continue the discussion at linux-ha-dev, that's where RA
development discussions take place.
Cheers,
Dejan
Cheers,
Tim.
n.b. There is a bug in the Debian 6.0 (Squeeze) OpenVZ kernel such that
"vzctl start <VEID> --wait" hangs. The bug doesn't impact the
OpenVZ.org kernels (and hence won't impact Debian 7.0 Wheezy either).
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
--- ManageVE.old 2010-10-22 05:54:50.000000000 +0000
+++ ManageVE 2013-03-12 11:39:47.895102380 +0000
@@ -26,12 +26,15 @@
#
#
# Created 07. Sep 2006
-# Updated 18. Sep 2006
+# Updated 12. Mar 2013
#
-# rev. 1.00.3
+# rev. 1.00.4
#
# Changelog
#
+# 12/Mar/13 1.00.4 Wait for VE startup to finish, lengthen default start timeout.
+# Default stop timeout to longer than the vzctl stop 'polite'
+# interval.
# 12/Sep/06 1.00.3 more cleanup
# 12/Sep/06 1.00.2 fixed some logic in start_ve
# general cleanup all over the place
@@ -67,7 +70,7 @@
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="ManageVE">
- <version>1.00.3</version>
+ <version>1.00.4</version>
<longdesc lang="en">
This OCF complaint resource agent manages OpenVZ VEs and thus requires
@@ -87,12 +90,12 @@
</parameters>
<actions>
- <action name="start" timeout="75" />
- <action name="stop" timeout="75" />
- <action name="status" depth="0" timeout="10" interval="10" />
- <action name="monitor" depth="0" timeout="10" interval="10" />
- <action name="validate-all" timeout="5" />
- <action name="meta-data" timeout="5" />
+ <action name="start" timeout="240" />
+ <action name="stop" timeout="150" />
+ <action name="status" depth="0" timeout="20" interval="60" />
+ <action name="monitor" depth="0" timeout="20" interval="60" />
+ <action name="validate-all" timeout="10" />
+ <action name="meta-data" timeout="10" />
</actions>
</resource-agent>
END
@@ -127,7 +130,7 @@
return $retcode
fi
- $VZCTL start $VEID >& /dev/null
+ $VZCTL start $VEID --wait >& /dev/null
retcode=$?
if [[ $retcode != 0 && $retcode != 32 ]]; then
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Tim.
n.b. There is a bug in the Debian 6.0 (Squeeze) OpenVZ kernel such that
"vzctl start <VEID> --wait" hangs. The bug doesn't impact the
OpenVZ.org kernels (and hence won't impact Debian 7.0 Wheezy either).
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
--- ManageVE.old 2010-10-22 05:54:50.000000000 +0000
+++ ManageVE 2013-03-12 11:39:47.895102380 +0000
@@ -26,12 +26,15 @@
#
#
# Created 07. Sep 2006
-# Updated 18. Sep 2006
+# Updated 12. Mar 2013
#
-# rev. 1.00.3
+# rev. 1.00.4
#
# Changelog
#
+# 12/Mar/13 1.00.4 Wait for VE startup to finish, lengthen default start timeout.
+# Default stop timeout to longer than the vzctl stop 'polite'
+# interval.
# 12/Sep/06 1.00.3 more cleanup
# 12/Sep/06 1.00.2 fixed some logic in start_ve
# general cleanup all over the place
@@ -67,7 +70,7 @@
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="ManageVE">
- <version>1.00.3</version>
+ <version>1.00.4</version>
<longdesc lang="en">
This OCF complaint resource agent manages OpenVZ VEs and thus requires
@@ -87,12 +90,12 @@
</parameters>
<actions>
- <action name="start" timeout="75" />
- <action name="stop" timeout="75" />
- <action name="status" depth="0" timeout="10" interval="10" />
- <action name="monitor" depth="0" timeout="10" interval="10" />
- <action name="validate-all" timeout="5" />
- <action name="meta-data" timeout="5" />
+ <action name="start" timeout="240" />
+ <action name="stop" timeout="150" />
+ <action name="status" depth="0" timeout="20" interval="60" />
+ <action name="monitor" depth="0" timeout="20" interval="60" />
+ <action name="validate-all" timeout="10" />
+ <action name="meta-data" timeout="10" />
</actions>
</resource-agent>
END
@@ -127,7 +130,7 @@
return $retcode
fi
- $VZCTL start $VEID >& /dev/null
+ $VZCTL start $VEID --wait >& /dev/null
retcode=$?
if [[ $retcode != 0 && $retcode != 32 ]]; then
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org