Discussion:
[Linux-ha-dev] [Pacemaker] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE cluster resource)
Dejan Muhamedagic
2013-03-13 16:18:38 UTC
Permalink
The attached patch changes the behaviour of the OpenVZ virtual machine
1. The default resource stop timeout is greater than the hardcoded
Just for the record: where is this hardcoded actually? Is it
also documented?
timeout in "vzctl stop" (after this time, vzctl forcibly stops the
virtual machine) (since failure to stop a resource can lead to the
cluster node being evicted from the cluster entirely - and this is
generally a BAD thing).
Agreed.
2. The start operation now waits for resource startup to complete i.e.
for the VE to "boot up" (so that the cluster manager can detect VEs
which are hanging on startup, and also throttle simultaneous startups,
so as not-to overburden the node in question). Since the start
operation now does a lot more, the default start operation timeout has
been increased.
I'm not sure if we can introduce this just like that. It changes
significantly the agent's behaviour.

BTW, how does vzctl know when the VE is started?
3. Backs off the default timeouts and intervals for various operations
to less aggressive values.
Please make patches which are self-contained, but can be
described in a succinct manner. If the description above matches
the code modifications, then there should be three instead of
one patch.

Please continue the discussion at linux-ha-dev, that's where RA
development discussions take place.

Cheers,

Dejan
Cheers,
Tim.
n.b. There is a bug in the Debian 6.0 (Squeeze) OpenVZ kernel such that
"vzctl start <VEID> --wait" hangs. The bug doesn't impact the
OpenVZ.org kernels (and hence won't impact Debian 7.0 Wheezy either).
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
--- ManageVE.old 2010-10-22 05:54:50.000000000 +0000
+++ ManageVE 2013-03-12 11:39:47.895102380 +0000
@@ -26,12 +26,15 @@
#
#
# Created 07. Sep 2006
-# Updated 18. Sep 2006
+# Updated 12. Mar 2013
#
-# rev. 1.00.3
+# rev. 1.00.4
#
# Changelog
#
+# 12/Mar/13 1.00.4 Wait for VE startup to finish, lengthen default start timeout.
+# Default stop timeout to longer than the vzctl stop 'polite'
+# interval.
# 12/Sep/06 1.00.3 more cleanup
# 12/Sep/06 1.00.2 fixed some logic in start_ve
# general cleanup all over the place
@@ -67,7 +70,7 @@
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="ManageVE">
- <version>1.00.3</version>
+ <version>1.00.4</version>
<longdesc lang="en">
This OCF complaint resource agent manages OpenVZ VEs and thus requires
@@ -87,12 +90,12 @@
</parameters>
<actions>
- <action name="start" timeout="75" />
- <action name="stop" timeout="75" />
- <action name="status" depth="0" timeout="10" interval="10" />
- <action name="monitor" depth="0" timeout="10" interval="10" />
- <action name="validate-all" timeout="5" />
- <action name="meta-data" timeout="5" />
+ <action name="start" timeout="240" />
+ <action name="stop" timeout="150" />
+ <action name="status" depth="0" timeout="20" interval="60" />
+ <action name="monitor" depth="0" timeout="20" interval="60" />
+ <action name="validate-all" timeout="10" />
+ <action name="meta-data" timeout="10" />
</actions>
</resource-agent>
END
@@ -127,7 +130,7 @@
return $retcode
fi
- $VZCTL start $VEID >& /dev/null
+ $VZCTL start $VEID --wait >& /dev/null
retcode=$?
if [[ $retcode != 0 && $retcode != 32 ]]; then
_______________________________________________
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Tim Small
2013-03-21 14:59:17 UTC
Permalink
Post by Dejan Muhamedagic
The attached patch changes the behaviour of the OpenVZ virtual machine
1. The default resource stop timeout is greater than the hardcoded
Just for the record: where is this hardcoded actually? Is it
also documented?
Defined here:

http://git.openvz.org/?p=vzctl;a=blob;f=include/env.h#l26

/** Shutdown timeout.
*/
#define MAX_SHTD_TM 120



Used by env_stop() here:

http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c#l821
<http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c;h=2da848d87904d9e572b7da5c0e7dc5d93217ae5b;hb=HEAD#l818>



for (i = 0; i < MAX_SHTD_TM; i++) {
sleep(1);
if (!vps_is_run(h, veid)) {
ret = 0;
goto out;
}
}

kill_vps:
logger(0, 0, "Killing container ...");



Perhaps something based on wall time would be more consistent, and I can
think of cases where users might want it to be a bit higher, or a bit
lower, but currently it's just fixed at 120s.


I can't find the timeout documented anywhere.
Post by Dejan Muhamedagic
2. The start operation now waits for resource startup to complete i.e.
for the VE to "boot up" (so that the cluster manager can detect VEs
which are hanging on startup, and also throttle simultaneous startups,
so as not-to overburden the node in question). Since the start
operation now does a lot more, the default start operation timeout has
been increased.
I'm not sure if we can introduce this just like that. It changes
significantly the agent's behaviour.
Yes. I think it probably makes the agent's behavour a bit more correct,
but that depends what your definition of a VE resource having "started"
is, I suppose. Currently with this agent the says that it has started
as soon as it has begun the boot process, whereas with the proposed
change, it would mean that it has started when it has booted up (which
should imply "is operational").

Although my personal reason for the change was so that I had a
reasonable way to avoid booting tens of VEs on the host machine at the
same time, I can think of other benefits - such as making other
resources depend on the fully-booted VE, or detecting the case where a
faulty VE host node causes the VE to hang during start-up.


I suppose other options are:

1. Make start --wait the default, but make starting without waiting
selectable using a RA parameter.

2. Make start without waiting the default, but make --wait selectable
using a RA parameter.


I suppose that the change will break configurations where the
administrator has hard coded a short timeout, and this change is
introduced as part of an upgrade, which I suppose is a bad thing...
Post by Dejan Muhamedagic
BTW, how does vzctl know when the VE is started?
The vzctl manual page says that 'vzctl start --wait' will "attempt to
wait till the default runlevel is reached" within the container.
Post by Dejan Muhamedagic
If the description above matches
the code modifications, then there should be three instead of
one patch.
Fair enough - I was being lazy!


Tim.
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
Dejan Muhamedagic
2013-04-03 15:52:11 UTC
Permalink
Hi,
Post by Tim Small
Post by Dejan Muhamedagic
The attached patch changes the behaviour of the OpenVZ virtual machine
1. The default resource stop timeout is greater than the hardcoded
Just for the record: where is this hardcoded actually? Is it
also documented?
http://git.openvz.org/?p=vzctl;a=blob;f=include/env.h#l26
/** Shutdown timeout.
*/
#define MAX_SHTD_TM 120
http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c#l821
<http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c;h=2da848d87904d9e572b7da5c0e7dc5d93217ae5b;hb=HEAD#l818>
for (i = 0; i < MAX_SHTD_TM; i++) {
sleep(1);
if (!vps_is_run(h, veid)) {
ret = 0;
goto out;
}
}
logger(0, 0, "Killing container ...");
Perhaps something based on wall time would be more consistent, and I can
think of cases where users might want it to be a bit higher, or a bit
lower, but currently it's just fixed at 120s.
I can't find the timeout documented anywhere.
That makes it hard to reference in other software products. But
we can anyway increase the advised timeout in the metadata.
Post by Tim Small
Post by Dejan Muhamedagic
2. The start operation now waits for resource startup to complete i.e.
for the VE to "boot up" (so that the cluster manager can detect VEs
which are hanging on startup, and also throttle simultaneous startups,
so as not-to overburden the node in question). Since the start
operation now does a lot more, the default start operation timeout has
been increased.
I'm not sure if we can introduce this just like that. It changes
significantly the agent's behaviour.
Yes. I think it probably makes the agent's behavour a bit more correct,
but that depends what your definition of a VE resource having "started"
is, I suppose. Currently with this agent the says that it has started
as soon as it has begun the boot process, whereas with the proposed
change, it would mean that it has started when it has booted up (which
should imply "is operational").
Although my personal reason for the change was so that I had a
reasonable way to avoid booting tens of VEs on the host machine at the
same time, I can think of other benefits - such as making other
resources depend on the fully-booted VE, or detecting the case where a
faulty VE host node causes the VE to hang during start-up.
1. Make start --wait the default, but make starting without waiting
selectable using a RA parameter.
2. Make start without waiting the default, but make --wait selectable
using a RA parameter.
I suppose that the change will break configurations where the
administrator has hard coded a short timeout, and this change is
introduced as part of an upgrade, which I suppose is a bad thing...
Yes, it could be so. I think that we should go for option 2.
Post by Tim Small
Post by Dejan Muhamedagic
BTW, how does vzctl know when the VE is started?
The vzctl manual page says that 'vzctl start --wait' will "attempt to
wait till the default runlevel is reached" within the container.
OK. Though that may mean different things depending on which
init system is running.
Post by Tim Small
Post by Dejan Muhamedagic
If the description above matches
the code modifications, then there should be three instead of
one patch.
Fair enough - I was being lazy!
:)

Cheers,

Dejan
Post by Tim Small
Tim.
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Tim Small
2013-03-22 13:14:19 UTC
Permalink
Actually, your post brings
up two options I've been wondering about for quite a while and which I
* the --wait flag for vzctl start
* batch-limit
It appears to me that those would solve the issues we've been
experiencing on our clusters. 'crm node online <node>' causes so many
containers to start simultaneously that the IO for the shared NFS where
our CTs are hosted is saturated for quite a while, which in some cases
even lead to some nodes being fenced.
What you are suggesting would drastically mitigate the problems we're
experiencing and you're describing in your post.
OK, that's good.

I suppose the only issue with changing the default to "vzctl start
--wait" is that it might cause problems to people with existing setups,
who then upgrade to a newer version of ManageVE. The possible problem
scenarios would be:

1. A pre-existing start operation timeout is too short, and causes an
error to be logged (I believe that in this case a second attempt to
start the container will be a NOOP, as the monitor operation will
already say that the resource is running in the case that a previous
invocation of the start operation was timed out, and killed by the
cluster daemons). Would be good to test this tho' (I can test with
pacemaker - what else would need testing?).

2. Related to the above, the Debian 6.0 OpenVZ kernels have faulty
--wait support (although OpenVZ upstream doesn't recommend you use these
kernels, and the Debian OpenVZ maintainer has suggested trying the
Debian 7.0 OpenVZ kernels from http://download.openvz.org/debian/ as an
alternative workaround). Still, probably this situation will be
relatively benign, and will only cause the same one-off timeout
behaviour as in 1. above.
Also what you say about the default time for stopping the CT sounds
reasonable. Luckily, we already set a higher timeout in our clusters
(without having known about vzctl's behavior
OK, good to know.
In my opinion, the --wait option would improve the current situation
significantly enough to justify the change in the agent's behavior. Of
course, this needs to be tested first. However, the current RA has flaws
(at least in certain setups like in ours) and I like to help improve it.
OK, good to know there's someone else at least using it...
Are you also running your CTs on an NFS share? I could imagine some
problems we experience might be related to that.
Nope, we're using DRBD as the backing store. Each hardware node has a
single Intel SSD, and pairs of cluster nodes co-host a particular DRBD
resource on top of the SSD - approx 10 VEs per DRBD, and multiple DRBDs
per SSD.

Cheers,


Tim.
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
Loading...