[Linux-ha-dev] "monitor_scripts" parameter for the VirtualDomain RA (was Re: [Linux-HA] ocf resource agent for KVM virtual machines)

[moving this discussion over to -dev]

Hi,
thanks. That is what I was looking for. Much better that my script.
Is there a chance to add an option to that RA to link to a separate script
additionally checking the state of the resource. Something like
monitor_script in the xen RA? Should be quite easy with copy and paste
from
that RA.
Thanks.

Having external monitor scripts available sounds like a good idea, however
I'm not quite happy with the current implementation in the Xen RA. I'd
propose the following logic:

- Any script listed in OCF_RESKEY_monitor_scripts must provide
OCF-compliant exit codes by itself (unlike the present implementation in
the Xen RA which just maps any nonzero exit code to $OCF_ERR_GENERIC).
- If OCF_RESKEY_monitor_scripts contains multiple entries, they are
iterated over (just like in the Xen RA).
- The first nonzero exit code encountered from a monitor script stops the
iteration (just like in the Xen RA), and its exit code propagates as the
return value, and hence exit code, of the monitor operation (unlike the
Xen RA).
- The external monitor operation must never time out by itself, it must
keep trying indefinitely until killed by the LRM.

The last one is due to an additional pitfall with respect to implementing
migrate_from/migrate_to (which eventually should work, of course). We can
set start_delay on the monitor op so we make sure we start monitoring only
after the domain has booted completely. So that is fine. But suppose we
have a brief interruption in machine availability during migration. We
can't temporarily disable the external monitor operation then, so we at
least need to make sure that it doesn't time out before the LRM says it
does. I realize no such interruption is supposed to happen during Xen live
migration, but I don't know about KVM, OpenVZ, lxc etc.

WDOT?

Cheers,
Florian

Florian Haas

2008-12-02 16:09:04 UTC

Post by Florian Haas
- The external monitor operation must never time out by itself, it must
keep trying indefinitely until killed by the LRM.

Sorry, that one looks misleading on second read. Should of course be
"...it must keep trying indefinitely until either succeeding, or being
killed by the LRM."

Cheers,
Florian

Lars Marowsky-Bree

2008-12-04 15:06:15 UTC

Post by Florian Haas
- Any script listed in OCF_RESKEY_monitor_scripts must provide
OCF-compliant exit codes by itself (unlike the present implementation in
the Xen RA which just maps any nonzero exit code to $OCF_ERR_GENERIC).

Nope.

The "master" RA is responsible for determining whether or not the
instance is active. The external monitor scripts just get a chance to
fail it, but they can't suddenly claim its not running or anything.

So the external scripts only need "true" or "false".

Post by Florian Haas
- If OCF_RESKEY_monitor_scripts contains multiple entries, they are
iterated over (just like in the Xen RA).
- The first nonzero exit code encountered from a monitor script stops the
iteration (just like in the Xen RA), and its exit code propagates as the
return value, and hence exit code, of the monitor operation (unlike the
Xen RA).
- The external monitor operation must never time out by itself, it must
keep trying indefinitely until killed by the LRM.

The last one is an implementation detail which is left for the external
script to handle.

Post by Florian Haas
The last one is due to an additional pitfall with respect to implementing
migrate_from/migrate_to (which eventually should work, of course). We can
set start_delay on the monitor op so we make sure we start monitoring only
after the domain has booted completely. So that is fine.

start_delay should never be needed. It was one of the biggest mistakes
to add it. I keep thinking about just making it a no-op; anything which
requires it points to a broken RA.

The resource must be fully operational after start (or migrate_from)
have completed. Monitor must immediately be OK.

Post by Florian Haas
But suppose we have a brief interruption in machine availability
during migration. We can't temporarily disable the external monitor
operation then, so we at least need to make sure that it doesn't time
out before the LRM says it does. I realize no such interruption is
supposed to happen during Xen live migration, but I don't know about
KVM, OpenVZ, lxc etc.

If that's what you think you need, have the migrate_from/start op loop
until "monitor" succeeds.

while ! monitor_function ; do sleep 1 ; done

Regards,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

Florian Haas

2008-12-04 16:20:43 UTC

Nope.
The "master" RA is responsible for determining whether or not the
instance is active. The external monitor scripts just get a chance to
fail it, but they can't suddenly claim its not running or anything.
So the external scripts only need "true" or "false".

Fair enough.

The last one is an implementation detail which is left for the external
script to handle.

As is the one with the exit codes which you already rejected.

start_delay should never be needed. It was one of the biggest mistakes
to add it. I keep thinking about just making it a no-op; anything which
requires it points to a broken RA.
The resource must be fully operational after start (or migrate_from)
have completed. Monitor must immediately be OK.

What?

If I'm not mistaken, the purpose of an external monitor_script in
conjunction with a virtual domain would be to do something like ping it,
try to connect to its TCP port 22, connect to its TCP port 445 (for a
virtual Windows box), etc. Any such monitor script only has a chance to
succeed when the virtual domain is fully booted. The start operation
from the VirtualDomain RA (just like that from the Xen RA) returns
immediately after the virtualization management API has determined that
the virtual domain has successfully _started_ its boot process, not
completed it.

What would be your suggestion to determine, from Pacemaker's
perspective, that a virtual domain is fully booted?

If that's what you think you need, have the migrate_from/start op loop
until "monitor" succeeds.
while ! monitor_function ; do sleep 1 ; done

So what is your suggestion?

1. Augment the monitor operation with any external monitor_script and
block any start or migrate_from until monitor succeeds? In that case,
please educate me as to the purpose of monitor timeouts. Or are you
saying one would have to adjust start and migrate_from timeouts accordingly?

2. Ditch any external monitor_script functionality in the VirtualDomain
RA, as it's useless anyway? In that case, please let me know what it's
for in the Xen RA.

Cheers,
Florian

Dejan Muhamedagic

2008-12-05 11:18:59 UTC

Hi,

Nope.
The "master" RA is responsible for determining whether or not the
instance is active. The external monitor scripts just get a chance to
fail it, but they can't suddenly claim its not running or anything.
So the external scripts only need "true" or "false".

Fair enough.

The last one is an implementation detail which is left for the external
script to handle.

As is the one with the exit codes which you already rejected.

It is up to the RA, of course, but the best practice is to let
the upper layers (i.e. lrmd) deal with timeouts. Simply because
that's the place where the user can control the timeouts. In the
most cases, it is very hard for an RA to take into account all
possible configurations and, in particular, all possible loads.

start_delay should never be needed. It was one of the biggest mistakes
to add it. I keep thinking about just making it a no-op; anything which
requires it points to a broken RA.
The resource must be fully operational after start (or migrate_from)
have completed. Monitor must immediately be OK.

What?
If I'm not mistaken, the purpose of an external monitor_script in
conjunction with a virtual domain would be to do something like ping it,
try to connect to its TCP port 22, connect to its TCP port 445 (for a
virtual Windows box), etc. Any such monitor script only has a chance to
succeed when the virtual domain is fully booted. The start operation
from the VirtualDomain RA (just like that from the Xen RA) returns
immediately after the virtualization management API has determined that
the virtual domain has successfully _started_ its boot process, not
completed it.
What would be your suggestion to determine, from Pacemaker's
perspective, that a virtual domain is fully booted?

This is not easy to answer in general. It depends on what the
VM should do, i.e. what kind of service it has to provide. I
agree with Lars that the start action should, once it has
finished, really mean that the resource is fully operational.
After all, there could be another resource waiting to start (i.e.
the order dependency) and this other resource may fail if the
previous one hasn't started. The simplest way for the start
operation to ensure this is to invoke monitor itself.

If that's what you think you need, have the migrate_from/start op loop
until "monitor" succeeds.
while ! monitor_function ; do sleep 1 ; done

The start/migrate_from timeout should be generous for Xen and
such. The monitor timeout may/should be shorter, depending on
your service quality policy. If in doubt, use longer timeouts.

Post by Florian Haas
2. Ditch any external monitor_script functionality in the VirtualDomain
RA, as it's useless anyway? In that case, please let me know what it's
for in the Xen RA.

I guess that this has been implicitly answered above :)

Cheers,

Dejan

Post by Florian Haas
Cheers,
Florian
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Lars Marowsky-Bree

2008-12-05 11:52:32 UTC

Post by Lars Marowsky-Bree
The resource must be fully operational after start (or migrate_from)
have completed. Monitor must immediately be OK.

What?
If I'm not mistaken, the purpose of an external monitor_script in
conjunction with a virtual domain would be to do something like ping it,
try to connect to its TCP port 22, connect to its TCP port 445 (for a
virtual Windows box), etc. Any such monitor script only has a chance to
succeed when the virtual domain is fully booted. The start operation
from the VirtualDomain RA (just like that from the Xen RA) returns
immediately after the virtualization management API has determined that
the virtual domain has successfully _started_ its boot process, not
completed it.
What would be your suggestion to determine, from Pacemaker's
perspective, that a virtual domain is fully booted?

It's not pacemaker's job to determine that. The RA must wait and not
return until this state has been reached.

The ordering dependencies are one extremly good reason, but monitor ops
could also occur "out of the blue" if the user invokes a reprobe
manually or something.

The easiest way is to loop in start until monitor succeeded, if there is
any doubt that the start action has been achieved.

Post by Lars Marowsky-Bree
while ! monitor_function ; do sleep 1 ; done

So what is your suggestion?

Uhm, I think the above line is actually valid shell code ;-)

Post by Florian Haas
1. Augment the monitor operation with any external monitor_script and
block any start or migrate_from until monitor succeeds? In that case,
please educate me as to the purpose of monitor timeouts. Or are you
saying one would have to adjust start and migrate_from timeouts accordingly?

Well, sure. start/migrate_from must cover the full time until the
resource has reached the requested state. Returning earlier is not
allowed, or rather, possibly will cause subtle errors somewhere.

Also consider the UI impact. The GUI would show the resource as "green"
and no longer in transition; still the admin would get a connection
refused; not good.

start means "start the resource and return when it is started, or some
error has occured." It does not mean "trigger the start and return".

Sorry for being pedantic, but its quite important to get the semantics
right - they are not very complicated, but observing them really makes
the cluster more dependable.

Regards,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

Florian Haas

2008-12-05 12:58:50 UTC

Lars, Dejan,