Florian Haas
2008-12-02 16:06:46 UTC
[moving this discussion over to -dev]
I'm not quite happy with the current implementation in the Xen RA. I'd
propose the following logic:
- Any script listed in OCF_RESKEY_monitor_scripts must provide
OCF-compliant exit codes by itself (unlike the present implementation in
the Xen RA which just maps any nonzero exit code to $OCF_ERR_GENERIC).
- If OCF_RESKEY_monitor_scripts contains multiple entries, they are
iterated over (just like in the Xen RA).
- The first nonzero exit code encountered from a monitor script stops the
iteration (just like in the Xen RA), and its exit code propagates as the
return value, and hence exit code, of the monitor operation (unlike the
Xen RA).
- The external monitor operation must never time out by itself, it must
keep trying indefinitely until killed by the LRM.
The last one is due to an additional pitfall with respect to implementing
migrate_from/migrate_to (which eventually should work, of course). We can
set start_delay on the monitor op so we make sure we start monitoring only
after the domain has booted completely. So that is fine. But suppose we
have a brief interruption in machine availability during migration. We
can't temporarily disable the external monitor operation then, so we at
least need to make sure that it doesn't time out before the LRM says it
does. I realize no such interruption is supposed to happen during Xen live
migration, but I don't know about KVM, OpenVZ, lxc etc.
WDOT?
Cheers,
Florian
Hi,
thanks. That is what I was looking for. Much better that my script.
Is there a chance to add an option to that RA to link to a separate script
additionally checking the state of the resource. Something like
monitor_script in the xen RA? Should be quite easy with copy and paste
from
that RA.
Thanks.
Having external monitor scripts available sounds like a good idea, howeverthanks. That is what I was looking for. Much better that my script.
Is there a chance to add an option to that RA to link to a separate script
additionally checking the state of the resource. Something like
monitor_script in the xen RA? Should be quite easy with copy and paste
from
that RA.
Thanks.
I'm not quite happy with the current implementation in the Xen RA. I'd
propose the following logic:
- Any script listed in OCF_RESKEY_monitor_scripts must provide
OCF-compliant exit codes by itself (unlike the present implementation in
the Xen RA which just maps any nonzero exit code to $OCF_ERR_GENERIC).
- If OCF_RESKEY_monitor_scripts contains multiple entries, they are
iterated over (just like in the Xen RA).
- The first nonzero exit code encountered from a monitor script stops the
iteration (just like in the Xen RA), and its exit code propagates as the
return value, and hence exit code, of the monitor operation (unlike the
Xen RA).
- The external monitor operation must never time out by itself, it must
keep trying indefinitely until killed by the LRM.
The last one is due to an additional pitfall with respect to implementing
migrate_from/migrate_to (which eventually should work, of course). We can
set start_delay on the monitor op so we make sure we start monitoring only
after the domain has booted completely. So that is fine. But suppose we
have a brief interruption in machine availability during migration. We
can't temporarily disable the external monitor operation then, so we at
least need to make sure that it doesn't time out before the LRM says it
does. I realize no such interruption is supposed to happen during Xen live
migration, but I don't know about KVM, OpenVZ, lxc etc.
WDOT?
Cheers,
Florian