Discussion:
[Linux-ha-dev] RA trace facility
Dejan Muhamedagic
2012-11-21 15:33:18 UTC
Permalink
Hi,

This is little something which could help while debugging
resource agents. Setting the environment variable __OCF_TRACE_RA
would cause the resource agent run to be traced (as in set -x).
PS4 is set accordingly (that's a bash feature, don't know if
other shells support it). ocf-tester got an option (-X) to turn
the feature on. The agent itself can also turn on/off tracing
via ocf_start_trace/ocf_stop_trace.

Do you find anything amiss?

Thanks,

Dejan
Lars Marowsky-Bree
2012-11-21 15:43:08 UTC
Permalink
Post by Dejan Muhamedagic
Hi,
This is little something which could help while debugging
resource agents. Setting the environment variable __OCF_TRACE_RA
would cause the resource agent run to be traced (as in set -x).
PS4 is set accordingly (that's a bash feature, don't know if
other shells support it). ocf-tester got an option (-X) to turn
the feature on. The agent itself can also turn on/off tracing
via ocf_start_trace/ocf_stop_trace.
Do you find anything amiss?
I *really* like this.

But I'd like a different way to turn it on - a standard one that is
available via the CIB configuration, without modifying the script.

What would you think of OCF_RESKEY_RA_TRACE ? Our include script could
enable that; it's unlikely that the problem occurs prior to that.

- never (default): Does nothing
- always: Always trace, write to $(which path?)/raname.rscid.$timestamp
- on-error: always trace, but delete on successful exit

hb_report/history explorer could gather this too.

(And yes I know this introduces a fake parameter that doesn't really
exist. But it'd be so helpful.)

Sorry. Maybe I'm getting carried away ;-)


Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
Dejan Muhamedagic
2012-11-21 17:02:49 UTC
Permalink
Hi Lars,
Post by Lars Marowsky-Bree
Post by Dejan Muhamedagic
Hi,
This is little something which could help while debugging
resource agents. Setting the environment variable __OCF_TRACE_RA
would cause the resource agent run to be traced (as in set -x).
PS4 is set accordingly (that's a bash feature, don't know if
other shells support it). ocf-tester got an option (-X) to turn
the feature on. The agent itself can also turn on/off tracing
via ocf_start_trace/ocf_stop_trace.
Do you find anything amiss?
I *really* like this.
But I'd like a different way to turn it on - a standard one that is
available via the CIB configuration, without modifying the script.
I don't really want that the script gets modified either.
The above instructions are for people developing a new RA.
Post by Lars Marowsky-Bree
What would you think of OCF_RESKEY_RA_TRACE ?
A meta attribute perhaps? That wouldn't cause a resource
restart.
Post by Lars Marowsky-Bree
Our include script could
enable that; it's unlikely that the problem occurs prior to that.
- never (default): Does nothing
- always: Always trace, write to $(which path?)/raname.rscid.$timestamp
bash has a way to send trace to a separate FD, but that feature
is available with version >=4.x. Otherwise, it could be messy to
separate the trace from the other stderr output. Of course, one
could just redirect stderr in this case. I suppose that that
would work too.
Post by Lars Marowsky-Bree
- on-error: always trace, but delete on successful exit
Good idea.
Post by Lars Marowsky-Bree
hb_report/history explorer could gather this too.
Right.
Post by Lars Marowsky-Bree
(And yes I know this introduces a fake parameter that doesn't really
exist. But it'd be so helpful.)
Sorry. Maybe I'm getting carried away ;-)
Good points. I didn't really think much (yet) about how to
further facilitate the feature, just had a vague idea that
somehow lrmd should set the environment variable. Perhaps we
could do something like this:

# crm resource trace <rsc_id> [<action>] [<when-to-trace>]

This would set the appropriate meta attribute for the resource
which would trickle down to the RA. ocf-shellfuncs would then
do whatever's necessary to setup the trace. The file management
could get tricky though, as we don't have a single point of exit
(and trap is already used elsewhere).

Cheers,

Dejan
Post by Lars Marowsky-Bree
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Lars Marowsky-Bree
2012-11-21 18:06:35 UTC
Permalink
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
What would you think of OCF_RESKEY_RA_TRACE ?
A meta attribute perhaps? That wouldn't cause a resource
restart.
Point, but - meta attributes so far were mostly for the PE/pacemaker,
this would be for the RA.

Would a changed definition for a resource we're trying to trace be an
actual problem? I mean, tracing clearly means you want to trace an
resource action, so one would put the attribute on the resource before
triggering that.

(It can also be put on in maintenance mode, avoiding the restart.)
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
Our include script could
enable that; it's unlikely that the problem occurs prior to that.
- never (default): Does nothing
- always: Always trace, write to $(which path?)/raname.rscid.$timestamp
bash has a way to send trace to a separate FD, but that feature
is available with version >=4.x. Otherwise, it could be messy to
separate the trace from the other stderr output. Of course, one
could just redirect stderr in this case. I suppose that that
would work too.
I assume that'd be easiest.

(And people not using bash can write their own implementation for this.
;-)
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
- on-error: always trace, but delete on successful exit
Good idea.
Post by Lars Marowsky-Bree
hb_report/history explorer could gather this too.
Right.
Post by Lars Marowsky-Bree
(And yes I know this introduces a fake parameter that doesn't really
exist. But it'd be so helpful.)
Sorry. Maybe I'm getting carried away ;-)
Good points. I didn't really think much (yet) about how to
further facilitate the feature, just had a vague idea that
somehow lrmd should set the environment variable.
Sure. LRM is an other obvious entry point for increased
tracing/logging. That could also work.
Post by Dejan Muhamedagic
# crm resource trace <rsc_id> [<action>] [<when-to-trace>]
This would set the appropriate meta attribute for the resource which
would trickle down to the RA. ocf-shellfuncs would then do whatever's
necessary to setup the trace. The file management could get tricky
though, as we don't have a single point of exit (and trap is already
used elsewhere).
The file/log management would be easier to do in the LRM - and also
handle the timeout situation; that could also make use of the "redirect
trace elsewhere" if the shell is new enough.


Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
Dejan Muhamedagic
2012-11-27 07:28:04 UTC
Permalink
Post by Lars Marowsky-Bree
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
What would you think of OCF_RESKEY_RA_TRACE ?
A meta attribute perhaps? That wouldn't cause a resource
restart.
Point, but - meta attributes so far were mostly for the PE/pacemaker,
this would be for the RA.
Not exactly for the RA itself. The RA execution would just be
observed. The attribute is consumed by others. Whether it is PE
or lrmd or something else makes less of a difference. It is up to
these subsystems to sort the meta attributes out.
Post by Lars Marowsky-Bree
Would a changed definition for a resource we're trying to trace be an
actual problem? I mean, tracing clearly means you want to trace an
resource action, so one would put the attribute on the resource before
triggering that.
(It can also be put on in maintenance mode, avoiding the restart.)
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
Our include script could
enable that; it's unlikely that the problem occurs prior to that.
- never (default): Does nothing
- always: Always trace, write to $(which path?)/raname.rscid.$timestamp
bash has a way to send trace to a separate FD, but that feature
is available with version >=4.x. Otherwise, it could be messy to
separate the trace from the other stderr output. Of course, one
could just redirect stderr in this case. I suppose that that
would work too.
I assume that'd be easiest.
(And people not using bash can write their own implementation for this.
;-)
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
- on-error: always trace, but delete on successful exit
Good idea.
Post by Lars Marowsky-Bree
hb_report/history explorer could gather this too.
Right.
Post by Lars Marowsky-Bree
(And yes I know this introduces a fake parameter that doesn't really
exist. But it'd be so helpful.)
Sorry. Maybe I'm getting carried away ;-)
Good points. I didn't really think much (yet) about how to
further facilitate the feature, just had a vague idea that
somehow lrmd should set the environment variable.
Sure. LRM is an other obvious entry point for increased
tracing/logging. That could also work.
Post by Dejan Muhamedagic
# crm resource trace <rsc_id> [<action>] [<when-to-trace>]
This would set the appropriate meta attribute for the resource which
would trickle down to the RA. ocf-shellfuncs would then do whatever's
necessary to setup the trace. The file management could get tricky
though, as we don't have a single point of exit (and trap is already
used elsewhere).
The file/log management would be easier to do in the LRM - and also
handle the timeout situation; that could also make use of the "redirect
trace elsewhere" if the shell is new enough.
Indeed. Until then, ocf-shellfuncs can fallback to some well
known location.

Thanks,

Dejan
Post by Lars Marowsky-Bree
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Dejan Muhamedagic
2013-01-24 20:12:29 UTC
Permalink
Hi,
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
What would you think of OCF_RESKEY_RA_TRACE ?
A meta attribute perhaps? That wouldn't cause a resource
restart.
Point, but - meta attributes so far were mostly for the PE/pacemaker,
this would be for the RA.
Not exactly for the RA itself. The RA execution would just be
observed. The attribute is consumed by others. Whether it is PE
or lrmd or something else makes less of a difference. It is up to
these subsystems to sort the meta attributes out.
It turns out that pacemaker won't export meta attributes which
were not recognized. At any rate, we can go with
OCF_RESKEY_trace_ra. The good thing is that it can be specified
per operation (op start trace_ra=1).

The interface is simple and it's described in ocf-shellfuncs. It
would get support in the UI.
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
Would a changed definition for a resource we're trying to trace be an
actual problem? I mean, tracing clearly means you want to trace an
resource action, so one would put the attribute on the resource before
triggering that.
(It can also be put on in maintenance mode, avoiding the restart.)
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
Our include script could
enable that; it's unlikely that the problem occurs prior to that.
- never (default): Does nothing
- always: Always trace, write to $(which path?)/raname.rscid.$timestamp
bash has a way to send trace to a separate FD, but that feature
is available with version >=4.x. Otherwise, it could be messy to
separate the trace from the other stderr output. Of course, one
could just redirect stderr in this case. I suppose that that
would work too.
I assume that'd be easiest.
(And people not using bash can write their own implementation for this.
;-)
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
- on-error: always trace, but delete on successful exit
Good idea.
This is not implemented right now.

The patch is attached. It's planned for the release 3.9.5.

Thanks,

Dejan
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
Post by Dejan Muhamedagic
Post by Lars Marowsky-Bree
hb_report/history explorer could gather this too.
Right.
Post by Lars Marowsky-Bree
(And yes I know this introduces a fake parameter that doesn't really
exist. But it'd be so helpful.)
Sorry. Maybe I'm getting carried away ;-)
Good points. I didn't really think much (yet) about how to
further facilitate the feature, just had a vague idea that
somehow lrmd should set the environment variable.
Sure. LRM is an other obvious entry point for increased
tracing/logging. That could also work.
Post by Dejan Muhamedagic
# crm resource trace <rsc_id> [<action>] [<when-to-trace>]
This would set the appropriate meta attribute for the resource which
would trickle down to the RA. ocf-shellfuncs would then do whatever's
necessary to setup the trace. The file management could get tricky
though, as we don't have a single point of exit (and trap is already
used elsewhere).
The file/log management would be easier to do in the LRM - and also
handle the timeout situation; that could also make use of the "redirect
trace elsewhere" if the shell is new enough.
Indeed. Until then, ocf-shellfuncs can fallback to some well
known location.
Thanks,
Dejan
Post by Lars Marowsky-Bree
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Keisuke MORI
2012-11-22 09:27:59 UTC
Permalink
Hi,
Post by Dejan Muhamedagic
Hi Lars,
Post by Lars Marowsky-Bree
Post by Dejan Muhamedagic
Hi,
This is little something which could help while debugging
resource agents. Setting the environment variable __OCF_TRACE_RA
would cause the resource agent run to be traced (as in set -x).
PS4 is set accordingly (that's a bash feature, don't know if
other shells support it). ocf-tester got an option (-X) to turn
the feature on. The agent itself can also turn on/off tracing
via ocf_start_trace/ocf_stop_trace.
Do you find anything amiss?
I *really* like this.
But I'd like a different way to turn it on - a standard one that is
available via the CIB configuration, without modifying the script.
I don't really want that the script gets modified either.
The above instructions are for people developing a new RA.
I like this, too.
I would be useful when you need to diagnose in the production
environment if you can enable / disable it without any modifications
to RAs.

It might be also helpful if it has a kind of 'hook' functionality that
allows you to execute an arbitrary script for collecting the runtime
information such as CPU usage, memory status, I/O status or the list
of running processes etc. for diagnosis.
--
Keisuke MORI
Dejan Muhamedagic
2012-11-27 07:31:24 UTC
Permalink
Hi Keisuke-san,
Post by Keisuke MORI
Hi,
Post by Dejan Muhamedagic
Hi Lars,
Post by Lars Marowsky-Bree
Post by Dejan Muhamedagic
Hi,
This is little something which could help while debugging
resource agents. Setting the environment variable __OCF_TRACE_RA
would cause the resource agent run to be traced (as in set -x).
PS4 is set accordingly (that's a bash feature, don't know if
other shells support it). ocf-tester got an option (-X) to turn
the feature on. The agent itself can also turn on/off tracing
via ocf_start_trace/ocf_stop_trace.
Do you find anything amiss?
I *really* like this.
But I'd like a different way to turn it on - a standard one that is
available via the CIB configuration, without modifying the script.
I don't really want that the script gets modified either.
The above instructions are for people developing a new RA.
I like this, too.
I would be useful when you need to diagnose in the production
environment if you can enable / disable it without any modifications
to RAs.
Of course.
Post by Keisuke MORI
It might be also helpful if it has a kind of 'hook' functionality that
allows you to execute an arbitrary script for collecting the runtime
information such as CPU usage, memory status, I/O status or the list
of running processes etc. for diagnosis.
Yes. I guess that one could run such a hook in background. Did
you mean that? Or once the RA instance exited? This is a bit
different feature though.

Thanks,

Dejan
Post by Keisuke MORI
--
Keisuke MORI
_______________________________________________________
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
Keisuke MORI
2012-11-28 10:24:15 UTC
Permalink
Hi,

2012/11/27 Dejan Muhamedagic <***@suse.de>:
(...)
Post by Dejan Muhamedagic
Post by Keisuke MORI
It might be also helpful if it has a kind of 'hook' functionality that
allows you to execute an arbitrary script for collecting the runtime
information such as CPU usage, memory status, I/O status or the list
of running processes etc. for diagnosis.
Yes. I guess that one could run such a hook in background. Did
you mean that?
I first thought that it simply runs a one-shot hook at the invocation
of the RA instance,
but it would be great if it can run in background while running a RA operation.
Post by Dejan Muhamedagic
Or once the RA instance exited? This is a bit
different feature though.
It is also possible if it can run a hook at the event of the RA
timeouts or a command in the RA gets stuck in some reason.

Thanks,
--
Keisuke MORI
Loading...