From: Parviz Fariborz (parviz_fariborz_at_mentor_dot_com)
Date: Tue May 27 2008 - 04:03:10 PDT
Hi Paul,
Thanks for taking the time to explain this. I will try your suggestion
and will be glad to put together a how-to guide once I verify that it works.
-Parviz
Paul H. Hargrove wrote:
> Parviz,
>
> BLCR is not able to save/restore the association between the debugger
> and the executable, making what you are trying slightly difficult (but
> hopefully not impossible). For that reason, in the 0.7.0 release (due
> out soon) the default behavior will be to refuse to checkpoint while a
> debugger is attached (an additional option will need to be specified
> to allow the checkpoint in such a case). In neither the 0.6.x or
> 0.7.0 release will checkpointing gdb and the debugged process together
> (as process group, process tree, etc) work. If it did, your task
> would have been much easier (just "cr_checkpoint <pid-of-gdb>").
>
> The Trace/BPT trap you see is the restarted executable executing a
> breakpoint (bpt) trap instruction that the debugger inserted. Since
> at restart time no debugger is attached, the trap is a fatal error.
> The problem is that any breakpoint trap instruction written by the
> first gdb is still present in the checkpointed process, having
> replaced instuction(s) in the process. When gdb wrote that
> instruction into process memory, it would have saved the original
> instruction byte in its own memory (to restore when executing past the
> breakpoint, or when removing it). However that information was lost
> when the first gdb exited. This doesn't appear to have a good
> solution other than deleting all breakpoints before you take the
> checkpoint. If you consult a gdb expert (I am not one) you may be
> able to get gdb to print all the breakpoint data in a form that can be
> fed back into the new gdb (or perhaps you only have one at this
> stage). So, I recommend the following steps:
> 1) Run under control on gdb until it stops at your "safe" breakpoint
> 2) delete all breakpoints/watchpoints
> 3) checkpoint the process (may require you to "c" in response to the
> BLCR-generated signal)
>
> At restart time there is the question of attaching gdb "soon enough"
> to regain control before the buggy code runs. Since we had to remove
> all the breakpoints, there seems to be nothing preventing the code
> from executing normally, bugs and all. If you are restarting from a
> point early enough (say 1 minute or more) before your suspected bug
> then you can probably just restart and then attach gdb "fast enough".
> If you are too slow it costs you little to try again. However, it
> might not be possible to do that in general. To deal with that on can
> try passing "--stop" to the cr_restart command, which will freeze the
> executable (with a SIGSTOP) immediately on restart (before returning
> control to the point where BLCR interrupted execution). That should
> allow you to attach a debugger, which then may need to send SIGCONT to
> the process to resume execution. However, I am not sure that gdb will
> correctly attach to a STOPed process. In my experiments there were
> some cases where "gdb <exectuable> <pid>" appeared to hang when the
> process was STOPed in this manner. If so, try sending a SIGCONT from
> another window/terminal ("kill -CONT <pid>"); hopefully that will
> resolve it, but it didn't always do so for me. I think this depends
> on the gdb and/or kernel release. In short, my recommendation if
> "attach gdb fast enough" isn't possible is:
> 1) Restart with the "--stop" command line option to freeze the process
> 2) Attach gdb to the restarted-but-stopped process
> 3) Send SIGCONT, either from gdb (if it attached OK) or from a command
> line (if gdb looks "stuck").
>
> Hope this helps. Let us know if the instructions above do or do not
> work for you. Perhaps you'd be interested in helping to write up a
> "mini howto" based on your experiences?
>
> -Paul
>
> Parviz Fariborz wrote:
>>
>> Hi,
>>
>> I am trying to use blcr to shorten the debug time for a large
>> executable. I have described the approach that I have taken and the
>> issues that I ran into below. Perhaps someone in this mailing list
>> has done the same and can give me some guidance.
>>
>> When debugging a long running executable in gdb (multiple hours), I
>> want to use blcr to checkpoint the running executable at a breakpoint
>> close to the problem area where I can safely assume things are in
>> good state. In the next round of debugging, instead of running the
>> executable in gdb, I want to re-start the checkpoint and attach the
>> gdb to running process. This gets me to the point of interest a lot
>> faster.
>>
>> My questions are : Is it possible to stop a running process in gdb at
>> a breakpoint and create a checkpoint? I tried it and was able to
>> create the checkpoint file, But the re-start always failed with the
>> following message :
>>
>> .Trace/BPT trap
>>
>> Also, is there a better approach? If so, please describe it.
>>
>> Thanks in advance for your help
>>
>> -Parviz
>
>