From: Yuan Wan (ywan_at_ed.ac.uk)
Date: Mon Mar 17 2008 - 07:09:01 PST
Paul,
--------------------------------------------------------------------------------------
$ ls -l /usr/lib64/gconv/gconv-modules.cache
-rw-r--r-- 1 root root 21546 Oct 2 14:51 /usr/lib64/gconv/gconv-modules.cache
$ tcsh -c 'cat /proc/$$/maps' | grep gconv
2a9892f000-2a98935000 r--s 00000000 08:01 522135 /usr/lib64/gconv/gconv-modules.cache
---------------------------------------------------------------------------------------
I cannot see any difference on permission.
Can you restart my test script from checkpoint on your machine?
-------------------------------------------
#!/bin/sh
PATHTOR=/usr/bin
# Below, the phrase "EOF" marks the beginning and end of the HERE
document.
$PATHTOR/R --no-save <<EOF
mod<-function (x, y)
{
x1 <- trunc(trunc(x/y) * y)
z <- trunc(x) - x1
z
}
z0 <- unclass(Sys.time())
repeat{
z1<-unclass(Sys.time())
secs<-floor(z1-z0)
if (mod(secs, 10)==0) print(secs)
if ((secs)>180) break
}
EOF
-------------------------------------------
--Yuan
On Fri, 14 Mar 2008, Paul H. Hargrove wrote:
> Yuan,
>
> What do you get if you run the following two commands?
> $ ls -l /usr/lib64/gconv/gconv-modules.cache
> $ tcsh -c 'cat /proc/$$/maps' | grep gconv
>
> What I see is a world readable file and a shared read-only mmap in tcsh:
> $ ls -l /usr/lib64/gconv/gconv-modules.cache
> -rw-r--r-- 1 root root 21514 Jun 3 2005
> /usr/lib64/gconv/gconv-modules.cache
> $ tcsh -c 'cat /proc/$$/maps' | grep gconv
> 2b8e36967000-2b8e3696d000 r--s 00000000 00:0f 9486631
> /usr/lib64/gconv/gconv-modules.cache
>
> So, there shouldn't be a problem unless there is something different about
> your system.
>
> -Paul
>
> Paul H. Hargrove wrote:
>> Yuan,
>>
>> I've not seen that particular failure before, but some quick research
>> indicates that gconv-modules.cache is a part of glibc and I suspect that
>> it is getting mapped in much the same way as the NCSD file is. I will
>> continue to look into the problem to see what BLCR might be able to do
>> differently,
>>
>> -Paul
>>
>> Yuan Wan wrote:
>>
>>> Hi Paul,
>>>
>>> Thanks for replying.
>>> The error messege I got from /var/log/messeges is as the following:
>>>
>>> vmadump: mmap failed: /usr/lib64/gconv/gconv-modules.cache
>>> thaw_threads returned error, aborting. -13
>>>
>>> The failure seems not caused by NSCD. What do you think?
>>>
>>> --Yuan
>>>
>>>
>>> On Mon, 10 Mar 2008, Paul H. Hargrove wrote:
>>>
>>>
>>>> Yuan,
>>>>
>>>> The most likely cause is that the restart failed to open one of the
>>>> files that was open() or mmap()ed at the time the checkpoint was taken.
>>>> Based on the fact that you see this w/ a shell script, but not C code,
>>>> my best guess is that you are encountering a problem with the file that
>>>> the Name Service Cache Daemon (NSCD) uses. Please see the following FAQ
>>>> entry for more detail (including what to look for in the system logs)
>>>> http://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#nscd
>>>> The only known work-around is to remove NSCD from your system.
>>>>
>>>> -Paul
>>>>
>>>> Yuan Wan wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm trying to restart my shell script jobs (bash and R) with BLCR but
>>>>> failed with the following error:
>>>>>
>>>>> "Restart failed: Permission denied"
>>>>>
>>>>> I can checkpoint the job and get context file. The restart will be
>>>>> successful if executed by root but fail if run by normal users. The
>>>>> context file does belongs to me, so I'm wondering where the permission
>>>>> is required. I can also restart a C code as a regular user without
>>>>> problem.
>>>>>
>>>>> Anyone know the possible reason? Thanks
>>>>>
>>>>> --Yuan
>>>>>
>>>>> Yuan Wan
>>>>>
>>>>
>>>>
>>
>>
>>
>
>
>
--
Unix Section
Information Services Infrastructure Division
University of Edinburgh
tel: 0131 650 4985
email: ywan@ed.ac.uk
2012 Computing Services, JCMB
The King's Buildings,
Edinburgh, EH9 3JZ