From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Jul 16 2008 - 14:46:40 PDT
Ladislav Subr wrote:
> Hello,
>
> I've experienced a problem when restarting a process which has an open file
> larger than 2GB. The cr_restart utility returns retcode 22 with the
> message 'Restart failed: Invalid argument'. In system logs, I see:
>
> kernel: blcr: Couldn't restore file pointer.
> kernel: blcr: cr_restore_all_files [27114]: Unable to restore fd 3
> (type=1,err=-2009075712)
> kernel: blcr: cr_rstrt_child [27114]: Unable to restore files!
> (err=-2009075712)
>
> The system is CentOS 5.2 with vanilla kernel 2.6.22.19 x86_64 on Opteron CPU
> and BLCR 0.7.1. I think that I have noticed this problem already some time
> ago with another verion of BLCR.
>
> Thank you in advance for any help to overcome this problem.
>
> Ladislav
>
Ladislav,
Thanks for the bug report.
I've looked at the code path that generates the message "Couldn't restore file pointer" and I am fairly confident the problem is simply that we are not testing the return from sys_lseek() correctly. The result is that the large file offset (>2GB) is being interpreted incorrectly by BLCR as a negative value indicating an error.
If you could, please apply the attatched patch to 0.7.1 and let me know if this resolves the problem.
-Paul
--
Paul H. Hargrove PHHargrove_at_lbl_dot_gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Index: cr_module/cr_rstrt_req.c
===================================================================
RCS file: /var/local/cvs/lbnl_cr/cr_module/cr_rstrt_req.c,v
retrieving revision 1.292.2.3
diff -u -r1.292.2.3 cr_rstrt_req.c
--- cr_module/cr_rstrt_req.c 24 Jun 2008 20:51:35 -0000 1.292.2.3
+++ cr_module/cr_rstrt_req.c 16 Jul 2008 21:27:27 -0000
@@ -1636,9 +1636,9 @@
}
/* restore position in file */
- retval = sys_lseek(file_info->fd, open_file.f_pos, 0);
- if (retval < 0) {
+ if (sys_lseek(file_info->fd, open_file.f_pos, 0) != open_file.f_pos) {
CR_ERR("Couldn't restore file pointer.");
+ retval = -EINVAL;
goto out_free;
}
@@ -1706,9 +1706,9 @@
}
/* restore position */
- retval = sys_lseek(file_info->fd, open_dir.f_pos, 0);
- if (retval < 0) {
+ if (sys_lseek(file_info->fd, open_dir.f_pos, 0) != open_dir.f_pos) {
CR_ERR("Couldn't restore file pointer.");
+ retval = -EINVAL;
goto out_free;
}