NERSCPowering Scientific Discovery Since 1974

scriptEnv - loading modules before starting a script

In some cases a script needs to load modules before the script can be executed, but it can often be inconvenient or impossible to provide wrapper scripts which load the needed modules.  CGI scripts on the gpweb resources or in the NERSC portal environment which require the genepool-specific python/perl/R or databases configuration modules are a strong example of this.  NERSC provides the scriptEnv as a custom drop-in replacement for /usr/bin/env. 

scriptEnv loads your selected modules to allow your scripts to run easily and reproducibly.  After constructing your scriptEnv, you only need replace the shebang line of your script to use your custom scriptEnv instead of /usr/bin/env.  scriptEnv can even be configured to work in a non-genepool environment (e.g., portal.nersc.gov, dtn).

Custom scriptEnvs

A scriptEnv is a compiled program which modifies the environment as if modules had been loaded, then starts a specfied interpreter (e.g., perl, python) with any specified flags.  This allows a script to bootstrap to the needed environment without needing a separate wrapper script.  This allows you to construct a script like:

#!/global/projectb/projectdirs/..../my_scriptEnv python3
import os
...

 The scriptEnv can then, for example, be compiled with a specific python/3* module, and you'll have surety that your script will execute in the proper environment - even if the script isn't executed on a genepool system.

Another feature of the scriptEnv system is that it can provide arguments to the intended interpreter.  This allows users to specify flags like "-w" or "-t" to perl:

#!/global/projectb/projectdirs/..../my_scriptEnv perl -w
use strict;
...

Building a scriptEnv

The python script "generate_scriptEnv.py" is used to create a new scriptEnv.  This python script calculates the effects on the environment of your desired modules, and then embeds those changes directly into the output scriptEnv executable.  The scriptEnv then only needs to be put into an appropriate place in the filesystem (recommend a projectb projectdir).

Creating a scriptEnv

dmj@genepool02:~$ generate_scriptEnv.py oracleEnv python oracle_client \
Compiling scriptEnv
g++ -O3 oracleEnv.cc -static -o oracleEnv

Use /path/to/oracleEnv instead of /usr/bin/env in your shebang line.
For example:
#!/path/to/oracleEnv python
... script ...
dmj@genepool02:~$ ls -l oracleEnv*
-rwxr-xr-x 1 dmj dmj 4642568 Jun  9 11:25 oracleEnv
-rw------- 1 dmj dmj    3106 Jun  9 11:25 oracleEnv.cc
dmj@genepool02:~$

 The above example creates a scriptEnv called oracleEnv which embeds the python and oracle_client modules into it.  The two outputs are the binary oracleEnv and oracleEnv.cc source code file.  It is not recommended to manully update the generated source code, rather all updates should be done by calling "generate_scriptEnv.py" again.

Using a scriptEnv in a CGI script:

dmj@genepool02:~$ cat /global/project/.../www/test_scriptEnv.cgi
#!/global/project/.../scripts/oracleEnv python
import sys
print "Content-type: text/html"
print ""
print "<html>"

try:
	import cx_Oracle
	print "Executing with:<strong>%s</strong>" % sys.executable
except:
	import traceback
	exception_strs = traceback.format_exception(sys.exc_info())
	print "<pre>%s</pre>" % "\n".join(exception_strs)
print "</html>"
dmj@genepool02:~$

If your script will not be executed in a genepool environment (e.g. in the portal system, not a gpweb), use the "--realpaths" argument to generate_scriptEnv.py.  The "--realpaths" argument will rewrite all the paths in the environment from the modules to use the absolute path (i.e., /global/common/genepool, instead of /usr/common).  This option will also prevent the embedded modules from being manipulated by your script (i.e., can't unload or purge it).

Presumed Frequently Asked Questions

Q: Why are scriptEnvs compiled?
A:  Some linux distributions will not execute a non-binary interpreter in the shebang line.  This prevents use of an arbitrary wrapper script which could flexibly load modules dynamically.  Furthermore embedding the the modules saves on the cost of modulefile interpretation at process start-up time.  Finally, since the scriptEnv will directly execute your intended interpreter directly, instead of asking another yet another interpreter (e.g., /usr/bin/env or /bin/sh) to work out the final interpreter, further start-up execution time is saved.  This may be important for scripts which are called frequently (e.g., CGI scripts).


Q: Can I include arguments to my favorite interpreter?
A:  Yes, scriptEnv tries to determine the intended arguments and passes them explicitly to the final interpreter.  It does this by looking for white spaces, and knows nothing of escape sequences.  Therefore:  "#!/path/to/scriptEnv perl -w" is acceptable, but "#!/path/to/scriptEnv perl -fake\ flag" is not.


Q: What if I want to add modules to my scriptEnv?
A:  Since the modules are directly embedded into the scriptEnv, you'll need to re-run generate_scriptEnv.py with the new set of modules you want.


Q: If the modules are embedded, what happens if the module changes?
A: Nothing - your scriptEnv will not be aware of any changes to the modules system.  It is recommended to rebuild the scriptEnvs frequently (e.g. with a cron script).  If generate_scriptEnv.py detects an issue creating the scriptEnv, it will return a non-zero exit status.  A reasonable cron-job for updating a scriptEnv would look like:

5 23 * * * cd <scratchPath> && generate_scriptEnv.py myScriptEnv python oracle_client \
    && mv myScriptEnv /real/path/for/myScriptEnv