 |
|
 |
 |
AlphaServer SCs (Sierra Clusters) can be thought of as a group of TruCluster systems tied together. Here, each TruCluster is termed a "domain." As with Globus on TruClusters, each node may be accessed directly or through the domain-specific cluster alias. This means that either a node-specific or cluster-wide installation is possible. However, in practice, rarely does every node have both an internal and external network interface (by default, only two nodes/domain are configured this way). Moreover, given that SCs can consist of hundreds of systems, it's unlikely that anyone would choose the node-specific Installation method here.
|
 |
 |
|
 |
 |
So, on AlphaServer SCs, it's recommended that the cluster-wide installation method be used. While each domain on the SC can be contacted directly using the cluster alias, there is no SC-wide alias mechanism that corresponds to the TruCluster "/etc/clua_services" file. This means that a client must chose a target domain on the SC in order to submit a job. That is, unlike a TruCluster, there can be no shared GSI certificates across the entire SC.
The administrator of an AlphaServer SC may chose to install Globus on either a single domain or on each domain. Installing Globus on a single domain resolves the problem of which target domain to select for job submission (i.e., there is only one). However, when installed on only one domain, Globus must utilize a SC-wide job manager to harness the resources available from other domains. From an administrative standpoint, it's also easier to manage Globus on a single cluster rather than across the entire SC. The main drawback of a single domain installation is availability; if the domain that serves Globus becomes unavailable, Globus jobs can not be submitted to the SC.
The alternative then, is to install Globus once on each domain. In this case, any node throughout the SC can be configured to accept Globus jobs. When selecting this method, each cluster should be installed and treated separately. After installation, the administrator may chose to relocate common commands and libraries to the SC File System (SCFS) as it spans all domains (note that there is no SCFS equivalent to the Cluster File System (CFS) CDSL.
|
|
 |
 |
|
 |
 |
Depending on the configuration, some SC nodes may not have direct connectivity to the external network. This poses a problem for the Globus job manager as it will be unable to communicate to nodes using their internal IP addresses. This problem is characterized by a job submission to a SC system returning an error message of the form: GRAM Job submission failed because the job manager
failed to open stderr (error code 74)
The solution is to reserve a range of TCP ports using the $GLOBUS_TCP_PORT_RANGE environment variable. The corresponding range of ports also must be added to "/etc/clua_services" along with the "out_alias" option. As an example, to reserve ports 40000-40100, one would add " GLOBUS_TCP_PORT_RANGE=40000,40100" next to $GLOBUS_HOSTNAME definition(s) in "/etc/inetd.conf". Then, add these ports to "/etc/services": globusio000 40000/tcp
globusio001 40001/tcp
[...]
globusio100 40100/tcp
Finally, add these named ports to "/etc/clua_services": globusio000 40000/tcp out_alias,static
globusio001 40001/tcp out_alias,static
[...]
globusio100 40100/tcp out_alias,static
|
 |
 |
|
 |
 |
With regards to job submission, the SC Resource Management System (RMS) can be used. The "scrun" command could be invoked to launch a job in parallel on all CFS domains or all nodes in the system. In addition, an augmented version of the LSF workload management software comes standard with SC systems; Globus has been configured to use LSF on the AlphaServer SC.
|
 |
About PDF files: The PDF files on this site can be read online or printed using Adobe® Acrobat® Reader. If you do not have this software on your system, you may download it from Adobe's website.
|
 |
Globus Toolkit feedback form |
 |
 |
 |
|
 |
 |
 |
|