From a user perspective, RUSS is all about accessing services that are available on a local machine or across a network. Services are provided by RUSS servers; each machine may host one or more RUSS server.
The simplest situation is when a server runs as the calling user (the client). A server may also run as a user other than the client, but provide services that the client would not otherwise have access to or permission to perform. Extending access/permission to other users is one of the main features of RUSS.
Dialing for Service
To access RUSS services, one must dial them.
When dialing a service, there are 3 basic operations possible:
help- get help about services
list- get a list of services
execute- execute a service
To identify the service being dialed, we provide a service path. A service path is a /-separated list of names which describe which service to dial and how to get to it. In the spath, the "+" name is used to as a shortcut to the system services area.
In addition to an operation and a service path, services may also be passed zero or more arguments and zero or more key=value settings.
When working from the command like, the main tool for accessing these services is
usage: rudial [<option>] <op> <spath> [<arg> ...] Dial service at <spath> to perform <op>. A service may support one or more operations (e.g., execute, help, info, list). A successful dial will effectively connect the stdin, stdout, and stderr of the service. Once connected, rudial forwards the stdin, stdout, and sterr I/O data between the caller and the service. An exit value of < 0 indicates a failure to connect. Otherwise a 0 exit value is returned. Options: -a|--attr <name=vaue> Pass a 'name=value' string to the service. -b <bufsize> Set buffer size for reading/writing. -i <path> Read from file instead of stdin. --stats --statsfd <fd> Output statistics for each read and write operation. The default is to output to stderr (fd=2). For 'execute' operation only. -t|--timeout <seconds> Allow a given amount of time to connect before aborting.
Example - Where to Start?
Most setups have a collection of core services available. These are provided by the servers in the system area accessed by "+".
To list these:
$ rudial list + debug exec proc ssh
exec- execute a command/program
proc- process status
ssh- provides access to remove host using ssh
Example - The debug Services
Most setups with have the
debug services available in the system area at
To get the list of services:
$ rudial list +/debug chargen conn daytime discard echo env exit request
If we are not familiar with the services provided, help can be gotten by:
$ rudial help +/debug Provides services useful for debugging. Unless otherwise stated, stdin, stdout, and stderr all refer to the file descriptor triple that is returned from a russ_dial call. /chargen[/...] Character generator outputting to stdout; follows the RFC 864 the RFC 864 protocol sequence. /conn[/...] Outputs russ connection information. /daytime Outputs the date and time to the stdout. /discard[/...] [--perf] Discards all data received from stdin; if --perf is specified, performance feedback is provide to stderr, otherwise there is none. /echo[/...] Simple echo service; receives from stdin and outputs to stdout. /env Outputs environ entries to stdout. /exit <value> Return with given exit value (between 0 and 255). /request[/...] Outputs the request information at the server stdout.
We can execute the
$ rudial execute +/debug/daytime Saturday, February 03, 2018 16:55:41-EST
The services path is composed of
+ identifies the system area; the
debug identifies the server which is the starting point for the services it provides; the
daytime is the actual service.
We can use the
request service to see the dialing information passed to and received by the service:
$ rudial execute +/debug/request protocol string (0010) spath (/request) op (execute) opnum (2) attrv (NULL) argv (NULL)
$ rudial -a name=john -a color=blue execute +/debug/request/a/b/c hello there protocol string (0010) spath (/request/a/b/c) op (execute) opnum (2) attrv (name=john) attrv (color=blue) argv (hello) argv (there)
To simplify dialing, instead of using
rudial, some convenience tools are provided:
ruhelp- instead of
ruls- instead of
ruexec- instead of
$ ruls + debug exec proc ssh
$ ruexec +/debug/daytime Saturday, February 03, 2018 17:06:19-EST
Working with Networks
RUSS is not limited to working on a single machine. RUSS can access machine across a network. Naturally, this is done using a RUSS server.
ssh server provides a single service. From the
ssh server help:
Provides access to remote host using ssh. /[<user>@]<host>[:<port>][<options>]/... <args> Connect to service ... at <user>@<host>:<port> using ssh. Options: ?controlpersist=<seconds> Set ControlPersist time in seconds. Default is 1. ?controltag=<tag> Used to generate a ControlPath. Required to set up control master functionality (if available).
What is noteworthy is that the service name is not fixed but made up of an optional user/account name, a mandatory hostname, an optional port, and optional "options". The
... indicates that the service path continues and specifies how to get to the service on the remote host.
For example, to connect to the daytime service on machine
This assumes that the ssh configuration (under
~/.ssh/config and keys) has been set up to not require user interaction.
If the desired service is a few hops away:
When it is necessary to work with a collection of targets (not just hosts), the
rurun tool is available. In some respects,
rurun is similar to the
dsh (distributed shell) tool but uses RUSS.
The targets file format:
[<user>@]<host>[:<port>] [<cgroup>] ...
Ignoring the optional
<cgroup> part, we can define a basic targets file (with three hosts):
macha machb machc
then "run" programs at the targets:
$ dsh --targetsfile machs3 0:3 hostname macha machb machc
- the program
hostnameis run on each machine sequentially
- each target in the targetsfile is indentified by its position index (using 0-indexing)
- 0:3 is equivalent to the range \[0,3): 0,1,2; this corresponds to Python ranges
To run concurrently (up to 5 at a time):
$ dsh --targetsfile -n 5 0:,2,3:-1:-1 hostname macha machb machc machc machc machb macha
- the range 0: is equivalent to 0:<count> which, in this case, is 0:3
- the range 2:-1:-1 is equivalent to 2,1,0
- because things are done concurrently, the order of the results is unspecified
Often targets are only hostnames, but targets may also include a cgroup (for Linux only). E.g.,
macha jobs/123-0 machb jobs/123-1 machc jobs/123-2
This is typical when used in conjunction with a queueing system in which cgroups are used to isolate jobs (or parts of a jobs). Only the targets file changes, the calls with
rurun do not.
rumpirun tool is used to run MPI jobs (works with openmpi and mpich implementations) with
rurun as a launcher. To meet the needs of the underlying
mpirun tool, a hosts files is required. However, instead of specifying hostnames in the hostfile, index values are provided. These index values correspond to the targets in the targets file.
rumpirun works the same whether cgroups are used or not.
Given a targets file of:
macha machb machc
and a hosts file of:
0 1 2
rumpirun is called in the same way as
rumpirun -np 3 mpihello