Skip to end of metadata
Go to start of metadata

Contents

Introduction

From a user perspective, RUSS is all about accessing services that are available on a local machine or across a network. Services are provided by RUSS servers; each machine may host one or more RUSS server.

The simplest situation is when a server runs as the calling user (the client). A server may also run as a user other than the client, but provide services that the client would not otherwise have access to or permission to perform. Extending access/permission to other users is one of the main features of RUSS.

Dialing for Service

To access RUSS services, one must dial them.

When dialing a service, there are 3 basic operations possible:

  • help - get help about services
  • list - get a list of services
  • execute - execute a service

To identify the service being dialed, we provide a service path. A service path is a /-separated list of names which describe which service to dial and how to get to it. In the spath, the "+" name is used to as a shortcut to the system services area.

In addition to an operation and a service path, services may also be passed zero or more arguments and zero or more key=value settings.

rudial

When working from the command like, the main tool for accessing these services is rudial.

usage: rudial [<option>] <op> <spath> [<arg> ...]

Dial service at <spath> to perform <op>. A service may support one
or more operations (e.g., execute, help, info, list).

A successful dial will effectively connect the stdin, stdout, and
stderr of the service. Once connected, rudial forwards the stdin,
stdout, and sterr I/O data between the caller and the service.

An exit value of < 0 indicates a failure to connect. Otherwise a 0
exit value is returned.

Options:
-a|--attr <name=vaue>
    Pass a 'name=value' string to the service.
-b <bufsize>
    Set buffer size for reading/writing.
-i <path>
    Read from file instead of stdin.
--stats
--statsfd <fd>
    Output statistics for each read and write operation. The
    default is to output to stderr (fd=2). For 'execute' operation
    only.
-t|--timeout <seconds>
    Allow a given amount of time to connect before aborting.

Example - Where to Start?

Most setups have a collection of core services available. These are provided by the servers in the system area accessed by "+".

To list these:

$ rudial list +
debug
exec
proc
ssh

These are:

  • debug - debugging
  • exec - execute a command/program
  • proc - process status
  • ssh - provides access to remove host using ssh

Example - The debug Services

Most setups with have the debug services available in the system area at +/debug.

To get the list of services:

$ rudial list +/debug
chargen
conn
daytime
discard
echo
env
exit
request

If we are not familiar with the services provided, help can be gotten by:

$ rudial help +/debug
Provides services useful for debugging. Unless otherwise stated,
stdin, stdout, and stderr all refer to the file descriptor triple
that is returned from a russ_dial call.

/chargen[/...]
    Character generator outputting to stdout; follows the RFC 864
    the RFC 864 protocol sequence.

/conn[/...]
    Outputs russ connection information.

/daytime
    Outputs the date and time to the stdout.

/discard[/...] [--perf]
    Discards all data received from stdin; if --perf is specified,
    performance feedback is provide to stderr, otherwise there is
    none.

/echo[/...]
    Simple echo service; receives from stdin and outputs to stdout.

/env
    Outputs environ entries to stdout.

/exit <value>
    Return with given exit value (between 0 and 255).

/request[/...]
    Outputs the request information at the server stdout.

We can execute the daytime service:

$ rudial execute +/debug/daytime
Saturday, February 03, 2018 16:55:41-EST

The services path is composed of +debug, and daytime. The + identifies the system area; the debug identifies the server which is the starting point for the services it provides; the daytime is the actual service.

We can use the request service to see the dialing information passed to and received by the service:

$ rudial execute +/debug/request
protocol string (0010)
spath (/request)
op (execute)
opnum (2)
attrv (NULL)
argv (NULL)
$ rudial -a name=john -a color=blue execute +/debug/request/a/b/c hello there
protocol string (0010)
spath (/request/a/b/c)
op (execute)
opnum (2)
attrv[0] (name=john)
attrv[1] (color=blue)
argv[0] (hello)
argv[1] (there)

Convenience Tools

To simplify dialing, instead of using rudial, some convenience tools are provided:

  • ruhelp - instead of rudial help
  • ruls - instead of rudial list
  • ruexec - instead of rudial execute

Some examples:

$ ruls +
debug
exec
proc
ssh
$ ruexec +/debug/daytime
Saturday, February 03, 2018 17:06:19-EST

Working with Networks

RUSS is not limited to working on a single machine. RUSS can access machine across a network. Naturally, this is done using a RUSS server.

ssh Server

The ssh server provides a single service. From the ssh server help:

Provides access to remote host using ssh.

/[<user>@]<host>[:<port>][<options>]/... <args>
    Connect to service ... at <user>@<host>:<port> using ssh.

    Options:
    ?controlpersist=<seconds>
        Set ControlPersist time in seconds. Default is 1.
    ?controltag=<tag>
        Used to generate a ControlPath. Required to set up control
        master functionality (if available).

What is noteworthy is that the service name is not fixed but made up of an optional user/account name, a mandatory hostname, an optional port, and optional "options". The ... indicates that the service path continues and specifies how to get to the service on the remote host.

For example, to connect to the daytime service on machine abc.xyz:

ruexec +/ssh/abc.xyz/+/debug/daytime

This assumes that the ssh configuration (under ~/.ssh/config and keys) has been set up to not require user interaction.

If the desired service is a few hops away:

ruexec +/ssh/A/+/ssh/B/+/ssh/C/+/debug/daytime

rurun

When it is necessary to work with a collection of targets (not just hosts), the rurun tool is available. In some respects, rurun is similar to the dsh (distributed shell) tool but uses RUSS.

The targets file format:

[<user>@]<host>[:<port>] [<cgroup>]
...

Ignoring the optional <cgroup> part, we can define a basic targets file (with three hosts):

machs3
macha
machb
machc

then "run" programs at the targets:

$ dsh --targetsfile machs3 0:3 hostname
macha
machb
machc

Notes:

  • the program hostname is run on each machine sequentially
  • each target in the targetsfile is indentified by its position index (using 0-indexing)
  • 0:3 is equivalent to the range \[0,3): 0,1,2; this corresponds to Python ranges

To run concurrently (up to 5 at a time):

$ dsh --targetsfile -n 5 0:,2,3:-1:-1 hostname
macha
machb
machc
machc
machc
machb
macha

Notes:

  • the range 0: is equivalent to 0:<count> which, in this case, is 0:3
  • the range 2:-1:-1 is equivalent to 2,1,0
  • because things are done concurrently, the order of the results is unspecified

Often targets are only hostnames, but targets may also include a cgroup (for Linux only). E.g.,

macha jobs/123-0
machb jobs/123-1
machc jobs/123-2

This is typical when used in conjunction with a queueing system in which cgroups are used to isolate jobs (or parts of a jobs). Only the targets file changes, the calls with rurun do not.

rumpirun

The rumpirun tool is used to run MPI jobs (works with openmpi and mpich implementations) with rurun as a launcher. To meet the needs of the underlying mpirun tool, a hosts files is required. However, instead of specifying hostnames in the hostfile, index values are provided. These index values correspond to the targets in the targets file. rumpirun works the same whether cgroups are used or not.

Given a targets file of:

macha
machb
machc

and a hosts file of:

0
1
2

rumpirun is called in the same way as mpirun:

rumpirun -np 3 mpihello