A Network Services Interface for Telepresence Applications

Gerald M. Karam, Bruce McLeod, Gerald Boersma

The Ontario Telepresence Project[1]

Department of Systems and Computer Engineering

Carleton University

1125 Colonel By Drive

Ottawa, Ontario, CANADA K1S 5B6

ph: 613-788-5749, fax: 613-788-5727

karam@sce.carleton.ca

ABSTRACT

Telepresence is the application of computing and telecommunications to support a sense of social proximity among collaborators, in spite of separation by distance and time. This class of applications uses a variety of multimedia information (audio, video, and computer data) communications and presentation techniques to achieve these goals. Paramount in these applications is the need to provide distributed control, and to some extent distributed information exchange. We describe a Network Services Interface (NSI) that has been developed to provide the control and data exchange backplane for telepresence applications. We describe two experimental systems that illustrate its use. The first is a re-implementation of the Xerox Portholes concept that shows the ability of NSI to support existing applications by simple means. The second is an implementation of a generic conference control system (for multipoint video conferences) and its instantiation to manage automated multipoint conference rooms; the conference control application is based in part on the ITU GCC draft standard. Through these applications NSI's strengths and flexibility are illustrated.

Introduction

Telepresence is the art of enabling social proximity despite geographical or temporal distances. It is a set of computer, audio-visual, and telecommunications technologies, which are carefully integrated to enable people to work together using technology as an intermediary [,]. In the activities of the Ontario Telepresence Project [] we have constructed a variety of software and hardware systems to support telepresence applications and it is the mandate of the project to deploy these applications in the field to evaluate their acceptance and use by people in an office environment (in this paper, the term "Telepresence" when capitalized, refers to the project rather than the concept) . The applications we have developed include: (1) computer-mediated intra-site and inter-site synchronous audio/video communcations employing strategically placed cameras and monitors, (2) background awareness using low frame rate snapshots of users (based on Xerox's Portholes []), (3) video and voice mail, (4) receptionist and video automated attendant functions for video call management, and (4) room level A/V switching management.

Towards providing a greater range of applications (including shared task-space applications such as shared drawing packages) and simplifying their creation and deployment, there is a need to support all of telepresence services by a common, network-wide software infrastructure. We have developed such a framework, the Network Services Interface (NSI) that employs: (1) an underlying reliable multipoint communications capability (a variant of the Multipoint Communications Services (MCS) [,]), (2) models for localized and distributed services, (3) a template for application construction, and (4) primitives for communications, and access and control of services.

We report on the utility of the NSI model in two experiments: (1) the re-engineering of the Portholes application (NSI-Portholes), and (2) the development of conference control application, Telepresence Conference Control (TCC) and its use in the construction of a multipoint conference room device manager. Each of these uses the NSI as its basis and represent two quite different forms of application (Portholes illustrates multipoint control, data sharing, and user services with periodic transmission of video images, whereas TCC illustrates conference control and its use to coordinate A/V devices, video codecs, etc. in a multiple conference room environment).

Related Work. There are a variety of research and development efforts in the area of NSI and TCC. Most propose alternatives to the functions of MCS; most notably, the multicast transmission function and the session management. Multicast has been explored widely in the Internet forum and proposals exist for various models. VMTP [] is a transaction based transport protocol with multicast capabilities. Specifically, it provides RPC-like client-server communications, where a client can communicate with multiple servers. VMTP uses the unreliable multicast network service, IP multicast, which as yet is not universally provided by Internet Nodes. Furthermore, the multicast services of VMTP are not themselves guaranteed to be reliable with respect to guaranteed delivery or notification of failure. XTP [,] provides a super protocol framework for the both point to point protocols like TCP/IP and multicast proposals like VMTP; it is intended to support distributed multimedia and is a commercial product. Other Internet-based multicast software include: Vartalaap [] that provides a single centralized global server that supports a multicast function similar to the MCS totally ordered send option (which is not always necessary); the Multicast Transport Protocol [] that tries to provide a reliable transport layer for the unreliable IP multicast network layer -- the end result is very complex to achieve a totally ordered message delivery (like the MCS totally ordered send option); the Cornell CBCAST and ABCAST [] that provides a reliable ordered, or non-ordered multicast --- this may be a viable option for supplanting MCS with a true reliable multicast transmission layer (the current MCS definition uses a tree of point-to-point connections); the LLNL MCAST software [] constructs a reliable multicast network using TCP/IP based sockets, but is intended to eventually use IP multicast with reliability mechanisms added --- NSI uses MCAST as a reliable network layer for MCS in Unix.

Related to the infrastructure features of NSI and the specific conference control application of TCC, several projects have been done. The most closely related is the standardization work on Generic Conference Control (GCC) []. This defines a model of general services for conference control applications. It provides an NSI-like capability, but in a rather narrowly defined way. Also, it defines some of the functions found in TCC, but again constrained primarily to multipoint control unit management. The ISI-sponsored MMConf system and its conference control subsystem, MMCC [,] also provide features similar to GCC and desirable for NSI and TCC. Of particular interest are the manner in which remote device control is managed. A common network infrastructure is provided in MMConf, but is targeted largely at packet-based transmission of video and audio over the Internet (although they actually used satellite links to avoid bandwidth and delay problems). MMConf does not provide co-ordination of applications.

In the remainder of this paper, we describe (1) the network goals for telepresence applications that NSI is intended to support, (2) the MCS model and our extensions, (3) the NSI, (4) the Portholes experiment, and (5) the TCC experiment. Finally conclusions and continuing work are presented.

Network Goals

In developing telepresence applications and in forming the services for NSI (and to some degree the TCC application), the following requirements have networking goals have been used: Bandwidth/Channel Transparency, Collaboration (Group) Transparency, Site-end Heterogeneity, Security, Robustness, and Incremental Extensibility. As a set the goals, these are typical of many networking systems; however, for telepresence the most significant points are the need for collaboration transparency, and the range of capability that may be required all within one general structure. Each of these is discussed briefly below.

Bandwidth/Channel Transparency. Applications must be able to select communications channels based on bandwidth (including quality of service) requirements, and ideally, be able to alter bandwidth requirements during a session. The mapping of channels to physical communications paths should be hidden by abstractions.

Collaboration (Group) Transparency. In telepresence applications there is a need for members of a work group (be it temporary for the purpose of a short term multiparty interaction, or long term for extended collaboration), to learn about each other, exchange information, maintain group awareness, and maintain group sessions. All of this should be achievable without regard to the physical locations -- the network must provide for the abstractions necessary to satisfy the logical groups required by collaboration.

Site-end Heterogeneity. The network model for telepresence applications must be tolerant of different equipment and capabilities all attempting to achieve the same functionality. For example, one site may represent a large local A/V network with many users, a second site may use a simpler mechanism to support several users, and a third site may consist of a lone user. Assuming that users from all three sites wish to maintain a sense of group, it is incumbent on the network services to permit it, up to the limitations of the site-end equipment.

Security. Computer networks regularly have to deal with security as nodes on a network often belong to different administrations with needs to protect their integrity. Telepresence applications represent a dichotomy as by their nature they represent "free wheeling collaborations" among group members, but at the same time must respect administrative needs for security. For example, an external user to a site should be able to collaborate with members of that site in an authorized work group, and interactions among that group should be free flowing within the limits of the site. However it should be possible to prevent operations with members outside of the group and/or application.

Robustness. Any serious network or network application holds robustness as a significant goal. Telepresence applications complicate this problem in several dimensions. (1) Most applications, even distributed ones, are a collection of point-to-point interactions. While this will likely be true of telepresence applications as well, the notion of a transparent group session will make the appearance of robustness more of a challenge. (2) Site-end heterogeneity will complicate the construction of redundancy. For telepresence applications the type of equipment and functionality at a site may vary considerably and furthermore the number and type of equipment used in a particular group session may also vary quite considerably.

Incremental Extensibility. Most networks have as a goal the ability to add new sites without significantly (or at all) disrupting the activities of the existing sites. (Note: a new site is not treated the same as a site that has failed and is returning on-line --- the latter case is usually handled differently). For a new site to join the network, two activities occur: (1) new site must become physically connected (whether lines are private or public) and (2) knowledge of the new site and its capabilities must be disseminated to the existing sites. The second point is the challenge as this directly involves communicating (somehow) with all of the other sites and providing information to them. Again telepresence applications complicate this situation because the site-end capabilities and equipment inventories may be very complex.

The challenge for the NSI is the definition of set of services that enables or provides directly the components to satisfy these network goals.

Multipoint Communications Service

The MCS protocol specification was initially produced by a research group in Bell- Northern Research to allow PC and Mac- based conferencing applications to communicate over a collection of point- to- point links. The original versions of the protocol provide a service abstraction based on a group communication paradigm consisting of: connected undirected point- to- point links; attached sessions which are connected links formed into a directed tree- like hierarchy; and dynamically managed channels and tokens, where each session contains a set of dedicated tokens for resource management, as well as dedicated unicast and multicast channels for data communications. This hides the low- level point- to- point communication details and group membership and administration functions from the invoking application. Applications that currently use MCS include textual and graphical teleconferencing applications such as VIS- a- VIS produced by Worldlinx, but MCS is purported by its creators to support various other kinds of conferencing or "groupware" class applications, such as audio- graphic conferencing, still- image distribution applications, distributed processing algorithms, and even low- grade motion video applications. The only commercially available implementation of MCS is currently produced by Worldlinx, and runs on IBM PC (DOS and Windows) as well as Macintosh based platforms.

MCS forms a excellent substrate for NSI because provides a good abstraction for multipoint as well as point to point interactions. However, to be truly useful as a basis for NSI, several changes were needed: (1) porting to Unix so that an adequate process model was available and so that it could interconnect with our existing applications; and (2) adapting its point-to-point model of multicast communications to a mixed true multicast/point-to-point model.

These issues are addressed in the construction of the Unix port of the original DOS MCS implementation, and then "layering" the higher level network services in an independent NSI implementation which opaquely uses the primitives of the MCS Unix port. Thus we have produced a true MCS Unix port which conforms to the original intra- node protocol, so that in theory it would be possible to connect a DOS/Windows or Mac- based MCS agent through a TCP socket over an IP network to a Unix version of an MCS agent (although this has not been implemented to date). Beyond this, the MCS Unix port also provides the ability to use underlying multicast transport facilities, albeit not yet based on true IP- multicast in our case. This entailed a significant effort to extend the existing point- to- point intra- node MCS protocol functionality to function over a reliable multipoint transport. In actual fact, the intra- node protocol itself was preserved intact; extra mechanisms were added to allow the protocol to run correctly over links that contained possibly many down- link members instead of just one (as is the case with a point- to- point connection). This allows the arbitrary combination of point- to- point and multicast transport into the same MCS network, as proposed in [].

While maintaining the "polling" nature of the existing MCS application interface, our MCS Unix port also provides the capability to asynchronously block on the MCS network sockets (e.g. using the BSD- style "select" system call) which are "exported" through an external array of socket file descriptors. The invoking application may then invoke MCS- specific processing on MCS network input when available, and then subsequently act on any resulting feedback from MCS via the conventional mechanisms, as well as handling other inputs as necessary in an asynchronous event- driven manner. Finally, all MCS primitives have been made available in an optional Tcl [] format as well as a standard C- library format, to permit faster prototype- style development of applications using the embedded interpreted language facilities provided by Tcl (the core functionality of the NSI is presently implemented in Tcl).

Although the NSI uses MCS functionality to implement its own group communications primitives and thus can be considered in truth just an MCS application, it seeks to pass on a simplified yet still flexible and powerful model of group communication as one of its primary services. First, the NSI hides the details of setting up reliable transport- level point- to- point or multipoint connections from applications, so that applications need only to indicate which "domain" (similar to an MCS session) it wishes to attach to and in what mode it wishes to relate to other applications joined to the same domain (i.e. in a master/slave arrangement, or truly as a peer). Second, it provides a more "asynchronous" model of network interaction by allowing the user to write its own "call- back" functions (similar to the standard X- window development style), thus eliminating the need for any sort of polling- style interface for the application. Finally, the NSI provides other network services which, at least from our experience writing Telepresence applications, seem to go hand- in- hand with the development network- capable systems.

NSI

The primary motivation for NSI is to provide a platform that encapsulates the capabilities which are common between services provided by the network. This does not necessarily imply that all network services are required to use all the capabilities provided by the interface; it simply identifies a common layer which provides lower level services needed to support many of the available network interface functions. For example, it would be useful if the interface substrate provided a basic robustness and security service, as well as reliable point- to- point and multipoint communications. It is also desirable for the interface to provide appropriate and useful API constructs for managing the complexities inherent in Telepresence applications.

Figure 1 illustrates the overall architecture with which the NSI provides necessary network services for Telepresence applications. It has the following characteristics:

Use of MCS. It makes use of the Multipoint Communication Service (described previously) as the core intra- NSI communication technology, while making use of well- known TCP- socket based client- server mechanisms for communicating between the NSI service entities and NSI client applications.

Network Entities. Control of network entities (e.g. physical devices such as cameras or VCRs, or abstract resources such as real- time audio- graphic conferences) are managed by three types of components: Physical Device Controllers (PDCs) that manage real physical devices at their actual sites; Logical Device Controllers (LDCs) that exist at all sites interested in accessing resources managed by PDCs; and Distributed Resource Controllers (DRCs) that manage abstract resources that are distributed at a number of sites.

LDCs. Every physical network entity accessed through the Network Services Interface deals directly with a Logical Device Controller (LDC) at every node, including the node at which the real "physical" device driver is located. An LDC is spawned at a node when requested if it is determined that the requestor has access rights to the resource. Once incarnated, all LDCs participate in a distributed manner in conjunction with a PDC to manage concurrent access to a network entity.

PDCs. A Physical Device Controller is co- located with each "physical" network resource interface and handles all interaction between the actual resource and the network interface substrate. It functions as a network resource server, interacting as necessary with LDC "clients" to manage access to the resource as appropriate. If the network entity and its user are located at the same node, the particular LDC and PDC will communicate directly instead of through MCS, however this is just for efficiency as a local LDC is treated in an identical manner as the remote LDCs in all other respects (unless instructed explicitly otherwise).

DRCs. Network entities that are truly distributed and have no actual physical realization to be controlled at any point in the network, are handled in the NSI using Distributed Resource Controllers. These software entities allow applications to define "virtual" resources and manage them using distributed algorithms, and so does not force the application to conceptually coerce a distributed service into a centralized or hierarchical framework. Examples include distributed databases, informal video- conference calls for which there is no explicit "owner", and shared whiteboard applications.

Figure 1: Network Services Interface Architecture

Directory Services. The NSI is expected to provide a global name service via a simplified interface to X.500 DNS entities which will be instantiated and populated as necessary at NSI sites. Note that the underlying global name service for NSI need not be directly X.500 based; in fact, the current prototypes of the NSI system employs a direct/centralized key- value dictionary style lookup facility implemented using MCS services. Directory Services is a special case of a DRC.

Telepresence applications (NSI clients) access network entities through an NSI server process, thus they do not deal directly with MCS and do not have to deal directly with device/resource controllers. The NSI server is defined through an API that hides the client-server messages. The server itself may be a monolithic single image combination of its subsystems (MCS, DRCs, PDCs, LDCs), or a combination of subsystems that exist as independent processes; in either case, the complexity of the underlying systems are hidden from the NSI clients.

Portholes over NSI

The Xerox Portholes system [] supports background distributed group awareness by providing a palette of small "postage stamp" images of persons subscribed to as being part of the user's community of interest. The current Portholes system implemented at Telepresence (see Figure 2(a)) consists of three major components: the clients (one per user), the servers (usually one per "site", or local area of potential clients, but possibly serving only an intermediary function), and the frame grabbers (allocated to one server in a site). Servers communicate in a spanning tree in order to forward images to clients that are at different sites (since a user's community may be spread over a number of sites). As an experiment to evaluate NSI's capability to easily support a typical telepresence application, we decided to replace the server- network component of the existing Portholes design with an appropriately engineered NSI application interface; the NSI layer takes care most of the server's functions implicitly. Thus the NSI- based Portholes design (see Figure 2(b)) has just two major components: the clients (again one per user), and the frame grabbers (one or more per "site"). The existing client process code and the existing frame grabber code are reused as much as possible, with the necessary network modifications so that they communicate directly over NSI service access points instead of using TCP sockets to talk to the original servers.

Features that are supported by the Telepresence Portholes include image registration and image access control (image viewing permissions). Each of these is handled by exchanging information between the spanning tree of servers, where each server maintains a database of which images are available for subscription, which images are being requested for subscription, and which users are permitted (by the image owner) to subscribe to an image. Every change to the database information about a user at one site is propagated to all other sites through the servers; in effect, it is a distributed database with a full copy maintained at each site.

The design of the registration and access control for the NSI- based Portholes system is very simple and mirrors the principles by which it is done in the Telepresence Portholes: (1) allocate a multicast channel for the user, (2) register the image source (along with the appropriate frame grabber name) with directory services under the user's name, and (3) register with directory services any access control list(s). De- registration is a matter of: (1) removing the appropriate directory services entries, (2) sending a data message over a predefined controller channel that the image source is no longer available, (3) waiting for an indication for that channel that no- one is now joined to it (i.e., it has become empty), and (4) freeing channel.

Current Portholes frame- grabber processes operate by contacting the appropriate, statically-defined server process, and waiting for image subscription messages (or update- image messages) from it. When image subscriptions are received, a frame- grabber process adds (or deletes) the appropriate subscription information into its schedule of frame- grabs and subsequent distribution for subscribed users. An update- image message indicates to the frame- grabber process that an image for the indicated user should be grabbed and distributed as soon as possible.

The frame- grabber process would be similar for the NSI- Based Portholes system, except that there would be no server for a newly invoked frame- grabber process to contact, rather the process would simply obtain a unique "user- id" channel from NSI upon invocation, and then register it under its unique name with NSI directory services. Also, the frame- grabber only ever receives one type of message on its own channel, a "new- client- subscription" message that contains the name of the user whose image is to be accessed, and the multicast channel on which the image is to be broadcast. If the frame- grabber process is already generating images on the indicated multicast channel then this is effectively an "update- image" message as previously defined, and is treated as such. Otherwise, the frame- grabber process knows that this is the first subscription request for the user's image, and thus treats it as an image subscription message, scheduling frame- grabs for the image source and distributing them (on the indicated multicast channel, in this case) as appropriate. The frame- grabber process must also join a predefined controller channel, so that it knows to "unsubscribe" to a user's image whenever the channel on which it is broadcasting images becomes empty.

The NSI- based Portholes client operates as follows. For each user whose image is subscribed to by the newly invoked client, access the NSI directory services to determine the multicast channel on which the image is being broadcast, and the name of the frame grabber which is providing the source for that image. The client then checks to see if the access control list has been registered in NSI directory services, and if access is not allowed the image is replaced by the appropriate message. Otherwise, the multicast channel is joined, and a message is sent to the image source's frame grabber (whose unicast "user- id" channel is registered with NSI directory services) requesting an update of the image (consequences described above). Once all desired multicast channels have been joined, the client simply receives periodic image updates on these channels and displays them.