Building a Media-Space across Wide-Area General Purpose Networks

Tracy Narine, Marilyn Mantei, Tom Milligan and Garry Beirne

Computer Systems Research Institute

University of Toronto

6 Kings College Road

Toronto, Ont. M5S 1A4

ABSTRACT

Desktop videoconferencing environments, often called media spaces, are usually installed in local environments which support television cabling and a LAN which handles the computer switching requests. The control of the network is centralized around a audio video server and the system is fast and robust. Transferring this system to a general purpose network such as the Internet increases the response time to unbearable levels and creates race conditions that cause all manner of user problems. We describe the transfer of our local video server (the IIIF server) to Internet and switched telephone lines and present the set of problems that this transfer caused in the user interface. We then describe interface fixes that surmount the problems and list the functionality in our local environment that could not be transferred. Our description of future plans suggests some possible solutions to these difficulties.

KEYWORDS

Media spaces, networks, wide area networks, Internet

INTRODUCTION

Quite often, a working and effective software system, when transferred to another computing environment or used in a more general fashion shows its weaknesses in scalability and adaptability. We adapted the video switching software that ran the University of Toronto's desktop videoconferencing environment (Buxton & Moran, 1990) to run over Internet and connect two cities in Canada that were approximately 100 miles apart. Connecting these cities were

required for a field trial that the Telepresence Project was conducting. This field trial is referred to as Indigo. We did not have ISDN lines available for shipping our data and video and had to resort to a mixture of switched 56 Kbd service from the telephone company and the Internet. This combination of scaling up the software package and using the two public transmission services led to serious problems with the user interface. Because we were trying to support real-time communication, the delays, arrival time errors and unusual communication states that arose created a large set of problems and embarrassing situations for our users.

The problems occurred because of the awkward arrangement of networks we set up. We discuss these problems for two reasons. First, these transmission services are the lowest common denominator that users will have available for the next five years. Anyone setting up a desktop videoconferencing environment will undergo the same hardships we did. We therefore suggest a number of interface design strategies that will avoid these problems. Second, although our design strategies removed some of the problems, others proved to be intractable and are not likely to go away even with better network services. We discuss that functionality which we cannot provide in the scaled up version and suggest future interface management strategies for coping with them.

Before we begin our discussion of the steps we took to transfer our video server to commercial transmission networks, we give the reader a brief overview of desktop videoconferencing environments and the issues involved in their transfer to a general purpose network such as Internet. We then describe the software and hardware environment we created in transferring to Internet. Once we have given a basic understanding of our system, we discuss the different user problems that arose with the system in place because of network delay. Then we discuss how we generated "apparent" fixes to the problems in the user interface utilizing design strategies. We pick out those problems that can't be solved and explain those aspects of media space functionality that currently cannot be used with general purpose networks. Finally, we discuss future work that we are planning that will address these problems.

MEDIA SPACES AND GENERAL PURPOSE NETWORKS

According to Gaver[4], media spaces are "computer-controllable networks of audio and video equipment used to support synchronous collaboration". To control the media space, computers and computer networks are required. Computer networks are used to send control information between machines within the media space. Computer networks come in many varieties. One such type is a general purpose network. General purpose networks are widely available in many organizations. In addition, many general purpose networks within organizations have pathways to other networks within other organizations. In most cases, these pathways are shareable between many organizations. One such example of a pathway is the Internet. The Internet is used to transfer information from one network to another By combining the use of general purpose networks over a wide area, we can support the network information transfer required in a media space.

PREVIOUS WORK

The work of the Telepresence Project and the Indigo field trial builds on the research of the CAVECAT Project[6]. A part of Telepresence's research involves studying organizations using our technology from "arms-length". Other projects researching media spaces include Mediaspace [3], Cruiser [1], and RAVE [5].

WHY USE GENERAL PURPOSE NETWORKS?

For the Indigo field trial, the Internet as a general purpose network was chosen to support the computer communication required within the media space. There are both positive and negative aspects to this choice. The Internet as a general purpose network is extremely reliable and has been in operation for many years. In addition, our existing media space software operated on Ethernet networks and it took no additional effort to make it work across the Internet. The use of the Internet is also free of charge. When compared to the cost of installing and using dedicated lines for information transfer between two cities that are in different telephone area codes, the use of the Internet seems appealing. The Internet is also available to a wide array of organizations. This makes it possible to build media spaces at many different locations. Also, the Internet as a general purpose network can be extended to locations where physical wire cannot be reached. "Slip" links can be built using telephone lines to add sites where network wires are unavailable. The negative aspect of using the Internet relates to the performance of the system. The media space installed requires communication between processes on different computers at the two sites. Network communication between sites can be thought of as information passing through a pipe. Communication will be limited by the size of the pipe between sites and how busy the pipe is. The tradeoff of lesser performance for a lower price was chosen for the Indigo field trial.

Click here for Picture

FIELD TRIAL - TECHNICAL DESCRIPTION

Each site of the Indigo field trial had its own audio visual network as illustrated in Figure 1. The two a/v networks were connected using devices called codecs(code-decode a/v signals). Codecs transmit a/v signals across digital telephone lines(switched 56 Kbd). A codec is required at each end. At each site, all offices were equipped with a camera, monitor, microphone and speaker.

The other components of the media space are the computers and the network. The computers used in this system formed a heterogeneous set. The media space servers require a UNIX computer at each site. Macintoshes served as workstations for the users. Workstations are used for regular work and running the Telepresence Application. The Telepresence Application provides a user interface to the media space. It works by sending commands such as "connect two offices together" to the media space servers. The media space servers run at Site One. If hardware requests are made for Site Two, the media space servers send commands to the hardware controllers at that site as illustrated in Figure 1. The Internet is thus used to connect the user's machines and the media space server machines together across both sites.

Media Space Servers

The iiif server is used to manage the media space. It contains the database which represents the current state and provides all the functionality available in the media space. Although iiif contains the state of the database and a rich query facility, it is difficult to extract information from it. A client wanting to find out the state of an object would have to continuously query or poll the iiif server to determine if some condition has changed. This is inefficient from the client's point of view since a great deal of information must be processed regularly. A direct consequence of this is other users trying to use the server would be competing with the polling operations. Another application called the Telepresence Application Server(TAS) was built to solve these problems. TAS receives any transaction that has been passed to iiif that has completed successfully. From these transactions, it builds the same state database of the media space that iiif has. It then processes this information, and sends a concise line of text for each office to each workstation in the media space. This line of text will contain data such as what is the state of a person's door, and are they logged in. By performing processing at the server level within TAS, clients do less work and become simpler. In addition, bandwidth is saved since less information is sent to workstations.

This design is quite different from Bellcore's Touring Machine design [1]. The Touring Machine is divided into many different collaborating processes. These processes or objects have specific functionality such as resource allocation, session management and control of hardware. One name server is used to maintain an up to date representation of the media space and to send out information to clients based on a trigger mechanism. This approach is different from ours since we have one process to control the media space which includes resource and session management and another to maintain the state of the media space.

Software Installation

In setting up the Indigo field trial, the Telepresence Project had to install software at the Indigo sites. This included the Telepresence Application and the media space servers. The server processes were easy to install since it could be done remotely. The fact that UNIX machines were being used and they were available through the Internet was utilized. The Telepresence Application was more difficult to install than the server software since the file systems of the workstations were not accessible through the Internet. The direct consequence is that an administrator had to go to the sites for installation. A positive side effect of this however, is that we were able to provide more human contact between the conductors of the trial and the participants.

THE MEDIA SPACE AND NETWORK DELAY

Transferring the media space software from the local area network(LAN) used for the Telepresence Project to a wide area network(WAN) introduced communication delays into the software. It could take five to ten seconds at a peak time during the day for a batch of transactions to be sent from one site to another. Network delay manifested itself in different forms. Timing holes showed up in our software, the time perceived by the users for operations were over-estimated and the response time of media space requests changed.

Timing Holes

Timing holes usually happened around notifications. Notifications are made on workstations and are in response to a request sent out by the iiif server. Notifications occur to inform a recipient that their office is about to be part of a connection with another office. Timing holes are situations where multiple events occurring at the right time can produce wrong behaviour in the software. An example of this would be two users trying to call the same user and the recipient getting two notifications for the calls. This problem was amplified with the use of a WAN.

Click here for Picture

Network delay is a function of how busy a network is. The WAN used for Indigo was always busy during the day. This caused delay times for notifications to increase when compared to a LAN environment as illustrated in Figure 2. The increase of network delay, also increased the chances of timing holes happening. The solution we took to solve this problem was to use semaphores at the workstation level. This was designed so that if a person was going to perform an operation where a timing hole might occur, all workstation requests are denied until the operation was completed.

Users Perception of Time

Delays were often perceived to be very long by the participants of the Indigo field trial. We would receive estimates of operations taking five minutes when they could have taken at most two. The Telepresence administrators often could not make sense of the bug reports that were given since the timing of operations as described by the users was quite different from what the administrator's would expect. To clarify this problem, special logging information was added into the client that tracked whenever a button was clicked on the Telepresence Application on the user's workstations. This allowed the verification of the actual time used in operations.

Response Time

Use of the WAN for the media space caused response time of our software to degrade. It was particularly bad at Site Two since any operation involved sending transactions to Site One, which would in turn send control information back to Site Two. This was compounded if notifications were involved. This changed the behaviour of our software and required us to compensate for delay. These issues will be tackled in the next section.

INTERFACE DESIGN STRATEGIES

In designing the interface for the Telepresence Application, that problem of network delay, and our usage of separate channels for a/v and network information had to be considered.

Network Delay

Network delay manifests itself in the user interface in different forms. Network delays causes the making and breaking of connections between offices to take a long time. In making connections, it is necessary to provide users with feedback. We chose to use the standard Macintosh rotating, circular cursor to inform the users that the operation is not yet completed. It was necessary to signify the completion of the operation with a tone generated from the interface. We found that users would start a connection operation and switch to doing other work because of the waiting time. One unexpected side-effect of using the rotating cursor was that users believed that their machines were tied up during the connection phase. They could have switched to other applications like as they normally do with the Macintosh and Multifinder. This was not apparent from the feedback of the cursor. This problem was solved with user retraining.

Awkward situations were created when the breaking of connections between offices took a long time. Users would attempt to break a connection and the response would not be immediate. The participants would start speaking again only to be rudely cut off halfway since the disconnect was already in progress. To solve this problem, we needed to make disconnects happen quickly. This was accomplished by changing the way the iiif server played sounds. The server was changed so that it could continue processing its batch of transactions while the notification or sounds were being sent to the workstations.

Separate Channels of Information

Separate channels exists for the audio visual and network information. This requires a two step process to make connections between offices. Workstation negotiation must happen across the network for notifications, and then the audio video connections are made. If the connections are within one site, then both processes happen very quickly. In a cross site connection, these processes take longer. In addition, a third stage of dialing the codec is required. For cross-site connections, the codec at each site is connected to the office involved in the connection. When the codecs at the sites have been dialed together over the telephone lines, the a/v information is transmitted into the offices. It was necessary to connect the codecs to the offices before they are dialed so that the user initiating the call could be provided with feedback. The codec displays information as it attempts to dial. The person initiating the connect will see the messages emitted by the codec. This was problematic from an interface perspective since the messages generated by a codec(often confusing) are not what a user should see. Regardless, the interface was built this way since it allowed the user to tell if a problem occurred in the final stage of the calling process.

HARD INTERFACE PROBLEMS

There are some problems with our media space that would still exist even if better network services were available. These problems exist because there are separate channels of information, and our media space servers do not handle canceling batches of transactions. In addition, the media space simply cannot support certain functionality in its current configuration. Some of these problems have strategies for coping while others do not.

Separate Information Channels

Having separate information channels requires a two step process in building connections. This is problematic since the second step is more prone to failure in a cross site connection. The remote codec could be busy for example. Faster networks would not solve this problem since the two step process would still be required by our software.

iiif server does not allow cancellation of transaction batches

The iiif server does not allow a batch to be canceled once it has been sent by a client. The interface problem that this creates is a "bail-out" option cannot be supported. Often when connections take a long time, users would like to stop the call. Stopping the batch is not possible in the current implementation of the iiif server. This problem will escalate when codecs are widely available and dialing wrong numbers to codecs become more frequent.

Functionality that does not map to a wide area network

"Affordances" as described by Donald Norman are "those fundamental properties that determine just how the thing could possibly be used." The media space designed for the Indigo field trial did not afford certain functionality across the WAN. In particular, short connections between offices called "glances" and certain situations of conferencing with three to four people were not afforded. It was necessary take these problems into account when designing the Telepresence Application.

1. Glancing

Glancing in a media space allows someone to quickly look into another person's office. In the software used in the Indigo field trial, notifications happen before the glance occurs. Locally, glances usually take two seconds. Across a WAN, the time to inform a workstation that it must make a notification will be five to ten seconds. When combined with a 30 second waiting period for the codecs to dial, glances do not make much sense across to remote sites. A more passive method of looking into a remote user's office is required. One such method is Portholes as described by Dourish [3]. Portholes sends images of offices across a network which are then displayed on user's machines. This would provide a good way of viewing a remote user's office in the absence of a glancing facility.

2. Simultaneous conferencing at both sites

Simultaneous conferencing was a media space functionality that did not map to WANs. Conferencing as implemented in the Indigo field trial allows three to four users to appear on a video monitor by the use of a picture in a picture device (PIP). The video image is divided into four quadrants that contains the participants of the conference. Conferencing was not feasible simultaneously at both sites since there was only one video line through the codecs in each direction. Click here for Picture

As shown if Figure 3, local participants appear in the normal conference size and remote participants are mapped to 1/4 of the local user's size. At that size, the remote participants gestures would become really difficult to see. To solve this problem, only Site One of Indigo was equipped with conferencing equipment. Only one person could be included remotely thus allowing all participants in conferences to be the same size on video.

FUTURE WORK

Moving the Telepresence media space software from a lab environment into a real world environment has taught us a great deal about our software's limitations. Ongoing work has been started to remove these limitations. Work is being conducted to make clients simpler and to remove limitations of the iiif server.

Clients

Clients as currently designed contain a significant amount of logic about the operations of the media space. Examples of this are when a person can glance at another given the state of the door of the receiver of the glance. Clients are also the most difficult to maintain since the file systems of the computers used in the field trial are not available through the Internet. There is currently a round of development that will move the logic required from the client to the iiif server. This is being implemented with the macro facility called Tool Command Language(TCL) [8]. TCL allows the extension of existing command languages. TCL will be incorporated into the iiif server to extend its grammar. Actions such as glances will be defined by TCL macros within the iiif server. The client will only need to send the command "glance" and the iiif server will execute the appropriate TCL macro. To change the functionality of glance will mean modifying a TCL script on a server instead of modifying all Telepresence Applications on users workstations. Making clients simpler will aid in the maintenance and installation of software since the process will become simpler.

iiif to iiif Communication

For the Indigo field trial, one iiif server running at Site One controlled both sites. The fact that only one iiif server was running was a limitation. It would be more efficient if each site had its own iiif server. This would require the two iiif servers to communicate with each other. The Telepresence project currently has this functionality under development. Supporting iiif to iiif communication will allow sites to operate more efficiently. The effects of network delay will be lessened since the amount of network communication required between sites will be reduced.

CONCLUSION

Most desktop videoconferencing environments that are built in research settings use a LAN and switched video or broad band connections. Environments that go over longer distances rely on ISDN connections. We have taken our switched video system that runs over Ethernet and coaxial cable and done something entirely different for upgrading it to long distance transmission. Keeping with our desire to keep the video and audio separate and being faced with commercial networks that have not been upgraded to ISDN, we used standard switched lines and Internet to run the media space. We did this so that it would be cost effective but ran into user interface problems because of it. What we did was redesign the interface to address some of the problems. However, we were not able to solve all problems Nevertheless, we have found out what we can do effectively in this limited network environment.

ACKNOWLEDGMENTS

Barbara Whitmer, Russell Owen, Christine Chang, Michael Palark, Andrea Leganchuk, Gary Hardock, Chris Passier

REFERENCES

1. Arango, M., Bahler, L., Bates, P., Cochinwala, M., Cohrs, D., Fish, R., Gopal, G., Griffeth, N., Herman, G.E., Hickey, T., Lee, K.C., Leland, C., Lowery, V., Mak, V., Patterson, L., Ruston, M., Segal, R.C., Sekar, M.P., Ruston, M., Vecchi, A., Wienrib, A., and Wuu, S-Y. (1993) Touring Machine: A software platform for distributed multimedia applications. Communications of the ACM, 36, 1, 68-77.

2. Buxton, B. and Moran, T. (1990) EuroPARC's integrated Interactive Intermedia Facility (IIIF): Early experiences. Proceedings of the IFIP WG8.4 Conference on Multi-user interfaces and Applications. Heraklion, Crete, September 1990. 24pp.

3. Bly, S., Harrison, S.R., and Irwin, S. (1993) Media Spaces: Bringing People Together in Video, Audio and Computing Environment. Communications of the ACM, 36, 1, 28-47.

4. Dourish, P., and Bly, S. (1991). Portholes: Supporting awareness in a distributed work group. Proceedings of CHI'91. (Monterey, California, 3 - 7 May, 1992). ACM, New York.

5. Gaver, W. (1992) The Affordances of Media Spaces for Collaboration. Proceedings of CSCW'92 (Toronto, Canada, 31 October - 4 November, 1992). ACM, New York.

6. Gaver, W., Moran, T., Maclean, A., Lennart, L., Dourish, P., Carter, K., and Buxton, W. Realizing a video environment: Europarc's RAVE system. In Proceedings of the CHI'92 Conference of Human Factors in Computing Systems. (Monterey, CA., May 3-7 1992).

7. Mantei, M., Baecker, R.M., Sellen, A., Buxton, W., Milligan, T., and Wellman, B. (1991) Experiences in the Use of a Media Space. In Proceedings of the CHI'91 Conference of Human Factors in Computing Systems. (New Orleans, LA. 1991).

8. Norman, D. Design of EveryDay Things. Double Day, New York 1990.

9. Ousterhout, J.K. (1990) Tcl: An Embeddable Command Language. John K. USENIX Conference Proceedings. 1990.