Natural Language as an Interface Style

Byron Long, byron@dgp.utoronto.ca
Dynamic Graphics Project
Department of Computer Science
University of Toronto
May, 1994

Introduction
Criticicisms
Counter Arguments
Recommendations
Conclusions
References

Introduction

Natural language is one of many 'interface styles' (or 'interaction modalities') that can be used in the dialog between a human user and a computer. There is a significant appeal in being able to address a machine and direct it's operations by using the same language we use in everyday human to human interaction.

Conventional wisdom in the field of human-computer interaction, however, is that natural language is nowhere near as an attractive an interface alternative as it initially appears. The literature tends to focus on failures of such interfaces to achieve what was expected of them. As an example, an introductory human-computer interaction text dismisses natural language interfaces on the grounds that language is ambiguous. Successful implementations are thus characterized as being sufficiently restricted in syntax or lexicon as to call into doubt their status as natural language (Dix et. al. 1993).

Usually, the degree of ambiguity in natural language is considered too extreme for it ever to be used effectively as an interface style (Hill, 1983). Further, when systems use constraints on the structures and lexicon of a system to limit ambiguity, it is assumed the user will be required to learn what structures are acceptable, making natural language no more useful or learnable than formal command languages.

Despite the lack of usefulness predicted by such accounts, some restricted language systems maintain most of the advantages associated with natural language interfaces. Usually most problems are shown not to be due to ambiguity but to excessive expectation of the capability of the system on the part of the user or the use of world knowledge that is not reflected in the computer's knowledge base. Like any other interface style, the mapping between the user's cognitive model of the system and the capabilities of the natural language interface is not perfect. To remedy this, certain guidelines should be considered for the use of the style. Many reports suggest such guidelines, but only for their particular area of experience; a broader listing would help when considering natural language as an interface style for a design.

Considering this, some further arguments against natural languages are examined, as well as successful implementations that manage to avoid the problems. From these successes, a set of guidelines for the inclusion of natural language in an interface will be collected.

Criticisms

An unrestricted natural language interface is generally considered an enticing prospect because, if it could be implemented, it would offer many advantages: it would be easy to learn and easy to remember, because its structure and vocabulary are already familiar to the user; because the same language could be used for many application, there might be fewer transfer problems between applications; they are particularly powerful because of the multitude of ways in which to accomplish an action; and they also allow considerable flexibility in executing the steps of a task (from Mayhew, 1992).

Unfortunately, natural language is often ambiguous and is dependent on a great deal of world knowledge. In order to implement a working natural language system one must usually restrict it to cover only a limited subset of the vocabulary and syntax of a full natural language. This allows ambiguity to be reduced and processing time to be kept within reasonable bounds. In order to still be considered a natural language interface, most of the positive traits of a general natural language interface would have to be maintained. To retain the properties of ease of use and ease of remembering, the limitations of the system must somehow be conveyed to the user without requiring them to learn the rules explicitly.

Additionally, natural language interfaces have in the past led users to anthropomorphize the computer, or at least to attribute more intelligence than is warranted to it. This leads to unrealistic expectations of the capabilities of the system on the part of the user. Such expectations will make it difficult to learn the restrictions of the system if they attribute to much capability to it, and they will lead to disappointment when the system fails to perform as expected (Dix et. al., 1993, Mayhew, 1992 and particularly Shneiderman, 1992).

Natural language interfaces, if they are the only form of interaction, do not take advantage of the capabilities of the computer -- those strategies that work in human-human communication are probably not the best suited to human-computer interactions, where the computer can display information many times faster than people can enter commands (Shneiderman, 1992).

The use of natural speech understanding is thought to suffer from the same problems as written natural language interfaces, in addition to the problem that speech recognition itself has not been very successful.

Some natural language interfaces are so restricted as to little distinguish them from command-line languages. If the system restricts the possible structures and vocabulary available for interaction to such a degree that it is unlikely that a novice user will even be able to be begin using the system, then the natural language system has failed in its mandate. For instance an interface that serves only to provide a few more ways of entering a command, without allowing for common linguistic transformations of the command, should probably not be called a natural language interface (see Manaris et. al., 1994 as an example). Though it may be possible for a user to guess at the appropriate command syntax, they will be dissatisfied when they are unable to execute compound commands or to use anything but the imperative voice.

Counter-Arguments

Modeling and Shaping

To be considered a natural language interface doesn't require that the system be able to parse every utterance made by a user that another human conversational partner would be able to; rather, the defining characteristic of a natural language interface is that a user need not explicitly learn the lexicon and syntax of the system, so that they are able to express what they want in the language they are used to. The question now is how to convey the subset of language understood by the system to the user without putting significant learning demands on the user or burdening them with a large number of explicit error messages.

A possible solution is to take advantage of the fact that people apparently design utterances with their addressees in mind (Brennan and Ohaeri, 1994). One way to mold the interaction between the machine and the computer would be to tailor the feedback to the system to model the style of utterances best understood by the system. For example, an early hidden-operator (or 'Wizard-of-Oz') experiment compared users creating graphs using a natural language interface where the abstract command-line syntax corresponding to their utterances was either shown to the user, or, in the control group, was not (Slator, et. al., 1986). Those users that were presented with the command-line feedback produced significantly fewer semantically ambiguous utterances than those without feedback; they quickly switched over to the syntax used in the command-line interface. This adaptation was attributed to the impatience of the users, who tend to quickly learn things that ease and speed their work, and to the fact that acquiring pidgin dialects is part of the users' linguistic competence. Also, being able glimpse at the abstract model of the system through the mnemonic feedback, the user's own model of the system, normally built only out of experience with the system, might be accumulated more accurately and quickly.

The method used in this situation takes the approach that it is better to guide users to use the underlying mnemonics of the command-line system as a way to reduce ambiguity; it makes the language that the system can parse explicit. A less directed approach would be to provide feedback not in system mnemonics but using natural language whose phrase structure and vocabulary, if matched by the user, could consistently be parsed by the system (Zoltan-Ford, 1991). As they interact with the computer, they should begin to narrow their range of syntax and vocabulary to that used by the computer in addressing them -- the user will be 'shaped' by the output. In the extreme, users would model system output exactly, but this level of correspondence between the computer's and user's language is not likely to be achievable in most applications. To determine if users would indeed be shaped by the feedback provided, a hidden-operator experiment was conducted, where an operator performed the appropriate actions that a subject either typed or said. The experiment examined four separate factors, including: the type of interaction, either voice or keyboard; the type of vocabulary, either familiar (common in everyday use) or unfamiliar; the length of utterances, either conversational or terse; and the amount of restriction placed on the subjects' language, either restricted (where the system only worked if the users used the vocabulary and phrase-length used by the computer) or unrestricted (where any wording was accepted).

Those users that were exposed to conversational (longer) output produced more words than those presented with terse outputs, and no other factors were significant influences on this variation -- so subjects used the computer's output length as a model for the length of their inputs. When presented with terse utterances, users were more likely to successfully model the vocabulary of system in addition to output length. Further, subjects who were required to interact using restricted language generated considerably more output conforming messages than those who used unrestricted language. The only length and restriction condition that did not lead to an increase in output-conforming messages as they progressed through their tasks was the conversational-unrestricted condition, where no shaping clues were provided to the user. It should be noted that the shaping through restriction came at the cost of more messages being generated by the user, because some of their early utterances were rejected by the system.

These results led to several conclusions. Firstly, people will model the length of outputs; the length of their inputs is based only on whether the feedback is terse or conversational. When examining both length and vocabulary however, it was easier to model the terse output. In any case, the modeling was not perfect - users' inputs rarely mirrored the computer's outputs precisely. rather, shaping served to reduce the variability of the input.

These findings led to the recommendation that designers provide consistently worded output, that they design the program to communicate with tersely phrased output, and they include non-threatening error messages that use the vocabulary and phrases the system can understand.

With respect shaping, the mode of communication (voice or type) made no difference, however, interesting differences in the way the users accomplished tasks were evident. For instance, speech users tended to perform file manipulations in a step-by-step fashion, allowing the computer to query them for each step, while keyboard users were more likely to provide all the necessary information for a file manipulation in one request. A similar difference was found between restricted language users (who would let the computer prompt them) and unrestricted language users (who would provide most of the relevant information in one sentence). Voice users also tended to recall files for verification purposes no matter what their certainty of the computer's understanding was. Keyboard users on the other hand, only recalled files for verification during the initial learning phase of the interaction.

User's attitudes towards the natural-language system on the whole were very positive, however ratings of user satisfaction were only increased with use for unrestricted systems, and were the same for restricted language users. This seems to indicate that users' satisfaction with the system will always be based on the processor's ability to handle variability.

Follow-up research, has shown that the output length may not actually be the largest determiner of the degree of shaping a system can illicit (Brennan and Ohaeri, 1994). In an experiment, three message styles were used: telegraphic, which were incomplete, terse sentences; fluent, which were complete, grammatical sentences; and anthropomorphic, which were complete sentences containing first person pronouns. The experiment was, again, conducted as a hidden-operator experiment where input was typed and was unrestricted, save for the fact that typos and misspellings were not accepted.

The anthropomorphic messages led subjects to refer to the computer using the second person pronoun "you" twice as frequently as either of the other two message types. Apparently the anthropomorphic messages led the subjects to treat the computer as more of a social partner than did the other types. With respect to word count, the fluent condition was closer to the telegraphic than the anthropomorphic condition, which generated significantly more words. The anthropomorphic condition also led to a longer task completion time and more indirect requests and conventional politeness terms than did the fluent or telegraphic messages. Thus it seems that the largest determiner in shaping a user's input length is not the degree of fluency of the feedback, but whether that feedback includes anthropomorphic references. Thus, the key to an effective interface is to a avoid the use of personal pronouns and other anthropomorphic references.

Anthropomorphism

A further question that this experiment sought to address was whether anthropomorphic messages would indeed lead to a greater attribution of intelligence to the computer. The question of attribution is important, because a perception that the machine is intelligent may cause users to form unrealistic expectations of the capabilities of the system. When the system fails to meet these expectations they may become dissatisfied with the system.

Attributions of intelligence were similar for all the message types, in addition, in no group did people appear to believe that the computer had general knowledge outside of the task domain. This result is interesting because wariness of anthropomorphization has be central to the criticism of natural language as an interface style; if the ability to use natural language does not cause users to expect too much of the system, then natural language can function as well as any other interface style.

Satisfaction

Napier, et. al., (1989) compared the performance of subjects using a restricted natural language interface to others using a traditional menu-based interface in interacting with a business application. The natural language interface showed advantages over the more traditional interface in measures of both task performance, where significantly more users successfully solved the given problems, and in user satisfaction. The performance increase became greater as the users became more experienced, indicating a fairly significant increase in learning rate over the menu-based interface. In trying to determine just what it was about the natural language interface that made it so much better than the traditional interface, two possibilities were proposed; either the commands, being more like English, are more familiar and better remembered, or it may have been because the particular natural language interface used made some use of context in interpreting a command, allowing the user to give instructions at a relatively high level, avoiding the need for a series of detailed commands. Though these results are impressive, the authors advise caution in generalizing the results to more complex problems, more experienced users, or even application domains that are different than the one examined in the experiment (spreadsheets). In any case this experiment demonstrates an instance of a fully-implemented natural language interface that offers some benefit to its users.

Errors

Another problem associated with natural language interfaces, particularly those that are based on speech, are disfluencies on the part of the user. Errors and delays can occur in many places throughout an utterance, as well as at any stage of natural language processing. The errors can be difficult to detect, and are harder still to correct dependably.

V�ronis (1991) proposes a typology of the errors that can occur in a man-machine interaction. Errors are distinguished as being either errors produced by the system or by the human operator and as either performance or competence errors. Competence errors arise from lack of knowledge of the linguistic rules, while performance errors are accidents that are made despite correct knowledge of the rules. All the classes of errors can occur at either the lexical, syntactic or semantic level of processing. It is suggested that explicit correction of a user error need only be done for competence errors, as these errors may effectively stop the human-computer dialog. Performance errors, on the other hand, if they cannot be corrected automatically, can be corrected by the user when their input is not parsed. When an error is detected these errors, the difficulty arises that the same error may result from different causes, so it may not be possible to properly categorize the error. In this case the system may propose a correction to the user, who has the opportunity to reject it if it is unsuitable.

Further, VŽronis addresses the question of just what counts a natural language interface. Acknowledging the limitations in the linguistic coverage of systems that are currently implementable, either systems with as large a coverage as possible or limited subsystems with an easily understood level of competence can be built. The former type of system leads to a high failure rate, as utterances which cannot be handled by today's parsing technology are rejected for reasons that are too arcane for the user to understand and correct for. By limiting the coverage to an understandable subset of the language a user can quickly learn the limits of the competence of the system because of the simplicity and consistency of the limits. "If the system cannot adapt to the user, the user should be able to adapt to the system." (V�ronis, 1991) The learning of the systems limits must not, of course, be an explicit process, but rather must be able to be derived by the user through experience. A good way to aid implicit learning is to base it on an artificial sub-language, with a lexicon and phrasal structure that is limited in such a way to suit the expectations the user would have of a system operating in the particular domain of the application in question. The sub-language must be predictable, which is to say it must be understood under all common linguistic transformations (passivization, for instance) or it will seem inconsistent to the user.

It may be possible to limit the occurrences of errors by careful consideration of feedback from the interface, as it was with determining limitations in the understanding of the system. One possible source of disfluencies in natural language interfaces may be the planning demands required by the task; this hypothesis was checked by comparing disfluency rates between different degrees of task structure and between different sentence lengths (Oviatt, 1994). A task was designed with two levels of structure. In the first, from-based, condition, the user was prompted to fill in each field on a form in order, while in the second condition, there was no form representation and order was unconstrained; the user was able to express their needs in a general fashion. The subjects either wrote or spoke their commands to the system, and in all cases their entries were interpreted by a hidden operator.

Disfluencies were significantly reduced for form-based tasks as compared to unconstrained tasks in both the spoken and written conditions. However, the difference was considerable more significant for spoken inputs, with form-based spoken inputs having a lower disfluency rate than written and unconstrained speech inputs having a higher rate than the written system. Further, for spoken inputs only, a significant portion of the disfluency rates (77%) could be accounted for by the length of the utterance in which the disfluencies were found; the longer the utterance, the higher the disfluency rate. The proportion of disfluency accounted for by task structure, after controlling for utterance length, was still significant, despite the consistently shorter utterances required for the highly structured task. About seventy percent of all spoken disfluencies could be eliminated by simply switching to a more structured task format

Disfluency rates were further compared with disfluency rates previously gathered from human-human speech interaction in four different situations. Disfluency rates were significantly lower for human-computer speech than for human-human speech in all cases. This seems to indicate that humans are well aware of the difference between speaking to a computer and another human, and compensate dramatically for their lower expectations of the capability of the computer by producing more careful speech.

This is contrary to the authors earlier research which suggested, based on performance differences between human-human communications where interaction was or was not allowed and speech or typing was used, that computer-generated interactive speech would lead to utterances as long as those found in interactive speech (Oviatt and Cohen, 1991). This research showed that in an instructor-learner task interactive speech and typed input differ in several ways. Speech was wordier, contained more personal pronouns and more introductory temporal segments such as "Okay, next..." were used. Efficiency in the speech conditions was also higher than in the non-speech modes, probably due to the smaller overhead that speech interaction has versus typed input.

Speech

The difference betwen typed and spoken natural language is often ignored. Speech, which is more efficient that typing or writing, is the most promising method of conducting natural language interactions.

Most testing of speech recognition in the past has used the 'Wizard-of-Oz' technique because effective speech recognition was not available. Recently however, the robustness of speech recognition has increased dramatically. For example, speech systems are now able to distinguish between words that are meant to be isolated or part of a two-word command (Danis et. al., 1994) or are able to effectively understand natural speech over phone lines (Yankelovich, 1994), among other things. With many operating systems beginning to ship with some form of speech recognition, reasonable and affordable speech understanding seems to be imminent.

If speech is used to direct natural-language commands to the computer, it frees up the hands for other tasks and allows users to take advantage of their natural voice communications skills.

Multi-modal interfaces

Recently, many researchers (Buxton and Myers, 1986, Chatty, 1994, etc.) have noted that it may be useful to take advantage of all the possible channels of communication between a user and the computer. Relatively untapped channels include gestures, off-handed pointing, haptic feedback, non-speech audio, speech audio and, of particular interest here, speech recognition. By combining natural speech understanding with other interaction styles, it is possible to capitalize on the additional cues for disambiguation provided by the other modalities. Also, a greater bandwidth of interaction between the man and the machine is possible. An example of a system that takes advantage of speech in concert with other interface modalities the CUBRICON system (Neal and Shapiro, 1991). CUBRICON includes the ability to generate and recognize speech, to generate natural language text, to display graphics and to use gestures made with a pointing device. The system is able to combine all the inputs into the language parsing process and all the outputs in the language generation process.

The ability to integrate gestures with the parsing process allows the system to take advantage of such gestures for parsing of deictic terms and for disambiguation; the specific instance of a class can be 'pointed' out for instance. This allows for queries for such as "What is this <point>?" and commands such as "Send this <point> there <point>." Other multi-modal systems take are able to handle a wider range of gestures, including simultaneous gesturing with both hands (Wahlster, 1991).

The system also takes advantage of the multiple modes of output to construct a representation of the system knowledge that best suits the user's task. By combining a visual display with speech and natural language text, the entire context of interaction can be conveyed to the user without the excessive verbosity that would be required by a language-only interface, and with the explanatory power that is missing from most graphics-only systems. The language understanding component can take advantage of the limiting context of the display during parsing; those objects on the display can be treated as having been previously expressed within the discourse model maintained by the system.

CUBRICON also maintains a model of the user that includes a representation of the user's current task and of the importance the user attaches to the entity types while performing different tasks. This allows the system to better tailor the output to the user's intentions, and in so doing, to limit the context for parsing. The use of these sources of knowledge leads to effective interaction with the final system (Maybury, 1994).

Recommendations

The decision as to whether or not to use a natural language interface must be based on the expected role of language component of the interface within the application. The scope of the understanding should be readily encompassed by an artificial sub-language, otherwise the range of the capability of the system will be difficult for the user to determine. The degree of restriction imposed on the language must somehow be conveyed to the user so that they can learn from their experience with the system.

It should be noted that interference between different sets of restraints may occur if the user uses more than one natural language interface. If the user is likely to have already become experienced with a natural language interface, the capabilities of that system should be considered during the design process. Any differences in the capabilities that cannot be easily attributed by the user to differences in the application domains of the systems should be avoided because of the difficulty involved in distinguishing the separate restrictions of the applications. One possible solution to this problem is to distinguish the speech interfaces of the two applications. Users consider computers to be social partners, and adapt to them as such. However, in the case of speech interfaces, each distinct voice is considered a separate conversational partner with different capabilities (Nass et. al., 1994), The voice is the only determiner of individual actors, different machines with the same voice are considered to be the same social partner. Thus the different interfaces might be characterized by different voices so that users consider them different social partners.

Natural language interfaces are best used in combination with other interface styles - making natural language the exclusive form of interaction limits the domains in which the system can be used and fails to take advantage of the multiple channels of human-computer interaction that are available. Speech based natural language is more suitable than typed or written natural language when it comes to multi-modal interaction, because it takes advantage of a channel of communication that has not already been exploited rather than demanding the use of the hands, which can be used in a wider variety of ways.

If natural speech understanding is to be used in an interface, several guidelines for its use should be considered. Firstly, the form of messages used as feedback needs to be carefully considered:

Messages should reflect the sub-language used by the system
A user will be shaped by the lexicon and phrasal structure of the messages they are presented with, so any natural-language feedback should model the parsing capabilities of the system.
Messages should not be anthropomorphic
The presence of first person pronouns in messages results in longer utterances, more indirect requests and more politeness terms. These factors lead to increased parsing difficulty and an increased chance of performance disfluencies, and thus should be avoided.
Messages should be terse, unless clarity will be is sacrificed
Although the presence of the first-person pronoun is the biggest determiner of the user's input length, the length of the messages is still has some bearing. Using terse, "verb-noun" phrases results in better shaping to the vocabulary of the system, but message clarity should not be sacrificed in meeting this demand. It is not worth disrupting the effectiveness of the man-machine dialog to achieve maximal shaping.
Error messages should attempt to correct the user's misconceptions and, if possible, provide correction alternatives
If an error on the part of the user can be determined to be due to errors in the user's linguistic competence, they should be explicitly informed of the correct rules, otherwise the error may lead to a degradation of the quality of the man-machine dialog. If the source of the error is indeterminate, it may be useful to provide a clarification dialog, where the user can specify an appropriate correction without having to repeat themselves.

In using a speech-based natural language interface, other modes of interaction can be used to clarify utterances and to limit the user's expectations of the system. Such a multi-modal interfaces should take into account the following design considerations:

Use gestures to disambiguate deictic terms
Gestures made by pointing devices can be used to determine the appropriate referents for deictic terms, just as human gestures like pointing are often used in human-human communication. By taking advantage of such additional cues, utterance length can be lessened, and users need not be so aware of the what properties distinguish possible referents because accurate verbal specification is not required.
Use available output modes to limit the context of interaction
By providing the user with a visual (or other) representation of an appropriate portion of the knowledge base, just those objects will be considered as available for interaction. Further, the linguistic parser can consider only those objects that are currently represented in processing ambiguous inputs; the objects that are visually represented can be considered to be "in focus."
Provide visual feedback for operations
One difficulty that arises in speech based interfaces is that users tend to recall data to ensure that any changes they specified were executed successfully. If such changes are made obvious through the use of static visual cues, such behaviours are likely to be minimized. The use of speech feedback is too ephemeral to be an effective indicator of success.
Provide alternatives to speech
Speech recognition and natural language understanding can fail due to a large number of factors: background noise, characteristics of the user's voice and users who are non-native speakers are just some possibilities. A robust interface should allow for more than one way to perform each of the actions available in the interface.

In a speech based interface, task structure can greatly influence the disfluency rates. As such task structure should be considered:

Whenever appropriate, use a well defined task structure
A well structured task lowers planning demands and thus disfluency rates, however by structuring the task, one of the percieved benefits of natural-language understanding, its expressive power, is hindered. If the task is well suited to structuring then this may be a way to dramaticly increase the effectiveness of the interface, however if it is not, users may not be well served by imposing an arbitrary structure on the task

These recommendations only address issues of immediate relevance to what has been discussed here, more comprehensive recommendations for written natural language systems have been examined, though they focus on typed, uni-modal interfaces (see Mayhew, 1992).

Conclusions

Despite the fact that the capabilities of natural language parsers that can be implemented in the near future are too limited to match the conversational efficacy of a human partner, restricted natural language interfaces can be used successfully as long as some caution is involved. The learnability and apparent flexibility of expression offered by a natural language interface is particularly appealing. Even when the natural language dialog is restricted to certain phrasal structures and vocabulary, it is possible for the user to learn the limitations of the system implicitly by using shaping and by providing multi-modal representation of just what is represented in the knowledge of the system. If these limitations can be conveyed to the user without bringing to much explicit attention to themselves, the attractive features of a natural language understanding interface can be maintained despite the short-comings of the technology.

Many of the core references used in interface design have cited excessive user expectation as a fundamental problem with natural language interfaces. There is a commonly held assumption that people will attribute intelligence to a computer with which they are able to interact using their own language. If users anthropomorphize the computer, they may expect it to be able to understand utterances that are not part of the task domain because the expect it to be fully aware of its environment and to have some common world knowledge. They might also attribute reasoning capability to the system (as was seen with some users of ELIZA and similar template-based simulated conversational partners). Recent research however, has shown that natural discourse does not lead to an attribution of intelligence to a computer system, and people tend not to expect that the computer has knowledge outside of the immediate domain of the application involved.

Natural language interfaces, particularly those that are speech based, should not be dismissed out-of-hand during the design process. As speech recognition becomes more robust, as natural language parsers are sufficiently powerful enough for restricted domains and as more processing power becomes available for the interface, speech-based natural language is becoming an increasingly attractive method of interaction. By considering natural language as a viable interface style, the creativity of the design process need not be narrowed unnecessarily.

Another benefit of natural language interfaces is that users find them enjoyable to use, and are more satisfied with them than with many other interface styles. Subjective measures such as satisfaction are very important to user acceptance of the system and to their perceived quality of working life. These factors alone may be sufficient reasons for adopting a natural language interface, even if it does not improve the interaction in any other way.

Finally, A speech interface can be combined with other modes of interaction to broaden the range of interaction bandwidth. By taking advantage of as many interaction channels as possible, the efficiency and expressive ability of the interface can be increased.

References

Brennan, S. and Ohaeri, J. (1994). "Effects of Message Style on Users' Attributions toward Agents." In CHI '94: Human Factors in Computing Systems, Conference Companion. ACM. 281-282.

Buxton, W. and Myers, B. (1986). "A study of two-handed input." In Proceedings of the ACM SIGCHI.

Chatty, S. (1994). "Issues and Experience in Designing Two-Handed Interaction." In CHI '94: Human Factors in Computing Systems, Conference Companion. ACM. 253-254.

Danis, C., Comerford, L., Janke, E. and Davies, K. (1994). "Storywriter: A Speech Oriented Editor." In CHI '94: Human Factors in Computing Systems, Conference Companion. ACM. 275-276.

Dix, A., Finlay, J., Abowd, G. and Bealle, R. (1993). Human-Computer Interaction. Prentice-Hall.

Hill, I. (1983). "Natural language versus computer language." In M. Sime and M. Coombs (Eds.) Designing for Human-Computer Communication. Academic Press.

Manaris, B., Pritchard, J. and Dominick, W. (1994). "Developing a Natural Language Interface for the Unix Operating System." ACM SIGCHI Bulletin 26, 2. 34-40.

Mayhew, D. (1992). Principles and Guidelines in Software User Interface Design. Prentice-Hall.

Maybury, M. (1994). "Intelligent Multimedia Interfaces." In CHI '94: Human Factors in Computing Systems, Conference Companion. ACM. 423-424.

Napier, H., Lane, D., Batsell, R. and Guadango, N. (1989). "Impact of a Restricted Natural Language Interface on Ease of Learning and Productivity." Communications of the ACM 32, 10, 1190-1198.

Nass, C., Steuer, J. and Tauber, E.. (1994). "Computers are Social Actors." In CHI '94: Human Factors in Computing Systems, Conference Proceedings.ACM. 72-79.

Neal, J. and Shapiro, S. (1991). "Intelligent Multi-Media Interface Technology" In J. Sullivan and S, Tyler (Eds.) Intelligent User Interfaces . Addison-Wesley. 11-43

Oviatt, S. and Cohen, P. (1991). "The Contributing Influence of Speech and Interaction on Human Discourse Patterns" In J. Sullivan and S, Tyler (Eds.) Intelligent User Interfaces . Addison-Wesley. 69-83.

Oviatt, S. (1994). "Interface Techniques for Minimizing Disfluent Input to Spoken Language Systems." In CHI '94: Human Factors in Computing Systems, Conference Proceedings. ACM. 205-210.

Slator, B., Anderson, M. and Conley, W. (1986). "Pygmalion at the Interface." Communications of the ACM 29, 7, 599-604.B. Shneiderman. 1992. Designing the User Interface: Strategies for Effective Human-Computer Interaction (Second Edition) . Addison-Wesley.

V�ronis, J. (1991). "Error in natural language dialogue between man and machine." International Journal of Man-Machine Studies 35, 187-217.

Wahlster, W. (1991). "User and Discourse Models for Multimodal Communication" In J. Sullivan and S, Tyler (Eds.) Intelligent User Interfaces . Addison-Wesley. 45-67.

Yankelovich, N. (1994). "Talking vs. Taking: Speech Access to Remote Computers" In CHI '94: Human Factors in Computing Systems, Conference Companion. ACM. 277-278.

Zoltan-Ford, E. (1991). "How to get people to say and type what computers can understand." International Journal of Man-Machine Studies 34, 527-547.