Abstract
Many real-world open-domain conversation applications have specific goals to achieve during open-ended chats, such as recommendation, psychotherapy, education, etc. We study the problem of imposing conversational goals on open-domain chat agents. In particular, we want a conversational system to chat naturally with human and proactively guide the conversation to a designated target subject. The problem is challenging as no public data is available for learning such a target-guided strategy. We propose a structured approach that introduces coarse-grained keywords to control the intended content of system responses. We then attain smooth conversation transition through turn-level supervised learning, and drive the conversation towards the target with discourse-level constraints. We further derive a keyword-augmented conversation dataset for the study. Quantitative and human evaluations show our system can produce meaningful and effective conversations, significantly improving over other approaches.
Contributions
• This paper makes a step towards open-domain dialogue agents with conversational goals. In particular, we want the system to chat naturally with humans on open domain topics and proactively guide the conversation to a designated target subject. For example, in Figure 1, given a target e-books and an arbitrary starting topic such as tired, the agent drives the conversation in a natural way following a high-level logical backbone, and effectively reaches the target in the end. Such a target-guided conversation setup is generalpurpose and can entail a large variety of practical applications as above. The above problem is difficult in that the agent has to balance well between chatting naturally and achieving the target; and moreover, to the best of our knowledge, there is no public dataset available for learning targetguided dialogue.
•This paper proposes a solution to the task. We decouple the whole system into separate modules and address the challenges at different granularity. Specifically, we explicitly model and control the intended content of each system response by introducing coarse-grained utterance keywords. We then impose a discourse-level rule that encourages the keywords to approach the end target during the course of the conversation; and we attain smooth conversation transition at each dialogue turn through turn-level supervised learning. To this end, we further derive a keyword-augmented conversation dataset from an existing daily-life chat corpus (Zhang et al., 2018) and use it for learning keyword transitions and utterance production.
•We study different keyword transition approaches, including pairwise PMI-based transition, neural-based prediction, and a hybrid kernelbased method. We conduct quantitative and human evaluations to measure the performance of sub-modules and the whole system. Our agent is able to generate meaningful and effective conversations with a decent success rate of reaching the targets, improving over other approaches in different respects. We show target-guided open-domain conversation is a promising and potentially important direction for future research.
Architecture
Conclusions
We have studied the problem of target-guided open-domain conversation, where an agent converses naturally with the human and proactively guides the conversation to a designated end target. We propose a modular solution with coarsegrained keywords as a logical backbone, and use partial supervision and heuristic rules to achieve the task. We also derive a dataset for the study. Quantitative and human evaluations demonstrate promising and improved results of our approach.
This work presents an initial attempt to bridge the gap between open-domain chit-chat and taskoriented dialogue. A target-guided agent can be deployed in practice to converse with users engagingly and guide the users to trigger task-oriented systems (e.g., reserving a restaurant) in the end. An open-domain agent with control over the conversation strategy and end target can also be useful in education, psychotherapy, and others as discussed in section 1. Our treatment of utterance action and conversation target through simple keywords can be preliminary in terms of complex real applications. It would be exciting to explore more sophisticated modeling to enable more finegrained control on both sentence (Hu et al., 2017) and discourse levels.