Chat Bot Intelligence - Current State
A simple chat bot can answer simple queries about - “How is the weather?”, “How are you feeling today”, “Where is the stock market today?” and so on. In most cases, the backend software in the Bot is doing a simple look-up in a database, comparing it with the user utterance and giving a reply.
In some cases, where the Bot is unable to find a match then it simply says - “Sorry, I am unable to understand it”, or something similar. If we target to build a complex Bot that can answer far more questions - perhaps everything in the world, then it becomes an impossible task (for now). To achieve such a scale and reduce complexity we will need to process all the information in the world and do a real-time comparison with the user utterance to come up with the best answer. In the search engine world, this is how it is done; - a user search query is compared to all the available documents and the best document is surfaced to the top. Users are used to typing simple keywords and if we base the technology to build a Bot on such model then this is how the conversation would look like
Even if the top results are used to read out the actual results, from a human perspective it does not make a good conversation since summarization of the results would still result in a complex answer to a simple human gesture.
Pure search engine based chat mechanisms are not that effective. The other mechanism is to craft standard replies to a large number of questions and put -those in some database. This method even though effective is similar to Interactive Voice Response (IVR) systems that lack human touch and remotely resemble normal human interaction.
Human touch is achieved when simple social skills are introduced in a Bot.
Do Smooth Social Skills Provide Human Touch?
Adding simple chit-chat abilities to a bot might give some semblance of human interaction but the real intelligence is not captured by mere chit chat. In general, humans value intelligence where a nuanced interaction results in some mutually beneficial useful action.
In short, it is good to have the ability in a Bot to do small talk but it is not sufficient.
Language Grammar to the Rescue
In order to do better than smalltalk, it is important to understand language grammar, and use it to do better dialog management. Humans use language in a very nuanced way to convey relationships between the entities. Various Natural Language Processing (NLP) techniques such as Constituency parsing, Dependency parsing, Coreference resolution etc. give a good understanding of the entities user is interested in and the action that user wants to take. Some simpler entities can be extracted using regular expressions too.
In the above example of Spacy’s Dependency parser, extracting only the verbs and the nouns gives a good indication of what the user wants
Action (Intent) - buying
Entities - a pair, jeans
A simple classifier that takes this information can build a large number of intents and the entities that the intent can operate upon. Does a large collection of intents along with the entities that it operates upon sufficient to answer user queries?
The answer is still a firm ‘No’. There are other problems that still need answers such as how to craft dynamic replies, how to identify complex sentences, if the information provided by the user is not sufficient then how to elicit the required information from the user and so on.
The real value comes from building a conversational flow model based on initial user intent and using a variety of algorithmic techniques as shown in the next few sections.
Context and Human Brain
As per Christoph Koch from Allen Institute of Brain, recent research indicates that higher intelligence (in any species including Humans) is a side product of more connectivity between various neural regions of the brain. Going by the same result, if context can be intertwined even more effectively into more complex slot-filling algorithms then it will indicate a higher bot intelligence.
As elementary school students, we learn various sentences and grammar. We learn how to create simple sentences and convey our thoughts using some templates of commonly used sentences that can be used in responding to a well known query. For example -
Q: “How much did this <product> cost?”
A: “The price of this product is <price>”
This is a very simple mechanism which can be overloaded with more complex information retrieved from a context that “knows” about the entity that is being queried and the specific “entity” relationship that is being queried (we call this piece of code a ‘Botlet’ at Linc)
For example, in actual code, this might look like this:
In a more complex scenario, where a user is engaging in a dialog a more sophisticated slot-filling can be employed to ask user appropriate questions as per the requirements of a slot and the current user intent while maintaining a good conversation flow.
Slot-filling is a very common methodology in asking the correct questions. If slot-filling is combined with some randomness within a given range of sentences then it gives a semblance of intelligence.
While slot-filling is useful in identifying the questions that the Bot can ask, a component in the Bot needs to track the ‘Conversation flow’ too, as it indicates that a conversation topic has changed and establish some ground rules on how the Bot should respond to it.
Context and Chat Bot - How to Intertwine them?
Identifying the right intent for the initial user question and then accessing the relevant context for the user to generate a sensible reply is the crux of building an intelligent chatbot.
Context can contain information about the entities extracted, past and current user behavior and sentiment, merchant level metrics and so on. A user can have hundreds of interesting context keys that are associated with him/her, and choosing the right context can become tricky.
For example, if the user has a question about finding the right product for a given category, the problem can then be divided up into these general areas –
- Identify user intent
- Extract any relevant entities from user utterance (e.g. product attributes such as color, size etc.)
- Search and identify the historical and current context of the user
- Search and identify the correct Botlet that can answer the user intent
- Use some kind of slot-filling mechanism that also understands the specific attribute the user is querying about, and then does a product search (as in the example below)
Can we build this in software?
The real challenge in building a chatbot is taking all the above problems and creating a workflow that resonates well with an end user. Having a ready cache of user context that can answer and intent is half of the solution. The other half relates to creating a workflow that is specific for a use case (in Linc’s case it is eCommerce). Simple workflows per use case are best handled by the experts in the area and providing simple tools for these experts is more important than creating all the possible use cases yourself. To give an example, in the case of HTML language, providing the HTML constructs and parser is sufficient as developers can use the provided tools to build their custom web pages.
Similarly, for Bots, individual small pieces of code (Botlets) that have been abstracted out for being common across various use cases and identifiable using an id are sufficient for real experts to develop bot dialog flows.
Intelligence is achieved by providing a high level of interconnectivity between context items and Botlets. Botlets are basically a syntax for creating user dialog flow (see the earlier example of price) and a domain expert can provide them.
As identified in the earlier problem decomposition, to summarize, in order to build an intelligent, generalized and scalable bot, the components below are required –
- Intent Classification - use a classification mechanism based on known services for the domain and the various sentences / phrases / words that fall under the various services (a Service Map). This provides an entry point for any dialog.
- A Context Building Mechanism - how will you get user data? Which context values are important for given use cases?
- An Entity Extraction Mechanism
- A Tool for Writing Bot Questions and Answers (Dialog flow or Botlets) - A domain expert can provide the inputs but the tool needs the capability to appropriately mix it with context.
- A Search mechanism that can take user intent, identify the specific Bot Question/Answer, and the context that goes along with it.
Combining user context into meaningful sentences which can either be questions or answers to user questions is an area of research that will provide more meaningful answers to users.
Slot-filling can be combined with Long Short-Term Memory (LSTM) based Neural architectures to provide deeper and thoughtful answers to users which can also include some personalization using contextual data.
About Sameer Yami
Sameer is the Head of Architecture and AI at Linc Global. He is responsible for crafting the 'smarts' as well as scaling the AI of Linc's Botlet Platform. He previously founded WikiSeer - a text summarization and semantic advertising company, and also worked in various senior engineering leadership roles at Ampush, Toshiba and Sun Microsystems/Netscape. Sameer has received multiple 'Top Inventor' awards and has filed over 30 patents of which 6 have been granted."
Link to Original Post on the Oracle Blog