2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology

Combining the Best of Two Worlds: NLP and IR for Intranet Search
Suma Adindla and Udo Kruschwitz School of Computer Science and Electronic Engineering University of Essex Wivenhoe Park, Colchester, C04 3SQ, UK {sadind,udo}
AbstractNatural language processing (NLP) is becoming much more robust and applicable in realistic applications. One area in which NLP has still not been fully exploited is information retrieval (IR). In particular we are interested in search over intranets and other local Web sites. We see dialogue-driven search which is based on a largely automated knowledge extraction process as one of the next big steps. Instead of replying with a set of documents for a user query the system would allow the user to navigate through the extracted knowledge base by making use of a simple dialogue manager. Here we support this idea with a rst task-based evaluation that we conducted on a university intranet. We automatically extracted entities like person names, organizations and locations as well as relations between entities and added visual graphs to the search results whenever a user query could be mapped into this knowledge base. We found that users are willing to interact and use those visual interfaces. We also found that users prefered such a system that guides a user through the result set over a baseline approach. The results represent an important rst step towards full NLP-driven intranet search. Keywords-natural language processing; information retrieval; dialogue; domain knowledge; visualization;

I. M OTIVATION Imagine we could interact with a university intranet search engine just like with a human person in a natural dialogue. The search engine would automatically extract knowledge from the Web site so that a searcher can be assisted in nding the information required. A student who asks for a particular course can be directed to the most recent lecture notes or the contact details of the lecturer. An external searcher typing in PhD NLE could be assisted by allowing him to explore the space of experts and projects available in the area of natural language engineering. Obviously, this information can change any day and the idea is to have always the most up-to-date facts and relations available to assist a searcher. Currently, we do not have systems which support this type of interaction. However, our aim is to automatically acquire knowledge (a domain model) from the document collection and employ that in an interactive search system. One motivation for a system that guides a user through the search space is the problem of too many results. Even queries in document collections of limited size often return a large number of documents, many of them not relevant to the query. Part of the problem is the fact that both on the Web and in intranet search queries tend to be short and
short queries always pose ambiguity and uncertanity issues for information retrieval systems [1]. Some form of dialogue based on feedback from the system could be very useful in helping the user nd the right results. This combination of NLP and IR we assume is particularly promising and scalable in smaller domains like university intranets or local Web sites. Obviously, most queries can be answered by a standard search engine but the use of NLP tools to extract knowledge can help address ambiguous queries as well as those where there might be only a single relevant document (which is common in an intranet setting). In order to employ a dialogue system we would ideally have access to a domain knowledge base because dialogue systems work well with structured knowledge bases. Web sites do have some internal structure but unlike product catalogues and online shopping sites they are not fully structured and the rst question we face is: How can we acquire suitable knowledge from a document collection to support system-guided search? We are not interested in manually extracting such knowledge, but we would like to automate that process so that we can apply the same approach to a new document collection without expensive manual customization. A related question is: What kind of knowledge should this knowledge base (the domain model) contain? We propose a system that guides a user in the search process which relies on a database automatically populated by processing the document collection and extracting pieces of knowledge from these documents. Along with named entities, relations that exist between those entities are essential in various practical applications [2]. We use NLP techniques to parse all sentences in the input documents, extract relations (such as subject-verb-object triples) and then map user queries against these relations. Such relations often involve named entities (as objects or subjects or both). Named entities have been found to play a key role in Web and intranet search, e.g. named entities can be used to deal with page ranking problems [3], they have also been found to play a key role in corporate and university search logs [4], [5]. Although we are using a knowledge base, we do not move away from the standard search paradigm. While displaying the results to a user, we combine the domain model with

the results of a local search engine. Our work is a rst step towards a full dialogue search system. Here we use visual graphs to present the relations and corresponding terms for a given query. What we are investigating here is the general validity of our approach. II. R ELATED W ORK Early work on dialogue systems focused on human computer interaction, e.g. ELIZA [6]. Since then a variety of task oriented applications have been developed in various domains. Intially, many of these dialogue systems assisted the users in travel domains. Examples include ATIS [7] and PHILIPS [8]. Generally speaking, one can distinguish different types of dialogue systems, e.g. a well dened structured database can be considered as Type I and systems which lack knowledge bases or deal with unstructured data as Type II [9]. Dialogue systems based on ontologies would fall under Type I [10], our work would start from Type II. Another possible way of guiding the user in the navigation of search results is through faceted search. A lot of research has been done in this area and we can also see commercial online sites and even libraries1 support this feature. However, the difference is that faceted search systems typically rely on well-structured databases, in other words they make use of rich structure in the knowledge bases [11]. Question-answering (QA) systems are related to our work as they tend to rely on similar NLP techniques that we apply although the main idea of a QA system is to return an answer rather than a list of documents. The rst question answering systems were only natural language interfaces to structured databases [12]. Progress in Information Extraction has more recently contributed much to the success of factoid based question answering systems. To support interaction, an element of dialogue has been added to a number of question answering systems, e.g. [13]. An example of high quality interaction question answering system is HITIQA [14]. We extract knowledge entirely from unstructured data available on (e.g.) a university website. This makes our work different from the above mentioned dialogue systems. In recent years, Web search algorithms have matured signicantly by adapting to the users information needs. An example is named entity recognition. Named entities are becoming increasingly popular in Web search. A study has shown that 71% of Web queries constitute named entities and identifying entities in a query further improves retrieval performance [15]. Analysing the search logs we have been collecting at the University conrms this observation. The sort of entities people search for might not coincide with typically identied ones such as dates, organisations and locations. In our logs it was found that queries like person names, room numbers, labs, course titles etc were routinely searched for. In addition to that 10% of our search queries

Figure 1.

Overview of the system components

consist of person names. This evidence indeed supports our work and also recommends the need for query type identication. Like on the Web, we can categorize user queries into some general types: information needs, browsing and transactional etc. [16]. The use of Web search engines has witnessed quite a bit of progress in that respect, compared to that intranet users still experience poor search results [17]. It has however been shown that understanding a query type (who, where, when) would be quite useful in an intranet domain [4]. Another area that is worth exploring in information retrieval is visualization. Information visualization is an important aspect in information retrieval systems. Also, visual interfaces are excellent tools for interacting and exploring search results. Various studies have been conducted to test the signicance of visualization for information retrieval systems [18], [19] suggest the use of various visualization methods for information organization. For information retrieval systems, the presentation of search results is still a challenging issue [20]. One example of search system which supports entity level search is [21] EntityCube2 . Another example is Googles3 Wonder Wheel. III. T OWARDS D IALOGUE - DRIVEN INTRANET SEARCH Our system consists of two parts: ofine knowedge extraction and an online mapping process that maps the query into the extracted knowledge. With the help of NLP tools and information extraction techniques, we process the document collection automatically to build a domain model. In an ofine extraction process we extract named entities and predicate argument structures from all documents of the local Web site at hand. We thus turn the university Web document collection into a usable knowledge base by populating it with named entities and simple facts. To identify entities (person, organization and location names), we use the Annie IE system that is part of the Gate4 NLP
toolkit. We use GATE, but any similar NLP tool could be employed. For extracting simple facts, we use the Stanford parser and our extraction methodology is similar to [22]. The extracted triples are represented in terms of subjectverb-object pairs. Along with triplet relations, we have also extracted dependency/predicate relations from the sentences. We will consider different ways of aiding users by suggesting various query options. This knowledge can then be used to guide the dialogue manager and our extracted relations are similar to the ones presented by [23]. Figure 1 shows an overview of the system architecture. In the second part, we try to map a user query against the knowledge base. The key component of our system is the dialogue manager and it is also responsible for the online mapping process. Whenever a user submits a query, the dialogue manager tries to map the query against the domain model and simultaneously submits it to search engine. For any user query which can be found in the knowledge base the dialogue manager allows the user to navigate through the knowledge base by presenting the relations that map the user query in some way (e.g. if the user query is a named entity that has relations with other named entities in the database). When displaying search results to a user, we combine the extracted domain knowledge with the results of a local search engine. Figure 2 shows a screenshot of our dialogue system. Results from the search engine are presented alongside a graph of extracted knowledge related to the query. In the gure the query is shown in the centre and edges present the dialogue manager suggestions for the above query. We use various colour codes to illustrate different types of terms. The green ones are the entities and the red colour terms indicate the relations. When a user clicks on any one of those entities the corresponding search box automatically updates with the clicked entity. With this interface also a user could interact during information searching. We have used JIT5 for visualization purposes. Semantic graphs are starting to get used in assisted search, e.g. in question answering [24]. A. Domain Model In the rst stage we identied named entities such as person names, organization names, and locations. We capture simple facts (relation between entities) from the sentences by using the Stanford parser and populate the database with this knowledge. This gives us a structured database, a network of related terms and the corresponding relations. A user query can then be matched against any part of this knowledge base. If the user query was a person name (very common in our domain), the dialogue manager would come up with various suggestions (department, role, contact details, other people, projects etc.). By using this piece of information, we frame

questions, generate answers and vice versa as shown in the motivating example, similar to [25] but more generic. We have two separate tables one for the entities and the other for relations. While query mapping, we extract the terms that match the user query from both the tables. With the proposed dialogue system (which will eventually go beyond a graphical interaction) a user could also engage in a dialogue, but the user is obviously not required to do so. IV. E VALUATION We conducted this rst evaluation to explore the potential of the outlined idea. The methodology we employed to evaluate our dialogue system is a task-based evaluation. We followed the TREC6 interactive track guidelines for comparing two systems. Here we are comparing our system against a baseline system. The two systems can be characterized as follows: 1) System A is the baseline system which is the search engine currently installed at the local university Web site. 2) System B is our system that works based on the automatically extracted domain model to guide a user in the navigation of search results. Here the domain entities and relations are represented in visual graphs with various colour codes. Both systems index the same document collection (they both use Nutch as a backend system.) System A is the UKSearch system [26], [27]. This system also suggests some query modication terms to rene and relax the query (presented as a at list of links). We assume this is a fair comparison because it has been shown in previous experiments on the same university Web site that users clearly prefer a search engine that makes suggestions over a Google-style search engine [27]. We therefore consider the current search engine a sensible baseline to compare against. We will now explain the experimental procedure and later we will discuss the results. While conducting the evaluation, users were not told anything about the underlying differences between the both systems. For the task-based evaluation we used the questionnaires introduced by the TREC-9 interactive series. These four questionnaires were employed: 1) Entry Questionnaire 2) Postsearch Questionnaire 3) Postsystem Questionnaire 4) Exit Questionnaire A. Procedure According to TREC interactive track guidelines at least 16 participants and 8 search tasks are required to conduct an evaluation that compares two systems in a task-based


Figure 2.

Dialogue system screenshot

evaluation. We recruited 16 students from the university population (actual target users in this context). Search tasks were designed based on query logs obtained on the universitys intranet search engine. The terms in parentheses are the queries found in the logs based on which the tasks were constructed. These terms were not included in the instructions for the subjects. Two sample tasks are:

lled in the exit questionnaire. V. R ESULTS AND D ISCUSSION Of our 16 participants 13 were male and 3 were female studying in various departments with a range of age (between 19 and 32) and experience (e.g. online search experience between 4 to 12 years). Most of the users were postgraduate students studying for a Masters degree or a PhD and the remaining were undergraduates. For the question on searching behavior, the majority of subjects (13) selected 5 and the remaining ones selected 4 (where 5 indicates daily and 4 indicates weekly). Among our participants, 8 users agreed that they enjoy carrying out information searches. After completion of each task, users lled in the postsearch questionnaire. The following questions with 5-point Likert scale ratings were used for both systems (where 1 indicates not at all and 5 indicates extremely). To study the signicance, t-tests have been conducted for comparison wherever necessary: 1) Are you familiar with this topic? 2) Was it easy to get started on this search? 3) Was it easy to do the search on this topic? 4) Are you satised with your search results? 5) Did you have enough time to do an effective search? For the question Was it easy to get started on this search? users prefered System B (but without statistical signicance). In regards to the question Was it easy to do the search on this topic? users also found it easier to search on

Task 1 (course) Imagine you are an undergraduate student and wish to study for a master in economics at Essex. Find a document that provides details of various MSc programs in economics. Task 5 (car parking): Imagine you are attending a seminar at the university. Please nd a document which gives details about visitor car parking areas and applicable charges.

We explained the experimental setup and showed one example on both systems before the evaluation process. Initially, users started with the entry questionnaire. Each subject was then asked to perform 4 search tasks on System A and the remaining four on System B (or the other way round). Tasks, systems and subjects were permutated based on the Latin square matrix used by [28]. Subjects were given 5 minutes to perform each task. After performing each search task the users had to ll in the postsearch questionnaire. Along with the questionnaire, they were asked to submit the answer and rate their task success. When all the four tasks were nished on one system, users were given the postsystem questionnaire to be lled in. In the end, users



3.32 3.09

4.09 4.26

4.09 4.45

4.53 4.57

4.68 4.68

Table I

Parameter Easier to learn to use Easier to use Best

System-A 4 1 4

System-B 6 11 10

No Difference 6 4 2

This evaluation is our rst validation of the outlined idea that promotes the use of deep NLP in order to extract facts and relations from document collections which can then be used to guide a user who searches this collection. Users were overall more satised with a system that makes use of extracted facts and relations when communicating results to the user. We found that the idea of NLP-based search in document collections appears to be a promising route based on this simple task-based evaluation reported here. Furthermore, we see a lot of potential in combining NLP techniques with state-of-the-art visualization methods. VI. C ONCLUSIONS AND F UTURE W ORK We presented a task-based evaluation assessing the usefulness of incorporating a dialogue component in a search system. We particularly targeted local Web sites such as university intranets. The sort of dialogue system we applied makes use of small pieces of knowledge extracted from the document collection (and linked in a simple term network) that can then be mapped against the query. We found that the general idea of such guided search offers a lot of potential. This work can obviously only be a rst step. There are a number of limitations in such a study and we will take the ndings as a guideline for future work. We will investigate a variety of routes. First of all, the system we investigated used a visual representation. We will continue doing so but will enrich the dialogue by adding more of a real NLP dialogue paraphrasing the knowledge found in the database. We also aim at putting a prototype of the search engine online so that we address a number of limitations that user studies such as the one presented here face. Finally, the knowledge extraction process is still not perfect and we are still working on nding the right balance between the quality and the quantity of relations and entities extracted from the documents. ACKNOWLEDGMENTS We would like to thank the anonymous reviewers for very helpful feedback on an earlier version of the paper. This work is partially supported by the AutoAdapt7 research project. AutoAdapt is funded by EPSRC grants EP/F035357/1 and EP/F035705/1. R EFERENCES
Easy to Search


System B. This value is statistically signicant when all the 8 search tasks were considered (p <0.01). This value clearly indicates the usefulness of domain model suggestions. Also for the next question Are you satised with your search results? users were more satised with the results returned by System B. Table I summarizes the results. For most of the above questions System B was slightly better than System A (though not always statistically signicant). The table also illustrates that sufcient time was allocated for the tasks. Users lled in the postsystem questionnaire after completing four search tasks on one system. When we compared the values on both systems, the differences between them are marginal. Finally, users submitted an exit questionnaire. For the question Which of the two systems did you nd easier to use? 11 users picked System B and only one user opted for System A. Furthermore, 10 users selected System B as the best system overall, 4 users selected System A and 2 users did not nd any difference between them. Table II demonstrates that System B scored overall better than the baseline system. The results of the exit questionnaire clearly demonstrate the potential that this type of guided search offers in the context of a university intranet. We also asked for additional user feedback in the exit questionnaire. Users liked the idea of visualizing search terms in a graph and most of the users did in fact select the query options suggested by System B despite the varying quality of the extracted knowledge (this noise was also commented on by a user). However, in this evaluation we did not target the quality of the extracted knowledge (which is a separate issue). In our evaluation we represented the rst 10 entities that matched the user query from the database and did not make use of any frequency or ranking parameter. One user commented It is useful to know not to always stick to the same search engine as there are others that could be just as useful.

Enough Time

Easy to Start





