Building Multi-Hop Query Response Systems For Documents Using Knowlege Graphs.

Published by Chandrashekar B N, Manjunath Ramachandra, Raghavan Solium, Vinutha B N,

13 min readOct 26, 2021

Knowledge graphs provide powerful tools and techniques for the representation of enterprise data. However, they fail to work well when the sentences are too long or too short or don’t have the traditional structure of the entities and the relations binding them. This limitation can be circumvented with our proprietary knowledge graphs, which store a collection of words, sentences, or even paragraphs as the leaf node, bound together through their commonality, rather than a relation.

Graphs are not built for all of the data. Instead, they are dynamically generated for a specific query. The graphs are eventually fed to the Graph Attention Network to deduce the proximity to the sentences and entities. However, the prior art techniques have two limitations — they are trained over wikiHop (HotPot Q & A training set), and they require hyperlinking of the entities. Given a domain-specific proprietary document for querying, they do not work well. In the following paragraphs, we provide a domain-agnostic mechanism to link the sentences related to the query and, thereby, overcome the aforementioned limitations.

Traditional Q & A Systems

Query-response systems, or those that understand user questions and provide the relevant answer most of the time in par with the human expert, play a significant role in automation. The degree of accuracy and acceptability depends upon the nature of the query and the business use case. The questions can come in various forms such as factoids, comparisons, analyses, complex, common sense, external knowledge, etc. that require inferencing, context, and computations. It may also involve images, videos, audio, etc.

If the question is a factoid with the answer available in one place in a document, it is easy for the model to respond. On the other hand, if the question is complex or the response is scattered across the document(s), it is tedious to come out with an accurate response. For example, if the query was “show me all the issues faced by the user in the screen guard of the new mobile phone model XYZ”, the response would need to be interpreted contextually. In this case, the query would require the system to run over the user reviews from social media or contact center transcriptions. The system should be able to isolate only the problems associated with the screen guard from among several sentences scattered across the document.

While the tools and techniques existing today address the factoid and direct questions, response to complex and multi-hop questions is still in the research and pilot stage.

Why Knowledge Graphs?

Most of the efforts to address this problem makes use of knowledge graphs at some point in the pipeline. Knowledge graphs provide a graphical representation of the content spanning the text, image audio, or any other media, although their most common usage is with text. Traditional knowledge graphs are generated by identifying the entities and the relation binding them. However, not all the sentences of English have entities with well-defined relations. Such sentences, although critical, may get dropped from the graphs. However, by relaxing the definition of the knowledge graph, it should be possible to include an entity, sentence, paragraph, etc. as leaf nodes at different levels of the hierarchy, and bind them under a common theme or context when a well-defined relationship does not exist. This approach is followed in the next sections.

How Do We Use the Knowledge Graphs?

Two approaches are being followed for the use of knowledge graphs in Q&As. In the first approach, a graph is built for the entire document a priori. This approach is practical and useful if there are a large number of documents that do not change very frequently and the user’s queries are fixed, although the query wordings can be different.

For example, what if the query regards the procedure of getting a credit card? Typically the queries are direct, much like frequently asked questions. For example, “I have lived abroad for the past three years. Would I be eligible for a credit card?” This is different from questions such as, “which business units have performed well in this quarter compared to the last quarter”. Generally, a response to such questions is not mentioned explicitly in the document. Several paragraphs are to be considered that speak about the different business units, the attritions, revenues, customer wins, customer churn, profit, etc. Subsequently, a comparison needs to be made based on these parameters. It often involves formula-based computations. It calls for the second approach, the dynamic build-up of the knowledge graphs. It builds a hierarchy of entities, relevant sentences, and paragraphs that link to the query, before screening them, to generate the precise response. Several variants of the multi-hop Q&A systems are available as pilot projects, however, most of them are trained over hyperlinked documents, such as WikiHOP/Hotpot. These variants perform poorly when ingested with proprietary documents. The model detailed here addresses this problem.

We ingest the sentences and paragraphs to a graph generator module before feeding them to the Graph Attention Network. We also plan to use multiple models and merge the sentences that were shortlisted by the same and build a knowledge graph and perform queries over the same. Experiments performed over comparative questions have shown better results as detailed here.

The Multi-Hop Model

To answer the questions, where the response is scattered across multiple paragraphs [1], the Hierarchical Graph Network (HGN) model is used. The multi-level graph is built using different types of nodes, like paragraphs, sentences, and entities, to determine the answer to the given question. The contextual encoders are used for node representation. Context, along with queries and data, are passed through the Graph Attention Network [2] to perform multi-hop reasoning and then through MLPs to give the answer through paragraph selection, supporting sentences, and answer prediction.

The idea here is to shortlist the relevant paragraphs relevant to the query using one or more paragraph retrievers, then relate sentences in each paragraph and, subsequently, select the relevant entities in each sentence. The knowledge graph is built with the superset of response-containing sentences. The paragraphs not contributing to the response are filtered out. First, the document is pre-processed and made query-able. Accordingly, the Q&A based on the knowledge graph approach has several stages. The data pipeline is indicated in Figure 1.

1. Pre-Processing: Data Extraction and Indexing

In this stage, various parts of the given document, such as section headers, captions, detailed text, images, graphs, and charts, are identified using state-of-art AI models. The test gets formatted with the bullets lists removed. Otherwise, the bulleted sentences would be treated as a single long, often meaningless sentence. Hyphens are removed.

From each of the identified parts, text data is extracted and is inserted into Solr, which is a fast search database to pre-index data. The data is exposed in such a way that any system can consume this data for various analytical purposes

2. Paragraph Retrieval

Using the query, Solr indices are searched for the most appropriate text content based on the query. Solr output is based on a full-text search. The Solr gives “n” paragraphs, where “n” can be configured. For our experiments, n is set to 10. The given paragraphs are ranked in descending order of relevance.

3. Connected Paragraph Retrieval

As we are working on proprietary documents, the paragraphs are plain text and generally do not contain hyperlinks, as they do in Wikipedia articles. Rather, the entities are linked to paragraphs. The relevant paragraphs need to be selected for each question as a part of pre-processing step.

Here additional paragraphs, linked to the Solr generated paragraphs, that may be useful for generating the response are retrieved from the document. These paragraphs are given to the paragraph selector, or re-ranked, in the next stage.

4. Paragraph Selector or Re-Ranker

The answer to the query is limited to be found in two of the candidate paragraphs among the given input paragraphs. The two candidate paragraphs are connected in a way such that both of them contain an answer span and reasoning has to be performed on these two spans to determine the answer. So, instead of giving all the paragraphs selected by the paragraph retriever, they need to be re-ranked, based on the context, to find the relevant answer-containing paragraphs. This can be achieved using paragraph ranker models. Here, we are using a pre-trained Cross Encoder model, which will find passages that are relevant for the query.

We are currently targeting only comparative questions. These questions require reasoning or comparison to be performed between two or more entities (values), which may be found in multiple paragraphs and used to answer the query. For example,

a) Between Cyber Security revenues and Digital revenues, which grew more?

b) Among Consulting revenues, IT export revenues, BFSI revenues, and Digital revenues, which grew more?

To re-rank the paragraphs for comparative questions like above, statistical methods to find text similarity can be used. We have experimented with Jaccard and TFIDF [3] techniques and they have shown good results. This can be attributed to the text in the question, and the similar or same text is expected to be present in the answer paragraphs. We are planning to use the weighted average of Jaccard and TFIDF scores to re-rank the paragraphs as it is giving better results.

For the query, “between cyber security revenues and digital revenues, which revenues grew more?”, below are the Jaccard and TFIDF scores for individual sentences in the paragraph.

a) “Our Digital revenues grew by 32% YoY”,

“Our largest deal win to date of $1.5 billion is a testimony to the capabilities we have in enterprise-scale modernization & transformation”,

“Our Al — First strategy and differentiated assets such as Data Discovery Platform are being well received in the market which is reflected in the double-digital growth of DAAI ( 10 % YoY in constant currency )”,

“Our big bet in Cybersecurity is central to our Trust pillar”.

Jaccard Scores for these selected sentences with respect to the query are:

[0.13333333333333333, 0.0, 0.07142857142857142, 0.0]. TFIDF Scores are [0.480492528980314, 0.0, 0.029799072643174662, 0.0]

b) “We are scaling assets such as our Cyber Defence Assurance Platform and working with security ecosystem partners and governing bodies”,

“Cyber security as a service offering which forms 4% of the revenues grew at 16% YoY ( in constant currency ) in FY 2019”,

“There were several green shoots in our overall performance as we built the momentum consistently through the year”,

“On a full-year basis, we grew 5.4 % in constant currency”].

Jaccard Scores for the above 4 selected sentences are:

[0.07407407407407407, 0.13793103448275862, 0.0, 0.045454545454545456]

TFIDF Scores for the above 4 selected sentences are:

[0.08957120892030829, 0.2726341520069292, 0.0, 0.06009476132187687]

As can be seen from the above scores, Jaccard and TFIDF techniques are correctly giving higher scores for relevant sentences that can answer the query. One of the approaches for paragraph re-ranking could be taking the sum of all the scores of all sentences in the paragraph, for each technique and the weighted average of both Jaccard and TFIDF.

These statistical approaches of re-ranking would reduce the computations for not using any AI model and avoid any additional training required. It is a significant step as the method can be applied to any document without any restrictions.

5. Data Organization Over Knowledge Graphs

The graph is built using questions, paragraphs, sentences, and entities as nodes of the graphs. The final paragraphs from the paragraph re-ranker form the paragraph nodes. The sentences and entities of these nodes form the remaining nodes. As the graph is formed with different categories of nodes, it is called a hierarchical graph. The following edges exist for the graph. Some of the edges are indicated in figure 2.

a) Edges between question node and paragraph nodes.

b) Edges between paragraph nodes:

If a paragraph is linked or connected to another paragraph, then they are connected by an edge.

c) Edges between paragraph nodes and sentence nodes

These are connections between paragraph node and its sentences.

d) Edges between sentence nodes and sentence nodes

These are connections between a sentence node to the sentence node of all the paragraphs. As the answer to the query is found in multiple sentences, it requires all the sentences to be connected or linked, so that the Graph Attention Neural Network finds the answer nodes using the context relation. The sentence node is connected to its previous and successive neighboring sentences.

e) Edges between sentence nodes and entity nodes

These are connections between sentence nodes and their corresponding entities.

As per the pre-trained hierarchical Graph Neural Network [1], the numbers of entities, sentences, and paragraphs in one graph can be 60, 40, and 4 respectively. These numbers are configurable but the pre-trained model has been proved to show optimal results with these parameters settings.

1. Context-Based Filter

The nodes of the graph are represented using the embedding from the context encoder. It uses pre-trained Transformer RoBERTa and a bi-attention layer [4]. The question and selected paragraphs are concatenated and given to the context encoder to get the corresponding embeddings for the question, paragraph, sentence, and entity nodes. The representations of the nodes from the bi-attention layer are fed into a Multilayer Perceptron (MLP) to get the final representations for the paragraphs, sentences, and entities. The representations of paragraphs are concatenated from the start of the paragraph to the end of the paragraph before giving it to the MLP. Similarly, representations of sentences are concatenated from the start of the sentence to the end of the sentence, and representations of the entities from the start of the entity to the end of the entity.

2. Attention-Based Graph Reasoning

A pre-trained hierarchical network is used to perform the reasoning over the hierarchical graph built in the previous section. The hierarchical network, firstly, uses the contextualized representations of all the graph nodes and transforms them, using Graph Neural Network, into the higher-level features. For extracting the relations using graph propagation, and to perform the message passing over the edges of the graph nodes, it uses Graph Attention Network (GAT) [5]. The context representation from the context encoder and the graph representation from the graph neural network are merged via a gated mechanism.

3. Inferencing

The output of the graph attention network will be the updated node representations, after being processed through the network, to determine the context and generate weights, to answer the given query. These representations are passed through MLP to inference the relative scores of the paragraph, sentence, and entity nodes. The scores give a value in the range 0 and 1, indicating the probability of answering the query. The scores of these nodes can be used to build the sentences of the paragraphs for answering the query. The MLP results for sentence nodes are also used to predict the supporting sentences that the network uses to provide an answer to the query.

4. Response Generator

The system uses a separate two-layer MLP to generate the answer span from the graph node representations of the graph attention network. This model is used to predict the start and the end of the answer span from the given graph representation of the paragraphs. So, the final result of the query will include the answer span, the supporting facts or sentences, and the confidence score.

Implementation Details

The main idea behind this architecture is to answer the multi-hop questions on proprietary documents with minimum or no domain-specific training. The pipeline system should be able to handle the documents from any domain for which the model is not trained. The pre-trained cross encoder is used for re-ranking the paragraphs.

We have used the Wipro annual reports as the domain-specific, user-defined documents for the experiments to perform multi-hop Q&As. As discussed earlier, the current experiments are limited to multi-hop comparative questions or multi-hop discrete reasoning questions. It can be easily extended to other categories of multi-hop questions. The results of comparative Q&A on annual reports are given below.

Question: “Between cyber security revenues and digital revenues, which revenues grew more?”

a) Using RoBERTa Model

Answer: Our Digital Revenues

Confidence Score: 0.7680537104606628

Supporting Sentences:

[“Our Digital revenues grew by 32% YoY”, “Our largest deal win to date of $1.5 billion is a testimony to the capabilities we have in enterprise-scale modernization & transformation”, “Our Al — First strategy and differentiated assets such as Data Discovery Platform are being well received in the market which is reflected in the double digital growth of DAAI (10% YoY in constant currency )”, “Our big bet in Cybersecurity is central to our Trust pillar”.]

[“Organizations across the globe are undergoing unprecedented change and transformation in the businesses led by forces such as digital, increasing consumerization of IT”, “Emergence of new platforms such as cloud services and increasing disruptions and competition from new-age companies”, “Technology access and usage has been largely democratized and mainstreamed”, “There has been a profound change in how technology is developed delivered and consumed.”]

b) Using Cross Encoder Model

Answer: Digital revenues

Confidence Score: 0.5462144017219543

Supporting Sentences:

[“Cyber security as a service offering which forms 4% of the revenues grew at 16% YoY (in constant currency) in FY 2019”]

[“Our Digital revenues grew by 32% YoY”]

In this case, both the answers are correct. We must consider another question — an example is below.

Question:

“Among IT export revenues, consulting revenues, cyber security revenues, which revenues grew more?”

a) Using RoBERTa Model

Answer: Our Digital revenues

Confidence Score: 0.9966963529586792

Supporting Sentences:

[“For the year, Digital grew from 27% of revenue in Q4 ‘8 to 35% of revenues as of Q4 ‘19”, “Consulting which is 7% of our Revenues grew by 19% Yov”, “Our biggest assets are our customer relationships and there is no better endorsement of our capabilities than the faith that our customers repose in us.”]

b) Using Cross Encoder Model

Answer: Consulting

Confidence Score: 0.8875774145126343

Supporting Sentences:

[“Cyber security as a service offering which forms 4% of the revenues grew at 16% YoY (in constant currency) in FY 2019”, “Consulting which is 7% of our Revenues grew by 19% Yov”, “Our Digital revenues grew by 32% YoY”, “According to the NASSCOM Report, IT export revenues from India grew by 8.3% to an estimated $136 billion in the fiscal year 2019.”]

Here, the answer is given by the Cross Encoder Model — “consulting” — is the correct answer. This example reveals that the domain-independent Cross Encoder Model has the potential to replace the RoBERTa model.

Future Directions

The mechanism provided here works for text documents or media objects and tables described with text. To try with queries over an image, video, audio, etc. effectively, the leaf nodes of the knowledge graph are required to store the patterns or features of the image (similar to entities of the text) as the attributes. These attributes can be generated through Zero Shot learning that binds the text description with image features. Such a multimedia knowledge graph can be queried with the text, image patterns, or a blend of both. Another emerging trend is the support for “logical queries”, especially over the tables. To support such queries, neuro-symbolic logic, as well as changes in word embeddings, are required to understand the relations.

References

1. Yuwei Fang et al, Hierarchical Graph Network for Multi-hop Question Answering

2. Minjoon Seo, et al. Bidirectional attention flow for machine comprehension

3. Suthira Plansangket, John Q Gan, A query suggestion method combining TF-IDF and Jaccard Coefficient for interactive web search

4. Minjoon Seo, et al. Bidirectional attention flow for machine comprehension

5. Petar Velickovi et al., GRAPH ATTENTION NETWORKS

Building Multi-Hop Query Response Systems For Documents Using Knowlege Graphs.

Published by Chandrashekar B N, Manjunath Ramachandra, Raghavan Solium, Vinutha B N,

Written by Wipro Tech Blogs

No responses yet