Epoch8.co blog

Building an AI Assistant that fetches answers to user question from a knowledge base

Introduction

Gone are the days when customer support meant being stuck on hold, listening to elevator music, and hoping to eventually speak to a human. AI-driven assistants don't get tired, don't take breaks, and—thankfully—don't play bad music.
In this article, we'll highlight the process of creating a virtual assistant that can instantly answer FAQ questions using the company's knowledge base. Our virtual assistant will not only perform keyword search but also be able to find all thematically related documents, even if they don't contain any matching keywords.
For instance, if a user asks, "I've gained 3 kilos during the holiday season, how do I get rid of them?", the AI assistant queries the FAQ database and suggests the article titled "How to Get Fit".
We'll touch on the following questions:
  • How to organize your product or company's knowledge base.
  • How to teach a virtual assistant to retrieve information from this knowledge base and formulate answers to users' questions.
  • How to manage costs – how to leverage the power of LLMs without paying a fortune to OpenAI or other LLM providers if you have a huge customer base

Ingredients

Technologies we will use:
  • Chatwoot – open-source operator interface and knowledge base
  • Rasa – open-source framework for creating chatbots
  • Qdrant – vector database to store vector representations of the articles from the knowledge base
  • Datapipe – ETL tool that will help in fetching articles from Chatwoot, processing them, and putting them in Qdrant

Recepie

1. Content: Preparing your knowledge base

We love using Chatwoot on our projects. Usually, we use it for the operator's interface when a chatbot switches to a human. But apart from the operator's interface, Chatwoot has a convenient knowledge base feature.
We added a feature to Chatwoot's knowledge base: for every FAQ article, we include some examples of real questions that users ask if they want to get an answer from this article.
The best practice for creating FAQ articles is to keep each article short and as much consistent with one topic as possible.

2. Programming: Encoding all articles from knowledge base to vectors

We will determine if some article is a good answer to a user query by calculating the semantic similarity between texts. To determine semantic similarity, the texts are converted into vectors. Calculating the distance (e.g., cosine similarity) between vectors indicates the closeness of their content.
For storing these vectors, we use a vector database Qdrant.
Qdrant is optimized for high-performance vector operations, essential for rapid and accurate retrieval of semantically similar content.
To encode article text into a vector and put it into Qdrant, we should address two challenges:
  1. Documents must be segmented so each vector corresponds to one logical theme.It's important because the more text that is encoded, the more average and indistinct the resulting vector becomes. Accordingly, it becomes more difficult to identify any theme in it.Therefore, it's essential to initially segment the document into chunks, and there's no silver bullet for this. Usually, segmentation is done using some structural heuristics, like chapters and paragraphs, then refined with models for Next Sentence Prediction (NSP), for example, and finally reviewed by a human.In the context of the FAQ, this step was not necessary, as there were only short answers. However, to enrich the search field, we generated human-like questions for the answers, and for the questions (if any) created a synthetic "answer image." All this is then converted into a vector and added to the examples for the target article.
  2. We must choose an effective method for vector generation.We used an encoder from OpenAI or the multilingual-e5 model. They are both effective due to their training on parallel corpora in several languages.

3. Programming: setting up FAQ service

FAQ service implements a simple API.
It gets a user inquiry, converts it into a vector, and performs a vector search in Qdrant. It returns the top most relevant vectors, along with article headings and texts.

4. Programming: setting up chatbot assistant

We need a chatbot to get questions from users and to deliver answers.
To create a basic chatbot, we use Rasa, an open-source chatbot creation framework.
When the user writes to the RASA chatbot, RASA tries to determine the intent of the user query. If the identified intent is FAQ, RASA redirects the user query to the FAQ service using a special action.
The FAQ service returns the list of related articles.

5. Optional: free-form LLM answers (using RAG, retrieval-augmented generation)

Now we've retrieved the most relevant articles from the knowledge base which correspond to the user inquiry.
Now we can prompt LLM to read the retrieved articles and generate an exact answer to the user's question.
There's a serious drawback to this approach: the best LLMs such as GPT-4 are rather expensive, and if you have a large number of support inquiries, the usage of LLMs can cost you a lot.
It was the case for our client, so we turned off the generator component, leaving only the answers by the list of articles in the knowledge base, which don't use expensive LLMs for each request.

6. Programming: Making sure everything is updated

We have some regular tasks we have to do to ensure everything stays up to date.
  1. We must encode articles and update them in Qdrant if they change, get deleted, or if some new article appears.To do so, we use the ETL framework Datapipe, which tracks updates, deletes, and additions automatically, without the need to write tracking code.We run the ETL process every 15 minutes, and if any content in the knowledge base changes, Datapipe captures the change and recalculates vectors in Qdrant.So, new information is available to the chatbot 15 minutes after addition to the knowledge base.
  2. We must make sure that RASA identifies the FAQ intent correctly. So, when user retrains the chatbot, RASA captures a maximal dissimilar bunch of data to cover the entire search field with a small number of samples.

Some project takeaways

As a result of the project, we have a forked version of Chatwoot that supports AI assistant working on top of Chatwoot's knowledge base "out of the box," with no development needed.
If you're using Chatwoot, especially in manual mode with no chatbot automation, you might consider contacting us and switching to our Chatwoot fork, to enable assistant functionality.

Some statistics

As a result of implementation, the number of support requests handled by a chatbot grew from 30% to 70%. The content team continues to add articles to ensure the chatbot handles more and more support requests.

Credits

Big thanks to project team: Mariia Rodina, Rustam Karimov, Anton Grechkin, Sergey Serov
Project supervised by Epoch8's CEO/CTO Andrey Tatarinov
Case Studies