Bring Your Knowledge Base Into OpenAI’s GPTs

By Jessie Hobb On Nov 24, 2023

On November 6, 2023, OpenAI announced the release of GPTs. On this no-code platform, as a professional (or hobbyist) developer, you can build customized GPTs or chatbots using your tools and prompts, effectively changing your interactions with OpenAI’s GPT. Previous interactions mandated using dynamic prompting to retrieve responses from GPT with LangChain (opens new window)or LlamaIndex (opens window. Now, the OpenAI GPTs handle your dynamic prompting by calling external APIs or tools.

This also changes how we (at MyScale) build RAG systems, from building prompts with server-side contexts to injecting these contexts into the GPTs model.

MyScale simplifies how you inject contexts into your GPTs. For instance, OpenAI’s method is to upload files to the GPT platform via a Web UI. Juxtapositionally, MyScale allows you to mix structured data filtering and semantic search using a SQL WHERE clause(opens a new window), process and store a much larger knowledge base at a lower cost, as well as share one knowledge base across multiple GPTs.

Try out MyScaleGPT now on the GPT Store, or integrate MyScale’s open knowledge base with your app today with our API hosted on Hugging Face.

BYOK: Bring Your Own Knowledge

GPT has evolved considerably during the past year, and it knows much more in the shared knowledge domain than it did when it was first released. However, there are still specific topics it knows nothing about or is uncertain about — like domain-specific knowledge and current events. Therefore, as described in our earlier articles(opens new window), integrating an external knowledge base — stored in MyScale — into GPT is mandatory, boosting its truthfulness and helpfulness.

We brought an LLM into our chain (or stack) when we were building RAG with MyScale(opens a new window). This time, we need to bring a MyScale database to the GPTs platform. Unfortunately, it is not currently possible to directly establish connections between GPTs and MyScale. So, we tweaked the query interface, exposing it as a REST API.

Due to our earlier success with the OpenAI function call (opens n, we can now design a similar interface where GPT can write vector search queries with SQL-like filter strings. The parameters are written in OpenAPI (opens a new window)as follows:

Providing Query Entries To Different Tables

We may sometimes have to query different tables. This can be implemented using separate API entries. Each API entry holds its own schema and prompts under its documentation. GPTs will read the applicable API documentation and write the correct queries to the corresponding table.

Notably, the methods we introduced before, like self-querying retrievers (opens new window) and vector SQL (opens new window), require dynamic or semi-dynamic prompting to describe the table structure. Instead, GPTs function like conversational agents in LangChain (opens new window), where agents use different tools to query tables.

For instance, the API entries can be written in OpenAPI 3.0 as follows:

After configuring the GPT Actions for knowledge base retrieval, we simply fill the Instructions and tell the GPT how to query the knowledge bases and then answer the user question:

Note: Do your best to answer the questions. Feel free to use any tools available to look up relevant information. Please keep all details in the query when calling search functions. When querying using MyScale knowledge bases for array of strings, please use has(column, value to match). For publication date, use parseDateTime32BestEffort() to convert timestamp values from string format into date-time objects, you NEVER convert the date-time typed columns with this function. You should always add reference links to the documents you used.

Hosting Your Database as OpenAPI

GPTs adapt APIs under the OpenAI 3.0 standard. Some applications, like databases, do not have OpenAPI interfaces. So, we need to use middleware to integrate GPTs with MyScale.

We have hosted our database with OpenAI-compatible interfaces on Hugging Face(opens new window). We used flask-restx (opens new window) to simplify and automate the implementation so the code is small, clean, and easy to read: app.py (opens new window), funcs.py (opens a new window.

The good thing about this is both the prompts and the functions are bound together. Therefore, you don’t need to overthink the combination of prompting, functionality, and extendibility; write it in a human-readable format, and that’s it. The GPT will read this documentation from a dumped OpenAI JSON file.

Note: flask-restx only generates APIs in a Swagger 2.0 format. You must convert them into an OpenAPI 3.0 format with Swagger Editor (opens a new window)first. You can use our JSON API on Hugging Face (opens a new window)as a reference.

GPT Running With Contexts From an API

With proper instructions, the GPT will use special functions to handle different data types carefully. Examples of these data types include ClickHouse SQL functions like has(column, value) for array columns and parseDateTime32BestEffort(value) for timestamp columns.

After sending the correct query to the API, it — or the API — will construct our vector search query using filters in WHERE clause strings. The returned values are formatted into strings as extra knowledge retrieved from the database. As the following code sample describes, this implementation is quite simple.

GPTs are indeed a significant improvement to OpenAI’s developer interface. Engineers don’t have to write too much code to build their chatbots, and tools can now be self-contained with prompts. We think it is beautiful to create an eco-system for GPTs. On the other hand, it will also encourage the open-source community to rethink existing ways to combine LLMs and tools.

We are very excited to dive into this new challenge, and as always, we are looking for new approaches to integrate vector databases like MyScale with LLMs. We firmly believe that bringing an external knowledge base stored in an external database will improve your LLM’s truthfulness and helpfulness.