Knowledge Graphs + GPT

a laptop screen with a lot of HTML text on it. in a dark room. — _{From Pexels by Markus Spiske}

By now, anyone who has dabbled in NLP models like GPT has figured out the massive potential that exists in using ChatGPT. But it’s currently limited because it doesn’t know about anything after mid-2021, and it doesn’t have specific domain knowledge in some areas.

Today, while thinking about its limitations, I wanted to understand why GPT couldn’t be a knowledge base expert. I believe it can. This post may be outdated in a month, if not already, but I believe we will start to see ChatGPT being used in consuming knowledge graphs.

What is a knowledge graph?

A knowledge graph is a type of data storage format that captures information about entities, concepts, and their relationships in a structured way.

simple example of a knowledge graph. — ^{Simple knowledge graph example showing the relationship between entities.}

Knowledge graphs contain information about how entities are connected and allow you to express additional properties about those entities. Properties can also be linked as ideas to help give meaning to the relationships.

Why is this important?

Plugging the knowledge graph into ChatGPT will provide additional contextual information about new data not yet in the model, such as new fields of science and medicine or common law decisions that didn’t exist prior to mid-2021.

What it means for the future of AI

With the ability to couple the power of GPT and knowledge graphs, we’ll have new skills:

Improved contextual understanding: More accurate responses to questions.
Enhanced reasoning capabilities: Better answers about fields of knowledge it didn’t have before.
Customized learning: ChatGPT could become a relative expert in law, medicine, finance, and engineering. It doesn’t do half bad at this already. In a recent article, GPT4 scored higher than the human average on one section of the bar exam.
Integration with external data sources: Cheaper than retraining the entire ChatGPT Model (retraining costs +$5M USD).

Now What?

I plan to build a knowledge graph of a book, chapter by chapter.

First, I need to confirm which format to use. ChatGPT understands the following formats:

RDF
OWL
JSON-LD – I think this one will be best, but I’m not yet positive.

ChatGPT supports more pedestrian formats like tables and bullet points, but these don’t infer relationships as easily. ChatGPT also understands the entity-attribute-value format, as well as semantic triples, which might prove useful.

I’ll send ChatGPT a chapter of data at a time and have it identify the entities, ideas, and other attributes about characters and events that I should collect from the chapters once I have chosen a format. I plan to use something like GPTBoss, which supports recursively calling ChatGPT (using GPT4).

After I’m able to create a knowledge graph from a chapter, I’ll send it to ChatGPT and start asking it questions about the characters, relationships, and more complex meanings/ideas from the book.