Chatting with the Stanford Encyclopedia of Philosophy – a RAG system for philosophy

by Paul Näger, 30.10.2024

While current Large Language Models (LLMs) such as GPT can answer philosophical questions in an impressive breadth, there are limits to their detail knowledge. When pushed about facts that they do not have information on, they either start hallucinating or become repretitive.

Systems with retrieval augmented generation (RAG) try to solve this problem by enhancing LLMs with specific and detailed information. In the present case I have made accessible the complete Stanford Encyclopedia of Philosophy (SEP) to an LLM (by crawling the SEP and storing it as a corpus to a vector database). When a user asks a question, the system first searches in the SEP corpus for relevant passages (“retrieval”) and passes these passages as background information to the LLM (“augmented”) jointly with the user question. When generating the answer (“generation”), the LLM can access the relevant background information and give a more specific, detailed and informed answer.

I have written the code for the complete system and a prototype is now running. An impression of the backend (code) can be gained here:

The system’s frontend (user interface) is designed as follows: