How to Train an AI Chatbot With Custom Knowledge Base Using ChatGPT API

In our earlier article, we demonstrated how to build an AI chatbot with the ChatGPT API and assign a role to personalize it. But what if you want to train the AI on your own data? For example, you may have a book, financial data, or a large set of databases, and you wish to search them with ease. In this article, we bring you an easy-to-follow tutorial on how to train an AI chatbot with your custom knowledge base with LangChain and ChatGPT API. We are deploying LangChain, GPT Index, and other powerful libraries to train the AI chatbot using OpenAI’s Large Language Model (LLM). So on that note, let’s check out how to train and create an AI Chatbot using your own dataset.

Notable Points Before You Train AI with Your Own Data

1. You can train the AI chatbot on any platform, whether Windows, macOS, Linux, or ChromeOS. In this article, I’m using Windows 11, but the steps are nearly identical for other platforms.

2. The guide is meant for general users, and the instructions are explained in simple language. So even if you have a cursory knowledge of computers and don’t know how to code, you can easily train and create a Q&A AI chatbot in a few minutes. If you followed our previous ChatGPT bot article, it would be even easier to understand the process.

3. Since we are going to train an AI Chatbot based on our own data, it’s recommended to use a capable computer with a good CPU and GPU. However, you can use any low-end computer for testing purposes, and it will work without any issues. I used a Chromebook to train the AI model using a book with 100 pages (~100MB). However, if you want to train a large set of data running into thousands of pages, it’s strongly recommended to use a powerful computer.

4. Finally, the data set should be in English to get the best results, but according to OpenAI, it will also work with popular international languages like French, Spanish, German, etc. So go ahead and give it a try in your own language.

Set Up the Software Environment to Train an AI Chatbot

Install Python and Pip

1. First off, you need to install Python along with Pip on your computer by following our linked guide. Make sure to enable the checkbox for “Add Python.exe to PATH” during installation.

Set Up the Software Environment to Train an AI Chatbot

2. To check if Python is properly installed, open the Terminal on your computer. Once here, run the below commands one by one, and it will output their version number. On Linux and macOS, you will have to use python3 instead of python from now onwards.

python --version
pip --version

3. Run the below command to update Pip to the latest version.

python -m pip install -U pip

Install OpenAI, GPT Index, PyPDF2, and Gradio Libraries

1. Open the Terminal and run the below command to install the OpenAI library.

pip install openai

2. Next, let’s install GPT Index.

pip install gpt_index==0.4.24

3. Now, install Langchain by running the below command.

pip install langchain==0.0.148

4. After that, install PyPDF2 and PyCryptodome to parse PDF files.

pip install PyPDF2
pip install PyCryptodome

5. Finally, install the Gradio library. This is meant for creating a simple UI to interact with the trained AI chatbot.

pip install gradio

Download a Code Editor

Finally, we need a code editor to edit some of the code. On Windows, I would recommend Notepad++ (Download). Simply download and install the program via the attached link. You can also use VS Code on any platform if you are comfortable with powerful IDEs. Other than VS Code, you can install Sublime Text (Download) on macOS and Linux.

For ChromeOS, you can use the excellent Caret app (Download) to edit the code. We are almost done setting up the software environment, and it’s time to get the OpenAI API key.

Get the OpenAI API Key For Free

1. Head to OpenAI’s website (visit) and log in. Next, click on “Create new secret key” and copy the API key. Do note that you can’t copy or view the entire API key later on. So it’s recommended to copy and paste the API key to a Notepad file for later use.

2. Next, go to platform.openai.com/account/usage and check if you have enough credit left. If you have exhausted all your free credit, you need to add a payment method to your OpenAI account.

Train and Create an AI Chatbot With Custom Knowledge Base

Add Your Documents to Train the AI Chatbot

1. First, create a new folder called docs in an accessible location like the Desktop. You can choose another location as well according to your preference. However, keep the folder name docs.

2. Next, move the documents for training inside the “docs” folder. You can add multiple text or PDF files (even scanned ones). If you have a large table in Excel, you can import it as a CSV or PDF file and then add it to the “docs” folder. You can also add SQL database files, as explained in this Langchain AI tweet. I haven’t tried many file formats besides the mentioned ones, but you can add and check on your own. For this article, I am adding one of my articles on NFT in PDF format.

Note: If you have a large document, it will take a longer time to process the data, depending on your CPU and GPU. In addition, it will quickly use your free OpenAI tokens. So in the beginning, start with a small document (30-50 pages or < 100MB files) to understand the process.

Make the Code Ready

1. Now, open a code editor like Sublime Text or launch Notepad++ and paste the below code. Once again, I have taken great help from armrrs on Google Colab and tweaked the code to make it compatible with PDF files and create a Gradio interface on top.

from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain.chat_models import ChatOpenAI
import gradio as gr
import sys
import os

os.environ["OPENAI_API_KEY"] = 'Your API Key'

def construct_index(directory_path):
    max_input_size = 4096
    num_outputs = 512
    max_chunk_overlap = 20
    chunk_size_limit = 600

    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name="gpt-3.5-turbo", max_tokens=num_outputs))

    documents = SimpleDirectoryReader(directory_path).load_data()

    index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)

    index.save_to_disk('index.json')

    return index

def chatbot(input_text):
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    response = index.query(input_text, response_mode="compact")
    return response.response

iface = gr.Interface(fn=chatbot,
                     inputs=gr.components.Textbox(lines=7, label="Enter your text"),
                     outputs="text",
                     title="Custom-trained AI Chatbot")

index = construct_index("docs")
iface.launch(share=True)

2. Next, click on “File” in the top menu and select “Save As…” . After that, set the file name app.py and change the “Save as type” to “All types”. Then, save the file to the location where you created the “docs” folder (in my case, it’s the Desktop).

3. Make sure the “docs” folder and “app.py” are in the same location, as shown in the screenshot below. The “app.py” file will be outside the “docs” folder and not inside.

4. Come back to the code again in Notepad++. Here, replace Your API Key with the one that you generated above on OpenAI’s website.

5. Finally, press “Ctrl + S” to save the code. You are now ready to run the code.

Create ChatGPT AI Bot with Custom Knowledge Base

1. First, open the Terminal and run the below command to move to the Desktop. It’s where I saved the “docs” folder and “app.py” file.

cd Desktop

2. Now, run the below command.

python app.py

3. It will start indexing the document using the OpenAI LLM model. Depending on the file size, it will take some time to process the document. Once it’s done, an “index.json” file will be created on the Desktop. If the Terminal is not showing any output, do not worry, it might still be processing the data. For your information, it takes around 10 seconds to process a 30MB document.

4. Once the LLM has processed the data, you will find a local URL. Copy it.

5. Now, paste the copied URL into the web browser, and there you have it. Your custom-trained ChatGPT-powered AI chatbot is ready. To start, you can ask the AI chatbot what the document is about.

6. You can ask further questions, and the ChatGPT bot will answer from the data you provided to the AI. So this is how you can build a custom-trained AI chatbot with your own dataset. You can now train and create an AI chatbot based on any kind of information you want.

Manage the Custom AI Chatbot

1. You can copy the public URL and share it with your friends and family. The link will be live for 72 hours, but you also need to keep your computer turned on since the server instance is running on your computer.

2. To stop the custom-trained AI chatbot, press “Ctrl + C” in the Terminal window. If it does not work, press “Ctrl + C” again.

3. To restart the AI chatbot server, simply move to the Desktop location again and run the below command. Keep in mind, the local URL will be the same, but the public URL will change after every server restart.

python app.py

4. If you want to train the AI chatbot with new data, delete the files inside the “docs” folder and add new ones. You can also add multiple files, but make sure to add clean data to get a coherent response.

5. Now, run the code again in the Terminal, and it will create a new “index.json” file. Here, the old “index.json” file will be replaced automatically.

python app.py

6. To keep track of your tokens, head over to OpenAI’s online dashboard and check how much free credit is left.

7. Lastly, you don’t need to touch the code unless you want to change the API key or the OpenAI model for further customization.

Comments 243
  • John says:

    Got it working following your steps and works like a charm. I have a question though,
    Does adding new pdf (along with existing ones) requires reloading .py app ?
    Also, how to add websites as the source along with pdfs ? Could use this code to build chatbot to Q and A websites.

  • ipank says:

    Thank you arjun, it’s work.
    But what if I want it to write 3000 words article?

  • PeB says:

    I had a compatibility problem between GPT_index and langchain, which is why the import did not work. (Import could not find BaseLanguageModel.) Could solve it thanks to ChatGPT 🙂 by: pip install langchain==0.0.153

  • Corentin says:

    Hi Arjun
    Super document.

    I get error message after pasting “python3 app.py” in Terminal (no index.json is created)
    ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/schema.py)

    When installing Python (Mac) there was no choice to “Add Python.exe to PATH”.

    Thank you

    • Arvin says:

      I encounter the same problem. No index.json is created.
      ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (C:\Users\arvinpedrosa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\schema.py)

    • Arvin says:

      I was now able to resolve the same issue with the help of chatGPT provided me with the correct code.

  • Cameron O'Rourke says:

    Installing the latest versions of the libraries, I had to make the following modifications to the code to get it to work.

    “`
    from llama_index import SimpleDirectoryReader, LLMPredictor, GPTVectorStoreIndex, PromptHelper, ServiceContext, load_index_from_storage, StorageContext
    from langchain.chat_models import ChatOpenAI

    import argparse
    import gradio as gr
    import os

    os.environ[“OPENAI_API_KEY”] = ‘— API KEY —‘

    parser = argparse.ArgumentParser(description=”Launch chatbot”)
    parser.add_argument(‘-t’, ‘-train’, action=’store_true’, help=”Train the model”)
    parser.add_argument(‘-i’, ‘-input’, default=’docs’, help=’Set input directory path’)
    parser.add_argument(‘-o’, ‘-output’, default=’./gpt_store’, help=”Set output directory path”)
    args = parser.parse_args()

    # define prompt helper
    max_input_size = 4096
    num_output = 1000 # number of output tokens
    max_chunk_overlap = 20

    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
    llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name=”gpt-3.5-turbo”, max_tokens=num_output))
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

    def construct_index():
    print(“Constructing index…”)
    # load in the documents
    docs = SimpleDirectoryReader(args.i).load_data()

    index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)

    # save index to disk
    index.set_index_id(‘vector_index’)
    index.storage_context.persist(persist_dir=args.o)

    return index

    def chatbot(input_text):
    # If not already done, initialize ‘index’ and ‘query_engine’
    if not hasattr(chatbot, “index”):
    # rebuild storage context and load index
    storage_context = StorageContext.from_defaults(persist_dir=args.o)
    chatbot.index = load_index_from_storage(service_context=service_context, storage_context=storage_context, index_id=”vector_index”)

    # Initialize query engine
    chatbot.query_engine = chatbot.index.as_query_engine()
    print(“Context initialized”)

    # Submit query
    response = chatbot.query_engine.query(input_text)
    return response.response

    iface = gr.Interface(fn=chatbot,
    inputs=gr.Textbox(lines=7, label=”Enter your text”),
    outputs=”text”,
    title=”Custom-trained AI Chatbot”)

    if args.t:
    construct_index()

    iface.launch(share=True)
    “`

    • Carlos Mercado says:

      Looks it works with new libs, but when querying using the web interface it throws the following error
      raise ValueError(f”No existing {__name__} found at {persist_path}.”)
      ValueError: No existing llama_index.storage.kvstore.simple_kvstore found at ./gpt_store\docstore.json.

      Could it be the way is called the app? i just ran it using “phyton app.py” on the same dir as dir doc. Is there any arguments i need to use on the command line?

      Thanks!

      • Benjamin Sanders says:

        run first time with python app.py -t to train data first…

    • J says:

      Would it be possible for you to send me the code in another way? I tried copy-pasting and i dont think it copied correctly.
      I am new to Python and dont know all the syntax and such yet.

    • Arvin says:

      I used this code and it generated the following:

      File “C:\Users\arvinpedrosa\Desktop\pdx.py”, line 1
      “`
      ^
      SyntaxError: invalid character ‘“’ (U+201C)

  • Cameron says:

    I had to refer to the LlamaIndex 0.6.8 docs and alter the code like this to get it to work:

    “`
    from llama_index import SimpleDirectoryReader, LLMPredictor, GPTVectorStoreIndex, PromptHelper, ServiceContext, load_index_from_storage, StorageContext
    from langchain.chat_models import ChatOpenAI

    import gradio as gr
    import os

    os.environ[“OPENAI_API_KEY”] = ‘— API KEY HERE —‘
    # define prompt helper
    max_input_size = 4096
    num_output = 512 # number of output tokens
    max_chunk_overlap = 20

    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
    llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name=”gpt-3.5-turbo”, max_tokens=num_output))
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

    def construct_index(directory_path):
    # load in the documents
    docs = SimpleDirectoryReader(directory_path).load_data()

    index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)

    # save index to disk
    index.set_index_id(‘vector_index’)
    index.storage_context.persist(persist_dir=”./gpt_store”)

    return index

    def chatbot(input_text):
    # If not already done, initialize ‘index’ and ‘query_engine’
    if not hasattr(chatbot, “index”):
    # rebuild storage context and load index
    storage_context = StorageContext.from_defaults(persist_dir=”./gpt_store”)
    chatbot.index = load_index_from_storage(service_context=service_context, storage_context=storage_context, index_id=”vector_index”)

    # Initialize query engine
    chatbot.query_engine = chatbot.index.as_query_engine()

    # Submit query
    response = chatbot.query_engine.query(input_text)

    return response.response

    iface = gr.Interface(fn=chatbot,
    inputs=gr.Textbox(lines=7, label=”Enter your text”),
    outputs=”text”,
    title=”Custom-trained AI Chatbot”)

    index = construct_index(“docs”) #comment out after 1st run if training docs aren’t changing
    iface.launch(share=True)
    “`

    • James says:

      Hi Cameron! I’m working with your code sample and am getting: ModuleNotFoundError: No module named ‘langchain.base_language’

      I’m using: llama-index 0.6.1 and langchain 0.0.194. Any ideas?

      TIA!

  • Cooliest says:

    Would like to run this off cloud…advise?

  • Elena says:

    Great tutorial, how can I do the same but instead on training with localhost docs it’s a website url?
    Thanks!

    • CuriousMind says:

      Even I am interested in knowing the same.
      If somebody could help, it would be really helpful

  • Ariful Haque says:

    i am getting a timeout error. Can anyone please help me to figure out the issue please
    The error message is showing
    “WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by ‘ConnectTimeoutError(, ‘Connection to api.openai.com timed out. (connect timeout=600)’)’: /v1/engines/text-embedding-ada-002/embeddings”

  • Fabiano says:

    I’m facing the error
    ImportError: cannot import name ‘RequestsWrapper’ from ‘langchain.utilities’ (C:\Users\famira\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\utilities\__init__.py)

  • kenzab says:

    hello everyone , i’ve got this as an error:

    INFO:openai:error_code=None error_message=’You exceeded your current quota, please check your plan and billing details.’ error_param=None error_type=insufficient_quota message=’OpenAI API error received’ stream_error=False

    Does this mean that i have a problem with my api key ? should i pay or something ?
    Thanks in advance!!

  • kenzab says:

    hello everyone ,
    i have this as an error :
    INFO:openai:error_code=None error_message=’You exceeded your current quota, please check your plan and billing details.’ error_param=None error_type=insufficient_quota message=’OpenAI API error received’ stream_error=False

    does this mean that i have a problem with my api key ? should i pay or something ?
    thanks in advance!!

  • Ken says:

    I made application run, but chatbot is giving me the information that doesn’t exist from the data I provide in docs folder.
    How can I solve this issue?

    • MHermes says:

      I’m having the same issue

  • Yash says:

    I have done all the steps as mentioned , and that thing works as well , but i want to use this custom trained bot in my flutter project , can you please tell me how to do that

  • Dan says:

    I am getting this error:

    Traceback (most recent call last):
    File “/Users/dan/notes/Hackathon/chat_bot_1.py”, line 1, in
    from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/__init__.py”, line 18, in
    from gpt_index.indices.common.struct_store.base import SQLDocumentContextBuilder
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/__init__.py”, line 4, in
    from gpt_index.indices.keyword_table.base import GPTKeywordTableIndex
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/keyword_table/__init__.py”, line 4, in
    from gpt_index.indices.keyword_table.base import GPTKeywordTableIndex
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/keyword_table/base.py”, line 16, in
    from gpt_index.indices.base import DOCUMENTS_INPUT, BaseGPTIndex
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/base.py”, line 23, in
    from gpt_index.indices.prompt_helper import PromptHelper
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/prompt_helper.py”, line 12, in
    from gpt_index.langchain_helpers.chain_wrapper import LLMPredictor
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/langchain_helpers/chain_wrapper.py”, line 13, in
    from gpt_index.prompts.base import Prompt
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/prompts/__init__.py”, line 3, in
    from gpt_index.prompts.base import Prompt
    File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/prompts/base.py”, line 9, in
    from langchain.schema import BaseLanguageModel
    ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/schema.py)

    • Sergio Cortes Satizabal says:

      Same here. Any idea?

    • Sergio Cortes Satizabal says:

      You can fix this issue with the following command

      pip install langchain==0.0.132

      • Maria says:

        Thank you so much!

  • lee says:

    Hi… Can some one help me with below errors…

    C:\Users\DELL\Desktop>python app.py
    Traceback (most recent call last):
    File “C:\Users\DELL\Desktop\app.py”, line 3, in
    import gradio as gr
    ModuleNotFoundError: No module named ‘gradio’

    C:\Users\DELL\Desktop>python app.py
    Traceback (most recent call last):
    File “C:\Users\DELL\Desktop\app.py”, line 3, in
    import gradio as gr
    ModuleNotFoundError: No module named ‘gradio’

    C:\Users\DELL\Desktop>pip install gradio
    Collecting gradio
    Using cached gradio-3.28.3-py3-none-any.whl (17.3 MB)
    Collecting aiofiles (from gradio)
    Using cached aiofiles-23.1.0-py3-none-any.whl (14 kB)
    Requirement already satisfied: aiohttp in c:\users\dell\appdata\local\programs\python\python311\lib\site-packages (from gradio) (3.8.4)
    Collecting altair>=4.2.0 (from gradio)
    Using cached altair-4.2.2-py3-none-any.whl (813 kB)
    Collecting fastapi (from gradio)
    Using cached fastapi-0.95.1-py3-none-any.whl (56 kB)
    Collecting ffmpy (from gradio)
    Using cached ffmpy-0.3.0.tar.gz (4.8 kB)
    Installing build dependencies … done
    Getting requirements to build wheel … error
    error: subprocess-exited-with-error

    × Getting requirements to build wheel did not run successfully.
    │ exit code: 1
    ╰─> [18 lines of output]
    running egg_info
    writing ffmpy.egg-info\PKG-INFO
    writing dependency_links to ffmpy.egg-info\dependency_links.txt
    writing top-level names to ffmpy.egg-info\top_level.txt
    reading manifest file ‘ffmpy.egg-info\SOURCES.txt’
    writing manifest file ‘ffmpy.egg-info\SOURCES.txt’
    OSError: [Errno 9] Bad file descriptor

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “C:\Users\DELL\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py”, line 353, in
    main()
    File “C:\Users\DELL\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py”, line 349, in main
    write_json(json_out, pjoin(control_dir, ‘output.json’), indent=2)
    File “C:\Users\DELL\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py”, line 31, in write_json
    with open(path, ‘w’, encoding=’utf-8′) as f:
    OSError: [Errno 9] Bad file descriptor
    [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error

    × Getting requirements to build wheel did not run successfully.
    │ exit code: 1
    ╰─> See above for output.

    note: This error originates from a subprocess, and is likely not a problem with pip.

    C:\Users\DELL\Desktop>

  • Ile says:

    Hey people, has anyone tried to limit the responses to the custom knowledge base only?

    • Srini says:

      Maybe, you can change the temperature to 0 (zero) and try.

  • David says:

    This code sis working for plain text files, but I am getting an error in PyPDF2 when trying to use PDF files. I have tried a number of version combinations of the various packages in an attempt to resolve the problem. Any advice on how to address the issue?

    Error messages displayed when executing the code:
    Traceback (most recent call last):
    File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1623, in _read_xref_tables_and_trailers
    xrefstream = self._read_pdf15_xref_stream(stream)
    File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1752, in _read_pdf15_xref_stream
    self._read_xref_subsections(idx_pairs, get_entry, used_before)
    File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1808, in _read_xref_subsections
    assert start >= last_end
    AssertionError

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “C:\Users\nnnn\OneDrive\Desktop\DocBot\chatbot.py”, line 37, in
    index = construct_index(“docs”)
    File “C:\Users\nnnn\OneDrive\Desktop\DocBot\chatbot.py”, line 19, in construct_index
    documents = SimpleDirectoryReader(directory_path).load_data()
    File “D:\anaconda3\lib\site-packages\gpt_index\readers\file\base.py”, line 150, in load_data
    data = parser.parse_file(input_file, errors=self.errors)
    File “D:\anaconda3\lib\site-packages\gpt_index\readers\file\docs_parser.py”, line 30, in parse_file
    pdf = PyPDF2.PdfReader(fp)
    File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 319, in __init__
    self.read(stream)
    File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1426, in read
    self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
    File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1632, in _read_xref_tables_and_trailers
    raise PdfReadError(f”trailer can not be read {e.args}”)
    PyPDF2.errors.PdfReadError: trailer can not be read ()

    Current version of OS and libraries mentioned in this article:
    Windows 11
    Python – 3.10.11
    OpenAI – 0.27.6
    GPT Index – 0.4.24
    PyPDF2 – 3.0.0
    PyCryptodome – 3.17
    Gradio – 3.28.3

  • adi says:

    How can I expend the length of the output. It always shows ~300 character maximum…………..

  • kima says:

    i have this error on macOS
    from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
    ImportError: No module named gpt_index

  • FaFa says:

    Getting the following error:

    ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’

    I’m guessing there’s an updated version of this part of the code:
    from langchain.chat_models import ChatOpenAI

    Any ideas?

    • Randall says:

      I got the same thing. Did you figure it out?

    • sandeep says:

      install pip install langchain==0.0.132

  • Marc says:

    Hi, thanks for the great work!
    Please allow two questions:
    1) When indexing 600 RTF files, the result says INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens, INFO:root:> [build_index_from_documents] Total embedding token usage: 28470527 tokens
    Why is it that LLM is not being used here?
    2) Whats the downside of using gpt_index==0.4.24 against the most current version… and what can we do to make the code working the most current version?

    Thanks
    Marc

  • Julio Falcon says:

    Hello, good morning

    I’m having this problem when try to run the app.py

    File “C:\Users\julio\AppData\Local\Programs\Python\Python311\Lib\site-packages\gpt_index\prompts\base.py”, line 9, in
    from langchain.schema import BaseLanguageModel
    ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (C:\Users\julio\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\schema.py)

    Somebody have any idea how to solve it?

    • Joao says:

      I hade the same issue 🙁

    • Sanchita Sarker says:

      I am getting the same error, would appreciate if someone could help me get unblock.

      • Peat says:

        needs specific version of langchain:
        pip install langchain==0.0.153
        (credit goes to a StackOverflow page I cant find anymore)

        also make sure that your gpt-index version is the one mentioned in the article

    • Ile says:

      In the end, the problem for was that the new version of llama-index doesn’t have the GPTSimpleVectorIndex library. You can get around that by installing an older version –> pip install llama-index==0.5.27 and it should work.

    • Rob Lind says:

      To fix this (I had the same problem!) the class name has changed so what you need to do is edit the base.py file on line 9 as the error suggests. You need to change line 9 to the following:-
      from langchain.schema import BaseModel as BaseLanguageModel

      Then it should all work!

    • Max Herrington says:

      figured it out

      go here
      Python\Python311\Lib\site-packages\gpt_index\prompts\base.py

      and change this: langchain.schema import BaseLanguageModel

      to this: from langchain.base_language import BaseLanguageModel

    • chris says:

      I had a similar issue and resolved it by downgrading the langchain version. Code below.
      pip install langchain==0.0.153

    • Federico says:

      I had the same issue when installing gpt_index==0.4.24.

      I had to go back to the screenshot and install manually the packages.
      pip uninstall llama_index
      pip uninstall langchain

      pip install langchain==0.0.132
      pip install openai==0.27.4
      pip install tiktoken==0.3.3
      pip install langchain==0.0.132

      This worked for me.

    • Ivan says:

      0

      fixed this with

      pip install langchain==0.0.118
      and

      pip install gpt_index==0.4.24

    • Bryan says:

      same issue. Anyone have some ideas?

    • simon says:

      I have the same error. I am 100% sure I have followed all the instructsions as written.

    • shakobe says:

      Running this fixed the BaseLanguageModel error for me:

      pip install langchain==0.0.118

  • Jack says:

    Thank you for the very easy to understand tutorial !
    Wonderful !

    Is there an easy way (tutorial) to make the ChatBot I trained not only 72 hours but permanently public ?
    That would be fantastic 🙂
    Thx !

    • Sebastian says:

      I would be interested in that as well.
      As I understood this creates a fined tuned chatGPT.
      Is this available on OpenAI or its always a part of the created Index?

  • Ubai says:

    Thank you very much , working perfectly , how to change the code so instead of reading preload file , it can upload a new PDF file

    • sandeep says:

      in the docs folder place the new pdf file

Leave a Reply