In our earlier article, we demonstrated how to build an AI chatbot with the ChatGPT API and assign a role to personalize it. But what if you want to train the AI on your own data? For example, you may have a book, financial data, or a large set of databases, and you wish to search them with ease. In this article, we bring you an easy-to-follow tutorial on how to train an AI chatbot with your custom knowledge base with LangChain and ChatGPT API. We are deploying LangChain, GPT Index, and other powerful libraries to train the AI chatbot using OpenAI’s Large Language Model (LLM). So on that note, let’s check out how to train and create an AI Chatbot using your own dataset.
Notable Points Before You Train AI with Your Own Data
1. You can train the AI chatbot on any platform, whether Windows, macOS, Linux, or ChromeOS. In this article, I’m using Windows 11, but the steps are nearly identical for other platforms.
2. The guide is meant for general users, and the instructions are explained in simple language. So even if you have a cursory knowledge of computers and don’t know how to code, you can easily train and create a Q&A AI chatbot in a few minutes. If you followed our previous ChatGPT bot article, it would be even easier to understand the process.
3. Since we are going to train an AI Chatbot based on our own data, it’s recommended to use a capable computer with a good CPU and GPU. However, you can use any low-end computer for testing purposes, and it will work without any issues. I used a Chromebook to train the AI model using a book with 100 pages (~100MB). However, if you want to train a large set of data running into thousands of pages, it’s strongly recommended to use a powerful computer.
4. Finally, the data set should be in English to get the best results, but according to OpenAI, it will also work with popular international languages like French, Spanish, German, etc. So go ahead and give it a try in your own language.
Set Up the Software Environment to Train an AI Chatbot
Install Python and Pip
1. First off, you need to install Python along with Pip on your computer by following our linked guide. Make sure to enable the checkbox for “Add Python.exe to PATH” during installation.
2. To check if Python is properly installed, open the Terminal on your computer. Once here, run the below commands one by one, and it will output their version number. On Linux and macOS, you will have to use python3
instead of python
from now onwards.
python --version pip --version
3. Run the below command to update Pip to the latest version.
python -m pip install -U pip
Install OpenAI, GPT Index, PyPDF2, and Gradio Libraries
1. Open the Terminal and run the below command to install the OpenAI library.
pip install openai
2. Next, let’s install GPT Index.
pip install gpt_index==0.4.24
3. Now, install Langchain by running the below command.
pip install langchain==0.0.148
4. After that, install PyPDF2 and PyCryptodome to parse PDF files.
pip install PyPDF2 pip install PyCryptodome
5. Finally, install the Gradio library. This is meant for creating a simple UI to interact with the trained AI chatbot.
pip install gradio
Download a Code Editor
Finally, we need a code editor to edit some of the code. On Windows, I would recommend Notepad++ (Download). Simply download and install the program via the attached link. You can also use VS Code on any platform if you are comfortable with powerful IDEs. Other than VS Code, you can install Sublime Text (Download) on macOS and Linux.
For ChromeOS, you can use the excellent Caret app (Download) to edit the code. We are almost done setting up the software environment, and it’s time to get the OpenAI API key.
Get the OpenAI API Key For Free
1. Head to OpenAI’s website (visit) and log in. Next, click on “Create new secret key” and copy the API key. Do note that you can’t copy or view the entire API key later on. So it’s recommended to copy and paste the API key to a Notepad file for later use.
2. Next, go to platform.openai.com/account/usage and check if you have enough credit left. If you have exhausted all your free credit, you need to add a payment method to your OpenAI account.
Train and Create an AI Chatbot With Custom Knowledge Base
Add Your Documents to Train the AI Chatbot
1. First, create a new folder called docs
in an accessible location like the Desktop. You can choose another location as well according to your preference. However, keep the folder name docs
.
2. Next, move the documents for training inside the “docs” folder. You can add multiple text or PDF files (even scanned ones). If you have a large table in Excel, you can import it as a CSV or PDF file and then add it to the “docs” folder. You can also add SQL database files, as explained in this Langchain AI tweet. I haven’t tried many file formats besides the mentioned ones, but you can add and check on your own. For this article, I am adding one of my articles on NFT in PDF format.
Note: If you have a large document, it will take a longer time to process the data, depending on your CPU and GPU. In addition, it will quickly use your free OpenAI tokens. So in the beginning, start with a small document (30-50 pages or < 100MB files) to understand the process.
Make the Code Ready
1. Now, open a code editor like Sublime Text or launch Notepad++ and paste the below code. Once again, I have taken great help from armrrs on Google Colab and tweaked the code to make it compatible with PDF files and create a Gradio interface on top.
from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper from langchain.chat_models import ChatOpenAI import gradio as gr import sys import os os.environ["OPENAI_API_KEY"] = 'Your API Key' def construct_index(directory_path): max_input_size = 4096 num_outputs = 512 max_chunk_overlap = 20 chunk_size_limit = 600 prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit) llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name="gpt-3.5-turbo", max_tokens=num_outputs)) documents = SimpleDirectoryReader(directory_path).load_data() index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper) index.save_to_disk('index.json') return index def chatbot(input_text): index = GPTSimpleVectorIndex.load_from_disk('index.json') response = index.query(input_text, response_mode="compact") return response.response iface = gr.Interface(fn=chatbot, inputs=gr.components.Textbox(lines=7, label="Enter your text"), outputs="text", title="Custom-trained AI Chatbot") index = construct_index("docs") iface.launch(share=True)
2. Next, click on “File” in the top menu and select “Save As…” . After that, set the file name app.py
and change the “Save as type” to “All types”. Then, save the file to the location where you created the “docs” folder (in my case, it’s the Desktop).
3. Make sure the “docs” folder and “app.py” are in the same location, as shown in the screenshot below. The “app.py” file will be outside the “docs” folder and not inside.
4. Come back to the code again in Notepad++. Here, replace Your API Key
with the one that you generated above on OpenAI’s website.
5. Finally, press “Ctrl + S” to save the code. You are now ready to run the code.
Create ChatGPT AI Bot with Custom Knowledge Base
1. First, open the Terminal and run the below command to move to the Desktop. It’s where I saved the “docs” folder and “app.py” file.
cd Desktop
2. Now, run the below command.
python app.py
3. It will start indexing the document using the OpenAI LLM model. Depending on the file size, it will take some time to process the document. Once it’s done, an “index.json” file will be created on the Desktop. If the Terminal is not showing any output, do not worry, it might still be processing the data. For your information, it takes around 10 seconds to process a 30MB document.
4. Once the LLM has processed the data, you will find a local URL. Copy it.
5. Now, paste the copied URL into the web browser, and there you have it. Your custom-trained ChatGPT-powered AI chatbot is ready. To start, you can ask the AI chatbot what the document is about.
6. You can ask further questions, and the ChatGPT bot will answer from the data you provided to the AI. So this is how you can build a custom-trained AI chatbot with your own dataset. You can now train and create an AI chatbot based on any kind of information you want.
Manage the Custom AI Chatbot
1. You can copy the public URL and share it with your friends and family. The link will be live for 72 hours, but you also need to keep your computer turned on since the server instance is running on your computer.
2. To stop the custom-trained AI chatbot, press “Ctrl + C” in the Terminal window. If it does not work, press “Ctrl + C” again.
3. To restart the AI chatbot server, simply move to the Desktop location again and run the below command. Keep in mind, the local URL will be the same, but the public URL will change after every server restart.
python app.py
4. If you want to train the AI chatbot with new data, delete the files inside the “docs” folder and add new ones. You can also add multiple files, but make sure to add clean data to get a coherent response.
5. Now, run the code again in the Terminal, and it will create a new “index.json” file. Here, the old “index.json” file will be replaced automatically.
python app.py
6. To keep track of your tokens, head over to OpenAI’s online dashboard and check how much free credit is left.
7. Lastly, you don’t need to touch the code unless you want to change the API key or the OpenAI model for further customization.
Got it working following your steps and works like a charm. I have a question though,
Does adding new pdf (along with existing ones) requires reloading .py app ?
Also, how to add websites as the source along with pdfs ? Could use this code to build chatbot to Q and A websites.
Thank you arjun, it’s work.
But what if I want it to write 3000 words article?
I had a compatibility problem between GPT_index and langchain, which is why the import did not work. (Import could not find BaseLanguageModel.) Could solve it thanks to ChatGPT 🙂 by: pip install langchain==0.0.153
Hi Arjun
Super document.
I get error message after pasting “python3 app.py” in Terminal (no index.json is created)
ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/schema.py)
When installing Python (Mac) there was no choice to “Add Python.exe to PATH”.
Thank you
I encounter the same problem. No index.json is created.
ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (C:\Users\arvinpedrosa\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\schema.py)
I was now able to resolve the same issue with the help of chatGPT provided me with the correct code.
Installing the latest versions of the libraries, I had to make the following modifications to the code to get it to work.
“`
from llama_index import SimpleDirectoryReader, LLMPredictor, GPTVectorStoreIndex, PromptHelper, ServiceContext, load_index_from_storage, StorageContext
from langchain.chat_models import ChatOpenAI
import argparse
import gradio as gr
import os
os.environ[“OPENAI_API_KEY”] = ‘— API KEY —‘
parser = argparse.ArgumentParser(description=”Launch chatbot”)
parser.add_argument(‘-t’, ‘-train’, action=’store_true’, help=”Train the model”)
parser.add_argument(‘-i’, ‘-input’, default=’docs’, help=’Set input directory path’)
parser.add_argument(‘-o’, ‘-output’, default=’./gpt_store’, help=”Set output directory path”)
args = parser.parse_args()
# define prompt helper
max_input_size = 4096
num_output = 1000 # number of output tokens
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name=”gpt-3.5-turbo”, max_tokens=num_output))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
def construct_index():
print(“Constructing index…”)
# load in the documents
docs = SimpleDirectoryReader(args.i).load_data()
index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)
# save index to disk
index.set_index_id(‘vector_index’)
index.storage_context.persist(persist_dir=args.o)
return index
def chatbot(input_text):
# If not already done, initialize ‘index’ and ‘query_engine’
if not hasattr(chatbot, “index”):
# rebuild storage context and load index
storage_context = StorageContext.from_defaults(persist_dir=args.o)
chatbot.index = load_index_from_storage(service_context=service_context, storage_context=storage_context, index_id=”vector_index”)
# Initialize query engine
chatbot.query_engine = chatbot.index.as_query_engine()
print(“Context initialized”)
# Submit query
response = chatbot.query_engine.query(input_text)
return response.response
iface = gr.Interface(fn=chatbot,
inputs=gr.Textbox(lines=7, label=”Enter your text”),
outputs=”text”,
title=”Custom-trained AI Chatbot”)
if args.t:
construct_index()
iface.launch(share=True)
“`
Looks it works with new libs, but when querying using the web interface it throws the following error
raise ValueError(f”No existing {__name__} found at {persist_path}.”)
ValueError: No existing llama_index.storage.kvstore.simple_kvstore found at ./gpt_store\docstore.json.
Could it be the way is called the app? i just ran it using “phyton app.py” on the same dir as dir doc. Is there any arguments i need to use on the command line?
Thanks!
run first time with python app.py -t to train data first…
Would it be possible for you to send me the code in another way? I tried copy-pasting and i dont think it copied correctly.
I am new to Python and dont know all the syntax and such yet.
I used this code and it generated the following:
File “C:\Users\arvinpedrosa\Desktop\pdx.py”, line 1
“`
^
SyntaxError: invalid character ‘“’ (U+201C)
I had to refer to the LlamaIndex 0.6.8 docs and alter the code like this to get it to work:
“`
from llama_index import SimpleDirectoryReader, LLMPredictor, GPTVectorStoreIndex, PromptHelper, ServiceContext, load_index_from_storage, StorageContext
from langchain.chat_models import ChatOpenAI
import gradio as gr
import os
os.environ[“OPENAI_API_KEY”] = ‘— API KEY HERE —‘
# define prompt helper
max_input_size = 4096
num_output = 512 # number of output tokens
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name=”gpt-3.5-turbo”, max_tokens=num_output))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
def construct_index(directory_path):
# load in the documents
docs = SimpleDirectoryReader(directory_path).load_data()
index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)
# save index to disk
index.set_index_id(‘vector_index’)
index.storage_context.persist(persist_dir=”./gpt_store”)
return index
def chatbot(input_text):
# If not already done, initialize ‘index’ and ‘query_engine’
if not hasattr(chatbot, “index”):
# rebuild storage context and load index
storage_context = StorageContext.from_defaults(persist_dir=”./gpt_store”)
chatbot.index = load_index_from_storage(service_context=service_context, storage_context=storage_context, index_id=”vector_index”)
# Initialize query engine
chatbot.query_engine = chatbot.index.as_query_engine()
# Submit query
response = chatbot.query_engine.query(input_text)
return response.response
iface = gr.Interface(fn=chatbot,
inputs=gr.Textbox(lines=7, label=”Enter your text”),
outputs=”text”,
title=”Custom-trained AI Chatbot”)
index = construct_index(“docs”) #comment out after 1st run if training docs aren’t changing
iface.launch(share=True)
“`
Hi Cameron! I’m working with your code sample and am getting: ModuleNotFoundError: No module named ‘langchain.base_language’
I’m using: llama-index 0.6.1 and langchain 0.0.194. Any ideas?
TIA!
Would like to run this off cloud…advise?
Great tutorial, how can I do the same but instead on training with localhost docs it’s a website url?
Thanks!
Even I am interested in knowing the same.
If somebody could help, it would be really helpful
i am getting a timeout error. Can anyone please help me to figure out the issue please
The error message is showing
“WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by ‘ConnectTimeoutError(, ‘Connection to api.openai.com timed out. (connect timeout=600)’)’: /v1/engines/text-embedding-ada-002/embeddings”
I’m facing the error
ImportError: cannot import name ‘RequestsWrapper’ from ‘langchain.utilities’ (C:\Users\famira\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\utilities\__init__.py)
hello everyone , i’ve got this as an error:
INFO:openai:error_code=None error_message=’You exceeded your current quota, please check your plan and billing details.’ error_param=None error_type=insufficient_quota message=’OpenAI API error received’ stream_error=False
Does this mean that i have a problem with my api key ? should i pay or something ?
Thanks in advance!!
hello everyone ,
i have this as an error :
INFO:openai:error_code=None error_message=’You exceeded your current quota, please check your plan and billing details.’ error_param=None error_type=insufficient_quota message=’OpenAI API error received’ stream_error=False
does this mean that i have a problem with my api key ? should i pay or something ?
thanks in advance!!
I made application run, but chatbot is giving me the information that doesn’t exist from the data I provide in docs folder.
How can I solve this issue?
I’m having the same issue
I have done all the steps as mentioned , and that thing works as well , but i want to use this custom trained bot in my flutter project , can you please tell me how to do that
I am getting this error:
Traceback (most recent call last):
File “/Users/dan/notes/Hackathon/chat_bot_1.py”, line 1, in
from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/__init__.py”, line 18, in
from gpt_index.indices.common.struct_store.base import SQLDocumentContextBuilder
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/__init__.py”, line 4, in
from gpt_index.indices.keyword_table.base import GPTKeywordTableIndex
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/keyword_table/__init__.py”, line 4, in
from gpt_index.indices.keyword_table.base import GPTKeywordTableIndex
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/keyword_table/base.py”, line 16, in
from gpt_index.indices.base import DOCUMENTS_INPUT, BaseGPTIndex
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/base.py”, line 23, in
from gpt_index.indices.prompt_helper import PromptHelper
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/indices/prompt_helper.py”, line 12, in
from gpt_index.langchain_helpers.chain_wrapper import LLMPredictor
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/langchain_helpers/chain_wrapper.py”, line 13, in
from gpt_index.prompts.base import Prompt
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/prompts/__init__.py”, line 3, in
from gpt_index.prompts.base import Prompt
File “/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gpt_index/prompts/base.py”, line 9, in
from langchain.schema import BaseLanguageModel
ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/schema.py)
Same here. Any idea?
You can fix this issue with the following command
pip install langchain==0.0.132
Thank you so much!
Hi… Can some one help me with below errors…
C:\Users\DELL\Desktop>python app.py
Traceback (most recent call last):
File “C:\Users\DELL\Desktop\app.py”, line 3, in
import gradio as gr
ModuleNotFoundError: No module named ‘gradio’
C:\Users\DELL\Desktop>python app.py
Traceback (most recent call last):
File “C:\Users\DELL\Desktop\app.py”, line 3, in
import gradio as gr
ModuleNotFoundError: No module named ‘gradio’
C:\Users\DELL\Desktop>pip install gradio
Collecting gradio
Using cached gradio-3.28.3-py3-none-any.whl (17.3 MB)
Collecting aiofiles (from gradio)
Using cached aiofiles-23.1.0-py3-none-any.whl (14 kB)
Requirement already satisfied: aiohttp in c:\users\dell\appdata\local\programs\python\python311\lib\site-packages (from gradio) (3.8.4)
Collecting altair>=4.2.0 (from gradio)
Using cached altair-4.2.2-py3-none-any.whl (813 kB)
Collecting fastapi (from gradio)
Using cached fastapi-0.95.1-py3-none-any.whl (56 kB)
Collecting ffmpy (from gradio)
Using cached ffmpy-0.3.0.tar.gz (4.8 kB)
Installing build dependencies … done
Getting requirements to build wheel … error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
running egg_info
writing ffmpy.egg-info\PKG-INFO
writing dependency_links to ffmpy.egg-info\dependency_links.txt
writing top-level names to ffmpy.egg-info\top_level.txt
reading manifest file ‘ffmpy.egg-info\SOURCES.txt’
writing manifest file ‘ffmpy.egg-info\SOURCES.txt’
OSError: [Errno 9] Bad file descriptor
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “C:\Users\DELL\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py”, line 353, in
main()
File “C:\Users\DELL\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py”, line 349, in main
write_json(json_out, pjoin(control_dir, ‘output.json’), indent=2)
File “C:\Users\DELL\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py”, line 31, in write_json
with open(path, ‘w’, encoding=’utf-8′) as f:
OSError: [Errno 9] Bad file descriptor
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
C:\Users\DELL\Desktop>
Hey people, has anyone tried to limit the responses to the custom knowledge base only?
Maybe, you can change the temperature to 0 (zero) and try.
This code sis working for plain text files, but I am getting an error in PyPDF2 when trying to use PDF files. I have tried a number of version combinations of the various packages in an attempt to resolve the problem. Any advice on how to address the issue?
Error messages displayed when executing the code:
Traceback (most recent call last):
File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1623, in _read_xref_tables_and_trailers
xrefstream = self._read_pdf15_xref_stream(stream)
File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1752, in _read_pdf15_xref_stream
self._read_xref_subsections(idx_pairs, get_entry, used_before)
File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1808, in _read_xref_subsections
assert start >= last_end
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “C:\Users\nnnn\OneDrive\Desktop\DocBot\chatbot.py”, line 37, in
index = construct_index(“docs”)
File “C:\Users\nnnn\OneDrive\Desktop\DocBot\chatbot.py”, line 19, in construct_index
documents = SimpleDirectoryReader(directory_path).load_data()
File “D:\anaconda3\lib\site-packages\gpt_index\readers\file\base.py”, line 150, in load_data
data = parser.parse_file(input_file, errors=self.errors)
File “D:\anaconda3\lib\site-packages\gpt_index\readers\file\docs_parser.py”, line 30, in parse_file
pdf = PyPDF2.PdfReader(fp)
File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 319, in __init__
self.read(stream)
File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1426, in read
self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
File “D:\anaconda3\lib\site-packages\PyPDF2\_reader.py”, line 1632, in _read_xref_tables_and_trailers
raise PdfReadError(f”trailer can not be read {e.args}”)
PyPDF2.errors.PdfReadError: trailer can not be read ()
Current version of OS and libraries mentioned in this article:
Windows 11
Python – 3.10.11
OpenAI – 0.27.6
GPT Index – 0.4.24
PyPDF2 – 3.0.0
PyCryptodome – 3.17
Gradio – 3.28.3
How can I expend the length of the output. It always shows ~300 character maximum…………..
i have this error on macOS
from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
ImportError: No module named gpt_index
Getting the following error:
ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’
I’m guessing there’s an updated version of this part of the code:
from langchain.chat_models import ChatOpenAI
Any ideas?
I got the same thing. Did you figure it out?
install pip install langchain==0.0.132
Hi, thanks for the great work!
Please allow two questions:
1) When indexing 600 RTF files, the result says INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens, INFO:root:> [build_index_from_documents] Total embedding token usage: 28470527 tokens
Why is it that LLM is not being used here?
2) Whats the downside of using gpt_index==0.4.24 against the most current version… and what can we do to make the code working the most current version?
Thanks
Marc
Hello, good morning
I’m having this problem when try to run the app.py
File “C:\Users\julio\AppData\Local\Programs\Python\Python311\Lib\site-packages\gpt_index\prompts\base.py”, line 9, in
from langchain.schema import BaseLanguageModel
ImportError: cannot import name ‘BaseLanguageModel’ from ‘langchain.schema’ (C:\Users\julio\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\schema.py)
Somebody have any idea how to solve it?
Just found it out: https://github.com/hwchase17/langchain/issues/1595
Fixed this by running:
pip install langchain==0.0.107
I hade the same issue 🙁
I am getting the same error, would appreciate if someone could help me get unblock.
needs specific version of langchain:
pip install langchain==0.0.153
(credit goes to a StackOverflow page I cant find anymore)
also make sure that your gpt-index version is the one mentioned in the article
In the end, the problem for was that the new version of llama-index doesn’t have the GPTSimpleVectorIndex library. You can get around that by installing an older version –> pip install llama-index==0.5.27 and it should work.
To fix this (I had the same problem!) the class name has changed so what you need to do is edit the base.py file on line 9 as the error suggests. You need to change line 9 to the following:-
from langchain.schema import BaseModel as BaseLanguageModel
Then it should all work!
figured it out
go here
Python\Python311\Lib\site-packages\gpt_index\prompts\base.py
and change this: langchain.schema import BaseLanguageModel
to this: from langchain.base_language import BaseLanguageModel
I had a similar issue and resolved it by downgrading the langchain version. Code below.
pip install langchain==0.0.153
I had the same issue when installing gpt_index==0.4.24.
I had to go back to the screenshot and install manually the packages.
pip uninstall llama_index
pip uninstall langchain
pip install langchain==0.0.132
pip install openai==0.27.4
pip install tiktoken==0.3.3
pip install langchain==0.0.132
This worked for me.
0
fixed this with
pip install langchain==0.0.118
and
pip install gpt_index==0.4.24
same issue. Anyone have some ideas?
I have the same error. I am 100% sure I have followed all the instructsions as written.
This stack overflow answer worked for me:
https://stackoverflow.com/questions/76153016/gpt-chatbot-not-working-after-using-open-ai-imports-and-langchain
After installing the new libraries I was also prompted to add the transformers library
Then everything worked like a charm
Good Luck!
Running this fixed the BaseLanguageModel error for me:
pip install langchain==0.0.118
Thank you for the very easy to understand tutorial !
Wonderful !
Is there an easy way (tutorial) to make the ChatBot I trained not only 72 hours but permanently public ?
That would be fantastic 🙂
Thx !
I would be interested in that as well.
As I understood this creates a fined tuned chatGPT.
Is this available on OpenAI or its always a part of the created Index?
Thank you very much , working perfectly , how to change the code so instead of reading preload file , it can upload a new PDF file
in the docs folder place the new pdf file