Appearance
What is an AI Agent?
AI Agent is one of the hottest topics in the AI field today, with examples like Coze overseas, Douzi domestically, and Alibaba's AI assistant. Recently, they even launched their own AI Agent marketplace.
Differences and Advantages Over Large Language Models (LLMs)
While large language models excel in language understanding and interactive tasks, their reasoning and action capabilities are often separate.
Reasoning Ability
This refers to an LLM's capacity to perform logical reasoning to solve problems. For example, using the Chain of Thought (COT) prompting technique, we can instruct the LLM to "think step by step," allowing it to reason through multi-step logical problems.
Action Ability
This describes the model's ability to generate actions or decisions in an interactive environment. For instance, in a Retrieval-Augmented Generation (RAG) system, the LLM retrieves information from a knowledge base and generates appropriate responses based on user queries, highlighting its action capability.
Traditional Considerations
In traditional research, these two aspects are typically not considered simultaneously. Reasoning does not change with the external environment, while action focuses on generating optimal action sequences in a given context, rather than relying on deep reasoning.
Constructing an AI Agent in LangChain
To create your own AI Agent in LangChain, you can integrate these reasoning and action capabilities, enabling a more dynamic and responsive interaction model. This allows you to leverage the strengths of both reasoning and action to create a robust AI solution.
AI Agent Example: Cooking with ReAct
Scenario
Consider a cooking recipe for scrambled eggs and tomatoes with specific steps:
- Take eggs and tomatoes out of the fridge.
- Heat oil in a pan.
- Add eggs and tomatoes.
- Add a pinch of salt.
While a large language model (LLM) might follow these steps, it cannot adapt if the fridge is empty, leading to an incomplete task.
Introduction of AI Agent
An AI Agent uses an LLM as its brain, integrating reasoning and action capabilities. When a user asks a question, the agent can perceive the environment, build memory, plan, and make decisions. It can even collaborate with other agents and interact with external systems via tools, enhancing its capabilities.
ReAct Framework
ReAct, introduced in a paper published in October 2022, combines reasoning and action in language model applications. It alternates between generating reasoning paths and specific task actions, enabling LLMs to tackle diverse language reasoning and decision-making tasks more effectively.
ReAct Process
ReAct includes an observation phase. After each action, it observes the current state before proceeding with the next reasoning step. For our cooking example, ReAct operates as follows:
- Thought: Need to take eggs and tomatoes from the fridge.
- Action: Take eggs and tomatoes from the fridge.
- Observation: The fridge is empty.
- Thought: Need to buy eggs at the market.
- Action: Go to the market to buy eggs.
- Observation: Eggs are bought.
- Thought: Need to heat oil.
- Action: Heat oil in the pan.
- Observation: Oil is heated.
- Thought: Need to add eggs and tomatoes.
The four essential components are memory, planning, action, and tools. We can integrate memory functionality into the LLM chain easily using LangChain.
Tools can be seen as advanced function calls. While basic function calling depends on model capabilities (available only in major models like OpenAI and Gemini), LangChain extends this functionality to all models through prompt engineering and code that parses responses.
A flowchart illustrating the complete ReAct implementation for an AI Agent shows how these components work together to achieve tasks efficiently.
In conclusion, ReAct provides a structured way for AI Agents to operate, allowing for dynamic interaction with the environment and efficient task completion.****
Constructing an AI Agent with LangChain
In this section, we will detail how to build an AI Agent using LangChain that can search the internet for relevant information and automatically store answers in a local file, while also incorporating memory for multi-turn conversations.
Steps to Construct the Agent
- Define Tools: Enhance the LLM's external interaction capabilities.
- Compose the Agent: Integrate the LLM, prompts, and tools into a functional agent.
- (Optional) Add Memory Components: Enable the agent to remember previous interactions.
Step 1: Define Tools
To analyze our agent's functionality, we identify that both "internet search" and "local file writing" extend beyond the LLM's inherent capabilities. Therefore, we need to define tools for web searching and file handling.
Web Search Tool
LangChain provides interfaces for various external systems, allowing us to easily integrate their capabilities into our agent. For web searching, we have options like Bing Search, Brave Search, DuckDuckGo Search, and Google Search. Today, we will use the SearchApi because it is accessible without requiring a VPN.
To use the SearchApi for searches, you need to register for an account. Once registered, you will receive 100 free searches and an API Key for calling the service.
Here's how to set it up:
python
import os
# Replace with your API key
os.environ["SEARCHAPI_API_KEY"] = "your_api_key_here"
from langchain_community.utilities import SearchApiAPIWrapper
# Initialize the search tool
search = SearchApiAPIWrapper()
result = search.run("什么是langchain")
print(result)
The output should look something like this:
LangChain is a framework designed to simplify the creation
of applications using large language models. As a language
model integration framework, LangChain's use-cases largely
overlap with those of language models in general, including
document analysis and summarization, chatbots, and code analysis.
Tool Object
Now, we need to encapsulate this functionality into a Tool object for the agent to utilize:
python
from langchain.agents import Tool
search_tool = Tool(
name="search_tool",
func=search.run,
description="Useful for when you need to ask with search",
)
With this, we have completed our web search tool.
Local File Writing Tool
LangChain provides various toolkits, including the FileManagementToolkit, which contains pre-packaged tools for managing local files. This toolkit includes tools for file operations such as creating, reading, updating, and deleting files.
Setting Up the File Management Toolkit
To get started with the FileManagementToolkit
, you can initialize it and specify the root directory for file operations. If you don't specify a directory, it defaults to the current working directory.
Here’s how to set it up:
python
from langchain_community.agent_toolkits import FileManagementToolkit
# Initialize the toolkit, specifying the root directory
tools = FileManagementToolkit(root_dir="/data/").get_tools()
print(tools)
This will output a list of available tools:
[CopyFileTool(root_dir='/data/'),
DeleteFileTool(root_dir='/data/'),
FileSearchTool(root_dir='/data/'),
MoveFileTool(root_dir='/data/'),
ReadFileTool(root_dir='/data/'),
WriteFileTool(root_dir='/data/'),
ListDirectoryTool(root_dir='/data/')]
Selecting the Write File Tool
In our agent example, we only need the WriteFileTool
for writing files. You can specify this when initializing the toolkit:
python
tools = FileManagementToolkit(selected_tools=["write_file"]).get_tools()
write_file_tool = tools[0] # Accessing the WriteFileTool
Writing to a File
Now, you can use the write_file_tool
to write text to a file. Here's how to do that:
python
# Invoke the write file tool to create a file
write_file_tool.invoke({"file_path": "example.txt", "text": "LangChain"})
After executing this code, you will see an example.txt
file created in the current directory, containing the text "LangChain".
At this point, you have successfully defined the two tools needed for your AI Agent: search_tool
for searching the web and write_file_tool
for writing to local files. These tools can now be integrated into your AI Agent to enhance its functionality.
Building the AI Agent
In LangChain, creating a fully functional AI Agent involves two main components: the Agent and the AgentExecutor.
Agent
The Agent is essentially the LLM invocation chain that runs during the agent's operation. It consists of a prompt, the LLM, and an output parser. Different types of agents construct the LLM invocation chain in various ways, depending on the type of model used and the tools it requires.
For example, models from OpenAI natively support function calls, allowing agents to use tools via a binding method. In contrast, other models that do not support function calls require a prompt to guide the LLM to return the tool name and parameters needed for a specific situation.
LangChain provides multiple methods for creating agents, accommodating these different requirements. In this example, we will use the create_openai_tools_agent
function to build our agent.
Creating the Agent
Here’s a simplified implementation of the create_openai_tools_agent
function:
python
def create_openai_tools_agent(
llm: BaseLanguageModel, tools: Sequence[BaseTool], prompt: ChatPromptTemplate
) -> Runnable:
missing_vars = {"agent_scratchpad"}.difference(prompt.input_variables)
if missing_vars:
raise ValueError(f"Prompt missing required variables: {missing_vars}")
llm_with_tools = llm.bind(tools=[convert_to_openai_tool(tool) for tool in tools])
agent = (
RunnablePassthrough.assign(
agent_scratchpad=lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
)
)
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
return agent
The logic here is straightforward: it constructs a LangChain execution link (LCEL).
Parameters for the Agent
The create_openai_tools_agent
function takes three parameters:
- llm: The instance of the language model.
- tools: The list of tools used by the agent.
- prompt: The prompt template to guide the agent's operation.
The function checks if the prompt contains the required agent_scratchpad
variable, which the AgentExecutor
uses to pass intermediate results. All AI agents in LangChain need this variable in their prompts.
Next, it converts each tool into an OpenAI tool object using convert_to_openai_tool
and binds them to the LLM, resulting in a new object called llm_with_tools
. Finally, it uses the OpenAIToolsAgentOutputParser
to process the model's output into a format that LangChain can handle.
Setting Up the LLM and Prompt
Now we need to create instances for the LLM and prompt required by the create_openai_tools_agent
function.
LLM Instance
You can instantiate a ChatOpenAI model directly:
python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
Prompt Instance
LangChain provides corresponding prompt templates for each agent type on LangChainHub. For our agent, we can use the hwchase17/openai-tools-agent
template:
python
from langchain import hub
prompt = hub.pull("hwchase17/openai-tools-agent")
Understanding the Prompt Structure
You may wonder why the concepts of Thought, Action, and Observation from the ReAct framework are not explicitly present in the prompt. In fact, for models that support function calls, the ReAct Agent essentially automates the invocation of tool functions.
When the LLM determines that a tool needs to be called, it returns the required actions in a tool_calls
format, corresponding to Thought and Action. The output is then parsed, and the relevant tool function is called, returning the results prefixed with Observation back to the LLM.
Creating the Agent
Now that we have the tools
, llm
, and prompt
, we can create our agent:
python
agent = create_openai_tools_agent(llm, tools, prompt)
This completes the construction of our AI Agent, which is now equipped with the necessary tools and prompts to perform its tasks. The next step is to integrate it with the AgentExecutor
for operational execution.
AgentExecutor
With the Agent defined, we now need the AgentExecutor to manage its execution. The AgentExecutor is responsible for invoking the LLM calling chain, determining whether the process should conclude based on the model's response, and, if not, executing the tools specified by the LLM while returning the results back to it. This cycle continues until a final action is determined.
Pseudocode for AgentExecutor
Here’s a simplified pseudocode representation of how the AgentExecutor operates:
plaintext
next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action
While this logic appears straightforward, the AgentExecutor also handles tool parameter conversions and manages exceptions that may arise during tool calls.
Initializing the AgentExecutor
To initialize the AgentExecutor, you would use the following code:
python
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
The verbose
parameter, when set to True
, enables logging of the execution process, allowing you to trace what the agent is doing step by step.
Running the AI Agent
Now we can invoke the agent using the agent_executor.invoke
method. For example:
python
agent_executor.invoke({"input": "网上搜索 langchain 相关资料,并将相关内容写入 langchain_info.txt中"})
Viewing Execution Logs
The execution log will show the step-by-step operations performed by the agent. After running the above command, you should find a new file named langchain_info.txt
in your project directory, containing the search results:
plaintext
# cat langchain_info.txt
LangChain is a framework designed to simplify the creation
of applications using large language models. As a language
model integration framework, LangChain's use-cases largely
overlap with those of language models in general, including
document analysis and summarization, chatbots, and code analysis.
Testing Additional Cases
To verify the AI Agent's functionality, let’s test another case:
python
agent_executor.invoke({"input": "计算 1+1的结果,并将结果写入 math.txt 文件"})
In this case, the LLM computes the result of 1 + 1
and directly invokes the write_file
tool to save the result to math.txt
. You can check the contents of this file using:
plaintext
# cat math.txt
2
Adding Memory Functionality to the AI Agent
To enhance our AI Agent with memory capabilities, allowing it to support multi-turn conversations, we can integrate a memory component using RunnableWithMessageHistory
. This approach enables the AI Agent to retain conversational context across multiple interactions.
Step-by-Step Process
1. Integrating Memory with AgentExecutor
Since AgentExecutor
inherits from Chain
, it can be treated as a LCEL
(LLM, Chain, Execution, and Logging) chain. We can follow the method introduced in the section Memory: Extending Your AI Assistant's Memory Beyond 7 Seconds, utilizing RunnableWithMessageHistory
to add memory functionality.
python
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
message_history = ChatMessageHistory()
agent_with_chat_history = RunnableWithMessageHistory(
agent_executor,
lambda session_id: message_history,
input_messages_key="input",
history_messages_key="chat_history",
)
Here, message_history
tracks the conversation history. The RunnableWithMessageHistory
wraps around agent_executor
to incorporate memory.
2. Testing the Memory Functionality
Now that the memory feature is added, we can test the AI Agent's ability to recall previous interactions.
python
agent_with_chat_history.invoke({"input": "网上搜索 langchain 相关资料"}, config={"configurable": {"session_id": "test"}})
print(agent_with_chat_history.invoke({"input": "你刚才搜索到的资料是什么"}, config={"configurable": {"session_id": "test"}}))
The execution output should include both the input message and the retrieved information from memory:
plaintext
{
'input': '你刚才搜索到的资料是什么',
'chat_history': [
HumanMessage(content='网上搜索 langchain 相关资料'),
AIMessage(content='I found information about LangChain, which is a framework designed to simplify the creation of applications using large language models. It is used for tasks such as document analysis and summarization, chatbots, and code analysis.')
],
'output': 'I found that LangChain is a framework designed to simplify the creation of applications using large language models. It is used for tasks such as document analysis and summarization, chatbots, and code analysis.'
}
Complete Code Example
Here’s a complete example demonstrating how to create an AI Agent with memory using LangChain:
python
import os
os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY'
os.environ['OPENAI_API_BASE'] = 'YOUR_OPENAI_API_BASE_URL'
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.utilities import SearchApiAPIWrapper
from langchain.agents import Tool
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
# Define File System Tool
from langchain_community.agent_toolkits import FileManagementToolkit
tools = FileManagementToolkit(selected_tools=["write_file"]).get_tools()
write_file_tool = tools[0]
# Define SearchApi Tool
os.environ["SEARCHAPI_API_KEY"] = "YOUR_SEARCH_API_KEY"
search = SearchApiAPIWrapper()
search_tool = Tool(
name="search_tool",
func=search.run,
description="useful for when you need to ask with search",
)
tools = [write_file_tool, search_tool]
# Create the Agent
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_openai_tools_agent(llm, tools, prompt)
# Create AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, stream_runnable=False, verbose=True)
# Add Memory Module
message_history = ChatMessageHistory()
agent_with_chat_history = RunnableWithMessageHistory(
agent_executor,
lambda session_id: message_history,
input_messages_key="input",
history_messages_key="chat_history",
)
# Run the AI Agent with Memory
agent_with_chat_history.invoke({"input": "网上搜索 langchain 相关资料, 并将结果写入langchain.txt中"}, config={"configurable": {"session_id": "test"}})
print(agent_with_chat_history.invoke({"input": "你刚才搜索到的资料是什么"}, config={"configurable": {"session_id": "test"}}))
Summary
- AI Agent leverages LLMs for reasoning and action capabilities, enabling it to perceive the environment, plan, and make decisions effectively.
- ReAct provides a design pattern for AI Agents by incorporating reasoning and action sequences with observations to guide decision-making iteratively.
- LangChain facilitates creating AI Agents by defining tools, constructing the Agent, and setting up the AgentExecutor for execution.
- Memory Module integration extends the AI Agent’s abilities to support multi-turn conversations, maintaining context over time.