Skip to content

What is an AI Agent?

AI Agent is one of the hottest topics in the AI field today, with examples like Coze overseas, Douzi domestically, and Alibaba's AI assistant. Recently, they even launched their own AI Agent marketplace.

Differences and Advantages Over Large Language Models (LLMs)

While large language models excel in language understanding and interactive tasks, their reasoning and action capabilities are often separate.

Reasoning Ability

This refers to an LLM's capacity to perform logical reasoning to solve problems. For example, using the Chain of Thought (COT) prompting technique, we can instruct the LLM to "think step by step," allowing it to reason through multi-step logical problems.

Action Ability

This describes the model's ability to generate actions or decisions in an interactive environment. For instance, in a Retrieval-Augmented Generation (RAG) system, the LLM retrieves information from a knowledge base and generates appropriate responses based on user queries, highlighting its action capability.

Traditional Considerations

In traditional research, these two aspects are typically not considered simultaneously. Reasoning does not change with the external environment, while action focuses on generating optimal action sequences in a given context, rather than relying on deep reasoning.

Constructing an AI Agent in LangChain

To create your own AI Agent in LangChain, you can integrate these reasoning and action capabilities, enabling a more dynamic and responsive interaction model. This allows you to leverage the strengths of both reasoning and action to create a robust AI solution.

AI Agent Example: Cooking with ReAct

Scenario

Consider a cooking recipe for scrambled eggs and tomatoes with specific steps:

  1. Take eggs and tomatoes out of the fridge.
  2. Heat oil in a pan.
  3. Add eggs and tomatoes.
  4. Add a pinch of salt.

While a large language model (LLM) might follow these steps, it cannot adapt if the fridge is empty, leading to an incomplete task.

Introduction of AI Agent

An AI Agent uses an LLM as its brain, integrating reasoning and action capabilities. When a user asks a question, the agent can perceive the environment, build memory, plan, and make decisions. It can even collaborate with other agents and interact with external systems via tools, enhancing its capabilities.

ReAct Framework

ReAct, introduced in a paper published in October 2022, combines reasoning and action in language model applications. It alternates between generating reasoning paths and specific task actions, enabling LLMs to tackle diverse language reasoning and decision-making tasks more effectively.

ReAct Process

ReAct includes an observation phase. After each action, it observes the current state before proceeding with the next reasoning step. For our cooking example, ReAct operates as follows:

  • Thought: Need to take eggs and tomatoes from the fridge.
  • Action: Take eggs and tomatoes from the fridge.
  • Observation: The fridge is empty.
  • Thought: Need to buy eggs at the market.
  • Action: Go to the market to buy eggs.
  • Observation: Eggs are bought.
  • Thought: Need to heat oil.
  • Action: Heat oil in the pan.
  • Observation: Oil is heated.
  • Thought: Need to add eggs and tomatoes.

The four essential components are memory, planning, action, and tools. We can integrate memory functionality into the LLM chain easily using LangChain.

Tools can be seen as advanced function calls. While basic function calling depends on model capabilities (available only in major models like OpenAI and Gemini), LangChain extends this functionality to all models through prompt engineering and code that parses responses.

A flowchart illustrating the complete ReAct implementation for an AI Agent shows how these components work together to achieve tasks efficiently.

In conclusion, ReAct provides a structured way for AI Agents to operate, allowing for dynamic interaction with the environment and efficient task completion.****

Constructing an AI Agent with LangChain

In this section, we will detail how to build an AI Agent using LangChain that can search the internet for relevant information and automatically store answers in a local file, while also incorporating memory for multi-turn conversations.

Steps to Construct the Agent

  1. Define Tools: Enhance the LLM's external interaction capabilities.
  2. Compose the Agent: Integrate the LLM, prompts, and tools into a functional agent.
  3. (Optional) Add Memory Components: Enable the agent to remember previous interactions.

Step 1: Define Tools

To analyze our agent's functionality, we identify that both "internet search" and "local file writing" extend beyond the LLM's inherent capabilities. Therefore, we need to define tools for web searching and file handling.

Web Search Tool

LangChain provides interfaces for various external systems, allowing us to easily integrate their capabilities into our agent. For web searching, we have options like Bing Search, Brave Search, DuckDuckGo Search, and Google Search. Today, we will use the SearchApi because it is accessible without requiring a VPN.

To use the SearchApi for searches, you need to register for an account. Once registered, you will receive 100 free searches and an API Key for calling the service.

Here's how to set it up:

python
import os

# Replace with your API key
os.environ["SEARCHAPI_API_KEY"] = "your_api_key_here"
from langchain_community.utilities import SearchApiAPIWrapper

# Initialize the search tool
search = SearchApiAPIWrapper()
result = search.run("什么是langchain")
print(result)

The output should look something like this:

LangChain is a framework designed to simplify the creation 
of applications using large language models. As a language 
model integration framework, LangChain's use-cases largely 
overlap with those of language models in general, including 
document analysis and summarization, chatbots, and code analysis.

Tool Object

Now, we need to encapsulate this functionality into a Tool object for the agent to utilize:

python
from langchain.agents import Tool

search_tool = Tool(
    name="search_tool",
    func=search.run,
    description="Useful for when you need to ask with search",
)

With this, we have completed our web search tool.

Local File Writing Tool

LangChain provides various toolkits, including the FileManagementToolkit, which contains pre-packaged tools for managing local files. This toolkit includes tools for file operations such as creating, reading, updating, and deleting files.

Setting Up the File Management Toolkit

To get started with the FileManagementToolkit, you can initialize it and specify the root directory for file operations. If you don't specify a directory, it defaults to the current working directory.

Here’s how to set it up:

python
from langchain_community.agent_toolkits import FileManagementToolkit

# Initialize the toolkit, specifying the root directory
tools = FileManagementToolkit(root_dir="/data/").get_tools()
print(tools)

This will output a list of available tools:

[CopyFileTool(root_dir='/data/'), 
 DeleteFileTool(root_dir='/data/'), 
 FileSearchTool(root_dir='/data/'), 
 MoveFileTool(root_dir='/data/'), 
 ReadFileTool(root_dir='/data/'), 
 WriteFileTool(root_dir='/data/'), 
 ListDirectoryTool(root_dir='/data/')]

Selecting the Write File Tool

In our agent example, we only need the WriteFileTool for writing files. You can specify this when initializing the toolkit:

python
tools = FileManagementToolkit(selected_tools=["write_file"]).get_tools()
write_file_tool = tools[0]  # Accessing the WriteFileTool

Writing to a File

Now, you can use the write_file_tool to write text to a file. Here's how to do that:

python
# Invoke the write file tool to create a file
write_file_tool.invoke({"file_path": "example.txt", "text": "LangChain"})

After executing this code, you will see an example.txt file created in the current directory, containing the text "LangChain".

At this point, you have successfully defined the two tools needed for your AI Agent: search_tool for searching the web and write_file_tool for writing to local files. These tools can now be integrated into your AI Agent to enhance its functionality.

Building the AI Agent

In LangChain, creating a fully functional AI Agent involves two main components: the Agent and the AgentExecutor.

Agent

The Agent is essentially the LLM invocation chain that runs during the agent's operation. It consists of a prompt, the LLM, and an output parser. Different types of agents construct the LLM invocation chain in various ways, depending on the type of model used and the tools it requires.

For example, models from OpenAI natively support function calls, allowing agents to use tools via a binding method. In contrast, other models that do not support function calls require a prompt to guide the LLM to return the tool name and parameters needed for a specific situation.

LangChain provides multiple methods for creating agents, accommodating these different requirements. In this example, we will use the create_openai_tools_agent function to build our agent.

Creating the Agent

Here’s a simplified implementation of the create_openai_tools_agent function:

python
def create_openai_tools_agent(
    llm: BaseLanguageModel, tools: Sequence[BaseTool], prompt: ChatPromptTemplate
) -> Runnable:
    missing_vars = {"agent_scratchpad"}.difference(prompt.input_variables)
    if missing_vars:
        raise ValueError(f"Prompt missing required variables: {missing_vars}")

    llm_with_tools = llm.bind(tools=[convert_to_openai_tool(tool) for tool in tools])

    agent = (
        RunnablePassthrough.assign(
            agent_scratchpad=lambda x: format_to_openai_tool_messages(
                x["intermediate_steps"]
            )
        )
        | prompt
        | llm_with_tools
        | OpenAIToolsAgentOutputParser()
    )
    return agent

The logic here is straightforward: it constructs a LangChain execution link (LCEL).

Parameters for the Agent

The create_openai_tools_agent function takes three parameters:

  1. llm: The instance of the language model.
  2. tools: The list of tools used by the agent.
  3. prompt: The prompt template to guide the agent's operation.

The function checks if the prompt contains the required agent_scratchpad variable, which the AgentExecutor uses to pass intermediate results. All AI agents in LangChain need this variable in their prompts.

Next, it converts each tool into an OpenAI tool object using convert_to_openai_tool and binds them to the LLM, resulting in a new object called llm_with_tools. Finally, it uses the OpenAIToolsAgentOutputParser to process the model's output into a format that LangChain can handle.

Setting Up the LLM and Prompt

Now we need to create instances for the LLM and prompt required by the create_openai_tools_agent function.

LLM Instance

You can instantiate a ChatOpenAI model directly:

python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

Prompt Instance

LangChain provides corresponding prompt templates for each agent type on LangChainHub. For our agent, we can use the hwchase17/openai-tools-agent template:

python
from langchain import hub

prompt = hub.pull("hwchase17/openai-tools-agent")

Understanding the Prompt Structure

You may wonder why the concepts of Thought, Action, and Observation from the ReAct framework are not explicitly present in the prompt. In fact, for models that support function calls, the ReAct Agent essentially automates the invocation of tool functions.

When the LLM determines that a tool needs to be called, it returns the required actions in a tool_calls format, corresponding to Thought and Action. The output is then parsed, and the relevant tool function is called, returning the results prefixed with Observation back to the LLM.

Creating the Agent

Now that we have the tools, llm, and prompt, we can create our agent:

python
agent = create_openai_tools_agent(llm, tools, prompt)

This completes the construction of our AI Agent, which is now equipped with the necessary tools and prompts to perform its tasks. The next step is to integrate it with the AgentExecutor for operational execution.

AgentExecutor

With the Agent defined, we now need the AgentExecutor to manage its execution. The AgentExecutor is responsible for invoking the LLM calling chain, determining whether the process should conclude based on the model's response, and, if not, executing the tools specified by the LLM while returning the results back to it. This cycle continues until a final action is determined.

Pseudocode for AgentExecutor

Here’s a simplified pseudocode representation of how the AgentExecutor operates:

plaintext
next_action = agent.get_action(...)  
while next_action != AgentFinish:  
    observation = run(next_action)  
    next_action = agent.get_action(..., next_action, observation)  
return next_action

While this logic appears straightforward, the AgentExecutor also handles tool parameter conversions and manages exceptions that may arise during tool calls.

Initializing the AgentExecutor

To initialize the AgentExecutor, you would use the following code:

python
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

The verbose parameter, when set to True, enables logging of the execution process, allowing you to trace what the agent is doing step by step.

Running the AI Agent

Now we can invoke the agent using the agent_executor.invoke method. For example:

python
agent_executor.invoke({"input": "网上搜索 langchain 相关资料,并将相关内容写入 langchain_info.txt中"})

Viewing Execution Logs

The execution log will show the step-by-step operations performed by the agent. After running the above command, you should find a new file named langchain_info.txt in your project directory, containing the search results:

plaintext
# cat langchain_info.txt
LangChain is a framework designed to simplify the creation 
of applications using large language models. As a language 
model integration framework, LangChain's use-cases largely 
overlap with those of language models in general, including 
document analysis and summarization, chatbots, and code analysis.

Testing Additional Cases

To verify the AI Agent's functionality, let’s test another case:

python
agent_executor.invoke({"input": "计算 1+1的结果,并将结果写入 math.txt 文件"})

In this case, the LLM computes the result of 1 + 1 and directly invokes the write_file tool to save the result to math.txt. You can check the contents of this file using:

plaintext
# cat math.txt
2

Adding Memory Functionality to the AI Agent

To enhance our AI Agent with memory capabilities, allowing it to support multi-turn conversations, we can integrate a memory component using RunnableWithMessageHistory. This approach enables the AI Agent to retain conversational context across multiple interactions.

Step-by-Step Process

1. Integrating Memory with AgentExecutor

Since AgentExecutor inherits from Chain, it can be treated as a LCEL (LLM, Chain, Execution, and Logging) chain. We can follow the method introduced in the section Memory: Extending Your AI Assistant's Memory Beyond 7 Seconds, utilizing RunnableWithMessageHistory to add memory functionality.

python
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

message_history = ChatMessageHistory()
agent_with_chat_history = RunnableWithMessageHistory(
    agent_executor,
    lambda session_id: message_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

Here, message_history tracks the conversation history. The RunnableWithMessageHistory wraps around agent_executor to incorporate memory.

2. Testing the Memory Functionality

Now that the memory feature is added, we can test the AI Agent's ability to recall previous interactions.

python
agent_with_chat_history.invoke({"input": "网上搜索 langchain 相关资料"}, config={"configurable": {"session_id": "test"}})
print(agent_with_chat_history.invoke({"input": "你刚才搜索到的资料是什么"}, config={"configurable": {"session_id": "test"}}))

The execution output should include both the input message and the retrieved information from memory:

plaintext
{
    'input': '你刚才搜索到的资料是什么', 
    'chat_history': [
        HumanMessage(content='网上搜索 langchain 相关资料'), 
        AIMessage(content='I found information about LangChain, which is a framework designed to simplify the creation of applications using large language models. It is used for tasks such as document analysis and summarization, chatbots, and code analysis.')
    ], 
    'output': 'I found that LangChain is a framework designed to simplify the creation of applications using large language models. It is used for tasks such as document analysis and summarization, chatbots, and code analysis.'
}

Complete Code Example

Here’s a complete example demonstrating how to create an AI Agent with memory using LangChain:

python
import os
os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY'
os.environ['OPENAI_API_BASE'] = 'YOUR_OPENAI_API_BASE_URL'

from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.utilities import SearchApiAPIWrapper
from langchain.agents import Tool
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# Define File System Tool
from langchain_community.agent_toolkits import FileManagementToolkit

tools = FileManagementToolkit(selected_tools=["write_file"]).get_tools()
write_file_tool = tools[0]

# Define SearchApi Tool
os.environ["SEARCHAPI_API_KEY"] = "YOUR_SEARCH_API_KEY"
search = SearchApiAPIWrapper()
search_tool = Tool(
    name="search_tool",
    func=search.run,
    description="useful for when you need to ask with search",
)

tools = [write_file_tool, search_tool]

# Create the Agent
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_openai_tools_agent(llm, tools, prompt)

# Create AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, stream_runnable=False, verbose=True)

# Add Memory Module
message_history = ChatMessageHistory()
agent_with_chat_history = RunnableWithMessageHistory(
    agent_executor,
    lambda session_id: message_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

# Run the AI Agent with Memory
agent_with_chat_history.invoke({"input": "网上搜索 langchain 相关资料, 并将结果写入langchain.txt中"}, config={"configurable": {"session_id": "test"}})
print(agent_with_chat_history.invoke({"input": "你刚才搜索到的资料是什么"}, config={"configurable": {"session_id": "test"}}))

Summary

  • AI Agent leverages LLMs for reasoning and action capabilities, enabling it to perceive the environment, plan, and make decisions effectively.
  • ReAct provides a design pattern for AI Agents by incorporating reasoning and action sequences with observations to guide decision-making iteratively.
  • LangChain facilitates creating AI Agents by defining tools, constructing the Agent, and setting up the AgentExecutor for execution.
  • Memory Module integration extends the AI Agent’s abilities to support multi-turn conversations, maintaining context over time.
What is an AI Agent? has loaded