Appearance
The Implementation Principle of Agent
Compared to directly using LLMs, AI Agents can perceive changes in the external environment in real-time and make corresponding decisions and actions, greatly enhancing the capability and efficiency of LLMs in solving tasks.
Today, I will guide you through the implementation code related to LangChain from start to finish, helping you better understand the concepts and details involved in constructing and running the AI Agent process. This will be very helpful for building complex Agent applications and debugging Agent code.
Before we start, please consider the following two questions:
- After the LLM call returns, how does the AgentExecutor determine whether to call a tool or to terminate the execution process and return the result to the user?
- For models that do not support function calls, how does the AI Agent implement tool calls?
I hope you can keep these two questions in mind as we continue our learning.
A More General ReAct Agent
The create_openai_tools_agent
used in the last lesson is a relatively simple type of AI Agent: it directly utilizes the model's function calling capabilities to specify which tool function to call. The AgentExecutor also directly determines whether to execute the relevant tool function or return to the user based on whether the message returned by the LLM is a tool call. However, this type of Agent is limited to models that support function calling.
In fact, LangChain provides a more general type of ReAct Agent: create_react_agent
, which can be used with various models.
In this lesson, we will use this method to construct a ReAct Agent and explore the implementation principles of AI Agents in LangChain.
Defining the Agent Tool Function
python
# Simulate getting the weather for a specific city
def GetWeather(city):
return 30
python
from langchain.agents import Tool
# Encapsulate as a Tool
weather_tool = Tool(
name="search_weather",
func=GetWeather,
description="useful for when you need to search for weather",
)
tools = [weather_tool]
Defining the Agent Call Chain
python
from langchain_openai import OpenAI
from langchain import hub
from langchain.agents import create_react_agent
llm = OpenAI()
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, tools, prompt)
Creating the AgentExecutor
python
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
Through these three steps, we have created an AI Agent with a custom tool capable of retrieving weather information. Next, let’s try to have it formulate a travel plan based on the weather:
python
print(agent_executor.invoke({"input": "根据北京的天气情况,制定一个出游计划"}))
Entering new AgentExecutor chain...
I need to find out the weather in Beijing
Action: search_weather
Action Input: Beijing
30 degrees Celsius with a UV index of 9 means I need to bring strong sunscreen
Final Answer: Based on the weather in Beijing, I should plan for hot and possibly wet weather and bring strong sunscreen.
Finished chain.
{'input': '根据北京的天气情况,制定一个出游计划', 'output': 'Based on the weather in Beijing, I should plan for hot and possibly wet weather and bring strong sunscreen.'}
Due to our overly simple prompt, the resulting travel plan is not perfect. However, based on the execution log above, we can see that this AI Agent can indeed call our tool function as needed to assist in task processing. Next, we will learn about the implementation principles of Agents in LangChain in the order of constructing the AI Agent (defining Tool, defining Agent, defining AgentExecutor).
Tool
Tools are external capabilities that an AI Agent can invoke during task execution. We can call retrievers for querying specific domain data or custom functions to execute business logic. To standardize the calling methods for various external systems, LangChain abstracts a Tool layer, allowing any function to be encapsulated as a Tool object that can be called by the AI Agent.
python
class BaseTool(RunnableSerializable[Union[str, Dict], Any]):
...
name: str
...
python
class Tool(BaseTool):
"""Tool that takes in function or coroutine directly."""
description: str = ""
func: Optional[Callable[..., str]]
"""The function to run when the tool is called."""
The Tool class has three key attributes: name
, description
, and func
. The name
is the tool's identifier; the description
explains the tool's functionality and how to invoke it, which will be included in the prompt for the LLM to decide when and how to use the tool for further processing; func
is the actual function executed by the tool.
For example, in our previously created weather_tool
, the prompt would include a section like this:
python
prompt = """
...
Here are the tools available to assist in answering questions:
{"weather_tool": "useful for when you need to search for weather"}
...
"""
Thus, when the LLM infers the need to retrieve weather information, it will return a response like (in pseudocode):
json
{"use_tool": "weather_tool", "input": "beijing"}
The AgentExecutor
maintains a mapping structure from tool.name
to the Tool instances. Upon parsing the above response from the LLM, it retrieves the tool name and parameters to invoke the corresponding Tool instance and calls tool.func(input)
. This is the entire usage process of tools. By inserting the tool's description into the prompt, the LLM can return the necessary function and parameters to call when needed, while the AgentExecutor
performs the actual invocation.
Agent (LLM Call Chain)
As mentioned earlier, an AI Agent essentially involves multiple calls to an LLM chain. This chain is defined as an Agent in LangChain, responsible for deciding the next action. It typically consists of three parts: the prompt, the LLM model, and the output parser.
Different AI Agents have different LLM chains. Let's analyze the LLM chain of create_react_agent
as an example.
Prompt Template
First, let's look at the prompt template used:
plaintext
# hwchase17/react
Answer the following questions as best you can. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}
The prompt defines several prefixes (Thought/Action/Action Input/Observation/Final Answer). The Thought
is appended to the input we submit to the LLM. Thus, each round of submission ends with "Thought:", prompting the LLM to express its reasoning process first.
For instance, if the LLM finds that the current information is insufficient, it may state its reasoning like this:
plaintext
I should search for the weather in Beijing to help with planning the trip
Action: weather_tool
Action Input: beijing
The tool's execution result will be prefixed with Observation
in the next round of prompts:
plaintext
# prompt
...
Observation: 30 # Result from weather_tool
...
If the LLM can derive a final result based on the available information, it will similarly first express its reasoning and then provide the final output:
plaintext
30 degrees Celsius is quite hot, I should plan accordingly
Final Answer: the final answer is xxxx
Interaction Example
Let’s examine the interaction with the LLM during the "travel plan" example:
First Round:
plaintext
...
Answer the following questions as best you can. You have access to the following tools:
{"weather_tool": "useful for when you need to search for weather"}
...
Action: the action to take, should be one of [weather_tool]
...
Question: 根据北京的天气情况,制定一个出游计划
Thought:
LLM Response:
plaintext
I should search for the weather in Beijing to help with planning the trip
Action: weather_tool
Action Input: beijing
Second Round: (After AgentExecutor
calls the tool to get results)
plaintext
...
Answer the following questions as best you can. You have access to the following tools:
{"weather_tool": "useful for when you need to search for weather"}
...
Action: the action to take, should be one of [weather_tool]
...
Question: 根据北京的天气情况,制定一个出游计划
Thought: I should search for the weather in Beijing to help with planning the trip
Action: weather_tool
Action Input: beijing
Observation: 30
Thought:
LLM Response:
plaintext
30 degrees Celsius is quite hot, I should plan accordingly
Final Answer: Based on the weather in Beijing, I should plan for hot and possibly wet weather and bring strong sunscreen.
Output Parser
The text output from the LLM is not directly processed by the AgentExecutor; instead, it first goes through an output parser, which formats it based on specific prefixes.
The create_react_agent
by default uses ReActSingleInputOutputParser
as its output parser. Here’s how it’s implemented:
python
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
# FINAL_ANSWER_ACTION = "Final Answer"
includes_answer = FINAL_ANSWER_ACTION in text
# Regular expression to match tool name and parameters
regex = (
r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
)
action_match = re.search(regex, text, re.DOTALL)
if action_match:
action = action_match.group(1).strip()
action_input = action_match.group(2)
tool_input = action_input.strip(" ")
tool_input = tool_input.strip('"')
return AgentAction(action, tool_input, text)
elif includes_answer:
return AgentFinish(
{"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
)
If the LLM response contains Action
and Action Input
, the parser uses a regular expression to extract the content following these prefixes and wraps it into an AgentAction
object. This object contains the tool name (tool
) and the input parameters (tool_input
).
python
class AgentAction(Serializable):
tool: str
tool_input: Union[str, dict]
...
If the LLM response includes Final Answer
, it extracts the content that follows and wraps it into an AgentFinish
object, storing the final result as a key-value pair (with the key being output
) in return_values
.
python
class AgentFinish(Serializable):
return_values: dict
...
It’s important to note that the process of parsing LLM text into AgentAction
or AgentFinish
objects is not exclusive to ReActSingleInputOutputParser
; all Agents’ output parsers will perform similar operations. This conversion layer allows the AgentExecutor
to handle only AgentAction
and AgentFinish
objects, without needing to concern itself with the specific type of Agent being used.
AgentExecutor
We finally arrive at the more complex part of the entire implementation—AgentExecutor
. This is the "runtime" of the AI Agent, responsible for calling the LLM chain, executing the actions chosen by the LLM (either invoking tools or returning the final result), and passing the operation results back to the LLM chain, repeating this process.
Constructing the AgentExecutor
First, let’s look at how to instantiate an AgentExecutor
:
python
agent_executor = AgentExecutor(agent=agent, tools=tools)
Here, two key parameters are passed in: agent
and tools
, which represent the LLM chain being used and the list of tools, respectively. Now, let's examine how these parameters are defined in the AgentExecutor
class:
python
class AgentExecutor(Chain):
agent: Union[BaseSingleActionAgent, BaseMultiActionAgent]
tools: Sequence[BaseTool]
You might notice an issue: the agent
we pass in is an LCEL chain, but AgentExecutor.agent
requires either a BaseSingleActionAgent
or a BaseMultiActionAgent
object.
- BaseSingleActionAgent: Can call only one tool at a time.
- BaseMultiActionAgent: Can call multiple tools at once, similar to the parallel function calls mentioned in the section on "Function Calling: Activating the LLM's Superpowers."
This distinction is facilitated by the root_validator
functionality from the Pydantic library, which performs field validation and logical conversions before instance construction.
python
class AgentExecutor(Chain):
...
@root_validator(pre=True)
def validate_runnable_agent(cls, values: Dict) -> Dict:
agent = values["agent"]
if isinstance(agent, Runnable):
...
values["agent"] = RunnableAgent(
runnable=agent, ...
)
return values
To keep the code concise, the branch logic for BaseMultiActionAgent
is omitted.
Inside the AgentExecutor
, the provided LCEL chain is automatically converted into a RunnableAgent
, which is a type of BaseSingleActionAgent
.
python
class RunnableAgent(BaseSingleActionAgent):
runnable: Runnable[dict, Union[AgentAction, AgentFinish]]
...
def plan(
self,
intermediate_steps: List[Tuple[AgentAction, str]],
callbacks: Callbacks = None,
**kwargs: Any,
) -> Union[AgentAction, AgentFinish]:
inputs = {**kwargs, **{"intermediate_steps": intermediate_steps}}
...
final_output = self.runnable.invoke(inputs, ...)
Ultimately, the AgentExecutor
uses the plan
method to call the previously constructed LCEL chain. This structure allows the AgentExecutor to efficiently manage the interaction between the LLM and the tools, ensuring that the right actions are executed in response to the LLM's outputs.
Execution of AgentExecutor
Having understood some construction details of the AgentExecutor
, let’s explore how it interacts with other components during execution.
Entry Point for Execution
The execution begins with the invoke
method. The AgentExecutor
does not implement invoke
itself but uses the one from its parent class, Chain
. The Chain.invoke
method eventually calls self._call
.
python
class Chain(RunnableSerializable[Dict[str, Any], Dict[str, Any]], ABC):
...
def invoke(
self,
input: Dict[str, Any],
...
) -> Dict[str, Any]:
...
self._call(inputs)
Thus, the entry point for AgentExecutor
is the _call
method. Here’s a simplified version of its implementation:
python
class AgentExecutor(Chain):
...
def _call(
self,
inputs: Dict[str, str],
...
) -> Dict[str, Any]:
name_to_tool_map = {tool.name: tool for tool in self.tools}
intermediate_steps: List[Tuple[AgentAction, str]] = []
iterations = 0
time_elapsed = 0.0
start_time = time.time()
# Check if we should continue
while self._should_continue(iterations, time_elapsed):
next_step_output = self._take_next_step(
name_to_tool_map,
...,
inputs,
intermediate_steps,
...
)
if isinstance(next_step_output, AgentFinish):
return self._return(
next_step_output, intermediate_steps, ...
)
...
intermediate_steps.extend(next_step_output)
iterations += 1
time_elapsed = time.time() - start_time
...
return self._return(output, intermediate_steps, ...)
For task processing, the AI Agent is not allowed to run indefinitely; it has a maximum number of iterations (max_iterations
, default 15) and a maximum execution time (max_execution_time
, default None, meaning no limit). This is controlled by the _should_continue
function:
python
class AgentExecutor(Chain):
...
def _should_continue(self, iterations: int, time_elapsed: float) -> bool:
if self.max_iterations is not None and iterations >= self.max_iterations:
return False
if (
self.max_execution_time is not None
and time_elapsed >= self.max_execution_time
):
return False
return True
Taking Next Steps
Each iteration calls _take_next_step
, which we can summarize as follows (with adjusted syntax for clarity):
python
class AgentExecutor(Chain):
...
def _take_next_step(
self,
...
) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
next_steps = []
for a in self._iter_next_step(
name_to_tool_map,
...,
inputs,
intermediate_steps,
...
):
next_steps.append(a)
return self._consume_next_step(next_steps)
The _iter_next_step
function is crucial for processing a single task:
python
class AgentExecutor(Chain):
...
def _iter_next_step(
self,
name_to_tool_map: Dict[str, BaseTool],
...,
inputs: Dict[str, str],
intermediate_steps: List[Tuple[AgentAction, str]],
...
) -> Iterator[Union[AgentFinish, AgentAction, AgentStep]]:
...
# Call the LLM chain
output = self.agent.plan(
intermediate_steps,
**inputs,
)
...
# If it's AgentFinish, return directly
if isinstance(output, AgentFinish):
yield output
return
actions: List[AgentAction]
if isinstance(output, AgentAction):
actions = [output]
else:
actions = output
...
# Call each tool function for every AgentAction
for agent_action in actions:
yield self._perform_agent_action(
name_to_tool_map, ..., agent_action, ...
)
Here, self.agent.plan
initiates the call to the LLM chain. As mentioned in the earlier section on the LLM chain, the output parser returns either an AgentFinish
or AgentAction
object (or an array of AgentAction
for multi-action agents).
If self.agent.plan
returns an AgentFinish
, that object is returned directly. If it returns one or more AgentAction
objects, _iter_next_step
calls _perform_agent_action
for each action:
python
class AgentExecutor(Chain):
...
def _perform_agent_action(
self,
name_to_tool_map: Dict[str, BaseTool],
...,
agent_action: AgentAction,
...
) -> AgentStep:
...
if agent_action.tool in name_to_tool_map:
tool = name_to_tool_map[agent_action.tool]
...
observation = tool.run(
agent_action.tool_input,
...
)
...
return AgentStep(action=agent_action, observation=observation)
This method locates the corresponding Tool
instance for AgentAction.tool
, then calls Tool.run
, which eventually invokes Tool.func
, the business function we provided when wrapping the tool.
Once the tool function call completes, _perform_agent_action
wraps the result in a new structure called AgentStep
, which has two key attributes: action
and observation
, storing the executed AgentAction
and the result of the tool function call, respectively.
python
class AgentStep(Serializable):
action: AgentAction
"""The AgentAction that was executed."""
observation: Any
"""The result of the AgentAction."""
...
Returning to _iter_next_step
, after calling _perform_agent_action
, we receive an AgentStep
object containing the result of the tool function call.
Summary of _take_next_step
In _take_next_step
, when _iter_next_step
returns an AgentFinish
, the function immediately returns that object and ends the loop. This means in this case, next_steps = [AgentFinish]
.
If _iter_next_step
returns an array of AgentAction
or AgentAction
objects, _take_next_step
calls _perform_agent_action
for each one, collecting the results into AgentStep
objects, which means now next_steps = [AgentStep, AgentStep,...]
.
After obtaining next_steps
, we pass them to _consume_next_step
, which mainly formats the data further:
python
class AgentExecutor(Chain):
...
def _consume_next_step(
self, values: NextStepOutput
) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
if isinstance(values[-1], AgentFinish):
assert len(values) == 1
return values[-1]
else:
return [
(a.action, a.observation) for a in values if isinstance(a, AgentStep)
]
After one _take_next_step
call, the return value next_step_output
can be one of two types:
AgentFinish
: Indicates that the LLM returned the final result; in this case, we directly call_return
to return the user's result.[(AgentAction, <tool function result>), ...]
: Indicates that the LLM returned actions that need to invoke tools, and the relevant tool functions have been called. In this case, we addnext_step_output
tointermediate_steps
for the next LLM call.
Here’s a flowchart of the entire AgentExecutor
call process to help visualize the interactions:
Here's the translation:
Today’s content comes to an end. I believe everyone can now answer the two questions raised at the beginning:
Q: After the LLM call returns, how does AgentExecutor determine whether to call a tool or to terminate the execution process and return the result to the user?
A: AgentExecutor determines this based on the return data type of the LCEL chain. If it returns AgentFinish, it indicates that the final answer has been obtained; if it returns AgentAction, it indicates that a tool function needs to be called.
Q: How does the AI Agent implement tool calls for models that do not support function calls?
A: By adding a description of the tool and parameter requirements (Tool.description) in the prompt, and defining specific symbols to parse the output to obtain the tool name and parameters, which are then called by AgentExecutor.
Summary
Compared to create_openai_tools_agent, create_react_agent is a more general ReAct Agent with no requirements on the LLM used. Tools are external capabilities that the AI Agent can call during task execution; these can be any business functions. To unify the calling methods of various external systems, LangChain abstracts a Tool layer, allowing any function to be encapsulated into a Tool object that can be called by the AI Agent.
In LangChain, the concept of an Agent is abstracted to represent the LLM calling chain used by the AI Agent when processing tasks, responsible for deciding what action to take next. LangChain supports various types of Agents for creating Agent applications in different usage scenarios. AgentExecutor serves as the runtime for Agent applications, responsible for calling the LLM chain, executing the operations chosen by the LLM (calling tools or returning the final result), and passing the results back to the LLM chain, and so on.