The Implementation Principle of Agent

Compared to directly using LLMs, AI Agents can perceive changes in the external environment in real-time and make corresponding decisions and actions, greatly enhancing the capability and efficiency of LLMs in solving tasks.

Today, I will guide you through the implementation code related to LangChain from start to finish, helping you better understand the concepts and details involved in constructing and running the AI Agent process. This will be very helpful for building complex Agent applications and debugging Agent code.

Before we start, please consider the following two questions:

After the LLM call returns, how does the AgentExecutor determine whether to call a tool or to terminate the execution process and return the result to the user?
For models that do not support function calls, how does the AI Agent implement tool calls?

I hope you can keep these two questions in mind as we continue our learning.

A More General ReAct Agent

The create_openai_tools_agent used in the last lesson is a relatively simple type of AI Agent: it directly utilizes the model's function calling capabilities to specify which tool function to call. The AgentExecutor also directly determines whether to execute the relevant tool function or return to the user based on whether the message returned by the LLM is a tool call. However, this type of Agent is limited to models that support function calling.

In fact, LangChain provides a more general type of ReAct Agent: create_react_agent, which can be used with various models.

In this lesson, we will use this method to construct a ReAct Agent and explore the implementation principles of AI Agents in LangChain.

Defining the Agent Tool Function

python

# Simulate getting the weather for a specific city
def GetWeather(city):
    return 30

python

from langchain.agents import Tool

# Encapsulate as a Tool
weather_tool = Tool(
    name="search_weather",
    func=GetWeather,
    description="useful for when you need to search for weather",
)

tools = [weather_tool]

Defining the Agent Call Chain

python

from langchain_openai import OpenAI
from langchain import hub
from langchain.agents import create_react_agent

llm = OpenAI()
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, tools, prompt)

Creating the AgentExecutor

python

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Through these three steps, we have created an AI Agent with a custom tool capable of retrieving weather information. Next, let’s try to have it formulate a travel plan based on the weather:

python

print(agent_executor.invoke({"input": "根据北京的天气情况，制定一个出游计划"}))

Entering new AgentExecutor chain...  
I need to find out the weather in Beijing  
Action: search_weather  
Action Input: Beijing  
30 degrees Celsius with a UV index of 9 means I need to bring strong sunscreen  
Final Answer: Based on the weather in Beijing, I should plan for hot and possibly wet weather and bring strong sunscreen.  
Finished chain.  
{'input': '根据北京的天气情况，制定一个出游计划', 'output': 'Based on the weather in Beijing, I should plan for hot and possibly wet weather and bring strong sunscreen.'}

Due to our overly simple prompt, the resulting travel plan is not perfect. However, based on the execution log above, we can see that this AI Agent can indeed call our tool function as needed to assist in task processing. Next, we will learn about the implementation principles of Agents in LangChain in the order of constructing the AI Agent (defining Tool, defining Agent, defining AgentExecutor).

Tool

Tools are external capabilities that an AI Agent can invoke during task execution. We can call retrievers for querying specific domain data or custom functions to execute business logic. To standardize the calling methods for various external systems, LangChain abstracts a Tool layer, allowing any function to be encapsulated as a Tool object that can be called by the AI Agent.

python

class BaseTool(RunnableSerializable[Union[str, Dict], Any]):
    ...
    name: str
    ...

python

class Tool(BaseTool):
    """Tool that takes in function or coroutine directly."""

    description: str = ""
    func: Optional[Callable[..., str]]
    """The function to run when the tool is called."""

The Tool class has three key attributes: name, description, and func. The name is the tool's identifier; the description explains the tool's functionality and how to invoke it, which will be included in the prompt for the LLM to decide when and how to use the tool for further processing; func is the actual function executed by the tool.

For example, in our previously created weather_tool, the prompt would include a section like this:

python

prompt = """
...
Here are the tools available to assist in answering questions:
{"weather_tool": "useful for when you need to search for weather"}
...
"""

Thus, when the LLM infers the need to retrieve weather information, it will return a response like (in pseudocode):

json

{"use_tool": "weather_tool", "input": "beijing"}

The AgentExecutor maintains a mapping structure from tool.name to the Tool instances. Upon parsing the above response from the LLM, it retrieves the tool name and parameters to invoke the corresponding Tool instance and calls tool.func(input). This is the entire usage process of tools. By inserting the tool's description into the prompt, the LLM can return the necessary function and parameters to call when needed, while the AgentExecutor performs the actual invocation.

Agent (LLM Call Chain)

As mentioned earlier, an AI Agent essentially involves multiple calls to an LLM chain. This chain is defined as an Agent in LangChain, responsible for deciding the next action. It typically consists of three parts: the prompt, the LLM model, and the output parser.

Different AI Agents have different LLM chains. Let's analyze the LLM chain of create_react_agent as an example.

Prompt Template

First, let's look at the prompt template used:

plaintext

# hwchase17/react
Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer

Thought: you should always think about what to do

Action: the action to take, should be one of [{tool_names}]

Action Input: the input to the action

Observation: the result of the action

... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer

Final Answer: the final answer to the original input question

Begin!

Question: {input}

Thought:{agent_scratchpad}

The prompt defines several prefixes (Thought/Action/Action Input/Observation/Final Answer). The Thought is appended to the input we submit to the LLM. Thus, each round of submission ends with "Thought:", prompting the LLM to express its reasoning process first.

For instance, if the LLM finds that the current information is insufficient, it may state its reasoning like this:

plaintext

I should search for the weather in Beijing to help with planning the trip
Action: weather_tool
Action Input: beijing

The tool's execution result will be prefixed with Observation in the next round of prompts:

plaintext

# prompt
...
Observation: 30 # Result from weather_tool
...

If the LLM can derive a final result based on the available information, it will similarly first express its reasoning and then provide the final output:

plaintext

30 degrees Celsius is quite hot, I should plan accordingly
Final Answer: the final answer is xxxx

Interaction Example

Let’s examine the interaction with the LLM during the "travel plan" example:

First Round:

plaintext

...
Answer the following questions as best you can. You have access to the following tools:
{"weather_tool": "useful for when you need to search for weather"}
...
Action: the action to take, should be one of [weather_tool]
...
Question: 根据北京的天气情况，制定一个出游计划
Thought:

LLM Response:

plaintext

I should search for the weather in Beijing to help with planning the trip
Action: weather_tool
Action Input: beijing

Second Round: (After AgentExecutor calls the tool to get results)

plaintext

...
Answer the following questions as best you can. You have access to the following tools:
{"weather_tool": "useful for when you need to search for weather"}
...
Action: the action to take, should be one of [weather_tool]
...
Question: 根据北京的天气情况，制定一个出游计划
Thought: I should search for the weather in Beijing to help with planning the trip
Action: weather_tool
Action Input: beijing
Observation: 30
Thought:

LLM Response:

plaintext

30 degrees Celsius is quite hot, I should plan accordingly
Final Answer: Based on the weather in Beijing, I should plan for hot and possibly wet weather and bring strong sunscreen.

Output Parser

The text output from the LLM is not directly processed by the AgentExecutor; instead, it first goes through an output parser, which formats it based on specific prefixes.

The create_react_agent by default uses ReActSingleInputOutputParser as its output parser. Here’s how it’s implemented:

python

def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
    # FINAL_ANSWER_ACTION = "Final Answer"
    includes_answer = FINAL_ANSWER_ACTION in text
    # Regular expression to match tool name and parameters
    regex = (
        r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
    )
    action_match = re.search(regex, text, re.DOTALL)
    
    if action_match:
        action = action_match.group(1).strip()
        action_input = action_match.group(2)
        tool_input = action_input.strip(" ")
        tool_input = tool_input.strip('"')
        return AgentAction(action, tool_input, text)

    elif includes_answer:
        return AgentFinish(
            {"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
        )

If the LLM response contains Action and Action Input, the parser uses a regular expression to extract the content following these prefixes and wraps it into an AgentAction object. This object contains the tool name (tool) and the input parameters (tool_input).

python

class AgentAction(Serializable):
    tool: str
    tool_input: Union[str, dict]
    ...

If the LLM response includes Final Answer, it extracts the content that follows and wraps it into an AgentFinish object, storing the final result as a key-value pair (with the key being output) in return_values.

python

class AgentFinish(Serializable):
    return_values: dict
    ...

It’s important to note that the process of parsing LLM text into AgentAction or AgentFinish objects is not exclusive to ReActSingleInputOutputParser; all Agents’ output parsers will perform similar operations. This conversion layer allows the AgentExecutor to handle only AgentAction and AgentFinish objects, without needing to concern itself with the specific type of Agent being used.

AgentExecutor

We finally arrive at the more complex part of the entire implementation—AgentExecutor. This is the "runtime" of the AI Agent, responsible for calling the LLM chain, executing the actions chosen by the LLM (either invoking tools or returning the final result), and passing the operation results back to the LLM chain, repeating this process.

Constructing the AgentExecutor

First, let’s look at how to instantiate an AgentExecutor:

python

agent_executor = AgentExecutor(agent=agent, tools=tools)

Here, two key parameters are passed in: agent and tools, which represent the LLM chain being used and the list of tools, respectively. Now, let's examine how these parameters are defined in the AgentExecutor class:

python

class AgentExecutor(Chain):
    agent: Union[BaseSingleActionAgent, BaseMultiActionAgent]
    tools: Sequence[BaseTool]

You might notice an issue: the agent we pass in is an LCEL chain, but AgentExecutor.agent requires either a BaseSingleActionAgent or a BaseMultiActionAgent object.

BaseSingleActionAgent: Can call only one tool at a time.
BaseMultiActionAgent: Can call multiple tools at once, similar to the parallel function calls mentioned in the section on "Function Calling: Activating the LLM's Superpowers."

This distinction is facilitated by the root_validator functionality from the Pydantic library, which performs field validation and logical conversions before instance construction.

python

class AgentExecutor(Chain):
    ...
    @root_validator(pre=True)
    def validate_runnable_agent(cls, values: Dict) -> Dict:
        agent = values["agent"]
        if isinstance(agent, Runnable):
            ...
            values["agent"] = RunnableAgent(
                runnable=agent, ...
            )
        return values

To keep the code concise, the branch logic for BaseMultiActionAgent is omitted.

Inside the AgentExecutor, the provided LCEL chain is automatically converted into a RunnableAgent, which is a type of BaseSingleActionAgent.

python

class RunnableAgent(BaseSingleActionAgent):
    runnable: Runnable[dict, Union[AgentAction, AgentFinish]]
    ...
    def plan(
        self,
        intermediate_steps: List[Tuple[AgentAction, str]],
        callbacks: Callbacks = None,
        **kwargs: Any,
    ) -> Union[AgentAction, AgentFinish]:
        inputs = {**kwargs, **{"intermediate_steps": intermediate_steps}}
        ...
        final_output = self.runnable.invoke(inputs, ...)

Ultimately, the AgentExecutor uses the plan method to call the previously constructed LCEL chain. This structure allows the AgentExecutor to efficiently manage the interaction between the LLM and the tools, ensuring that the right actions are executed in response to the LLM's outputs.

Execution of AgentExecutor

Having understood some construction details of the AgentExecutor, let’s explore how it interacts with other components during execution.

Entry Point for Execution

The execution begins with the invoke method. The AgentExecutor does not implement invoke itself but uses the one from its parent class, Chain. The Chain.invoke method eventually calls self._call.

python

class Chain(RunnableSerializable[Dict[str, Any], Dict[str, Any]], ABC):
    ...
    def invoke(
        self,
        input: Dict[str, Any],
        ...
    ) -> Dict[str, Any]:
        ...
        self._call(inputs)

Thus, the entry point for AgentExecutor is the _call method. Here’s a simplified version of its implementation:

python

class AgentExecutor(Chain):
    ...
    def _call(
        self,
        inputs: Dict[str, str],
        ...
    ) -> Dict[str, Any]:
        name_to_tool_map = {tool.name: tool for tool in self.tools}
        intermediate_steps: List[Tuple[AgentAction, str]] = []
        iterations = 0
        time_elapsed = 0.0
        start_time = time.time()
        
        # Check if we should continue
        while self._should_continue(iterations, time_elapsed):
            next_step_output = self._take_next_step(
                name_to_tool_map,
                ...,
                inputs,
                intermediate_steps,
                ...
            )
         
            if isinstance(next_step_output, AgentFinish):
                return self._return(
                    next_step_output, intermediate_steps, ...
                )
            ...
            intermediate_steps.extend(next_step_output)
            iterations += 1
            time_elapsed = time.time() - start_time
            ...
        return self._return(output, intermediate_steps, ...)

For task processing, the AI Agent is not allowed to run indefinitely; it has a maximum number of iterations (max_iterations, default 15) and a maximum execution time (max_execution_time, default None, meaning no limit). This is controlled by the _should_continue function:

python

class AgentExecutor(Chain):
    ...
    def _should_continue(self, iterations: int, time_elapsed: float) -> bool:
        if self.max_iterations is not None and iterations >= self.max_iterations:
            return False
        if (
            self.max_execution_time is not None
            and time_elapsed >= self.max_execution_time
        ):
            return False

        return True

Taking Next Steps

Each iteration calls _take_next_step, which we can summarize as follows (with adjusted syntax for clarity):

python

class AgentExecutor(Chain):
    ...
    def _take_next_step(
        self,
        ...
    ) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
        next_steps = []
        for a in self._iter_next_step(
            name_to_tool_map,
            ...,
            inputs,
            intermediate_steps,
            ...
        ):
            next_steps.append(a)
        
        return self._consume_next_step(next_steps)

The _iter_next_step function is crucial for processing a single task:

python

class AgentExecutor(Chain):
    ...
    def _iter_next_step(
        self,
        name_to_tool_map: Dict[str, BaseTool],
        ...,
        inputs: Dict[str, str],
        intermediate_steps: List[Tuple[AgentAction, str]],
        ...
    ) -> Iterator[Union[AgentFinish, AgentAction, AgentStep]]:
        ...
        # Call the LLM chain
        output = self.agent.plan(
                intermediate_steps,
                **inputs,
            )
        ...
        # If it's AgentFinish, return directly
        if isinstance(output, AgentFinish):
            yield output
            return
        
        actions: List[AgentAction]
        if isinstance(output, AgentAction):
            actions = [output]
        else:
            actions = output
        ...
        # Call each tool function for every AgentAction
        for agent_action in actions:
            yield self._perform_agent_action(
                name_to_tool_map, ..., agent_action, ...
            )

Here, self.agent.plan initiates the call to the LLM chain. As mentioned in the earlier section on the LLM chain, the output parser returns either an AgentFinish or AgentAction object (or an array of AgentAction for multi-action agents).

If self.agent.plan returns an AgentFinish, that object is returned directly. If it returns one or more AgentAction objects, _iter_next_step calls _perform_agent_action for each action:

python

class AgentExecutor(Chain):
    ...
    def _perform_agent_action(
        self,
        name_to_tool_map: Dict[str, BaseTool],
        ...,
        agent_action: AgentAction,
        ...
    ) -> AgentStep:
        ...
        if agent_action.tool in name_to_tool_map:
            tool = name_to_tool_map[agent_action.tool]
            ...
            observation = tool.run(
                agent_action.tool_input,
                ...
            )
        ...
        return AgentStep(action=agent_action, observation=observation)

This method locates the corresponding Tool instance for AgentAction.tool, then calls Tool.run, which eventually invokes Tool.func, the business function we provided when wrapping the tool.

Once the tool function call completes, _perform_agent_action wraps the result in a new structure called AgentStep, which has two key attributes: action and observation, storing the executed AgentAction and the result of the tool function call, respectively.

python

class AgentStep(Serializable):
    action: AgentAction
    """The AgentAction that was executed."""
    observation: Any
    """The result of the AgentAction."""
    ...

Returning to _iter_next_step, after calling _perform_agent_action, we receive an AgentStep object containing the result of the tool function call.

Summary of `_take_next_step`

In _take_next_step, when _iter_next_step returns an AgentFinish, the function immediately returns that object and ends the loop. This means in this case, next_steps = [AgentFinish].

If _iter_next_step returns an array of AgentAction or AgentAction objects, _take_next_step calls _perform_agent_action for each one, collecting the results into AgentStep objects, which means now next_steps = [AgentStep, AgentStep,...].

After obtaining next_steps, we pass them to _consume_next_step, which mainly formats the data further:

python

class AgentExecutor(Chain):
    ...
    def _consume_next_step(
        self, values: NextStepOutput
    ) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
        if isinstance(values[-1], AgentFinish):
            assert len(values) == 1
            return values[-1]
        else:
            return [
                (a.action, a.observation) for a in values if isinstance(a, AgentStep)
            ]

After one _take_next_step call, the return value next_step_output can be one of two types:

AgentFinish: Indicates that the LLM returned the final result; in this case, we directly call _return to return the user's result.
[(AgentAction, <tool function result>), ...]: Indicates that the LLM returned actions that need to invoke tools, and the relevant tool functions have been called. In this case, we add next_step_output to intermediate_steps for the next LLM call.

Here’s a flowchart of the entire AgentExecutor call process to help visualize the interactions:

Here's the translation:

Today’s content comes to an end. I believe everyone can now answer the two questions raised at the beginning:

Q: After the LLM call returns, how does AgentExecutor determine whether to call a tool or to terminate the execution process and return the result to the user?

A: AgentExecutor determines this based on the return data type of the LCEL chain. If it returns AgentFinish, it indicates that the final answer has been obtained; if it returns AgentAction, it indicates that a tool function needs to be called.

Q: How does the AI Agent implement tool calls for models that do not support function calls?

A: By adding a description of the tool and parameter requirements (Tool.description) in the prompt, and defining specific symbols to parse the output to obtain the tool name and parameters, which are then called by AgentExecutor.

Summary

Compared to create_openai_tools_agent, create_react_agent is a more general ReAct Agent with no requirements on the LLM used. Tools are external capabilities that the AI Agent can call during task execution; these can be any business functions. To unify the calling methods of various external systems, LangChain abstracts a Tool layer, allowing any function to be encapsulated into a Tool object that can be called by the AI Agent.

In LangChain, the concept of an Agent is abstracted to represent the LLM calling chain used by the AI Agent when processing tasks, responsible for deciding what action to take next. LangChain supports various types of Agents for creating Agent applications in different usage scenarios. AgentExecutor serves as the runtime for Agent applications, responsible for calling the LLM chain, executing the operations chosen by the LLM (calling tools or returning the final result), and passing the results back to the LLM chain, and so on.

The Implementation Principle of Agent ​

A More General ReAct Agent ​

Defining the Agent Tool Function ​

Defining the Agent Call Chain ​

Creating the AgentExecutor ​

Tool ​

Agent (LLM Call Chain) ​

Prompt Template ​

Interaction Example ​

Output Parser ​

AgentExecutor ​

Constructing the AgentExecutor ​

Execution of AgentExecutor ​

Entry Point for Execution ​

Taking Next Steps ​

Summary of _take_next_step ​

Summary ​

The Implementation Principle of Agent

A More General ReAct Agent

Defining the Agent Tool Function

Defining the Agent Call Chain

Creating the AgentExecutor

Tool

Agent (LLM Call Chain)

Prompt Template

Interaction Example

Output Parser

AgentExecutor

Constructing the AgentExecutor

Execution of AgentExecutor

Entry Point for Execution

Taking Next Steps

Summary of `_take_next_step`

Summary