Event Callback

In the programming field, callbacks are a very important concept, especially for front-end developers who are certainly familiar with callback functions. A callback function can be passed as a parameter to other functions and is called at the appropriate time.

Almost all components in LangChain are designed with relevant callback hooks at various execution stages, which can trigger related callback event functions. This allows us to execute custom logic during specific events. This is very useful in many scenarios such as logging, performance monitoring, and audit records, helping us monitor and debug the system.

Today, we will detail the principles and usage of callback handlers in LangChain with examples. Finally, we will demonstrate the application of callback functions in a practical project involving token consumption auditing.

Workflow of Callback Handlers

First, let's take a look at what callback events are available in LangChain. All callback handler functions inherit from BaseCallbackHandler, so let's directly examine how this class is defined:

python

class BaseCallbackHandler(
    LLMManagerMixin,
    ChainManagerMixin,
    ToolManagerMixin,
    RetrieverManagerMixin,
    CallbackManagerMixin,
    RunManagerMixin,
):
    ...

LLMManagerMixin

python

class LLMManagerMixin:
    """Mixin for LLM callbacks."""

    def on_llm_new_token(...) -> Any:
        """Run on new LLM token. Only available when streaming is enabled."""
    def on_llm_end(...) -> Any:
        """Run when LLM ends running."""
    def on_llm_error(...) -> Any:
        """Run when LLM errors."""

ChainManagerMixin

python

class ChainManagerMixin:
    """Mixin for chain callbacks."""

    def on_chain_end(...) -> Any:
        """Run when chain ends running."""

    def on_chain_error(...) -> Any:
        """Run when chain errors."""

    def on_agent_action(...) -> Any:
        """Run on agent action."""

    def on_agent_finish(...) -> Any:
        """Run on agent end."""

ToolManagerMixin

python

class ToolManagerMixin:
    """Mixin for tool callbacks."""

    def on_tool_end(...) -> Any:
        """Run when tool ends running."""

    def on_tool_error(...) -> Any:
        """Run when tool errors."""

RetrieverManagerMixin

python

class RetrieverManagerMixin:
    """Mixin for Retriever callbacks."""

    def on_retriever_error(...) -> Any:
        """Run when Retriever errors."""

    def on_retriever_end(...) -> Any:
        """Run when Retriever ends running."""

CallbackManagerMixin

python

class CallbackManagerMixin:
    """Mixin for callback manager."""

    def on_llm_start(...) -> Any:
        """Run when LLM starts running."""

    def on_chat_model_start(...) -> Any:
        """Run when a chat model starts running."""

    def on_retriever_start(...) -> Any:
        """Run when Retriever starts running."""

    def on_chain_start(...) -> Any:
        """Run when chain starts running."""

    def on_tool_start(...) -> Any:
        """Run when tool starts running."""

RunManagerMixin

python

class RunManagerMixin:
    """Mixin for run manager."""

    def on_text(...) -> Any:
        """Run on arbitrary text."""

    def on_retry(...) -> Any:
        """Run on a retry event."""

LangChain categorizes events and assigns them to different abstract classes. These event names are quite clear; for example, on_llm_start is triggered before the LLM executes, and on_llm_end is triggered after the LLM returns.

Each component generally has three events: before execution, after execution, and execution exception:

LLM: [on_llm_start, on_chat_model_start, on_llm_end, on_llm_error, on_llm_new_token]
Chain: [on_chain_start, on_chain_end, on_chain_error]
Tool: [on_tool_start, on_tool_end, on_tool_error]
Retriever: [on_retriever_start, on_retriever_end, on_retriever_error]
Agent: [on_agent_action, on_agent_finish]

In general, there are two ways to specify callback handler functions: specifying them when constructing the component instance and specifying them when making a request.

Passing Callbacks in Constructor

When creating a component instance, you can specify the callback through the callbacks parameter, for example: LLMChain(callbacks=[handler]). In this case, the callback function will be effective for all calls of that component instance.

python

from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union

class ConstructorCallbackHandler(BaseCallbackHandler):
    def on_llm_start(
        self,
        serialized: Dict[str, Any],
        prompts: List[str],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> Any:
        print("ConstructorCallbackHandler on_llm_start")

# Create OpenAI instance with specified callback function; each request will trigger the callback
llm = OpenAI(callbacks=[ConstructorCallbackHandler()])
model.invoke("hello")

# output
# > ConstructorCallbackHandler on_llm_start

Specifying Callbacks at Request Time

You can also specify the callback function at the time of the actual call, for example: invoke(config={'callbacks': [handler]}). In this case, the callback function will only apply to that single request.

python

from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union

class RequestCallbackHandler(BaseCallbackHandler):
    def on_llm_start(
        self,
        serialized: Dict[str, Any],
        prompts: List[str],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> Any:
        print("RequestCallbackHandler on_llm_start")

model = OpenAI()
model.invoke("hello", config={"callbacks": [RequestCallbackHandler()]})

# output
# > RequestCallbackHandler on_llm_start

It is important to note that if callback handler functions are specified in both places for an instance, both functions will be triggered.

python

llm = OpenAI(callbacks=[ConstructorCallbackHandler()])
llm.invoke("hello", config={"callbacks": [RequestCallbackHandler()]})

# output
# > RequestCallbackHandler on_llm_start
# > ConstructorCallbackHandler on_llm_start

In actual system development, we might want to add user metadata such as uid or IP to the logs, which is very helpful for troubleshooting.

LangChain supports specifying tags when specifying callbacks; these tags will be passed to the callback method's tags parameter. For example, with RequestCallbackHandler:

python

from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union

class RequestCallbackHandler(BaseCallbackHandler):
    def on_llm_start(
        self,
        serialized: Dict[str, Any],
        prompts: List[str],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> Any:
        print(tags)
        print("RequestCallbackHandler on_llm_start")

model = OpenAI()
model.invoke("hello", config={"callbacks": [RequestCallbackHandler()], "tags": ["request_tag"]})

# output
# > ['request_tag']
# > RequestCallbackHandler on_llm_start

Callback Handlers in LCEL

The concept of callbacks and their integration in components is relatively straightforward. However, when it comes to integrating callback functions into LCEL links, the situation becomes more complex. In our practice, integrating callback functions into LCEL links is a very common approach.

Before we start the introduction, let me give you a summary: Each stage in an LCEL chain can be viewed as a separate execution module. When the module itself is one of the components mentioned earlier (Tool, LLM, Agent, Retriever), it will trigger the corresponding component's callback events. When the module is a Runnable object designed by LangChain for LCEL, such as RunnableParallel or RunnableLambda, it will trigger on_chain_xxx related callback events. I hope you can mentally repeat this three times before continuing to read.

Below, we will illustrate and understand the logic of callback functions in LCEL through several examples, from simple to complex.

Example 1: RunnableSequence

Let's take a look at the simplest LCEL chain and add a callback function when invoking that chain.

python

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union
import json

class CustomCallbackHandler(BaseCallbackHandler):
    def on_chain_start(
        self,
        serialized: Dict[str, Any],
        inputs: Dict[str, Any],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> Any:
        print("on_chain_start")
        print("id: " + json.dumps(serialized['id']))
        print("inputs: " + json.dumps(inputs))
        print("------------------")
        
    def on_llm_start(
        self,
        serialized: Dict[str, Any],
        prompts: List[str],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> Any:
        print("on_llm_start")
        print("id: " + json.dumps(serialized['id']))
        print("------------------")
    
prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])
model = OpenAI()
chain = prompt | model
response = chain.invoke({"animal": "bears"}, config={"callbacks": [CustomCallbackHandler()]})

First, we constructed an LCEL call chain that consists of a single prompt template and an LLM model instance.

When actually invoking invoke, we passed in the custom callback handler CustomCallbackHandler. This callback handler implements two callback events: on_chain_start and on_llm_start. Both of these callback functions have a serialized parameter that contains some metadata from the function call, and here we print out serialized['id'] for analysis. The on_chain_start also has an inputs parameter, which represents the input parameters received by the link, and we print that as well.

Here are the execution results:

on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "prompts", "chat", "ChatPromptTemplate"]
inputs: {"animal": "bears"}
------------------
on_llm_start
id: ["langchain", "llms", "openai", "OpenAI"]
------------------

I wonder if you were confused when you first saw this result. Where does RunnableSequence come from? Why does RunnableSequence and ChatPromptTemplate trigger on_chain_start, while OpenAI triggers on_llm_start?

Examining the Source Code of `RunnableSequence.invoke`

Let's take a look at the source code for RunnableSequence.invoke:

python

class RunnableSequence(RunnableSerializable[Input, Output]):
    def invoke(self, ...) -> Output:
        ...
        run_manager = callback_manager.on_chain_start(...)
        ...

As you can see, RunnableSequence triggers on_chain_start once.

Using the same analysis method, we can find in the source code that ChatPromptTemplate also triggers on_chain_start, while OpenAI triggers on_llm_start.

Example 2: RunnableParallel

In the following example, we will explore the invocation of callback functions in a complex LCEL chain with parallel execution (RunnableParallel).

python

# The definition of CustomCallbackHandler remains unchanged
...

def get_num(animal):
    if animal == "bears":
        return 1
    else:
        return 2

prompt = ChatPromptTemplate.from_messages(["Tell me {num} joke about {animal}"])
model = OpenAI()
chain = {"num": get_num, "animal": RunnablePassthrough()} | prompt | model
response = chain.invoke({"animal": "bears"}, config={"callbacks": [CustomCallbackHandler()]})

First, let’s look directly at the output:

on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableParallel"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain_core", "runnables", "base", "RunnableLambda"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnablePassthrough"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "prompts", "chat", "ChatPromptTemplate"]
inputs: {"num": 2, "animal": {"animal": "bears"}}
------------------
on_llm_start
id: ["langchain", "llms", "openai", "OpenAI"]
------------------

In the output above, I believe everyone understands the first (RunnableSequence), the fifth (ChatPromptTemplate), and the sixth (OpenAI). Let's focus on the remaining three.

First, let's carefully analyze our code. In this example, the prompt template includes a num placeholder variable, which is obtained through the get_num function.

Parallel Execution in the Constructed LCEL Chain

In the constructed LCEL chain, {"num": get_num, "animal": RunnablePassthrough()} indicates parallel execution, which transforms into RunnableParallel({"num": RunnableLambda(get_num), "animal": RunnablePassthrough()}). The execution flow diagram of this chain is as follows:

Now that we know where RunnableParallel, RunnableLambda, and RunnablePassthrough in the output come from, let's examine their invoke functions to see which event functions are triggered:

RunnableParallel

python

class RunnableParallel(RunnableSerializable[Input, Dict[str, Any]]):
    ...
    def invoke(...):
        ...
        run_manager = callback_manager.on_chain_start(...)
        ...

RunnableLambda

python

class RunnableLambda(Runnable[Input, Output]):
    ...
    def invoke(...):
        ...
        return self._call_with_config(...)
        ...

class Runnable(Generic[Input, Output], ABC):
    ...
    def _call_with_config(...):
        ...
        run_manager = callback_manager.on_chain_start(...)
        ...

RunnablePassthrough

python

class RunnablePassthrough(RunnableSerializable[Other, Other]):
    ...
    def invoke(...):
        ...
        # _call_with_config is from Runnable._call_with_config
        return self._call_with_config(...)

From the above three code snippets, we can see that these types of Runnable all trigger on_chain_xxx type callback events.

Example 3 - Nested Subchains

Now, let's increase the complexity by constructing a more intricate LCEL chain and adding callback functions to see if you can analyze and explain the callback invocation behavior.

python

# CustomCallbackHandler definition remains unchanged
...

def get_num(animal):
    if animal == "bears":
        return 1
    else:
        return 2

def add_one(num):
    return num + 1

prompt = ChatPromptTemplate.from_messages(["Tell me {num} joke about {animal}"])
model = OpenAI()
chain = {"num": RunnableLambda(get_num) | RunnableLambda(add_one), "animal": RunnablePassthrough()} | prompt | model
response = chain.invoke({"animal": "bears"}, config={"callbacks": [CustomCallbackHandler()]})

In this example, we further added a step to obtain num: after get_num, it needs to go through add_one to add 1 to the original value. Let's look at the output:

on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableParallel"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnablePassthrough"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain_core", "runnables", "base", "RunnableLambda"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain_core", "runnables", "base", "RunnableLambda"]
inputs: 2
------------------
on_chain_start
id: ["langchain", "prompts", "chat", "ChatPromptTemplate"]
inputs: {"num": 3, "animal": {"animal": "bears"}}
------------------
on_llm_start
id: ["langchain", "llms", "openai", "OpenAI"]
------------------

Take a moment to pause and try to analyze and explain the above content. If you can understand these results, congratulations, you have grasped the mechanism of callback functions in the LCEL chain.

Now, here’s the answer.

Compared to Example 2, the difference in the entire LCEL chain is that "num": get_num has been replaced with "num": RunnableLambda(get_num) | RunnableLambda(add_one). This change, using the pipe (|), transforms the original RunnableLambda into a RunnableSequence object. The execution flow diagram of the entire chain is as follows:

With the previous introduction of the types of callback events triggered by these Runnable objects, it’s not difficult to derive the above results.

Practical Application: Token Consumption Audit

In some LLM applications, tracking user token consumption for billing purposes is a common feature. For example, OpenAI's forwarding service typically charges based on token usage.

Using the on_llm_new_token callback event provided by LangChain allows us to easily integrate audit statistics into our LCEL call chain.

python

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAI
from typing import Any, Optional


class CustomCallbackHandler(BaseCallbackHandler):
    def __init__(self):
        self.token_used = 0
    
    def on_llm_new_token(
        self,
        token: str,
        **kwargs: Any,
    ) -> Any:
        self.token_used += 1

prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])
model = OpenAI(streaming=True)
chain = prompt | model
callbackHandler = CustomCallbackHandler()
response = chain.invoke({"animal": "bears"}, config={"callbacks": [callbackHandler]})
print(callbackHandler.token_used)

The code is straightforward; it increments the count by 1 every time a new token is generated. There are two important points to note when using on_llm_new_token:

The LLM must support streaming output, and streaming=True must be set.
This event is triggered upon the output of a new token, so it only counts the number of output tokens. For a complete audit, you would need to record both input and output tokens, which can be done using on_llm_start or on_chat_model_start for comprehensive statistics.

Summary

The concept of callbacks is simple yet important. By implementing callback functions for specific events, we can gain better insights into the execution of LLM applications, aiding in system monitoring and debugging.

In LangChain, each component is designed with corresponding callback hooks at key points. By inheriting BaseCallbackHandler, we can implement the relevant event methods, allowing the LLM to execute our custom logic at designated stages. After defining our callback class, we can generally specify the callback function either when constructing the component instance or at the time of the request.

Each stage of the LCEL chain can be viewed as a separate execution module. When the module is one of the components mentioned earlier (Tool, LLM, Agent, Retriever), it will trigger the corresponding callback event. For modules like RunnableParallel or RunnableLambda, designed for LCEL in LangChain, the on_chain_xxx related callback events will be triggered.

LangChain is iterating rapidly, so to adapt better to version changes and understand the callback handling mechanism, it's advisable to learn how to view the relevant source code implementations. In this section, we've walked through the callback source code for several objects; you can extrapolate this knowledge to explore where other event functions are invoked.

Event Callback ​

Workflow of Callback Handlers ​

LLMManagerMixin ​

ChainManagerMixin ​

ToolManagerMixin ​

RetrieverManagerMixin ​

CallbackManagerMixin ​

RunManagerMixin ​

Passing Callbacks in Constructor ​

Specifying Callbacks at Request Time ​

Callback Handlers in LCEL ​

Example 1: RunnableSequence ​

Examining the Source Code of RunnableSequence.invoke ​

Example 2: RunnableParallel ​

Parallel Execution in the Constructed LCEL Chain ​

RunnableParallel ​

RunnableLambda ​

RunnablePassthrough ​

Example 3 - Nested Subchains ​

Practical Application: Token Consumption Audit ​

Summary ​

Event Callback

Workflow of Callback Handlers

LLMManagerMixin

ChainManagerMixin

ToolManagerMixin

RetrieverManagerMixin

CallbackManagerMixin

RunManagerMixin

Passing Callbacks in Constructor

Specifying Callbacks at Request Time

Callback Handlers in LCEL

Example 1: RunnableSequence

Examining the Source Code of `RunnableSequence.invoke`

Example 2: RunnableParallel

Parallel Execution in the Constructed LCEL Chain

RunnableParallel

RunnableLambda

RunnablePassthrough

Example 3 - Nested Subchains

Practical Application: Token Consumption Audit

Summary