Appearance
Event Callback
In the programming field, callbacks are a very important concept, especially for front-end developers who are certainly familiar with callback functions. A callback function can be passed as a parameter to other functions and is called at the appropriate time.
Almost all components in LangChain are designed with relevant callback hooks at various execution stages, which can trigger related callback event functions. This allows us to execute custom logic during specific events. This is very useful in many scenarios such as logging, performance monitoring, and audit records, helping us monitor and debug the system.
Today, we will detail the principles and usage of callback handlers in LangChain with examples. Finally, we will demonstrate the application of callback functions in a practical project involving token consumption auditing.
Workflow of Callback Handlers
First, let's take a look at what callback events are available in LangChain. All callback handler functions inherit from BaseCallbackHandler
, so let's directly examine how this class is defined:
python
class BaseCallbackHandler(
LLMManagerMixin,
ChainManagerMixin,
ToolManagerMixin,
RetrieverManagerMixin,
CallbackManagerMixin,
RunManagerMixin,
):
...
LLMManagerMixin
python
class LLMManagerMixin:
"""Mixin for LLM callbacks."""
def on_llm_new_token(...) -> Any:
"""Run on new LLM token. Only available when streaming is enabled."""
def on_llm_end(...) -> Any:
"""Run when LLM ends running."""
def on_llm_error(...) -> Any:
"""Run when LLM errors."""
ChainManagerMixin
python
class ChainManagerMixin:
"""Mixin for chain callbacks."""
def on_chain_end(...) -> Any:
"""Run when chain ends running."""
def on_chain_error(...) -> Any:
"""Run when chain errors."""
def on_agent_action(...) -> Any:
"""Run on agent action."""
def on_agent_finish(...) -> Any:
"""Run on agent end."""
ToolManagerMixin
python
class ToolManagerMixin:
"""Mixin for tool callbacks."""
def on_tool_end(...) -> Any:
"""Run when tool ends running."""
def on_tool_error(...) -> Any:
"""Run when tool errors."""
RetrieverManagerMixin
python
class RetrieverManagerMixin:
"""Mixin for Retriever callbacks."""
def on_retriever_error(...) -> Any:
"""Run when Retriever errors."""
def on_retriever_end(...) -> Any:
"""Run when Retriever ends running."""
CallbackManagerMixin
python
class CallbackManagerMixin:
"""Mixin for callback manager."""
def on_llm_start(...) -> Any:
"""Run when LLM starts running."""
def on_chat_model_start(...) -> Any:
"""Run when a chat model starts running."""
def on_retriever_start(...) -> Any:
"""Run when Retriever starts running."""
def on_chain_start(...) -> Any:
"""Run when chain starts running."""
def on_tool_start(...) -> Any:
"""Run when tool starts running."""
RunManagerMixin
python
class RunManagerMixin:
"""Mixin for run manager."""
def on_text(...) -> Any:
"""Run on arbitrary text."""
def on_retry(...) -> Any:
"""Run on a retry event."""
LangChain categorizes events and assigns them to different abstract classes. These event names are quite clear; for example, on_llm_start
is triggered before the LLM executes, and on_llm_end
is triggered after the LLM returns.
Each component generally has three events: before execution, after execution, and execution exception:
- LLM:
[on_llm_start, on_chat_model_start, on_llm_end, on_llm_error, on_llm_new_token]
- Chain:
[on_chain_start, on_chain_end, on_chain_error]
- Tool:
[on_tool_start, on_tool_end, on_tool_error]
- Retriever:
[on_retriever_start, on_retriever_end, on_retriever_error]
- Agent:
[on_agent_action, on_agent_finish]
In general, there are two ways to specify callback handler functions: specifying them when constructing the component instance and specifying them when making a request.
Passing Callbacks in Constructor
When creating a component instance, you can specify the callback through the callbacks
parameter, for example: LLMChain(callbacks=[handler])
. In this case, the callback function will be effective for all calls of that component instance.
python
from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union
class ConstructorCallbackHandler(BaseCallbackHandler):
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
*,
run_id: UUID,
parent_run_id: Optional[UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Any:
print("ConstructorCallbackHandler on_llm_start")
# Create OpenAI instance with specified callback function; each request will trigger the callback
llm = OpenAI(callbacks=[ConstructorCallbackHandler()])
model.invoke("hello")
# output
# > ConstructorCallbackHandler on_llm_start
Specifying Callbacks at Request Time
You can also specify the callback function at the time of the actual call, for example: invoke(config={'callbacks': [handler]})
. In this case, the callback function will only apply to that single request.
python
from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union
class RequestCallbackHandler(BaseCallbackHandler):
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
*,
run_id: UUID,
parent_run_id: Optional[UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Any:
print("RequestCallbackHandler on_llm_start")
model = OpenAI()
model.invoke("hello", config={"callbacks": [RequestCallbackHandler()]})
# output
# > RequestCallbackHandler on_llm_start
It is important to note that if callback handler functions are specified in both places for an instance, both functions will be triggered.
python
llm = OpenAI(callbacks=[ConstructorCallbackHandler()])
llm.invoke("hello", config={"callbacks": [RequestCallbackHandler()]})
# output
# > RequestCallbackHandler on_llm_start
# > ConstructorCallbackHandler on_llm_start
In actual system development, we might want to add user metadata such as uid or IP to the logs, which is very helpful for troubleshooting.
LangChain supports specifying tags
when specifying callbacks; these tags will be passed to the callback method's tags
parameter. For example, with RequestCallbackHandler
:
python
from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union
class RequestCallbackHandler(BaseCallbackHandler):
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
*,
run_id: UUID,
parent_run_id: Optional[UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Any:
print(tags)
print("RequestCallbackHandler on_llm_start")
model = OpenAI()
model.invoke("hello", config={"callbacks": [RequestCallbackHandler()], "tags": ["request_tag"]})
# output
# > ['request_tag']
# > RequestCallbackHandler on_llm_start
Callback Handlers in LCEL
The concept of callbacks and their integration in components is relatively straightforward. However, when it comes to integrating callback functions into LCEL links, the situation becomes more complex. In our practice, integrating callback functions into LCEL links is a very common approach.
Before we start the introduction, let me give you a summary: Each stage in an LCEL chain can be viewed as a separate execution module. When the module itself is one of the components mentioned earlier (Tool, LLM, Agent, Retriever), it will trigger the corresponding component's callback events. When the module is a Runnable object designed by LangChain for LCEL, such as RunnableParallel
or RunnableLambda
, it will trigger on_chain_xxx
related callback events. I hope you can mentally repeat this three times before continuing to read.
Below, we will illustrate and understand the logic of callback functions in LCEL through several examples, from simple to complex.
Example 1: RunnableSequence
Let's take a look at the simplest LCEL chain and add a callback function when invoking that chain.
python
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAI
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Sequence, TypeVar, Union
import json
class CustomCallbackHandler(BaseCallbackHandler):
def on_chain_start(
self,
serialized: Dict[str, Any],
inputs: Dict[str, Any],
*,
run_id: UUID,
parent_run_id: Optional[UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Any:
print("on_chain_start")
print("id: " + json.dumps(serialized['id']))
print("inputs: " + json.dumps(inputs))
print("------------------")
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
*,
run_id: UUID,
parent_run_id: Optional[UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Any:
print("on_llm_start")
print("id: " + json.dumps(serialized['id']))
print("------------------")
prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])
model = OpenAI()
chain = prompt | model
response = chain.invoke({"animal": "bears"}, config={"callbacks": [CustomCallbackHandler()]})
First, we constructed an LCEL call chain that consists of a single prompt template and an LLM model instance.
When actually invoking invoke
, we passed in the custom callback handler CustomCallbackHandler
. This callback handler implements two callback events: on_chain_start
and on_llm_start
. Both of these callback functions have a serialized
parameter that contains some metadata from the function call, and here we print out serialized['id']
for analysis. The on_chain_start
also has an inputs
parameter, which represents the input parameters received by the link, and we print that as well.
Here are the execution results:
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "prompts", "chat", "ChatPromptTemplate"]
inputs: {"animal": "bears"}
------------------
on_llm_start
id: ["langchain", "llms", "openai", "OpenAI"]
------------------
I wonder if you were confused when you first saw this result. Where does RunnableSequence
come from? Why does RunnableSequence
and ChatPromptTemplate
trigger on_chain_start
, while OpenAI
triggers on_llm_start
?
Examining the Source Code of RunnableSequence.invoke
Let's take a look at the source code for RunnableSequence.invoke
:
python
class RunnableSequence(RunnableSerializable[Input, Output]):
def invoke(self, ...) -> Output:
...
run_manager = callback_manager.on_chain_start(...)
...
As you can see, RunnableSequence
triggers on_chain_start
once.
Using the same analysis method, we can find in the source code that ChatPromptTemplate
also triggers on_chain_start
, while OpenAI
triggers on_llm_start
.
Example 2: RunnableParallel
In the following example, we will explore the invocation of callback functions in a complex LCEL chain with parallel execution (RunnableParallel
).
python
# The definition of CustomCallbackHandler remains unchanged
...
def get_num(animal):
if animal == "bears":
return 1
else:
return 2
prompt = ChatPromptTemplate.from_messages(["Tell me {num} joke about {animal}"])
model = OpenAI()
chain = {"num": get_num, "animal": RunnablePassthrough()} | prompt | model
response = chain.invoke({"animal": "bears"}, config={"callbacks": [CustomCallbackHandler()]})
First, let’s look directly at the output:
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableParallel"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain_core", "runnables", "base", "RunnableLambda"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnablePassthrough"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "prompts", "chat", "ChatPromptTemplate"]
inputs: {"num": 2, "animal": {"animal": "bears"}}
------------------
on_llm_start
id: ["langchain", "llms", "openai", "OpenAI"]
------------------
In the output above, I believe everyone understands the first (RunnableSequence), the fifth (ChatPromptTemplate), and the sixth (OpenAI). Let's focus on the remaining three.
First, let's carefully analyze our code. In this example, the prompt template includes a num
placeholder variable, which is obtained through the get_num
function.
Parallel Execution in the Constructed LCEL Chain
In the constructed LCEL chain, {"num": get_num, "animal": RunnablePassthrough()}
indicates parallel execution, which transforms into RunnableParallel({"num": RunnableLambda(get_num), "animal": RunnablePassthrough()})
. The execution flow diagram of this chain is as follows:
Now that we know where RunnableParallel
, RunnableLambda
, and RunnablePassthrough
in the output come from, let's examine their invoke
functions to see which event functions are triggered:
RunnableParallel
python
class RunnableParallel(RunnableSerializable[Input, Dict[str, Any]]):
...
def invoke(...):
...
run_manager = callback_manager.on_chain_start(...)
...
RunnableLambda
python
class RunnableLambda(Runnable[Input, Output]):
...
def invoke(...):
...
return self._call_with_config(...)
...
class Runnable(Generic[Input, Output], ABC):
...
def _call_with_config(...):
...
run_manager = callback_manager.on_chain_start(...)
...
RunnablePassthrough
python
class RunnablePassthrough(RunnableSerializable[Other, Other]):
...
def invoke(...):
...
# _call_with_config is from Runnable._call_with_config
return self._call_with_config(...)
From the above three code snippets, we can see that these types of Runnable
all trigger on_chain_xxx
type callback events.
Example 3 - Nested Subchains
Now, let's increase the complexity by constructing a more intricate LCEL chain and adding callback functions to see if you can analyze and explain the callback invocation behavior.
python
# CustomCallbackHandler definition remains unchanged
...
def get_num(animal):
if animal == "bears":
return 1
else:
return 2
def add_one(num):
return num + 1
prompt = ChatPromptTemplate.from_messages(["Tell me {num} joke about {animal}"])
model = OpenAI()
chain = {"num": RunnableLambda(get_num) | RunnableLambda(add_one), "animal": RunnablePassthrough()} | prompt | model
response = chain.invoke({"animal": "bears"}, config={"callbacks": [CustomCallbackHandler()]})
In this example, we further added a step to obtain num
: after get_num
, it needs to go through add_one
to add 1 to the original value. Let's look at the output:
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableParallel"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnablePassthrough"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain", "schema", "runnable", "RunnableSequence"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain_core", "runnables", "base", "RunnableLambda"]
inputs: {"animal": "bears"}
------------------
on_chain_start
id: ["langchain_core", "runnables", "base", "RunnableLambda"]
inputs: 2
------------------
on_chain_start
id: ["langchain", "prompts", "chat", "ChatPromptTemplate"]
inputs: {"num": 3, "animal": {"animal": "bears"}}
------------------
on_llm_start
id: ["langchain", "llms", "openai", "OpenAI"]
------------------
Take a moment to pause and try to analyze and explain the above content. If you can understand these results, congratulations, you have grasped the mechanism of callback functions in the LCEL chain.
Now, here’s the answer.
Compared to Example 2, the difference in the entire LCEL chain is that "num": get_num
has been replaced with "num": RunnableLambda(get_num) | RunnableLambda(add_one)
. This change, using the pipe (|
), transforms the original RunnableLambda
into a RunnableSequence
object. The execution flow diagram of the entire chain is as follows:
With the previous introduction of the types of callback events triggered by these Runnable
objects, it’s not difficult to derive the above results.
Practical Application: Token Consumption Audit
In some LLM applications, tracking user token consumption for billing purposes is a common feature. For example, OpenAI's forwarding service typically charges based on token usage.
Using the on_llm_new_token
callback event provided by LangChain allows us to easily integrate audit statistics into our LCEL call chain.
python
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAI
from typing import Any, Optional
class CustomCallbackHandler(BaseCallbackHandler):
def __init__(self):
self.token_used = 0
def on_llm_new_token(
self,
token: str,
**kwargs: Any,
) -> Any:
self.token_used += 1
prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])
model = OpenAI(streaming=True)
chain = prompt | model
callbackHandler = CustomCallbackHandler()
response = chain.invoke({"animal": "bears"}, config={"callbacks": [callbackHandler]})
print(callbackHandler.token_used)
The code is straightforward; it increments the count by 1 every time a new token is generated. There are two important points to note when using on_llm_new_token
:
- The LLM must support streaming output, and
streaming=True
must be set. - This event is triggered upon the output of a new token, so it only counts the number of output tokens. For a complete audit, you would need to record both input and output tokens, which can be done using
on_llm_start
oron_chat_model_start
for comprehensive statistics.
Summary
The concept of callbacks is simple yet important. By implementing callback functions for specific events, we can gain better insights into the execution of LLM applications, aiding in system monitoring and debugging.
In LangChain, each component is designed with corresponding callback hooks at key points. By inheriting BaseCallbackHandler
, we can implement the relevant event methods, allowing the LLM to execute our custom logic at designated stages. After defining our callback class, we can generally specify the callback function either when constructing the component instance or at the time of the request.
Each stage of the LCEL chain can be viewed as a separate execution module. When the module is one of the components mentioned earlier (Tool, LLM, Agent, Retriever), it will trigger the corresponding callback event. For modules like RunnableParallel
or RunnableLambda
, designed for LCEL in LangChain, the on_chain_xxx
related callback events will be triggered.
LangChain is iterating rapidly, so to adapt better to version changes and understand the callback handling mechanism, it's advisable to learn how to view the relevant source code implementations. In this section, we've walked through the callback source code for several objects; you can extrapolate this knowledge to explore where other event functions are invoked.