π‘οΈ Guardrails for your LLMs with LangChain4J π‘οΈ
π TL;DR
π‘οΈ Guardrails protect the inputs and outputs of your LLMs
π§ Input guardrails filter dangerous prompts before they reach the LLM
π§ Output guardrails filter inappropriate responses before they are sent back to the user
π€ Using Qwen Guard as a safety classification model
π§βπ» Implementation in Java with LangChain4J
π The complete source code
π A bit of context
In my previous article, we saw how to create agents, orchestrate them and make them work together. That's all well and good, but we didn't talk about a rather important topic at all: security π.
Because, let's be honest, letting a user chat freely with an LLM without any control is a bit like leaving your front door wide open in the middle of the city hoping nobody walks in π . Spoiler: someone will walk in π .
And the problem isn't only on the user's side. Your LLM can also have... let's say... creative responses π. A slightly twisted prompt, an overly enthusiastic model, and you end up with a response you clearly don't want to show to your users.
That's where guardrails come in.
π‘οΈ What is a guardrail?
If we had to sum up the concept in one sentence: a guardrail is a bouncer at a nightclub for your LLM πΊ.
More seriously, guardrails are safety filters that inspect messages before and after they pass through the LLM.
There are two types of guardrails:
- π§ Input guardrails: they check the user's message before it is sent to the LLM. If the message is deemed dangerous, toxic or against your rules, it is blocked on the spot. The LLM is never even called.
- π§ Output guardrails: they check the LLM's response before it is sent back to the user. Even if the prompt was legitimate, the model can sometimes generate inappropriate content. The output guardrail is there to catch that.
βΉοΈ In this article's example, we use Qwen Guard, a model specialized in safety classification, as the guardrail engine. Basically, it's an LLM whose only job is to tell whether a text is safe or unsafe. βΉοΈ
ποΈ The architecture
Visually, here's how the guardrail fits into the architecture:

How to read this diagram:
- π€ the user sends a message
- π‘οΈ the message first goes through the input guardrail (Qwen Guard) which classifies it
- β if the message is safe, it is forwarded to the chat LLM
- β if the message is unsafe, it is blocked and a notification is sent back. The LLM is never called
- π€ the LLM generates its response
- π‘οΈ the response goes through the output guardrail (Qwen Guard) which classifies it in turn
- β if the response is safe, it is sent back to the user
- β if the response is unsafe, it is blocked
As you can see, we have two checkpoints: one on input, one on output.
π How does it work in practice?
Let's now look at the two possible scenarios during an interaction with our chatbot.
β Safe scenario
The nominal case, where everything goes well:

- The user sends a normal message: "What is the capital of France?"
- The input guardrail (Qwen Guard) classifies the message as safe β
- The message is forwarded to the chat LLM
- The LLM responds: "The capital of France is Paris."
- The output guardrail (Qwen Guard) classifies the response as safe β
- The response is sent back to the user
Nothing spectacular, and that's a good thing π.
π Unsafe scenario
Now, the case where someone tries to cause trouble:

- The user sends a problematic message
- The input guardrail (Qwen Guard) classifies the message as unsafe with a category (for example: S1 - Violent Crimes) π
- The message is blocked. The LLM is never called β
- An error message is sent back to the user
The important point here is that the LLM never even sees the dangerous message. We save tokens, and more importantly, we prevent the model from being exposed to content it could misinterpret.
π§βπ» Show me the code!
Alright, enough theory. For the implementation I'm going to use Python, just kidding π. Of course, we're going with Java and LangChain4J.
βΉοΈ For this example, I used JBang to put everything in a single file and make it easily executable. No Quarkus this time, just plain Java with LangChain4J. βΉοΈ
The idea is to have two AI Services:
ChatBot: The conversational chatbot(OSS-GPT, Llama, ...)GuardClassifier:Safety classification (Qwen Guard)
π‘οΈ The safety classifier
Let's start with the simplest part: the safety classifier interface.
interface GuardClassifier {
String classify(String text);
}
Yes, that's it π . This interface wraps the Qwen Guard model via a LangChain4J AI Service. You give it a text, it tells you whether it's safe or unsafe.
The Qwen Guard model behind it:
ChatModel guardModel = OpenAiChatModel.builder()
.apiKey(apiToken)
.baseUrl(baseUrl)
.modelName(guardModelName)
.temperature(0.0) // We want deterministic responses for classification
.logRequests(false)
.logResponses(false)
.build();
GuardClassifier guardClassifier = AiServices.builder(GuardClassifier.class)
.chatModel(guardModel)
.build();
βΉοΈ Note the temperature set to
0.0: for a safety classifier, we want the most deterministic responses possible. We don't want the model to be "creative" in its classification π¬. βΉοΈ
π€ The chatbot
The chatbot interface is just as simple:
interface ChatBot {
String chat(String userMessage);
}
It's a classic AI Service interface.
The twist is that the guardrails are not defined in the interface itself, but injected via the AiServices builder.
π¦ Activating the guardrails
This is where the magic happens.
LangChain4J provides two interfaces: InputGuardrail and OutputGuardrail.
We implement them as anonymous classes directly in the builder (just for simplicity's sake, we could also implement them in separate classes):
ChatBot chatBot = AiServices.builder(ChatBot.class)
.chatModel(chatModel)
.inputGuardrails(new InputGuardrail() {
@Override
public InputGuardrailResult validate(UserMessage userMessage) {
String guardOutput = guardClassifier
.classify(userMessage.singleText())
.strip()
.toLowerCase();
if (guardOutput.contains("unsafe")) {
String category = extractCategory(guardOutput);
return fatal("ποΈ Input blocked by Qwen Guard: "
+ "message classified as unsafe. π\n" + category);
}
IO.println("β
Input approved by Qwen Guard β
");
return success();
}
})
.outputGuardrails(new OutputGuardrail() {
@Override
public OutputGuardrailResult validate(AiMessage responseFromLLM) {
String guardOutput = guardClassifier
.classify(responseFromLLM.text())
.strip()
.toLowerCase();
if (guardOutput.contains("unsafe")) {
String category = extractCategory(guardOutput);
return fatal("π Output blocked by Qwen Guard: "
+ "response classified as unsafe. π\n" + category);
}
IO.println("β
Output approved by Qwen Guard β
");
return success();
}
})
.build();
Let's break down what's happening:
- For each guardrail, we call
guardClassifier.classify()on the text to check - If Qwen Guard's response contains "unsafe", we return a
fatal()result that blocks the chain - Otherwise, we return
success()and processing continues normally - The
extractCategory()method extracts the danger category (from the second line of Qwen Guard's response)
String extractCategory(String guardOutput) {
String[] lines = guardOutput.strip().split("\n");
if (lines.length > 1) {
return lines[1].strip();
}
return "unknown";
}
π The interactive loop
To make all of this usable, we set up a classic conversation loop with exception handling for guardrail blocks:
while (true) {
IO.print("π¬>: ");
String userInput = IO.readln();
if (userInput == null || "exit".equalsIgnoreCase(userInput.strip())) {
IO.println("π Goodbye! π");
break;
}
try {
String response = chatBot.chat(userInput);
IO.println("π€: " + response + "\n");
} catch (InputGuardrailException e) {
IO.println("π [INPUT BLOCKED] π " + e.getMessage() + "\n");
} catch (OutputGuardrailException e) {
IO.println("π [OUTPUT BLOCKED] π " + e.getMessage() + "\n");
}
}
The key point here: when a guardrail blocks a message, LangChain4J throws a specific exception (InputGuardrailException or OutputGuardrailException).
You just need to catch them to inform the user that their message (or the response) was blocked.
π½οΈ See it in action!
π€ In conclusion
Guardrails are a simple yet essential mechanism in any application using LLMs. Thanks to LangChain4J, it remains relatively straightforward.
If you want to go further in the LangChain4J ecosystem, here are my previous articles:
- π€ When Quarkus meets LangChain4j for a first hands-on experience
- π¦ Supercharge your AI with LangChain4j for RAG and streaming
- π AI agents, how does it work? for agents and their orchestration
The complete code is available in this gist π.
If you've made it this far, thank you for reading and if there are any typos don't hesitate to open an issue or PR π.