Building a ReAct Agent from Scratch: MockLLM vs Real LLM
NOTE
This post walks through Assignment 3 from the Engineering GenAI course, where we build a ReAct Agent that acts as a Dungeon Master. Weโll compare two implementations: MockLLM (for testing) and Real LLM (Groq, for production).
Introduction: What is an AI Agent?
Traditional chatbots simply respond to inputs. AI Agents are differentโthey can:
- Reason about what to do
- Act by using external tools
- Observe the results and incorporate them into their reasoning
The ReAct pattern (Reason + Act) formalizes this into a structured loop:
Thought: <reasoning about what to do>
Action: <tool to call>
Observation: <result from tool>
... (repeat as needed)
Final Answer: <response to user>In this tutorial, weโll build a Dungeon Master agent that uses a dice-rolling tool to determine success/failure of player actions. Weโll implement it two ways to understand the difference between testing and production approaches.
Part 1: The Foundation (Same for Both Approaches)
Both MockLLM and RealLLM share the same foundation: tools, system prompt, parser, and agent loop structure.
The Tool: roll_dice
Every agent needs toolsโfunctions it can call to interact with the world.
import random
def roll_dice(sides=20):
"""
Simulates rolling a die.
Args:
sides (int): The number of sides on the die (default 20).
Returns:
int: A random number between 1 and \"sides\".
"""
return random.randint(1, sides)Why it matters: When roll_dice() returns 17, thatโs a real, deterministic resultโnot something the LLM imagined.
The System Prompt
The LLM needs explicit instructions on how to behave:
SYSTEM_PROMPT = \"\"\"
You are a text adventure Dungeon Master. Your goal is to narrate an
exciting, immersive story for the player.
## Your Capabilities
You have access to the following tool:
- **roll_dice**: Rolls a 20-sided die and returns a number between 1 and 20.
## When to Use Tools
When the player attempts an action with an UNCERTAIN outcome
(attacking, jumping, persuading, etc.), you MUST:
1. First, explain your reasoning using \"Thought:\"
2. Then, call the tool using \"Action: roll_dice\"
Example:Thought: The player is trying to attack the goblin. This requires a roll to determine success. Action: roll_dice
## How to Interpret Results
After receiving the Observation (dice result), narrate the outcome:
- **1-5 (Critical Fail):** Action fails spectacularly
- **6-10 (Fail):** Action fails, but not catastrophically
- **11-15 (Success):** Action succeeds
- **16-20 (Critical Hit):** Action succeeds spectacularly
## When NOT to Use Tools
For simple conversation, exploration with no risk, or describing the
environment, respond directly without using tools.
Stay in character as a dramatic, engaging Dungeon Master!
\"\"\"
The Parser
Detects if the LLM is requesting a tool:
def parse_response(llm_output):
"""Returns True if LLM requested roll_dice, False otherwise."""
return \"action: roll_dice\" in llm_output.lower()The Agent Loop
The generic structure that works with any LLM:
def run_turn(user_input, history=[]):
"""Executes a single turn of the agent."""
# Build context
context = \"\\n\".join(history) + f\"\\nUser: {user_input}\\n\"
# First LLM call
response = llm.generate(context)
print(f\"๐ค Agent: {response}\")
# Check for tool use
if parse_response(response):
# Execute tool
dice_result = roll_dice()
print(f\"๐ฒ [Tool Executed] roll_dice() returned: {dice_result}\")
# Add observation
observation = f\"Observation: {dice_result}\"
context = context + f\"\\nAgent: {response}\\n{observation}\\n\"
# Second LLM call with result
response = llm.generate(context)
print(f\"๐ค Agent (Final): {response}\")
return responseKey point: The loop doesnโt care whether itโs talking to MockLLM or RealLLMโit just calls llm.generate().
Part 2: The Two Approaches - MockLLM vs RealLLM
Now hereโs where things diverge. Letโs compare how each approach implements the llm object and why they behave differently.
Approach 1: MockLLM (Testing Implementation)
What It Is
A fake LLM that uses pattern matching to simulate responses. No API required, no costs, purely for testing your agent loop logic.
Complete Code
class MockLLM:
\"\"\"A fake LLM that responds predictably for testing logic.\"\"\"
def generate(self, prompt):
# If the prompt shows we just rolled a die (Observation), narrate
if \"Observation:\" in prompt:
return \"The die rolls... a critical hit! The goblin falls.\"
# If the user says \"attack\", the LLM decides to roll
if \"attack\" in prompt.lower():
return \"Thought: The player is attacking. I need to check if they hit. Action: roll_dice\"
# Default conversation
return \"What do you want to do next?\"
# Initialize
llm = MockLLM()How It Works
MockLLM uses simple pattern matching:
- Checks if
\"attack\"is in the input โ returns action request - Checks if
\"Observation:\"is in the prompt โ returns hardcoded narrative - Otherwise โ returns generic response
No actual reasoning, just if/else logic.
Execution Flow: โI attack the goblin!โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ User: \"I attack the goblin!\" โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ llm.generate(\"User: I attack...\") โ
โ Checks: \"attack\" in prompt.lower()? โ
โ โ YES โ
โ Returns: \"Thought:... Action: roll_dice\"โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ parse_response() โ True โ
โ Execute: dice_result = 3 โ โ Let's say we roll a 3
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ llm.generate(\"...Observation: 3\") โ
โ Checks: \"Observation:\" in prompt? โ
โ โ YES โ
โ Returns: \"...critical hit! Goblin falls.\"โ โ Hardcoded! Ignores the 3!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโThe Problem
MockLLM doesnโt read the dice value. Whether the observation is 3 (critical fail) or 20 (critical hit), it always returns:
\"The die rolls... a critical hit! The goblin falls.\"It just checks for the keyword \"Observation:\" and returns a fixed string.
Approach 2: Real LLM (Production Implementation with Groq)
What It Is
A real neural network (Llama 3.3, 70 billion parameters) that:
- Actually reads and understands the conversation
- Follows the system prompt rules dynamically
- Generates contextual responses based on dice results
Complete Code
from groq import Groq
import os
class RealLLM:
\"\"\"A real LLM using Groq API for fast inference\"\"\"
def __init__(self, api_key, model=\"llama-3.3-70b-versatile\"):
self.client = Groq(api_key=api_key)
self.model = model
# Initialize conversation with system prompt
self.messages = [{\"role\": \"system\", \"content\": SYSTEM_PROMPT}]
def generate(self, prompt):
\"\"\"Generate a response from Groq.\"\"\"
# Add user message to conversation history
self.messages.append({\"role\": \"user\", \"content\": prompt})
# Get response from Groq
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages
)
assistant_message = response.choices[0].message.content
# Add assistant response to conversation history
self.messages.append({\"role\": \"assistant\", \"content\": assistant_message})
return assistant_message
# Initialize
GROQ_API_KEY = os.getenv(\"GROQ_API_KEY\")
llm = RealLLM(api_key=GROQ_API_KEY)How It Works
RealLLM uses API calls to a neural network:
- Maintains full conversation history in
self.messages - Sends entire conversation to Groq servers
- Llama 3.3 reads and understands the context
- Generates response following system prompt rules
- Stores response in conversation history
Actual reasoning, not pattern matching.
Execution Flow: โI attack the goblin!โ (with dice = 3)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ User: \"I attack the goblin!\" โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ llm.generate(\"User: I attack...\") โ
โ Sends to Groq: โ
โ [{\"role\": \"system\", \"content\": SYSTEM_PROMPT}, โ
โ {\"role\": \"user\", \"content\": \"User: I attack...\"}]โ
โ โ
โ Llama 3.3 reads: โ
โ - System: \"For uncertain outcomes, roll dice\" โ
โ - User: \"I attack the goblin\" โ
โ - Reasoning: \"Attack = uncertain, need roll\" โ
โ โ
โ Returns: \"Thought:... Action: roll_dice\" โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ parse_response() โ True โ
โ Execute: dice_result = 3 โ โ Let's say we roll a 3
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ llm.generate(\"...Observation: 3\") โ
โ Sends to Groq: โ
โ [...previous messages..., โ
โ {\"role\": \"user\", \"content\": \"Observation: 3\"}] โ
โ โ
โ Llama 3.3 reads: โ
โ - Previous: \"I need to roll\" โ
โ - Observation: 3 โ โ Actually reads this!
โ - System rules: \"1-5 = Critical Fail\" โ
โ - Reasoning: \"3 is in 1-5 range, spectacular fail\" โ
โ โ
โ Returns: \"You swing wildly and miss! Your โ
โ blade clatters harmlessly against the cave โ
โ wall. The goblin cackles with glee!\" โ โ Contextual!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโWhy Itโs Different
RealLLM reads the dice value (3) and applies the system prompt rule (1-5 = Critical Fail), generating a narrative that matches the roll.
Part 3: Side-by-Side Comparison
Code Differences
| Aspect | MockLLM | RealLLM |
|---|---|---|
| Import | None | from groq import Groq |
| Initialization | llm = MockLLM() | llm = RealLLM(api_key=...) |
| Storage | No state (stateless) | self.messages (conversation history) |
| Logic | Pattern matching (if \"attack\" in prompt) | Neural network inference |
| API Call | None | self.client.chat.completions.create(...) |
| Cost | Free | Pay-per-token |
| Speed | Instant | ~1-2 seconds (Groq is fast!) |
Behavior Differences
Same scenario: โI attack the goblin!โ with dice roll = 3
| Step | MockLLM | RealLLM |
|---|---|---|
| First Response | โThought:โฆ Action: roll_dice" | "Thought:โฆ Action: roll_diceโ |
| Dice Roll | 3 | 3 |
| Reads Value? | โ No (ignores it) | โ Yes (actually reads โ3โ) |
| Applies Rules? | โ No | โ Yes (sees 3 โ Critical Fail) |
| Final Narrative | โโฆcritical hit! Goblin falls.โ โ WRONG | โYou swing wildly and missโฆโ โ CORRECT |
When to Use Which
Use MockLLM when:
- ๐ Learning agent architecture
- ๐งช Testing agent loop logic
- ๐ง Debugging prompt parsing
- ๐ฐ Avoiding API costs during development
- โก Need instant responses for unit tests
Use RealLLM when:
- ๐ Building production applications
- ๐ฎ Demonstrating dynamic AI behavior
- ๐ Showcasing true agent capabilities
- ๐ Creating actual user experiences
- ๐ฏ Need contextual, varied responses
Part 4: How RealLLM Actually Works (Deep Dive)
Letโs trace through a complete interaction to see what happens under the hood.
Scenario: โI attack the goblin!โ (dice rolls 17)
Step 1: Initialization
llm = RealLLM(api_key=GROQ_API_KEY)What happens:
- Creates Groq client connection
- Initializes
self.messageswith system prompt
State:
self.messages = [
{\"role\": \"system\", \"content\": \"You are a Dungeon Master...\"}
]Step 2: First LLM Call
response = llm.generate(\"User: I attack the goblin!\")Inside generate():
-
Add user message:
self.messages.append({ \"role\": \"user\", \"content\": \"User: I attack the goblin!\" }) -
Call Groq API:
response = self.client.chat.completions.create( model=\"llama-3.3-70b-versatile\", messages=self.messages # [system, user] ) -
Groq processes:
- Llama 3.3 reads system prompt + user input
- Recognizes โattackโ needs dice roll
- Generates:
\"Thought: The player is attacking... Action: roll_dice\"
-
Store response:
self.messages.append({ \"role\": \"assistant\", \"content\": \"Thought: The player is attacking... Action: roll_dice\" })
State:
self.messages = [
{\"role\": \"system\", \"content\": \"You are a DM...\"},
{\"role\": \"user\", \"content\": \"User: I attack the goblin!\"},
{\"role\": \"assistant\", \"content\": \"Thought:... Action: roll_dice\"}
]Step 3: Tool Execution
dice_result = roll_dice() # Returns 17
observation = \"Observation: 17\"Step 4: Second LLM Call
response = llm.generate(\"...\\nObservation: 17\")Inside generate() (second time):
-
Add observation:
self.messages.append({ \"role\": \"user\", \"content\": \"Observation: 17\" }) -
Call Groq API:
response = self.client.chat.completions.create( model=\"llama-3.3-70b-versatile\", messages=self.messages # [system, user, assistant, observation] ) -
Groq processes:
- Llama 3.3 reads full conversation
- Sees
Observation: 17 - Applies rule: โ16-20 = Critical Hitโ
- Generates contextual narrative:
\"Your blade arcs through the air with devastating precision...\"
-
Store response:
self.messages.append({ \"role\": \"assistant\", \"content\": \"Your blade arcs through the air...\" })
Final State:
self.messages = [
{\"role\": \"system\", \"content\": \"You are a DM...\"},
{\"role\": \"user\", \"content\": \"User: I attack the goblin!\"},
{\"role\": \"assistant\", \"content\": \"Thought:... Action: roll_dice\"},
{\"role\": \"user\", \"content\": \"Observation: 17\"},
{\"role\": \"assistant\", \"content\": \"Your blade arcs through the air...\"}
]This conversation history persistsโif the player makes another action, the entire history is sent to Groq, giving the LLM full context!
Why Conversation History Matters
Without history (MockLLM):
# Each call is independent, no memory
generate(\"User: I attack\") # Returns action request
generate(\"Observation: 17\") # Doesn't know about previous attack!With history (RealLLM):
# Each call builds on previous
self.messages:
1. System: \"You're a DM\"
2. User: \"I attack\"
3. Assistant: \"Action: roll_dice\"
4. User: \"Observation: 17\" โ LLM knows this relates to the attack!
5. Assistant: \"Critical hit! ...\"The LLM can reason: โThey attacked, rolled 17, thatโs a crit hit, I should narrate accordingly.โ
Part 5: Practical Examples
MockLLM: Always the Same
No matter what we roll, MockLLM gives the same output:
llm = MockLLM()
# Roll 1 (critical fail)
run_turn(\"I attack the goblin!\")
# ๐ฒ [Tool Executed] roll_dice() returned: 2
# ๐ค Agent (Final): The die rolls... a critical hit! The goblin falls.
# Roll 2 (critical fail again)
run_turn(\"I attack the goblin!\")
# ๐ฒ [Tool Executed] roll_dice() returned: 4
# ๐ค Agent (Final): The die rolls... a critical hit! The goblin falls.
# Roll 3 (actual critical hit)
run_turn(\"I attack the goblin!\")
# ๐ฒ [Tool Executed] roll_dice() returned: 20
# ๐ค Agent (Final): The die rolls... a critical hit! The goblin falls.Same narrative regardless of dice value! โ
RealLLM: Dynamic Responses
With Groq, narratives change based on the actual roll:
llm = RealLLM(api_key=GROQ_API_KEY)
# Critical Fail (dice = 3)
run_turn(\"I attack the dragon!\")
# ๐ฒ [Tool Executed] roll_dice() returned: 3
# ๐ค Agent (Final): You swing your sword with confidence, but your foot
# catches on a stone! You stumble forward, your blade clattering harmlessly
# against the cave wall. The dragon roars with amusement!
# Fail (dice = 8)
run_turn(\"I attack the dragon!\")
# ๐ฒ [Tool Executed] roll_dice() returned: 8
# ๐ค Agent (Final): Your strike is solid, but the dragon is quicker than
# expected. It dodges, and your sword whooshes through empty air.
# Success (dice = 13)
run_turn(\"I attack the dragon!\")
# ๐ฒ [Tool Executed] roll_dice() returned: 13
# ๐ค Agent (Final): Your blade finds its mark! You slice across the dragon's
# scales, drawing green blood. It screeches in pain and stumbles backward.
# Critical Hit (dice = 20)
run_turn(\"I attack the dragon!\")
# ๐ฒ [Tool Executed] roll_dice() returned: 20
# ๐ค Agent (Final): With perfect precision, your sword pierces through the
# dragon's armor! It crumples to the ground with a final gasp. Victory is yours!Different narratives for each dice value! โ
Conclusion
What We Learned
1. MockLLM vs RealLLM is about purpose:
- MockLLM: Testing your agent architecture without API costs
- RealLLM: Actual production behavior with dynamic reasoning
2. The agent loop stays the same:
response = llm.generate(context) # Works with both!Good abstraction means switching from Mock to Real is just changing one line.
3. Conversation history is crucial:
- MockLLM: Stateless (no memory)
- RealLLM: Stateful (
self.messagestracks everything)
4. Real LLMs actually read and apply rules:
- They see
Observation: 17 - Apply
16-20 = Critical Hit - Generate contextual narrative
Try It Yourself
See the full Groq implementation: EngGenAI_assignment_3_agents_grok.ipynb
The notebook includes:
- โ Complete setup instructions
- โ API key configuration
- โ Interactive demos
- โ Forced dice scenarios (test all outcome ranges)
Next Steps
To extend this agent:
- Multiple tools - Add
check_inventory,cast_spell,search_room - Memory persistence - Save conversation history to database
- Streaming responses - Show text as it generates
- Error handling - What if Groq API fails?
- Multi-turn context - Remember what happened 3 turns ago
The architecture you learned here powers real production systems like ChatGPTโs Code Interpreter, Claudeโs tool use, and autonomous research agents.
Happy adventuring! ๐