Building a Document Q&A System with n8n: A RAG Tutorial
[!NOTE] This post walks through Assignment 1 from the Engineering GenAI course, where we build a RAG (Retrieval-Augmented Generation) system that can answer questions about the βAttention Is All You Needβ paper.
Introduction: What is RAG?
Large Language Models are powerful, but they have limitations:
- Knowledge cutoff β They donβt know about recent events
- Hallucination β They sometimes make things up
- No access to private data β They canβt read your documents
RAG (Retrieval-Augmented Generation) solves this by:
- Retrieving relevant information from a knowledge base
- Augmenting the LLM prompt with that context
- Generating answers grounded in actual documents
In this tutorial, weβll build a complete RAG system using n8n (a visual workflow automation tool) to answer questions about the famous Transformer paper.
System Architecture
Our system has two independent workflows:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FLOW 1: Document Ingestion (runs once) β
β β
β PDF β Extract Text β Chunk β Embed β Store in Supabase β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FLOW 2: Conversational Retrieval (runs per question) β
β β
β Question β Embed β Search Supabase β LLM β Answer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The key insight: these flows communicate through the database, not through n8n connections.
Prerequisites
Required Services
| Service | Purpose | Free Tier? |
|---|---|---|
| Supabase | Vector database with pgvector | β Yes |
| HuggingFace | Text embeddings | β Yes |
| Groq | Fast LLM inference | β Yes |
| n8n | Workflow orchestration | β Self-hosted |
Supabase Setup
Create a vector-enabled table:
-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create the documents table
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
metadata JSONB,
embedding vector(768) -- 768 dimensions for our model
);
-- Create index for fast similarity search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);Youβll also need a custom function for similarity search:
CREATE OR REPLACE FUNCTION match_documents(
query_embedding vector(768),
match_count int DEFAULT 5,
filter jsonb DEFAULT '{}'
)
RETURNS TABLE (
id bigint,
content text,
metadata jsonb,
similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
documents.id,
documents.content,
documents.metadata,
1 - (documents.embedding <=> query_embedding) AS similarity
FROM documents
WHERE documents.metadata @> filter
ORDER BY documents.embedding <=> query_embedding
LIMIT match_count;
END;
$$;Flow 1: Document Ingestion
This flow runs once to process and store the PDF.
Node 1: Manual Trigger
Simply starts the workflow when you click βExecuteβ.
Node 2: Read Binary File
File Path: /data/attention_is_all_you_need.pdf
Property Name: dataTip: Use Docker volume mounts to make files accessible to n8n.
Node 3: Extract From File
Input Binary Field: data
Operation: Extract From PDF
Chunk Size: 1000 characters
Chunk Overlap: 200 characters
Include Metadata: trueWhy these settings?
- 1000 chars balances context vs. specificity
- 200 char overlap prevents losing info at chunk boundaries
- Metadata helps track source locations
Node 4: Embeddings (HuggingFace)
Model: sentence-transformers/all-mpnet-base-v2
Output Dimension: 768This converts text chunks into 768-dimensional vectors that capture semantic meaning.
Node 5: Supabase Vector Store (Insert)
Operation: Insert Documents
Table Name: documents
Embedding Column: embedding
Content Column: content
Metadata Column: metadataResult: Successfully inserted 71 document chunks into Supabase.
Flow 2: Conversational Retrieval
This flow runs every time a user asks a question.
Node 1: Chat Trigger
Receives user questions via n8nβs built-in chat interface.
Node 2: AI Agent
The orchestrator that connects everything:
System Message: |
You are a helpful AI assistant that answers questions about
the research paper "Attention Is All You Need".
Use the retrieval tool to find relevant information from
the paper before answering. Base your answers strictly on
the retrieved content - do not make up information.
If the answer is not in the retrieved content, say so clearly.
Provide clear, concise answers with specific details from the paper.The AI Agent connects to:
- Groq Chat Model (the βbrainβ)
- Supabase Vector Store (the βknowledgeβ)
- Window Buffer Memory (the βmemoryβ)
Node 3: Groq Chat Model
Model: llama-3.3-70b-versatile
Temperature: 0.3 # Lower = more focusedModel Options:
llama-3.3-70b-versatileβ Best qualitymixtral-8x7b-32768β Good balancellama-3.1-8b-instantβ Fastest
Node 4: Supabase Vector Store (Retrieve)
Operation: Retrieve Documents (As Tool)
Table Name: documents
Top K: 4 # Return 4 most relevant chunks
Tool Description: |
Search the "Attention Is All You Need" paper for relevant
information. Use this tool whenever you need to find specific
details from the paper.Node 5: Window Buffer Memory
Session Key: {{ $json.sessionId }}
Window Size: 5 # Remember last 5 exchangesEnables follow-up questions like βTell me more about thatβ.
Challenges & Solutions
Challenge 1: Document Sub-node Connection Error
Problem: βA Document sub-node must be connected and enabledβ
Solution: n8n v1.0+ requires explicit sub-nodes:
- Add Default Data Loader for documents
- Add Embeddings HuggingFace for vectors
Challenge 2: Missing match_documents Function
Problem: βCould not find the function public.match_documentsβ
Solution: Create the custom PostgreSQL function (shown in Prerequisites).
Challenge 3: Column Ambiguity Error
Problem: βcolumn reference βmetadataβ is ambiguousβ
Solution: Prefix table columns with table name: documents.metadata
Challenge 4: Embedding Model Mismatch
Problem: Vector dimension mismatch errors
Solution: Use the same embedding model for both ingestion and retrieval. Mix-and-match = broken search.
Challenge 5: Understanding Flow Independence
Problem: Confusion about how to βconnectβ Flow 1 to Flow 2
Solution: They communicate through the database, not n8n connections:
- Flow 1 writes to Supabase
- Flow 2 reads from Supabase
Testing the System
Test Questions
| Question Type | Example |
|---|---|
| Basic Factual | βWhat is the Transformer model?β |
| Technical | βHow does multi-head attention work?β |
| Comparative | βHow does the Transformer differ from RNNs?β |
| Numerical | βWhat is the model dimension (d_model)?β |
Expected Behavior
β Good Response:
- Answers grounded in paper content
- Includes specific details and figures
- Acknowledges when info isnβt available
β Poor Response:
- Generic answer not from the paper
- Fabricated information
- Ignores retrieved context
Performance Metrics
| Metric | Value |
|---|---|
| Document Chunks | 71 |
| Response Time | ~2-3 seconds |
| Retrieval Top K | 4 chunks |
| Token Usage | ~500-1000 per query |
Key Takeaways
1. RAG = Grounded Answers
By retrieving from actual documents, the LLM canβt hallucinateβitβs forced to use real content.
2. Embeddings Enable Semantic Search
Unlike keyword search, vector embeddings understand meaning:
- βHow does the model work?β matches βarchitecture of the Transformerβ
- βWhatβs special about this approach?β matches βadvantages over RNNsβ
3. Chunking Strategy Matters
- Too small (100 chars): Loses context
- Too large (5000 chars): Dilutes relevance
- Sweet spot (1000 chars + overlap): Balanced retrieval
4. n8n Makes AI Accessible
Visual workflows demystify complex AI pipelines. Anyone can build a RAG system without writing Python.
5. The Database is the Bridge
Ingestion and retrieval are separate processes that share data, not workflow connections.
Workflow Diagram
ββββββββββββββββββββββββββββββββββββββββββββ
β FLOW 1 (One-time) β
β β
βββββββββββ βββββββ΄ββββββ ββββββββββββββ ββββββββββββββ β
β Manual ββββΆβ Read PDF ββββΆβ Extract & ββββΆβ Embed with ββββ
β Trigger β β File β β Chunk β β HuggingFaceβ
βββββββββββ βββββββββββββ ββββββββββββββ ββββββββββββββ
β
βΌ
ββββββββββββββ
β Supabase β
β Vector β βββ Shared Database
β Store β
ββββββββββββββ
β²
β
βββββββββββ βββββββββββββ ββββββββββββββ ββββββββββββββ
β Chat ββββΆβ AI Agent ββββΆβ Groq β β Retrieve ββββ
β Trigger β β β β LLM β β from DB β
βββββββββββ βββββββββββββ ββββββββββββββ ββββββββββββββ
β
β FLOW 2 (Per Question)
βββββββββββββββββββββββββββββββββββββββββββ
Complete Node Configuration
Flow 1 Summary
| Node | Type | Key Settings |
|---|---|---|
| Trigger | Manual Trigger | β |
| Read | Read Binary File | Path: /data/paper.pdf |
| Extract | Extract From File | Chunk: 1000, Overlap: 200 |
| Embed | HuggingFace | Model: all-mpnet-base-v2 |
| Store | Supabase Vector Store | Operation: Insert |
Flow 2 Summary
| Node | Type | Key Settings |
|---|---|---|
| Trigger | Chat Trigger | Mode: Chat |
| Agent | AI Agent | System message + connections |
| LLM | Groq Chat Model | Model: llama-3.3-70b |
| Retrieve | Supabase Vector Store | Top K: 4 |
| Memory | Window Buffer Memory | Size: 5 |
Next Steps
To extend this system:
- Multi-Document Support β Add
document_idcolumn to handle multiple papers - Citation Tracking β Include page numbers in responses
- Hybrid Search β Combine vector + keyword search
- Answer Validation β Add a second LLM to verify grounding
- Feedback Loop β Collect thumbs up/down to improve retrieval
Resources
- n8n AI Nodes Documentation
- Supabase Vector Guide
- Attention Is All You Need (Paper)
- RAG Overview Paper
Happy building! π