Building a Document Q&A System with n8n: A RAG Tutorial

[!NOTE] This post walks through Assignment 1 from the Engineering GenAI course, where we build a RAG (Retrieval-Augmented Generation) system that can answer questions about the β€œAttention Is All You Need” paper.

Introduction: What is RAG?

Large Language Models are powerful, but they have limitations:

  • Knowledge cutoff β€” They don’t know about recent events
  • Hallucination β€” They sometimes make things up
  • No access to private data β€” They can’t read your documents

RAG (Retrieval-Augmented Generation) solves this by:

  1. Retrieving relevant information from a knowledge base
  2. Augmenting the LLM prompt with that context
  3. Generating answers grounded in actual documents

In this tutorial, we’ll build a complete RAG system using n8n (a visual workflow automation tool) to answer questions about the famous Transformer paper.


System Architecture

Our system has two independent workflows:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FLOW 1: Document Ingestion (runs once)                      β”‚
β”‚                                                             β”‚
β”‚  PDF β†’ Extract Text β†’ Chunk β†’ Embed β†’ Store in Supabase    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FLOW 2: Conversational Retrieval (runs per question)        β”‚
β”‚                                                             β”‚
β”‚  Question β†’ Embed β†’ Search Supabase β†’ LLM β†’ Answer          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The key insight: these flows communicate through the database, not through n8n connections.


Prerequisites

Required Services

ServicePurposeFree Tier?
SupabaseVector database with pgvectorβœ… Yes
HuggingFaceText embeddingsβœ… Yes
GroqFast LLM inferenceβœ… Yes
n8nWorkflow orchestrationβœ… Self-hosted

Supabase Setup

Create a vector-enabled table:

-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
 
-- Create the documents table
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  metadata JSONB,
  embedding vector(768)  -- 768 dimensions for our model
);
 
-- Create index for fast similarity search
CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

You’ll also need a custom function for similarity search:

CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(768),
  match_count int DEFAULT 5,
  filter jsonb DEFAULT '{}'
)
RETURNS TABLE (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    documents.id,
    documents.content,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) AS similarity
  FROM documents
  WHERE documents.metadata @> filter
  ORDER BY documents.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Flow 1: Document Ingestion

This flow runs once to process and store the PDF.

Node 1: Manual Trigger

Simply starts the workflow when you click β€œExecute”.

Node 2: Read Binary File

File Path: /data/attention_is_all_you_need.pdf
Property Name: data

Tip: Use Docker volume mounts to make files accessible to n8n.

Node 3: Extract From File

Input Binary Field: data
Operation: Extract From PDF
Chunk Size: 1000 characters
Chunk Overlap: 200 characters
Include Metadata: true

Why these settings?

  • 1000 chars balances context vs. specificity
  • 200 char overlap prevents losing info at chunk boundaries
  • Metadata helps track source locations

Node 4: Embeddings (HuggingFace)

Model: sentence-transformers/all-mpnet-base-v2
Output Dimension: 768

This converts text chunks into 768-dimensional vectors that capture semantic meaning.

Node 5: Supabase Vector Store (Insert)

Operation: Insert Documents
Table Name: documents
Embedding Column: embedding
Content Column: content
Metadata Column: metadata

Result: Successfully inserted 71 document chunks into Supabase.


Flow 2: Conversational Retrieval

This flow runs every time a user asks a question.

Node 1: Chat Trigger

Receives user questions via n8n’s built-in chat interface.

Node 2: AI Agent

The orchestrator that connects everything:

System Message: |
  You are a helpful AI assistant that answers questions about 
  the research paper "Attention Is All You Need". 
  
  Use the retrieval tool to find relevant information from 
  the paper before answering. Base your answers strictly on 
  the retrieved content - do not make up information.
  
  If the answer is not in the retrieved content, say so clearly.
  Provide clear, concise answers with specific details from the paper.

The AI Agent connects to:

  • Groq Chat Model (the β€œbrain”)
  • Supabase Vector Store (the β€œknowledge”)
  • Window Buffer Memory (the β€œmemory”)

Node 3: Groq Chat Model

Model: llama-3.3-70b-versatile
Temperature: 0.3  # Lower = more focused

Model Options:

  • llama-3.3-70b-versatile β€” Best quality
  • mixtral-8x7b-32768 β€” Good balance
  • llama-3.1-8b-instant β€” Fastest

Node 4: Supabase Vector Store (Retrieve)

Operation: Retrieve Documents (As Tool)
Table Name: documents
Top K: 4  # Return 4 most relevant chunks
Tool Description: |
  Search the "Attention Is All You Need" paper for relevant 
  information. Use this tool whenever you need to find specific 
  details from the paper.

Node 5: Window Buffer Memory

Session Key: {{ $json.sessionId }}
Window Size: 5  # Remember last 5 exchanges

Enables follow-up questions like β€œTell me more about that”.


Challenges & Solutions

Challenge 1: Document Sub-node Connection Error

Problem: β€œA Document sub-node must be connected and enabled”

Solution: n8n v1.0+ requires explicit sub-nodes:

  • Add Default Data Loader for documents
  • Add Embeddings HuggingFace for vectors

Challenge 2: Missing match_documents Function

Problem: β€œCould not find the function public.match_documents”

Solution: Create the custom PostgreSQL function (shown in Prerequisites).

Challenge 3: Column Ambiguity Error

Problem: β€œcolumn reference β€˜metadata’ is ambiguous”

Solution: Prefix table columns with table name: documents.metadata

Challenge 4: Embedding Model Mismatch

Problem: Vector dimension mismatch errors

Solution: Use the same embedding model for both ingestion and retrieval. Mix-and-match = broken search.

Challenge 5: Understanding Flow Independence

Problem: Confusion about how to β€œconnect” Flow 1 to Flow 2

Solution: They communicate through the database, not n8n connections:

  • Flow 1 writes to Supabase
  • Flow 2 reads from Supabase

Testing the System

Test Questions

Question TypeExample
Basic Factual”What is the Transformer model?”
Technical”How does multi-head attention work?”
Comparative”How does the Transformer differ from RNNs?”
Numerical”What is the model dimension (d_model)?”

Expected Behavior

βœ… Good Response:

  • Answers grounded in paper content
  • Includes specific details and figures
  • Acknowledges when info isn’t available

❌ Poor Response:

  • Generic answer not from the paper
  • Fabricated information
  • Ignores retrieved context

Performance Metrics

MetricValue
Document Chunks71
Response Time~2-3 seconds
Retrieval Top K4 chunks
Token Usage~500-1000 per query

Key Takeaways

1. RAG = Grounded Answers

By retrieving from actual documents, the LLM can’t hallucinateβ€”it’s forced to use real content.

Unlike keyword search, vector embeddings understand meaning:

  • β€œHow does the model work?” matches β€œarchitecture of the Transformer”
  • β€œWhat’s special about this approach?” matches β€œadvantages over RNNs”

3. Chunking Strategy Matters

  • Too small (100 chars): Loses context
  • Too large (5000 chars): Dilutes relevance
  • Sweet spot (1000 chars + overlap): Balanced retrieval

4. n8n Makes AI Accessible

Visual workflows demystify complex AI pipelines. Anyone can build a RAG system without writing Python.

5. The Database is the Bridge

Ingestion and retrieval are separate processes that share data, not workflow connections.


Workflow Diagram

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚            FLOW 1 (One-time)             β”‚
                    β”‚                                          β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚ Manual  │──▢│ Read PDF  │──▢│  Extract & │──▢│ Embed with β”‚β”€β”€β”˜
β”‚ Trigger β”‚   β”‚   File    β”‚   β”‚   Chunk    β”‚   β”‚ HuggingFaceβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                      β”‚
                                                      β–Ό
                                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                              β”‚  Supabase  β”‚
                                              β”‚  Vector    β”‚ ◀── Shared Database
                                              β”‚   Store    β”‚
                                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                      β–²
                                                      β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chat   │──▢│ AI Agent  │──▢│   Groq     β”‚   β”‚  Retrieve  β”‚β”€β”€β”˜
β”‚ Trigger β”‚   β”‚           β”‚   β”‚    LLM     β”‚   β”‚  from DB   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β”‚            FLOW 2 (Per Question)
                    └──────────────────────────────────────────

Complete Node Configuration

Flow 1 Summary

NodeTypeKey Settings
TriggerManual Triggerβ€”
ReadRead Binary FilePath: /data/paper.pdf
ExtractExtract From FileChunk: 1000, Overlap: 200
EmbedHuggingFaceModel: all-mpnet-base-v2
StoreSupabase Vector StoreOperation: Insert

Flow 2 Summary

NodeTypeKey Settings
TriggerChat TriggerMode: Chat
AgentAI AgentSystem message + connections
LLMGroq Chat ModelModel: llama-3.3-70b
RetrieveSupabase Vector StoreTop K: 4
MemoryWindow Buffer MemorySize: 5

Next Steps

To extend this system:

  1. Multi-Document Support β€” Add document_id column to handle multiple papers
  2. Citation Tracking β€” Include page numbers in responses
  3. Hybrid Search β€” Combine vector + keyword search
  4. Answer Validation β€” Add a second LLM to verify grounding
  5. Feedback Loop β€” Collect thumbs up/down to improve retrieval

Resources


Happy building! πŸš€