Building a Document Q&A System with n8n: A RAG Tutorial

[!NOTE] This post walks through Assignment 1 from the Engineering GenAI course, where we build a RAG (Retrieval-Augmented Generation) system that can answer questions about the “Attention Is All You Need” paper.

Introduction: What is RAG?

Large Language Models are powerful, but they have limitations:

Knowledge cutoff — They don’t know about recent events
Hallucination — They sometimes make things up
No access to private data — They can’t read your documents

RAG (Retrieval-Augmented Generation) solves this by:

Retrieving relevant information from a knowledge base
Augmenting the LLM prompt with that context
Generating answers grounded in actual documents

In this tutorial, we’ll build a complete RAG system using n8n (a visual workflow automation tool) to answer questions about the famous Transformer paper.

System Architecture

Our system has two independent workflows:

┌─────────────────────────────────────────────────────────────┐
│ FLOW 1: Document Ingestion (runs once)                      │
│                                                             │
│  PDF → Extract Text → Chunk → Embed → Store in Supabase    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ FLOW 2: Conversational Retrieval (runs per question)        │
│                                                             │
│  Question → Embed → Search Supabase → LLM → Answer          │
└─────────────────────────────────────────────────────────────┘

The key insight: these flows communicate through the database, not through n8n connections.

Prerequisites

Required Services

Service	Purpose	Free Tier?
Supabase	Vector database with pgvector	✅ Yes
HuggingFace	Text embeddings	✅ Yes
Groq	Fast LLM inference	✅ Yes
n8n	Workflow orchestration	✅ Self-hosted

Supabase Setup

Create a vector-enabled table:

-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
 
-- Create the documents table
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  metadata JSONB,
  embedding vector(768)  -- 768 dimensions for our model
);
 
-- Create index for fast similarity search
CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

You’ll also need a custom function for similarity search:

CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(768),
  match_count int DEFAULT 5,
  filter jsonb DEFAULT '{}'
)
RETURNS TABLE (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    documents.id,
    documents.content,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) AS similarity
  FROM documents
  WHERE documents.metadata @> filter
  ORDER BY documents.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Flow 1: Document Ingestion

This flow runs once to process and store the PDF.

Node 1: Manual Trigger

Simply starts the workflow when you click “Execute”.

Node 2: Read Binary File

File Path: /data/attention_is_all_you_need.pdf
Property Name: data

Tip: Use Docker volume mounts to make files accessible to n8n.

Node 3: Extract From File

Input Binary Field: data
Operation: Extract From PDF
Chunk Size: 1000 characters
Chunk Overlap: 200 characters
Include Metadata: true

Why these settings?

1000 chars balances context vs. specificity
200 char overlap prevents losing info at chunk boundaries
Metadata helps track source locations

Node 4: Embeddings (HuggingFace)

Model: sentence-transformers/all-mpnet-base-v2
Output Dimension: 768

This converts text chunks into 768-dimensional vectors that capture semantic meaning.

Node 5: Supabase Vector Store (Insert)

Operation: Insert Documents
Table Name: documents
Embedding Column: embedding
Content Column: content
Metadata Column: metadata

Result: Successfully inserted 71 document chunks into Supabase.

Flow 2: Conversational Retrieval

This flow runs every time a user asks a question.

Node 1: Chat Trigger

Receives user questions via n8n’s built-in chat interface.

Node 2: AI Agent

The orchestrator that connects everything:

System Message: |
  You are a helpful AI assistant that answers questions about 
  the research paper "Attention Is All You Need". 
  
  Use the retrieval tool to find relevant information from 
  the paper before answering. Base your answers strictly on 
  the retrieved content - do not make up information.
  
  If the answer is not in the retrieved content, say so clearly.
  Provide clear, concise answers with specific details from the paper.

The AI Agent connects to:

Groq Chat Model (the “brain”)
Supabase Vector Store (the “knowledge”)
Window Buffer Memory (the “memory”)

Node 3: Groq Chat Model

Model: llama-3.3-70b-versatile
Temperature: 0.3  # Lower = more focused

Model Options:

llama-3.3-70b-versatile — Best quality
mixtral-8x7b-32768 — Good balance
llama-3.1-8b-instant — Fastest

Node 4: Supabase Vector Store (Retrieve)

Operation: Retrieve Documents (As Tool)
Table Name: documents
Top K: 4  # Return 4 most relevant chunks
Tool Description: |
  Search the "Attention Is All You Need" paper for relevant 
  information. Use this tool whenever you need to find specific 
  details from the paper.

Node 5: Window Buffer Memory

Session Key: {{ $json.sessionId }}
Window Size: 5  # Remember last 5 exchanges

Enables follow-up questions like “Tell me more about that”.

Challenges & Solutions

Challenge 1: Document Sub-node Connection Error

Problem: “A Document sub-node must be connected and enabled”

Solution: n8n v1.0+ requires explicit sub-nodes:

Add Default Data Loader for documents
Add Embeddings HuggingFace for vectors

Challenge 2: Missing `match_documents` Function

Problem: “Could not find the function public.match_documents”

Solution: Create the custom PostgreSQL function (shown in Prerequisites).

Challenge 3: Column Ambiguity Error

Problem: “column reference ‘metadata’ is ambiguous”

Solution: Prefix table columns with table name: documents.metadata

Challenge 4: Embedding Model Mismatch

Problem: Vector dimension mismatch errors

Solution: Use the same embedding model for both ingestion and retrieval. Mix-and-match = broken search.

Challenge 5: Understanding Flow Independence

Problem: Confusion about how to “connect” Flow 1 to Flow 2

Solution: They communicate through the database, not n8n connections:

Flow 1 writes to Supabase
Flow 2 reads from Supabase

Testing the System

Test Questions

Question Type	Example
Basic Factual	”What is the Transformer model?”
Technical	”How does multi-head attention work?”
Comparative	”How does the Transformer differ from RNNs?”
Numerical	”What is the model dimension (d_model)?”

Expected Behavior

✅ Good Response:

Answers grounded in paper content
Includes specific details and figures
Acknowledges when info isn’t available

❌ Poor Response:

Generic answer not from the paper
Fabricated information
Ignores retrieved context

Performance Metrics

Metric	Value
Document Chunks	71
Response Time	~2-3 seconds
Retrieval Top K	4 chunks
Token Usage	~500-1000 per query

Key Takeaways

1. RAG = Grounded Answers

By retrieving from actual documents, the LLM can’t hallucinate—it’s forced to use real content.

2. Embeddings Enable Semantic Search

Unlike keyword search, vector embeddings understand meaning:

“How does the model work?” matches “architecture of the Transformer”
“What’s special about this approach?” matches “advantages over RNNs”

3. Chunking Strategy Matters

Too small (100 chars): Loses context
Too large (5000 chars): Dilutes relevance
Sweet spot (1000 chars + overlap): Balanced retrieval

4. n8n Makes AI Accessible

Visual workflows demystify complex AI pipelines. Anyone can build a RAG system without writing Python.

5. The Database is the Bridge

Ingestion and retrieval are separate processes that share data, not workflow connections.

Workflow Diagram

                    ┌──────────────────────────────────────────┐
                    │            FLOW 1 (One-time)             │
                    │                                          │
┌─────────┐   ┌─────┴─────┐   ┌────────────┐   ┌────────────┐  │
│ Manual  │──▶│ Read PDF  │──▶│  Extract & │──▶│ Embed with │──┘
│ Trigger │   │   File    │   │   Chunk    │   │ HuggingFace│
└─────────┘   └───────────┘   └────────────┘   └────────────┘
                                                      │
                                                      ▼
                                              ┌────────────┐
                                              │  Supabase  │
                                              │  Vector    │ ◀── Shared Database
                                              │   Store    │
                                              └────────────┘
                                                      ▲
                                                      │
┌─────────┐   ┌───────────┐   ┌────────────┐   ┌────────────┐
│  Chat   │──▶│ AI Agent  │──▶│   Groq     │   │  Retrieve  │──┘
│ Trigger │   │           │   │    LLM     │   │  from DB   │
└─────────┘   └───────────┘   └────────────┘   └────────────┘
                    │
                    │            FLOW 2 (Per Question)
                    └──────────────────────────────────────────

Complete Node Configuration

Flow 1 Summary

Node	Type	Key Settings
Trigger	Manual Trigger	—
Read	Read Binary File	Path: `/data/paper.pdf`
Extract	Extract From File	Chunk: 1000, Overlap: 200
Embed	HuggingFace	Model: `all-mpnet-base-v2`
Store	Supabase Vector Store	Operation: Insert

Flow 2 Summary

Node	Type	Key Settings
Trigger	Chat Trigger	Mode: Chat
Agent	AI Agent	System message + connections
LLM	Groq Chat Model	Model: `llama-3.3-70b`
Retrieve	Supabase Vector Store	Top K: 4
Memory	Window Buffer Memory	Size: 5

Next Steps

To extend this system:

Multi-Document Support — Add document_id column to handle multiple papers
Citation Tracking — Include page numbers in responses
Hybrid Search — Combine vector + keyword search
Answer Validation — Add a second LLM to verify grounding
Feedback Loop — Collect thumbs up/down to improve retrieval

Resources

Happy building! 🚀

🧠 ज्ञान उद्यान

Explorer

Recent Notes

Building a ReAct Agent from Scratch: MockLLM vs Real LLM

Map of Content

Building a RAG Pipeline from Scratch: A Complete Tutorial