Assignment Solution: n8n Chatmate

This document outlines the complete solution for the “Chatmate” assignment in the GenAI course. The goal is to build a Retrieval-Augmented Generation (RAG) workflow in n8n that allows users to ask questions about the “Attention Is All You Need” paper.

1. Assignment Overview

Objective: Create an interactive “Ask Me Anything” system for a specific PDF document. Core Technology: n8n (Workflow Automation), Supabase (Vector Storage), Groq (LLM Inference). Architecture:

Ingestion Flow: PDF → Text → Embeddings → Vector DB.
Retrieval Flow: User Query → Vector Search → LLM → Answer.

2. Part A: Data Preparation

Before building the workflow, the environment and data must be prepared.

Step 1: Download the Document

The assignment requires the “Attention Is All You Need” paper.

wget https://arxiv.org/pdf/1706.03762.pdf -O Attention_Is_All_You_Need.pdf

Step 2: Docker Configuration

To allow n8n (running in Docker) to access the downloaded PDF, the local directory must be mounted to the container.

Docker Volume Mount

Ensure your Docker run command includes a volume mount like -v /path/to/local/files:/files. This allows the “Read/Write Files from Disk” node to access /files/Attention_Is_All_You_Need.pdf.

3. Part B: Workflow Implementation

The solution consists of two separate flows within n8n.

Step 0: Database Setup (Supabase)

Before creating the workflow, you must set up the database schema in Supabase. Run the following SQL in the Supabase SQL Editor:

-- Enable the vector extension
create extension if not exists vector;
 
-- Create the documents table
create table if not exists documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(768) -- Matches DistilBERT dimension
);
 
-- Create the search function
create or replace function match_documents (
  query_embedding vector(768),
  match_threshold float,
  match_count int
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
) language plpgsql stable as $$
begin
  return query
  select
    documents.id,
    documents.content,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

Flow 1: Document Ingestion (Setup)

Run this flow once to populate the database.

Trigger: Manual Trigger
- Settings: “When clicking Execute Workflow”.
Read File: Read/Write Files from Disk
- Operation: Read.
- File Path: /files/Attention_Is_All_You_Need.pdf (or your specific mounted path).
- Property Name: data.
Extract Text: Extract Text (PDF Parser)
- Operation: Extract from PDF.
- Binary Property: data.
Generate Embeddings: HuggingFace Inference
- Model: sentence-transformers/distilbert-base-nli-mean-token.
- Input: Text chunks from the previous node.
Store Vectors: Supabase Vector Store
- Operation: Insert.
- Table Name: documents.
- Content Column: content.
- Embedding Column: embedding.

Flow 2: Conversational Retrieval (Runtime)

This flow handles user interactions.

Trigger: Chat Trigger
- Settings: “When chat message received”.
Orchestrator: AI Agent
- Mode: Tools Agent.
- System Prompt: “You are a helpful assistant. Use the provided tools to answer questions about the ‘Attention Is All You Need’ paper based strictly on the retrieved context.”
LLM: Groq Chat Model
- Model: llama3-8b-8192.
- Temperature: 0.1 (for factual consistency).
Retrieval Tool: Supabase Vector Store
- Operation: Search.
- Table Name: documents.
- Top K: 4 (retrieve top 4 relevant chunks).
Memory: Simple Memory
- Session Key: {{$json.sessionId}}.
- Context Window: 5 turns.

4. Deliverables Checklist

Ensure the following files are generated and verified:

n8n_workflow.json: The exported workflow file containing both flows.
report.md: A summary report of the nodes and configuration.
video_instructions.md: A script or link to the demonstration video.

5. Challenges & Solutions

Challenge: File Access

Issue: The n8n container could not find the PDF file. Solution: Verified the Docker volume mount and used the absolute path inside the container (/files/...) instead of the host path.

Challenge: Embedding Dimensions

Issue: Mismatch between embedding model output and database column. Solution: Used distilbert-base-nli-mean-token which outputs 768 dimensions, matching the standard vector column setup in Supabase.

🧠 ज्ञान उद्यान

Explorer

Recent Notes

Building a ReAct Agent from Scratch: MockLLM vs Real LLM

Map of Content

Building a RAG Pipeline from Scratch: A Complete Tutorial

Building a Document Q&A System with n8n: A RAG Tutorial

Home

Assignment Solution: n8n Chatmate

Assignment Solution: n8n Chatmate

1. Assignment Overview

2. Part A: Data Preparation

Step 1: Download the Document

Step 2: Docker Configuration

3. Part B: Workflow Implementation

Step 0: Database Setup (Supabase)

Flow 1: Document Ingestion (Setup)

Flow 2: Conversational Retrieval (Runtime)

4. Deliverables Checklist

5. Challenges & Solutions

Graph View

Backlinks

Table of Contents