Assignment Solution: n8n Chatmate

This document outlines the complete solution for the “Chatmate” assignment in the GenAI course. The goal is to build a Retrieval-Augmented Generation (RAG) workflow in n8n that allows users to ask questions about the “Attention Is All You Need” paper.

1. Assignment Overview

Objective: Create an interactive “Ask Me Anything” system for a specific PDF document. Core Technology: n8n (Workflow Automation), Supabase (Vector Storage), Groq (LLM Inference). Architecture:

  1. Ingestion Flow: PDF Text Embeddings Vector DB.
  2. Retrieval Flow: User Query Vector Search LLM Answer.

2. Part A: Data Preparation

Before building the workflow, the environment and data must be prepared.

Step 1: Download the Document

The assignment requires the “Attention Is All You Need” paper.

wget https://arxiv.org/pdf/1706.03762.pdf -O Attention_Is_All_You_Need.pdf

Step 2: Docker Configuration

To allow n8n (running in Docker) to access the downloaded PDF, the local directory must be mounted to the container.

Docker Volume Mount

Ensure your Docker run command includes a volume mount like -v /path/to/local/files:/files. This allows the “Read/Write Files from Disk” node to access /files/Attention_Is_All_You_Need.pdf.

3. Part B: Workflow Implementation

The solution consists of two separate flows within n8n.

Step 0: Database Setup (Supabase)

Before creating the workflow, you must set up the database schema in Supabase. Run the following SQL in the Supabase SQL Editor:

-- Enable the vector extension
create extension if not exists vector;
 
-- Create the documents table
create table if not exists documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(768) -- Matches DistilBERT dimension
);
 
-- Create the search function
create or replace function match_documents (
  query_embedding vector(768),
  match_threshold float,
  match_count int
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
) language plpgsql stable as $$
begin
  return query
  select
    documents.id,
    documents.content,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

Flow 1: Document Ingestion (Setup)

Run this flow once to populate the database.

  1. Trigger: Manual Trigger
    • Settings: “When clicking Execute Workflow”.
  2. Read File: Read/Write Files from Disk
    • Operation: Read.
    • File Path: /files/Attention_Is_All_You_Need.pdf (or your specific mounted path).
    • Property Name: data.
  3. Extract Text: Extract Text (PDF Parser)
    • Operation: Extract from PDF.
    • Binary Property: data.
  4. Generate Embeddings: HuggingFace Inference
    • Model: sentence-transformers/distilbert-base-nli-mean-token.
    • Input: Text chunks from the previous node.
  5. Store Vectors: Supabase Vector Store
    • Operation: Insert.
    • Table Name: documents.
    • Content Column: content.
    • Embedding Column: embedding.

Flow 2: Conversational Retrieval (Runtime)

This flow handles user interactions.

  1. Trigger: Chat Trigger
    • Settings: “When chat message received”.
  2. Orchestrator: AI Agent
    • Mode: Tools Agent.
    • System Prompt: “You are a helpful assistant. Use the provided tools to answer questions about the ‘Attention Is All You Need’ paper based strictly on the retrieved context.”
  3. LLM: Groq Chat Model
    • Model: llama3-8b-8192.
    • Temperature: 0.1 (for factual consistency).
  4. Retrieval Tool: Supabase Vector Store
    • Operation: Search.
    • Table Name: documents.
    • Top K: 4 (retrieve top 4 relevant chunks).
  5. Memory: Simple Memory
    • Session Key: {{$json.sessionId}}.
    • Context Window: 5 turns.

4. Deliverables Checklist

Ensure the following files are generated and verified:

  • n8n_workflow.json: The exported workflow file containing both flows.
  • report.md: A summary report of the nodes and configuration.
  • video_instructions.md: A script or link to the demonstration video.

5. Challenges & Solutions

Challenge: File Access

Issue: The n8n container could not find the PDF file. Solution: Verified the Docker volume mount and used the absolute path inside the container (/files/...) instead of the host path.

Challenge: Embedding Dimensions

Issue: Mismatch between embedding model output and database column. Solution: Used distilbert-base-nli-mean-token which outputs 768 dimensions, matching the standard vector column setup in Supabase.