n8n Chatmate - RAG Document QA System 🤖📚

Project Status

Status: ✅ Completed & Functional
Type: Academic Assignment + Learning Project
Timeline: November 2025
Course: GenAI (WiSe25)

📋 Project Overview

Chatmate is an AI-powered document question-answering system that uses Retrieval-Augmented Generation (RAG) to answer questions about the research paper “Attention Is All You Need” (the original Transformer paper). Built entirely using n8n workflow automation, it demonstrates the practical application of vector embeddings, semantic search, and LLM-based answer generation.

Core Concept

Instead of relying purely on an LLM’s pre-trained knowledge (which can hallucinate), this system:

  1. Ingests the PDF document into a vector database
  2. Retrieves semantically relevant chunks based on user questions
  3. Generates accurate answers grounded in the actual document content

🎯 Project Goals

  • Build a working RAG pipeline using n8n’s visual workflow editor
  • Implement vector-based semantic search with Supabase
  • Integrate LLM (Groq) for natural language answer generation
  • Maintain conversation context across multiple questions
  • Ensure answers are grounded in source material (no hallucinations)
  • Document the learning process and challenges faced

🛠️ Technology Stack

ComponentTechnologyPurpose
Workflow Orchestrationn8nVisual workflow automation platform
Vector DatabaseSupabase (PostgreSQL + pgvector)Stores document embeddings for similarity search
Embedding ModelHuggingFace all-MiniLM-L6-v2Converts text to 384-dim vectors
LLMGroq (llama-3.1-70b-versatile)Natural language answer generation
Document Processingn8n Extract From File nodePDF chunking and text extraction
MemoryWindow Buffer MemoryMaintains conversation context

🏗️ Architecture

Two-Flow Design

graph TB
    subgraph "Flow 1: Document Ingestion (One-Time)"
        A[PDF File] --> B[Read Binary File]
        B --> C[Extract From File]
        C --> D[Chunk Text]
        D --> E[Generate Embeddings]
        E --> F[Supabase Vector Store]
    end
    
    subgraph "Flow 2: Conversational Retrieval (Per Query)"
        G[User Question] --> H[Chat Trigger]
        H --> I[AI Agent]
        I --> J[Generate Query Embedding]
        J --> K[Supabase Similarity Search]
        K --> L[Retrieve Top 4 Chunks]
        L --> M[Groq LLM]
        M --> N[Grounded Answer]
    end
    
    F -.->|Shared Database| K

Key Design Decisions

  1. Separate Ingestion & Retrieval Flows: One-time setup vs. per-query execution
  2. Consistent Embedding Model: Same model for ingestion and retrieval ensures vector compatibility
  3. Chunk Strategy: 1000 chars with 200 char overlap balances context preservation and token efficiency
  4. Top K = 4: Retrieves enough context without overwhelming the LLM’s context window
  5. Low Temperature (0.3): Prioritizes factual accuracy over creativity

📊 Performance Metrics

  • Document Size: 71 text chunks from the Transformer paper
  • Average Response Time: 2-3 seconds per query
  • Retrieval Accuracy: High relevance for technical questions
  • Token Efficiency: 500-1000 tokens per query
  • Embedding Dimension: 384 (lightweight and fast)

🔧 Key Technical Challenges Solved

Critical Learnings

See Challenges and Solutions for detailed problem-solving narratives

  1. Sub-node Architecture: Modern n8n requires explicit Document and Embedding sub-nodes
  2. Custom PostgreSQL Function: Created match_documents() for vector similarity search
  3. Model Selection: Switched to well-supported HuggingFace model for reliability
  4. Vector Dimension Alignment: Ensured database schema matches embedding model output
  5. Flow Independence: Understood database-mediated communication pattern

📁 Project Files

  • Assignment_Report.md - Detailed technical report with node configurations
  • Assignment_Process_Log.md - Development process and iteration log
  • n8n_workflow.json - Exported workflow (importable into n8n)
  • demo_video.mp4 - One-minute demonstration

💡 Key Learnings

Technical Insights

  1. RAG Architecture: Understanding the separation of ingestion and retrieval pipelines
  2. Vector Databases: Practical experience with pgvector and similarity search
  3. Embedding Consistency: Critical importance of using identical models across flows
  4. n8n Sub-nodes: Modular architecture for vector store operations
  5. PostgreSQL Functions: Writing custom functions for advanced vector operations

Broader Concepts

  • Grounded Generation: How RAG prevents hallucinations by anchoring LLM responses
  • Semantic Search: Vector embeddings capture meaning beyond keyword matching
  • Workflow Automation: Visual programming for AI/ML pipelines
  • LLM Integration: Practical API usage with Groq for cost-effective inference

🚀 Future Enhancements

Potential Improvements

  • Multi-document support (upload multiple papers)
  • Citation extraction (show which chunks were used)
  • Hybrid search (combine vector + keyword search)
  • Advanced chunking strategies (semantic splitting)
  • Query rewriting for better retrieval
  • Web UI deployment (public chat interface)
  • Metadata filtering (search by paper section)
  • Evaluation metrics (retrieval precision/recall)

📚 References


Reflection

This project transformed a theoretical understanding of RAG into practical implementation. The visual workflow approach of n8n made complex AI pipelines accessible, while the challenges encountered provided deep insights into vector databases, embedding models, and LLM integration. A perfect blend of learning and building. 🌿→🌺