n8n Chatmate - RAG Document QA System 🤖📚
Project Status
Status: ✅ Completed & Functional
Type: Academic Assignment + Learning Project
Timeline: November 2025
Course: GenAI (WiSe25)
📋 Project Overview
Chatmate is an AI-powered document question-answering system that uses Retrieval-Augmented Generation (RAG) to answer questions about the research paper “Attention Is All You Need” (the original Transformer paper). Built entirely using n8n workflow automation, it demonstrates the practical application of vector embeddings, semantic search, and LLM-based answer generation.
Core Concept
Instead of relying purely on an LLM’s pre-trained knowledge (which can hallucinate), this system:
- Ingests the PDF document into a vector database
- Retrieves semantically relevant chunks based on user questions
- Generates accurate answers grounded in the actual document content
🎯 Project Goals
- Build a working RAG pipeline using n8n’s visual workflow editor
- Implement vector-based semantic search with Supabase
- Integrate LLM (Groq) for natural language answer generation
- Maintain conversation context across multiple questions
- Ensure answers are grounded in source material (no hallucinations)
- Document the learning process and challenges faced
🛠️ Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Workflow Orchestration | n8n | Visual workflow automation platform |
| Vector Database | Supabase (PostgreSQL + pgvector) | Stores document embeddings for similarity search |
| Embedding Model | HuggingFace all-MiniLM-L6-v2 | Converts text to 384-dim vectors |
| LLM | Groq (llama-3.1-70b-versatile) | Natural language answer generation |
| Document Processing | n8n Extract From File node | PDF chunking and text extraction |
| Memory | Window Buffer Memory | Maintains conversation context |
🏗️ Architecture
Two-Flow Design
graph TB subgraph "Flow 1: Document Ingestion (One-Time)" A[PDF File] --> B[Read Binary File] B --> C[Extract From File] C --> D[Chunk Text] D --> E[Generate Embeddings] E --> F[Supabase Vector Store] end subgraph "Flow 2: Conversational Retrieval (Per Query)" G[User Question] --> H[Chat Trigger] H --> I[AI Agent] I --> J[Generate Query Embedding] J --> K[Supabase Similarity Search] K --> L[Retrieve Top 4 Chunks] L --> M[Groq LLM] M --> N[Grounded Answer] end F -.->|Shared Database| K
Key Design Decisions
- Separate Ingestion & Retrieval Flows: One-time setup vs. per-query execution
- Consistent Embedding Model: Same model for ingestion and retrieval ensures vector compatibility
- Chunk Strategy: 1000 chars with 200 char overlap balances context preservation and token efficiency
- Top K = 4: Retrieves enough context without overwhelming the LLM’s context window
- Low Temperature (0.3): Prioritizes factual accuracy over creativity
📊 Performance Metrics
- Document Size: 71 text chunks from the Transformer paper
- Average Response Time: 2-3 seconds per query
- Retrieval Accuracy: High relevance for technical questions
- Token Efficiency: 500-1000 tokens per query
- Embedding Dimension: 384 (lightweight and fast)
🔧 Key Technical Challenges Solved
Critical Learnings
See Challenges and Solutions for detailed problem-solving narratives
- Sub-node Architecture: Modern n8n requires explicit Document and Embedding sub-nodes
- Custom PostgreSQL Function: Created
match_documents()for vector similarity search - Model Selection: Switched to well-supported HuggingFace model for reliability
- Vector Dimension Alignment: Ensured database schema matches embedding model output
- Flow Independence: Understood database-mediated communication pattern
📁 Project Files
- Assignment_Report.md - Detailed technical report with node configurations
- Assignment_Process_Log.md - Development process and iteration log
n8n_workflow.json- Exported workflow (importable into n8n)demo_video.mp4- One-minute demonstration
💡 Key Learnings
Technical Insights
- RAG Architecture: Understanding the separation of ingestion and retrieval pipelines
- Vector Databases: Practical experience with pgvector and similarity search
- Embedding Consistency: Critical importance of using identical models across flows
- n8n Sub-nodes: Modular architecture for vector store operations
- PostgreSQL Functions: Writing custom functions for advanced vector operations
Broader Concepts
- Grounded Generation: How RAG prevents hallucinations by anchoring LLM responses
- Semantic Search: Vector embeddings capture meaning beyond keyword matching
- Workflow Automation: Visual programming for AI/ML pipelines
- LLM Integration: Practical API usage with Groq for cost-effective inference
🚀 Future Enhancements
Potential Improvements
- Multi-document support (upload multiple papers)
- Citation extraction (show which chunks were used)
- Hybrid search (combine vector + keyword search)
- Advanced chunking strategies (semantic splitting)
- Query rewriting for better retrieval
- Web UI deployment (public chat interface)
- Metadata filtering (search by paper section)
- Evaluation metrics (retrieval precision/recall)
🔗 Related Notes
- Map of Content - Main navigation hub
- See also: n8n automation workflows in n8n
- Related concepts: ai, automation
📚 References
- n8n Documentation
- Supabase Vector Documentation
- HuggingFace Inference API
- Groq API
- Original Paper: “Attention Is All You Need” (Vaswani et al., 2017)
Reflection
This project transformed a theoretical understanding of RAG into practical implementation. The visual workflow approach of n8n made complex AI pipelines accessible, while the challenges encountered provided deep insights into vector databases, embedding models, and LLM integration. A perfect blend of learning and building. 🌿→🌺