technical-blog · retrieval-augmented generation

hello this is just a vibe coded template. will edit soon!

large language models are great at pattern matching, but they hallucinate when they have to invent facts. retrieval-augmented generation (rag) fixes this by letting the model look things up in an external knowledge base before answering.

in this post, we'll:

  • build a tiny rag demo that runs in your browser,
  • walk through the core architecture (indexing → retrieval → generation), and
  • sketch a python version you can scale up in a colab notebook.

rag in one picture

conceptually, rag is just:

user question
     │
     ▼
 ┌───────────┐      ┌────────────────────┐
 │ embedder  │      │  vector database   │
 └───────────┘      └────────────────────┘
        │   retrieve top-k chunks   ▲
        └───────────────┬───────────┘
                        │
                        ▼
                 ┌────────────┐
                 │   llm      │
                 └────────────┘
                        │
                        ▼
                grounded answer ✔

browser demo: keyword-based "rag"

a full rag system needs embeddings + a vector database. for a quick interactive demo, we can approximate retrieval with a simple keyword overlap score over a tiny "corpus" of notes.

try it

ask a question about rag, and i'll show you which notes get "retrieved" and a toy answer.

retrieved notes
    answer

    ask a question to see an answer here.

    python sketch: real rag with embeddings

    below is a minimal python sketch you can adapt in a colab notebook (for example, the one linked at the top of this post).

    from langchain_community.vectorstores import FAISS
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_openai import OpenAIEmbeddings, ChatOpenAI
    
    # 1. load & split documents
    raw_docs = [
        "rag separates retrieval from generation...",
        "use a vector database to store document embeddings...",
        # add your own notes or PDFs here
    ]
    
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=512,
        chunk_overlap=64,
    )
    docs = splitter.create_documents(raw_docs)
    
    # 2. build the vector index
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    db = FAISS.from_documents(docs, embeddings)
    
    # 3. rag chain (retrieve + generate)
    llm = ChatOpenAI(model="gpt-4.1-mini")
    
    def rag_answer(question: str):
        retrieved_docs = db.similarity_search(question, k=4)
        context = "\n\n".join([d.page_content for d in retrieved_docs])
    
        prompt = f"""
        you are a helpful tutor. use ONLY the context below to answer.
        if something isn't in the context, say you don't know.
    
        context:
        {context}
    
        question: {question}
        """
    
        response = llm.invoke(prompt)
        return response.content, retrieved_docs

    where to go next

    • swap the toy browser demo with real embeddings via an api.
    • index your own notes, pdfs, or blog posts.
    • add evaluation: log retrieved chunks + answers and inspect failure modes.