technical-blog · retrieval-augmented generation
hello this is just a vibe coded template. will edit soon!
large language models are great at pattern matching, but they hallucinate when they have to invent facts. retrieval-augmented generation (rag) fixes this by letting the model look things up in an external knowledge base before answering.
in this post, we'll:
- build a tiny rag demo that runs in your browser,
- walk through the core architecture (indexing → retrieval → generation), and
- sketch a python version you can scale up in a colab notebook.
rag in one picture
conceptually, rag is just:
user question
│
▼
┌───────────┐ ┌────────────────────┐
│ embedder │ │ vector database │
└───────────┘ └────────────────────┘
│ retrieve top-k chunks ▲
└───────────────┬───────────┘
│
▼
┌────────────┐
│ llm │
└────────────┘
│
▼
grounded answer ✔
browser demo: keyword-based "rag"
a full rag system needs embeddings + a vector database. for a quick interactive demo, we can approximate retrieval with a simple keyword overlap score over a tiny "corpus" of notes.
try it
ask a question about rag, and i'll show you which notes get "retrieved" and a toy answer.
retrieved notes
answer
ask a question to see an answer here.
python sketch: real rag with embeddings
below is a minimal python sketch you can adapt in a colab notebook (for example, the one linked at the top of this post).
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
# 1. load & split documents
raw_docs = [
"rag separates retrieval from generation...",
"use a vector database to store document embeddings...",
# add your own notes or PDFs here
]
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64,
)
docs = splitter.create_documents(raw_docs)
# 2. build the vector index
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
db = FAISS.from_documents(docs, embeddings)
# 3. rag chain (retrieve + generate)
llm = ChatOpenAI(model="gpt-4.1-mini")
def rag_answer(question: str):
retrieved_docs = db.similarity_search(question, k=4)
context = "\n\n".join([d.page_content for d in retrieved_docs])
prompt = f"""
you are a helpful tutor. use ONLY the context below to answer.
if something isn't in the context, say you don't know.
context:
{context}
question: {question}
"""
response = llm.invoke(prompt)
return response.content, retrieved_docs
where to go next
- swap the toy browser demo with real embeddings via an api.
- index your own notes, pdfs, or blog posts.
- add evaluation: log retrieved chunks + answers and inspect failure modes.