https://huggingface.co/learn/cookbook/en/rag_evaluation https://python.langchain.com/v0.1/docs/integrations/chat/google_vertex_ai_palm/ https://cloud.google.com/vertex-ai/docs/tutorials/jupyter-notebooks#vertex-ai-workbench

start with:

  1. what is rag?
    • how is it different than fie-tuning?
    • ways of evaluating rag
      • humans
      • static scores
        • precision
        • recall
        • bleu, rouge
      • use another llm
        • since broad capabilities
        • varied human preferences
  2. lots of things to tweak in the RAG pipeline - but there should be a proper way to evaluate its impact

  3. dataset used is huggingface documentation
    • text + source
  4. Build RAG
    • split documents using recursive split
    • embed documents
      • model used: thenlper/gte-small
    • retrieve relevant chunks - like a search engine
      • FAISS index stores embeddings for quick retrievals of similar chunks
    • use retrieved contexts + query to formulate answer
      • rerank retrieved context
        • retrieve 30 docs
        • rerank and output best 7
        • model used: colbert-ir/colbertv2.0
        • reevaluates and reorders the documents retrieved by the initial search based on their relevance to the query
      • answer using llm
        • LLM B (zephyr-7b-beta)
        • a fine-tuned version of (mistralai/Mistral-7B-v0.1) trained to act as helpful assistants.

basic rag sentence-window rag auto retrieval rag