Gemini + NotebookLM — how the model stack works
Reading Guide
This page traces the model lineage from the Project Tailwind prototype through Gemini 2.5, explains the retrieval-augmented generation loop that grounds every answer, and shows how each new Gemini tier translated into visible product improvements for the AI research assistant.
From Project Tailwind to Gemini: the lineage
The history of this AI research tool is inseparable from the history of Google's large-language-model programme. The earliest public prototype, known internally as Project Tailwind, debuted at Google I/O 2023 running on a PaLM 2 foundation. PaLM 2 was already a capable model for text tasks, but its context window — a few tens of thousands of tokens — constrained how many sources a single notebook could hold. At that stage the tool could read a handful of PDFs and answer grounded questions; audio overviews did not yet exist.
The step-change came when Google introduced Gemini 1.5 Pro in February 2024 with a headline one-million-token context window. For the research notebook, that number translated directly into scale: instead of a handful of documents, a notebook could now hold the equivalent of a short book's worth of source material and still keep every word in the model's active attention. The product relaunched under the name NotebookLM shortly after I/O 2023, and the Gemini 1.5 upgrade marked the moment it moved from interesting prototype to practical research tool.
The transition to Gemini 2.0 Flash and 2.0 Pro during late 2024 and early 2025 brought multimodal improvements — the model's ability to process audio and image data became usable in the notebook context for the first time, unlocking audio-file ingestion for Plus subscribers. Gemini 2.5 Pro, arriving in the first half of 2025, added stronger reasoning over mixed-language corpora and noticeably tighter citation attribution, reducing the rate at which answers would pull from a tangentially related passage rather than the most precisely matching one.
Long-context architecture and what it means in practice
A long-context model does not simply read more text — it reads it all at once without losing track of where it is in the material. Earlier transformer architectures degraded as documents grew longer because positional encodings became less reliable toward the end of a long sequence. Gemini 1.5 and later models use a modified attention mechanism that keeps early and late tokens nearly equally accessible throughout the generation pass.
For the research assistant this matters because a user's notebook is not one document — it may be sixty PDFs of varying length, a dozen YouTube transcripts, and a collection of pasted notes. A model with a short effective window would either truncate the least-recently-added sources or chunk them so aggressively that cross-document connections became invisible. With Gemini's one-million-token window, the indexing stage can embed every source in fine-grained chunks while still reserving enough context budget for the generation pass to see a large slice of the corpus simultaneously.
The NIST AI programme at CSRC notes that long-context retrieval systems require careful evaluation of attribution accuracy, since a model that can see more text also has more opportunities to cite a plausible-but-wrong passage. The research notebook addresses this through explicit citation pinning: the interface highlights the exact sentence the model drew from, not merely the document title.
The RAG loop: index, retrieve, generate
Under the surface, the research assistant runs a retrieval-augmented generation (RAG) pipeline with three distinct passes.
The indexing pass runs once when you add or update a source. Each document is split into overlapping chunks, each chunk is converted to a dense vector embedding using a Gemini embedding model, and those vectors are stored in a per-notebook index inside Google's infrastructure. The index never leaves the account boundary associated with your Google login.
The retrieval pass fires each time you submit a prompt. The prompt itself is converted to an embedding, the index is searched for the chunks whose vectors are closest to the query embedding, and the top-scoring passages are assembled into a ranked context window. The ranking takes into account semantic similarity, source recency, and — for multi-turn conversations — the thread of the current dialogue.
The generation pass takes the retrieved context plus your prompt and passes both to Gemini, which produces a grounded response. As the model generates each sentence it resolves the citation — identifying which retrieved chunk the sentence draws from and tagging it with a numbered footnote. Users can click any footnote to jump directly to the highlighted passage in the source pane.
This three-stage architecture is why answers from the research notebook are qualitatively different from answers you would get from a general-purpose chatbot fed the same prompt. The chatbot may have memorised some of your topic during training; the notebook is specifically not allowed to use out-of-notebook knowledge to fill gaps. If the answer cannot be grounded in a retrieved passage, the assistant says so.
How citations resolve
Citation resolution is the step that separates a well-built research assistant from a tool that merely looks like one. When Gemini generates a sentence in the research notebook's generation pass, it simultaneously produces a pointer to the chunk that justified the sentence. The interface maps that pointer back to the original source document and highlights the span of text — often a single sentence or at most a paragraph — that was the evidentiary basis.
For sources with clear structure (PDFs with a table of contents, Google Docs with headings) the citation goes to a named section. For less structured sources such as pasted text or YouTube transcripts, the citation resolves to a character offset displayed as a highlighted region. This behaviour is what makes the tool usable in professional contexts where vague attribution is not acceptable.
Model upgrade table
| Year | Gemini tier | Context window | NotebookLM feature unlocked |
|---|---|---|---|
| 2023 | PaLM 2 (Project Tailwind) | ~32 k tokens | Grounded Q&A, early source summaries |
| 2024 Q1 | Gemini 1.0 Pro | ~128 k tokens | Expanded source limits, improved multilingual chat |
| 2024 Q2 | Gemini 1.5 Pro | 1 M tokens | Audio overviews, 50-source free-tier notebooks |
| 2024 Q4 – 2025 Q1 | Gemini 2.0 Flash / Pro | 1 M+ tokens | Audio-file ingestion (Plus), image-in-source support |
| 2025 Q2 | Gemini 2.5 Pro | 2 M tokens | Tighter citation attribution, stronger cross-language reasoning |
Gemini and NotebookLM — questions people ask
Common questions about the model underpinning the research assistant, answered with as much technical detail as is publicly known.
Which Gemini model does the research assistant use today?
The tool runs on whichever Gemini long-context model Google designates as production-ready at a given time. As of mid-2025 that is the Gemini 2.5 Pro tier for the core generation pass. Google updates the underlying model without requiring users to reconfigure their notebooks, and the transition is generally invisible — you may notice improved citation precision or faster generation, but the interface does not announce the change.
What does the one-million-token context window actually mean for my notebook?
A token is roughly three-quarters of a word in English. One million tokens corresponds to somewhere around 750,000 words — the equivalent of six or seven average-length novels. For a research notebook, that budget covers several hundred typical academic papers. The free tier caps notebooks at around 500,000 words of source material; the paid tier raises the ceiling significantly. Within those limits, Gemini can attend to the entire corpus in a single pass rather than having to select which documents to load before answering.
What is retrieval-augmented generation and why does it matter here?
Retrieval-augmented generation (RAG) is a technique where a language model is given a set of retrieved passages alongside the user's prompt, rather than being expected to generate from memorised training knowledge alone. In the research assistant, RAG means every answer is built from text the model retrieved from your uploaded documents in that specific query session. The model is constrained to that retrieved context, which is why answers carry citations and why the tool will acknowledge when a question cannot be answered from the available sources.
Does the Gemini upgrade affect audio overview quality?
Yes, substantially. Audio overviews are generated by Gemini as a structured dialogue script, which is then passed to a text-to-speech layer. Gemini 2.0 and 2.5 improved the narrative flow of these scripts — transitions between topics became smoother, the two-host dynamic felt more natural, and the model became better at identifying which passages to illustrate with a concrete example versus which to summarise at a higher level. The voice synthesis layer itself also improved across 2024 and 2025.
Can I use the research assistant with sources in languages other than English?
Yes. Gemini 1.5 and later are natively multilingual. You can upload French legal documents and ask questions in English; the retrieval pass will match semantically across languages and the generation pass will produce a correctly cited answer in whichever language you queried in. Audio overviews added non-English host voices — Japanese, Spanish, Portuguese, German, French, Italian, Korean, Hindi — during 2025, so the spoken output can also match the source language of your corpus.
See the model in action
The cleanest way to understand what Gemini's long-context window changes is to load a large corpus and watch citations resolve. Start with a notebook and a stack of sources.
Walk through the first-notebook flowFurther reading on the AI research notebook
The NotebookLM features page catalogs every capability the tool currently offers. If you are new to the product, the how-to-use guide walks through setting up a notebook step by step, while the longer NotebookLM guide covers advanced workflows. The audio overviews deep-dive explains how Gemini's generation pass is adapted for spoken output. Anyone interested in the data handling side should visit the data and privacy page, which covers how source material is stored and whether it enters training pipelines. The pricing breakdown sets out the exact source limits for each tier.
Related context pages include the NotebookLM AI primer, the product history tracing the journey from Project Tailwind, the capabilities list, and the Google product context page. For external background on long-context model evaluation methodology, the NIST AI programme at CSRC publishes frameworks that enterprise teams use when assessing retrieval-augmented tools. The sources and uploads page explains the input formats the indexing pass can handle. Finally, the NotebookLM review benchmarks citation accuracy across several corpus types.