Which Gemini model does NotebookLM use today?

NotebookLM runs on whichever Gemini long-context model Google designates as production-ready. The assistant has moved from PaLM 2 at the Project Tailwind stage, through Gemini 1.5 Pro, to the Gemini 2.0 and 2.5 families during 2025. Google updates the underlying model without requiring users to reconfigure their notebooks.

What is the context window size in NotebookLM?

The effective context window available to users scales with the active Gemini tier. Gemini 1.5 Pro introduced a one-million-token window, which NotebookLM exposes as a large source limit per notebook. Gemini 2.0 and 2.5 extended this further. The free-tier notebook caps translate to roughly 500,000 words of combined source text.

How does the RAG loop work inside NotebookLM?

NotebookLM runs a three-stage pipeline: an indexing pass that chunks and embeds each source, a retrieval pass that ranks chunks by relevance to the current prompt, and a generation pass where Gemini composes a grounded answer and attaches inline citations. The retrieval stage uses dense-vector similarity search so the model reads only the most relevant passages, not the entire corpus on every query.

Does a newer Gemini model mean better NotebookLM answers?

Generally yes. Each Gemini generation has improved instruction-following, factual grounding, and multilingual capability. The jump from PaLM 2 to Gemini 1.5 Pro most visibly improved cross-document reasoning and citation resolution. Gemini 2.x brought stronger audio synthesis for the spoken overview feature and tighter attribution in multi-source notebooks.

Can I choose which Gemini model NotebookLM uses?

No. Model selection is managed by Google and is not exposed as a user-facing setting. Notebooks automatically benefit from model upgrades. Workspace Gemini add-on subscribers may receive access to newer tiers earlier than free-account users.

Gemini + NotebookLM — how the model stack works

Reading Guide

This page traces the model lineage from the Project Tailwind prototype through Gemini 2.5, explains the retrieval-augmented generation loop that grounds every answer, and shows how each new Gemini tier translated into visible product improvements for the AI research assistant.

From Project Tailwind to Gemini: the lineage

The history of this AI research tool is inseparable from the history of Google's large-language-model programme. The earliest public prototype, known internally as Project Tailwind, debuted at Google I/O 2023 running on a PaLM 2 foundation. PaLM 2 was already a capable model for text tasks, but its context window — a few tens of thousands of tokens — constrained how many sources a single notebook could hold. At that stage the tool could read a handful of PDFs and answer grounded questions; audio overviews did not yet exist.

The step-change came when Google introduced Gemini 1.5 Pro in February 2024 with a headline one-million-token context window. For the research notebook, that number translated directly into scale: instead of a handful of documents, a notebook could now hold the equivalent of a short book's worth of source material and still keep every word in the model's active attention. The product relaunched under the name NotebookLM shortly after I/O 2023, and the Gemini 1.5 upgrade marked the moment it moved from interesting prototype to practical research tool.

The transition to Gemini 2.0 Flash and 2.0 Pro during late 2024 and early 2025 brought multimodal improvements — the model's ability to process audio and image data became usable in the notebook context for the first time, unlocking audio-file ingestion for Plus subscribers. Gemini 2.5 Pro, arriving in the first half of 2025, added stronger reasoning over mixed-language corpora and noticeably tighter citation attribution, reducing the rate at which answers would pull from a tangentially related passage rather than the most precisely matching one.

Long-context architecture and what it means in practice

A long-context model does not simply read more text — it reads it all at once without losing track of where it is in the material. Earlier transformer architectures degraded as documents grew longer because positional encodings became less reliable toward the end of a long sequence. Gemini 1.5 and later models use a modified attention mechanism that keeps early and late tokens nearly equally accessible throughout the generation pass.

For the research assistant this matters because a user's notebook is not one document — it may be sixty PDFs of varying length, a dozen YouTube transcripts, and a collection of pasted notes. A model with a short effective window would either truncate the least-recently-added sources or chunk them so aggressively that cross-document connections became invisible. With Gemini's one-million-token window, the indexing stage can embed every source in fine-grained chunks while still reserving enough context budget for the generation pass to see a large slice of the corpus simultaneously.

The NIST AI programme at CSRC notes that long-context retrieval systems require careful evaluation of attribution accuracy, since a model that can see more text also has more opportunities to cite a plausible-but-wrong passage. The research notebook addresses this through explicit citation pinning: the interface highlights the exact sentence the model drew from, not merely the document title.

The RAG loop: index, retrieve, generate

Under the surface, the research assistant runs a retrieval-augmented generation (RAG) pipeline with three distinct passes.

The indexing pass runs once when you add or update a source. Each document is split into overlapping chunks, each chunk is converted to a dense vector embedding using a Gemini embedding model, and those vectors are stored in a per-notebook index inside Google's infrastructure. The index never leaves the account boundary associated with your Google login.

The retrieval pass fires each time you submit a prompt. The prompt itself is converted to an embedding, the index is searched for the chunks whose vectors are closest to the query embedding, and the top-scoring passages are assembled into a ranked context window. The ranking takes into account semantic similarity, source recency, and — for multi-turn conversations — the thread of the current dialogue.

The generation pass takes the retrieved context plus your prompt and passes both to Gemini, which produces a grounded response. As the model generates each sentence it resolves the citation — identifying which retrieved chunk the sentence draws from and tagging it with a numbered footnote. Users can click any footnote to jump directly to the highlighted passage in the source pane.

This three-stage architecture is why answers from the research notebook are qualitatively different from answers you would get from a general-purpose chatbot fed the same prompt. The chatbot may have memorised some of your topic during training; the notebook is specifically not allowed to use out-of-notebook knowledge to fill gaps. If the answer cannot be grounded in a retrieved passage, the assistant says so.

How citations resolve

Citation resolution is the step that separates a well-built research assistant from a tool that merely looks like one. When Gemini generates a sentence in the research notebook's generation pass, it simultaneously produces a pointer to the chunk that justified the sentence. The interface maps that pointer back to the original source document and highlights the span of text — often a single sentence or at most a paragraph — that was the evidentiary basis.

For sources with clear structure (PDFs with a table of contents, Google Docs with headings) the citation goes to a named section. For less structured sources such as pasted text or YouTube transcripts, the citation resolves to a character offset displayed as a highlighted region. This behaviour is what makes the tool usable in professional contexts where vague attribution is not acceptable.

Model upgrade table

Year	Gemini tier	Context window	NotebookLM feature unlocked
2023	PaLM 2 (Project Tailwind)	~32 k tokens	Grounded Q&A, early source summaries
2024 Q1	Gemini 1.0 Pro	~128 k tokens	Expanded source limits, improved multilingual chat
2024 Q2	Gemini 1.5 Pro	1 M tokens	Audio overviews, 50-source free-tier notebooks
2024 Q4 – 2025 Q1	Gemini 2.0 Flash / Pro	1 M+ tokens	Audio-file ingestion (Plus), image-in-source support
2025 Q2	Gemini 2.5 Pro	2 M tokens	Tighter citation attribution, stronger cross-language reasoning

Gemini and NotebookLM — questions people ask

Common questions about the model underpinning the research assistant, answered with as much technical detail as is publicly known.

Which Gemini model does the research assistant use today?

The tool runs on whichever Gemini long-context model Google designates as production-ready at a given time. As of mid-2025 that is the Gemini 2.5 Pro tier for the core generation pass. Google updates the underlying model without requiring users to reconfigure their notebooks, and the transition is generally invisible — you may notice improved citation precision or faster generation, but the interface does not announce the change.

What does the one-million-token context window actually mean for my notebook?

A token is roughly three-quarters of a word in English. One million tokens corresponds to somewhere around 750,000 words — the equivalent of six or seven average-length novels. For a research notebook, that budget covers several hundred typical academic papers. The free tier caps notebooks at around 500,000 words of source material; the paid tier raises the ceiling significantly. Within those limits, Gemini can attend to the entire corpus in a single pass rather than having to select which documents to load before answering.

What is retrieval-augmented generation and why does it matter here?

Retrieval-augmented generation (RAG) is a technique where a language model is given a set of retrieved passages alongside the user's prompt, rather than being expected to generate from memorised training knowledge alone. In the research assistant, RAG means every answer is built from text the model retrieved from your uploaded documents in that specific query session. The model is constrained to that retrieved context, which is why answers carry citations and why the tool will acknowledge when a question cannot be answered from the available sources.

Does the Gemini upgrade affect audio overview quality?

Yes, substantially. Audio overviews are generated by Gemini as a structured dialogue script, which is then passed to a text-to-speech layer. Gemini 2.0 and 2.5 improved the narrative flow of these scripts — transitions between topics became smoother, the two-host dynamic felt more natural, and the model became better at identifying which passages to illustrate with a concrete example versus which to summarise at a higher level. The voice synthesis layer itself also improved across 2024 and 2025.

Can I use the research assistant with sources in languages other than English?

Yes. Gemini 1.5 and later are natively multilingual. You can upload French legal documents and ask questions in English; the retrieval pass will match semantically across languages and the generation pass will produce a correctly cited answer in whichever language you queried in. Audio overviews added non-English host voices — Japanese, Spanish, Portuguese, German, French, Italian, Korean, Hindi — during 2025, so the spoken output can also match the source language of your corpus.

See the model in action

The cleanest way to understand what Gemini's long-context window changes is to load a large corpus and watch citations resolve. Start with a notebook and a stack of sources.

Walk through the first-notebook flow