NotebookLM sources & uploads

Opening Notes

The tool accepts PDFs, Google Docs, Slides, web URLs, YouTube links, plain text, and audio files. Free accounts hold up to fifty sources per notebook; Plus accounts hold up to three hundred. Sources are indexed in the order they are added, but reordering and hiding are both available.

A notebook is only as useful as the sources inside it. The tool's underlying design principle is that every answer, note, and audio overview must be traceable to a specific passage in the material you uploaded — which means the quality and range of your sources determines the quality and range of everything the tool can produce. This page covers every supported format, the limits that apply to each tier, and the practical details of how the indexing, reordering, and deduplification behaviour works.

One distinction worth making at the outset: the tool does not download or permanently store your sources in the way a file-hosting service would. It indexes them — chunking the text, computing embeddings, and storing those representations so the retrieval system can find the relevant passages quickly. The original files remain where they were; the index is what lives inside the notebook. That is why a YouTube video can be a source: the tool indexes the transcript, not a video file.

Supported source formats

PDF

PDF is the most commonly used source format. The tool ingests both text-layer PDFs and, with OCR processing, scanned documents where the text is embedded in images. Text-layer PDFs index faster and produce more reliable citation links. Scanned PDFs may have slightly lower citation accuracy depending on scan quality. Individual PDFs are capped at 200 MB and 500,000 words. For research documents in a single PDF that exceed this limit, splitting the file at chapter or section boundaries before upload usually works well.

Google Docs and Google Slides

Google Workspace documents connect directly from Drive without a file download step. The tool reads the current saved state of the document at the time you add it. It does not track subsequent edits — if the document changes after indexing, you need to remove the source and re-add it to get the updated version. Slides are indexed slide-by-slide; speaker notes are included in the index if the file contains them.

Web URLs

Paste a web URL and the tool fetches the readable text content of that page. It skips navigation elements, footers, and sidebars, focusing on the main article or document body. Dynamic pages that require JavaScript rendering may index partially or not at all; static pages and most news articles index cleanly. The content is fetched at add time; the tool does not re-fetch the page unless you remove and re-add the source.

YouTube videos

YouTube links are indexed via transcript. The tool retrieves the video's auto-generated or manual captions and indexes those as the source text. Citations in chat or notes from a YouTube source include a timestamp that links back to the relevant moment in the video. Videos without transcripts — those with only auto-caption support disabled — cannot be indexed. Short clips and long lectures both work; most users find that lectures under ninety minutes index in under a minute.

Plain text and Markdown

Plain .txt and .md files upload directly. Markdown formatting is interpreted so heading levels, bullet lists, and emphasis carry through to the index structure. This format is useful for research notes written in a local editor, exported blog posts, or documentation files from a code repository.

Pasted text

You can paste raw text directly into an "add source" dialog without uploading a file. The pasted content is treated as a source with a generic title you can rename. This is useful for short documents, email threads, or content from systems that do not produce downloadable files.

Audio files (rolling out)

Audio ingestion is live for Plus subscribers and in partial rollout on the free tier as of 2025. The tool transcribes the audio and indexes the transcript. Supported formats include MP3, MP4 audio tracks, and WAV. Podcasts, recorded interviews, and lecture recordings are the most common use cases. Transcription quality affects citation accuracy — clearly recorded speech indexes well; highly accented or noisy audio may produce lower-quality transcripts.

Source limits by tier

The free tier caps at fifty sources per notebook and approximately 500,000 combined words across those sources. Plus raises the source cap to three hundred per notebook and the word ceiling to several million words — large enough for a full book manuscript alongside its reference corpus. There is no published limit on the number of notebooks per account on either tier.

Indexing order and re-indexing

Sources are indexed in the order they are added. Indexing is typically fast — a standard PDF of twenty pages indexes in under thirty seconds. Very large PDFs or audio files may take a minute or two. The tool shows a spinner next to each source while indexing is in progress; chat and generation requests submitted before indexing completes will only search the already-indexed sources.

If you need to update a source — for instance, a Google Doc that has been edited or a URL whose content has changed — remove the old source and re-add it. The tool will re-index from scratch. Saved notes that cited the removed source keep their text; the citation links resolve again once the source is re-added and re-indexed.

Deduplication

The tool does not currently perform automatic deduplication. If you add the same PDF twice under two different file names, it appears as two separate sources and each contributes to the source count. Duplicate sources do not harm the quality of answers — they may slightly amplify the weight of the duplicated content in retrieval — but they use up source slots, so it is worth avoiding duplicates in large notebooks that approach the free-tier cap.

Source format reference table

Source type Format Free cap Plus cap
PDF.pdf (text-layer or scanned)200 MB / 500k words per fileSame per-file limits
Google DocsDrive linkIncluded in 50-source limitIncluded in 300-source limit
Google SlidesDrive linkIncluded in 50-source limitIncluded in 300-source limit
Web URLAny static URLIncluded in 50-source limitIncluded in 300-source limit
YouTubeyoutube.com link (transcript required)Included in 50-source limitIncluded in 300-source limit
Plain text / Markdown.txt, .mdIncluded in 50-source limitIncluded in 300-source limit
Pasted textIn-app pasteIncluded in 50-source limitIncluded in 300-source limit
Audio fileMP3, WAV, MP4 audioPartial rolloutLive for all Plus accounts

The NIST AI guidance portal recommends documenting data provenance for AI-assisted workflows — keeping a record of which sources were indexed, when, and from which version. The source panel in the tool serves that function implicitly; exporting the source list alongside a note export provides a provenance record for any research artefact produced from the notebook.

Sources and uploads questions

The most common questions about what you can add to a notebook and how it gets processed.

What file types does the tool accept?

PDF, Google Docs, Google Slides, plain text, Markdown, web URLs, YouTube links with transcripts, pasted raw text, and audio files (MP3, WAV, MP4 audio — live for Plus, rolling out on free). Spreadsheet support is also in partial rollout. The table above has the current full list with tier availability.

How many sources can a notebook hold?

Fifty on the free tier, three hundred on NotebookLM Plus. The limit applies per notebook; there is no published limit on how many notebooks an account can hold. If you hit the free-tier cap, hiding sources temporarily does not free up a slot — only deleting a source does.

Can I upload sources in languages other than English?

Yes. The indexing step is language-agnostic. French, German, Spanish, Japanese, Chinese, and other languages index correctly. You can query in a different language from the source — upload French papers, ask questions in English, and citations will resolve to the original French passage.

What happens if I update a source document after indexing?

The index is a snapshot taken at add time. Changes to a Google Doc or a live URL are not reflected automatically. Remove the source and re-add it to get the updated version indexed. Saved notes keep their text when a source is removed; citation links resolve again once the updated source is re-indexed.

Does the tool deduplicate identical sources?

Not automatically. Adding the same file twice creates two separate source entries and uses two source slots. The duplicate does not break anything, but it is wasteful near the free-tier cap. Remove the duplicate and check the source list if you notice an answer unusually weighted toward a single document.

Start building your source corpus

Add a mix of PDFs, URLs, and YouTube links to a new notebook and watch the tool cross-reference them in a single chat query. The first-notebook walkthrough shows you the full flow.

Read the first-notebook walkthrough

Sources in the broader research workflow

The source layer is the foundation of everything the tool produces. The features overview explains how sources feed into chat, notes, and audio overviews. The chat mode page covers how the retrieval step selects relevant passages from the source index for each query, and the notes studio page describes how citation links in generated notes trace back to specific source passages. For anyone working with large corpora, the capabilities deep dive explains the indexing architecture in more detail.

Teams comparing tiers should check the pricing page for the current source caps and the Plus tier page for the audio and analytics features that become available at higher source volumes. The full guide has a dedicated section on corpus management — how to organise sources within a notebook for large research projects — and the data and privacy page explains what happens to source content at the infrastructure level.