scriptflow/docs/plans/2026-03-11-upload-extract-episodes-design.md

# Upload And Extract Episodes Design

**Context**

The conversion mode currently accepts only manual text input in the left textarea. The app already has a Doubao streaming integration pattern for script generation, and the extracted content should feed back into the existing `sourceText` flow rather than replacing the rest of the conversion pipeline.

**Decision**

Adopt a client-side upload flow for four file types: Word (`.docx`), text (`.txt`), PDF (`.pdf`), and Markdown (`.md`). After upload, the app will read the file in the browser, send the raw text to a new Doubao extraction call using `doubao-seed-1-6-flash-250828`, and stream the model output directly into the left-side source textarea.

**Behavior**

- The source input area becomes a hybrid input surface: manual typing still works, and file upload is added alongside it.
- Upload immediately starts extraction without requiring the user to click `立即转换成剧本`.
- The extraction model is instructed to identify each episode and return the original script content 1:1 with no rewriting, normalization, cleanup, or omission.
- The streamed extraction result overwrites `sourceText` progressively so the user can see the result arrive in real time.
- Existing conversion generation stays separate. After extraction completes, the user can still click the existing conversion button to continue with the current workflow.

**Parsing Strategy**

- `.txt` and `.md`: read with `File.text()`.
- `.docx`: parse in-browser with a document-text extraction library.
- `.pdf`: parse in-browser with a PDF text extraction library.

**UI And State**

- Add upload affordance, accepted-file hint, extraction loading state, and extraction error state in conversion mode.
- Preserve `sourceText` local persistence.
- Keep manual editing enabled after extraction.

**AI Contract**

The new extraction API will:

- use Doubao only
- stream results
- instruct the model to output episode-separated original content only
- avoid any transformations beyond episode boundary recognition

**Risks**

- PDF text extraction quality depends on document structure.
- Even with strict prompting, model-based extraction is probabilistic, so the prompt must strongly prohibit edits and define a deterministic output format.
- Browser-side parsing adds dependency and bundle-size cost.