scriptflow/docs/plans/2026-03-11-upload-extract-episodes-design.md
Song367 b49d703e3c
All checks were successful
Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 1m22s
一键转换模式优化
2026-03-11 21:53:41 +08:00

2.3 KiB

Upload And Extract Episodes Design

Context

The conversion mode currently accepts only manual text input in the left textarea. The app already has a Doubao streaming integration pattern for script generation, and the extracted content should feed back into the existing sourceText flow rather than replacing the rest of the conversion pipeline.

Decision

Adopt a client-side upload flow for four file types: Word (.docx), text (.txt), PDF (.pdf), and Markdown (.md). After upload, the app will read the file in the browser, send the raw text to a new Doubao extraction call using doubao-seed-1-6-flash-250828, and stream the model output directly into the left-side source textarea.

Behavior

  • The source input area becomes a hybrid input surface: manual typing still works, and file upload is added alongside it.
  • Upload immediately starts extraction without requiring the user to click 立即转换成剧本.
  • The extraction model is instructed to identify each episode and return the original script content 1:1 with no rewriting, normalization, cleanup, or omission.
  • The streamed extraction result overwrites sourceText progressively so the user can see the result arrive in real time.
  • Existing conversion generation stays separate. After extraction completes, the user can still click the existing conversion button to continue with the current workflow.

Parsing Strategy

  • .txt and .md: read with File.text().
  • .docx: parse in-browser with a document-text extraction library.
  • .pdf: parse in-browser with a PDF text extraction library.

UI And State

  • Add upload affordance, accepted-file hint, extraction loading state, and extraction error state in conversion mode.
  • Preserve sourceText local persistence.
  • Keep manual editing enabled after extraction.

AI Contract

The new extraction API will:

  • use Doubao only
  • stream results
  • instruct the model to output episode-separated original content only
  • avoid any transformations beyond episode boundary recognition

Risks

  • PDF text extraction quality depends on document structure.
  • Even with strict prompting, model-based extraction is probabilistic, so the prompt must strongly prohibit edits and define a deterministic output format.
  • Browser-side parsing adds dependency and bundle-size cost.