Skip to main content
Extensions allow the Collector to ingest content from sources other than local files, such as video platforms or code repositories.

YouTube Extension

Module: app/extensions/youtube.py Fetches transcripts from YouTube videos. It uses a robust fallback strategy:
  1. Primary: youtube_transcript_api (official-like API).
  2. Secondary: yt-dlp (extracts subtitles/captions).
  3. Fallback: playwright (browser-based automation).

Cookies Configuration

For reliable extraction (especially for age-restricted or member-only content), you can provide a Netscape-format cookies file.
  • Env Variable: YT_COOKIES_FILE pointing to the path of cookies.txt.
  • Default: Looks for app/cookies/youtube_cookies.txt.

GitHub Extension

Module: app/extensions/github.py Clones and processes repositories.
  • Endpoint: /ext/github/process
  • Auth: Requires GITHUB_TOKEN in environment variables.

GitLab Extension

Module: app/extensions/gitlab.py Similar to GitHub, but for GitLab repositories.
  • Auth: Requires GITLAB_TOKEN.