Skip to main contentExtensions allow the Collector to ingest content from sources other than local files, such as video platforms or code repositories.
YouTube Extension
Module: app/extensions/youtube.py
Fetches transcripts from YouTube videos. It uses a robust fallback strategy:
- Primary:
youtube_transcript_api (official-like API).
- Secondary:
yt-dlp (extracts subtitles/captions).
- Fallback:
playwright (browser-based automation).
Cookies Configuration
For reliable extraction (especially for age-restricted or member-only content), you can provide a Netscape-format cookies file.
- Env Variable:
YT_COOKIES_FILE pointing to the path of cookies.txt.
- Default: Looks for
app/cookies/youtube_cookies.txt.
GitHub Extension
Module: app/extensions/github.py
Clones and processes repositories.
- Endpoint:
/ext/github/process
- Auth: Requires
GITHUB_TOKEN in environment variables.
GitLab Extension
Module: app/extensions/gitlab.py
Similar to GitHub, but for GitLab repositories.
- Auth: Requires
GITLAB_TOKEN.