Pipeline Overview

When a bookmark is enqueued for enrichment, it passes through a multi-stage pipeline:

Stage 1: Content Fetching

The domain router selects the best fetching strategy based on the URL’s domain:
PriorityStrategyHandlesMethod
1Instagram Fetcherinstagram.comPlatform-specific (limited due to restrictions)
2oEmbed FetcherYouTube, TikTok, Vimeo, XoEmbed API for rich metadata
3MetascraperRegular websitesHTML meta tags (Open Graph, Twitter Cards)
4FallbackEverything elseRegex-based extraction from raw HTML
Each fetcher produces a PageContext:
{
  "normalizedUrl": "https://youtube.com/watch?v=dQw4w9WgXcQ",
  "domain": "youtube.com",
  "title": "Rick Astley - Never Gonna Give You Up",
  "description": "The official video for...",
  "thumbnailUrl": "https://i.ytimg.com/vi/.../maxresdefault.jpg",
  "excerpt": "...",
  "authorName": "Rick Astley",
  "authorHandle": "@RickAstleyYT",
  "bodyText": "...",
  "mediaDurationSeconds": 213,
  "providerMetadata": { "provider": "youtube" }
}

Stage 2: AI Analysis

The page context is sent to an AI model for enrichment: Primary: Gemini AI (Vertex AI)
  • Model: gemini-2.5-flash
  • Temperature: 0.2 (factual, low creativity)
  • Max output: 256 tokens
  • System prompt: “Enrich saved bookmarks for a personal library. Return compact, factual metadata only.”
Output:
{
  "summary": "Rick Astley's iconic 1987 music video...",
  "saveWhy": "Classic music video reference",
  "tags": ["music", "video", "80s", "pop"]
}
Fallback: Heuristic Enrichment If AI is unavailable (rate limits, downtime), a rule-based fallback runs:
FieldHeuristic
summaryTruncated description (140 chars) or title
saveWhyKeyword matching (e.g., “recipe” → “Cooking inspiration”)
tagsExtracted from domain + title + description words

Stage 3: Write Results

On completion, the worker:
  1. Updates the bookmark — sets enrichmentStatus: completed, writes summary/tags/saveWhy, sets enrichedAt
  2. Updates the job — marks status: completed
  3. Writes to enrichment cache — keyed by hash(normalizedUrl) for global dedup

Global Cache

The enrichment cache is shared across all users:
  • Key: Hash of the normalized URL
  • Value: Full enrichment result (title, summary, tags, etc.)
  • Effect: Second save of the same URL skips the entire pipeline — result is applied instantly from cache
This dramatically speeds up enrichment for popular URLs.

Retry Policy

AttemptBackoffAction
1st failure1 minuteRetry if error is retriable
2nd failure5 minutesRetry if error is retriable
3rd failurePermanent failure
Retriable errors: AbortError, connect-failed, dns-failed, enrichment-failed, fetch-timeout, http-5xx, provider-rate-limited, stale-job-timeout Non-retriable errors (immediate fail): blocked-host, http-4xx, invalid-body, quota-exhausted

Job Status Lifecycle

Quota Enforcement

Quotas are checked at enqueue time (not during processing):
CheckFree LimitError
Monthly enrichments200quota-exhausted (429)
Per-minute rate5rate-limited (429)
Pro usersUnlimited
Quota counters are stored per-user in the database and reset monthly (UTC).