Replace source_host column with proper FK to domains table using
composite key (domain_host, domain_tld). This enables JOIN queries
instead of string concatenation for domain lookups.
Changes:
- Update Feed struct: SourceHost/TLD → DomainHost/DomainTLD
- Update all SQL queries to use domain_host/domain_tld columns
- Add column aliases (as source_host) for API backwards compatibility
- Update trigram index from source_host to domain_host
- Add getDomainHost() helper for extracting host from domain
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When searching for patterns like "npr.org", the search now also matches
the exact domain (host=npr, tld=org) in addition to the existing text
search across domain names, feed URLs, titles, and descriptions.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Split main.go into separate files for better organization:
crawler.go, domain.go, feed.go, parser.go, html.go, util.go
- Add PebbleDB for persistent storage of feeds and domains
- Store feeds with metadata: title, TTL, update frequency, ETag, etc.
- Track domains with crawl status (uncrawled/crawled/error)
- Normalize URLs by stripping scheme and www. prefix
- Add web dashboard on port 4321 with real-time stats:
- Crawl progress with completion percentage
- Feed counts by type (RSS/Atom)
- Top TLDs and domains by feed count
- Recent feeds table
- Filter out comment feeds from results
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>