Commit Graph

5 Commits

Author SHA1 Message Date
primal
aa6f571215 Add PDS credentials env file for service auth 2026-01-26 19:38:36 -05:00
primal
143807378f Add Docker support and refactor data layer 2026-01-26 16:02:05 -05:00
primal
219b49352e Add PebbleDB storage, domain tracking, and web dashboard
- Split main.go into separate files for better organization:
  crawler.go, domain.go, feed.go, parser.go, html.go, util.go
- Add PebbleDB for persistent storage of feeds and domains
- Store feeds with metadata: title, TTL, update frequency, ETag, etc.
- Track domains with crawl status (uncrawled/crawled/error)
- Normalize URLs by stripping scheme and www. prefix
- Add web dashboard on port 4321 with real-time stats:
  - Crawl progress with completion percentage
  - Feed counts by type (RSS/Atom)
  - Top TLDs and domains by feed count
  - Recent feeds table
- Filter out comment feeds from results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:29:00 -05:00
primal
0dd612b7e1 Rename feed directory to feeds
Update output directory path in main.go and .gitignore to use
feeds/ instead of feed/.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 15:36:56 -05:00
primal
f4cae127cc Add feed crawler with documentation
- main.go: RSS/Atom feed crawler using Common Crawl data
- CLAUDE.md: Project documentation for Claude Code
- .gitignore: Ignore binary and go.* files
- Feed output now written to feed/ directory

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 15:15:30 -05:00