Commit Graph

5 Commits

Author SHA1 Message Date
primal
254b751799 Add rich text links, language filter, and domain deny feature
- Use labeled links (Article · Audio) instead of raw URLs in posts
- Add language filter dropdown to dashboard with toggle selection
- Auto-deny feeds with no language on discovery
- Add deny/undeny buttons for domains to block crawling
- Denied domains set feeds to dead status, preventing future checks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 12:36:58 -05:00
primal
f4afb29980 Migrate from SQLite to PostgreSQL
- Replace modernc.org/sqlite with jackc/pgx/v5
- Update all SQL queries for PostgreSQL syntax ($1, $2 placeholders)
- Use snake_case column names throughout
- Replace SQLite FTS5 with PostgreSQL tsvector/tsquery full-text search
- Add connection pooling with pgxpool
- Support Docker secrets for database password
- Add trigger to normalize feed URLs (strip https://, http://, www.)
- Fix anchor feed detection regex to avoid false positives
- Connect app container to atproto network for PostgreSQL access
- Add version indicator to dashboard UI

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:38:13 -05:00
primal
75835d771d Add AT Protocol publishing, media support, and SQLite stability
Publishing:
- Add publisher.go for posting feed items to AT Protocol PDS
- Support deterministic rkeys from SHA256(guid + discoveredAt)
- Handle multiple URLs in posts with facets for each link
- Image embed support (app.bsky.embed.images) for up to 4 images
- External embed with thumbnail fallback
- Podcast/audio enclosure URLs included in post text

Media extraction:
- Parse RSS enclosures (audio, video, images)
- Extract Media RSS content and thumbnails
- Extract images from HTML content in descriptions
- Store enclosure and imageUrls in items table

SQLite stability improvements:
- Add synchronous=NORMAL and wal_autocheckpoint pragmas
- Connection pool tuning (idle conns, max lifetime)
- Periodic WAL checkpoint every 5 minutes
- Hourly integrity checks with PRAGMA quick_check
- Daily hot backup via VACUUM INTO
- Docker stop_grace_period: 30s for graceful shutdown

Dashboard:
- Feed publishing UI and API endpoints
- Account creation with invite codes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 15:30:02 -05:00
primal
143807378f Add Docker support and refactor data layer 2026-01-26 16:02:05 -05:00
primal
219b49352e Add PebbleDB storage, domain tracking, and web dashboard
- Split main.go into separate files for better organization:
  crawler.go, domain.go, feed.go, parser.go, html.go, util.go
- Add PebbleDB for persistent storage of feeds and domains
- Store feeds with metadata: title, TTL, update frequency, ETag, etc.
- Track domains with crawl status (uncrawled/crawled/error)
- Normalize URLs by stripping scheme and www. prefix
- Add web dashboard on port 4321 with real-time stats:
  - Crawl progress with completion percentage
  - Feed counts by type (RSS/Atom)
  - Top TLDs and domains by feed count
  - Recent feeds table
- Filter out comment feeds from results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:29:00 -05:00