Commit Graph

15 Commits

Author SHA1 Message Date
primal
be595cb403 v100 2026-01-30 22:35:08 -05:00
primal
522233c4a2 Tune concurrency settings: 100 workers, 100 batch size, 100 buffer
- Import batch size: 100 (was 100k)
- Domain check/crawl fetch size: 100 (was 1000)
- Feed check fetch size: 100 (was 1000)
- Worker count: 100 fixed (was NumCPU)
- Channel buffers: 100 (was 256)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 23:33:57 -05:00
primal
3999e96f26 Dashboard UI overhaul: inline feed details, TLD filtering, status improvements
- Feed details now expand inline instead of navigating to new page
- Add TLD section headers with domains sorted by TLD then name
- Add TLD filter button to show/hide domain sections by TLD
- Feed status behavior: pass creates account, hold crawls only, skip stops, drop cleans up
- Auto-follow new accounts from directory account (1440.news)
- Fix handle derivation (removed duplicate .1440.news suffix)
- Increase domain import batch size to 100k
- Various bug fixes for account creation and profile updates

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 20:51:05 -05:00
primal
8e4993d3c5 Optimize stats and TLD queries for performance
- Reduce stats loop interval from 1 minute to 10 seconds
- Simplify TLD query by removing expensive feeds JOIN (30s -> 1.8s)
- Sort TLDs alphabetically

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 13:00:13 -05:00
primal
254b751799 Add rich text links, language filter, and domain deny feature
- Use labeled links (Article · Audio) instead of raw URLs in posts
- Add language filter dropdown to dashboard with toggle selection
- Auto-deny feeds with no language on discovery
- Add deny/undeny buttons for domains to block crawling
- Denied domains set feeds to dead status, preventing future checks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 12:36:58 -05:00
primal
94d64373ed Add URL shortener for link tracking and shorter posts
- New short_urls and clicks tables for URL mapping and analytics
- /r/{code} redirect endpoint with click tracking
- Short URLs use 6-char base64 hash codes (26 chars total)
- Publish loop now shortens article links and enclosure URLs
- Enables podcast audio URLs to fit in posts (139 → 26 chars)
- Tracks: timestamp, referrer, user agent, anonymized IP

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 22:43:42 -05:00
primal
f4f80e91cc Add enclosure support for podcast/media items
- Load enclosure fields in GetAllUnpublishedItems query
- Only include enclosure URL if it fits within post length limit
- Shorter video/audio enclosures will be included when they fit

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 22:36:28 -05:00
primal
a1f02cd0bc Fix image embeds and rkey collisions
- Add image_urls to GetAllUnpublishedItems query
- Add aspectRatio to image embeds (required by Bluesky)
- Add image decoding to get dimensions (width/height)
- Fix rkey collision by using XOR of multiple hash bytes

The rkey collision was caused by using only 2 hash bytes (10 bits)
which had ~0.1% collision rate per pair of items with same timestamp.
Now XORs 8 hash bytes for better entropy distribution.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 21:24:35 -05:00
primal
4e4e8c939a Add favicon as profile picture for feed accounts
Fetches the site's favicon and uses it as the avatar when creating
or updating feed account profiles. Tries common favicon locations
(/favicon.ico, /favicon.png, /apple-touch-icon.png) then falls back
to Google's favicon service.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 21:05:50 -05:00
primal
9a43b69b4b Add profile refresh on startup to backfill feed URLs
Updates existing account profiles with the feed URL on startup.
This ensures all accounts have the source feed URL in their
profile description.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 21:03:10 -05:00
primal
9ecf0f700d Add feed URL to profile description
When creating new accounts, include the full RSS/Atom feed URL
in the profile description so users can find the original source.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:56:09 -05:00
primal
f4afb29980 Migrate from SQLite to PostgreSQL
- Replace modernc.org/sqlite with jackc/pgx/v5
- Update all SQL queries for PostgreSQL syntax ($1, $2 placeholders)
- Use snake_case column names throughout
- Replace SQLite FTS5 with PostgreSQL tsvector/tsquery full-text search
- Add connection pooling with pgxpool
- Support Docker secrets for database password
- Add trigger to normalize feed URLs (strip https://, http://, www.)
- Fix anchor feed detection regex to avoid false positives
- Connect app container to atproto network for PostgreSQL access
- Add version indicator to dashboard UI

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:38:13 -05:00
primal
75835d771d Add AT Protocol publishing, media support, and SQLite stability
Publishing:
- Add publisher.go for posting feed items to AT Protocol PDS
- Support deterministic rkeys from SHA256(guid + discoveredAt)
- Handle multiple URLs in posts with facets for each link
- Image embed support (app.bsky.embed.images) for up to 4 images
- External embed with thumbnail fallback
- Podcast/audio enclosure URLs included in post text

Media extraction:
- Parse RSS enclosures (audio, video, images)
- Extract Media RSS content and thumbnails
- Extract images from HTML content in descriptions
- Store enclosure and imageUrls in items table

SQLite stability improvements:
- Add synchronous=NORMAL and wal_autocheckpoint pragmas
- Connection pool tuning (idle conns, max lifetime)
- Periodic WAL checkpoint every 5 minutes
- Hourly integrity checks with PRAGMA quick_check
- Daily hot backup via VACUUM INTO
- Docker stop_grace_period: 30s for graceful shutdown

Dashboard:
- Feed publishing UI and API endpoints
- Account creation with invite codes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 15:30:02 -05:00
primal
143807378f Add Docker support and refactor data layer 2026-01-26 16:02:05 -05:00
primal
219b49352e Add PebbleDB storage, domain tracking, and web dashboard
- Split main.go into separate files for better organization:
  crawler.go, domain.go, feed.go, parser.go, html.go, util.go
- Add PebbleDB for persistent storage of feeds and domains
- Store feeds with metadata: title, TTL, update frequency, ETag, etc.
- Track domains with crawl status (uncrawled/crawled/error)
- Normalize URLs by stripping scheme and www. prefix
- Add web dashboard on port 4321 with real-time stats:
  - Crawl progress with completion percentage
  - Feed counts by type (RSS/Atom)
  - Top TLDs and domains by feed count
  - Recent feeds table
- Filter out comment feeds from results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:29:00 -05:00