- New short_urls and clicks tables for URL mapping and analytics
- /r/{code} redirect endpoint with click tracking
- Short URLs use 6-char base64 hash codes (26 chars total)
- Publish loop now shortens article links and enclosure URLs
- Enables podcast audio URLs to fit in posts (139 → 26 chars)
- Tracks: timestamp, referrer, user agent, anonymized IP
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Load enclosure fields in GetAllUnpublishedItems query
- Only include enclosure URL if it fits within post length limit
- Shorter video/audio enclosures will be included when they fit
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The PDS restricts the first segment of local handles to 18 characters,
not the AT Protocol spec of 63. Added abbreviation map for long
category names:
- science-and-environment -> sci-env
- entertainment-and-arts -> ent-arts
- technology -> tech (when needed)
- etc.
Fixes "Handle too long" errors for BBC category feeds.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bluesky has a ~976KB blob limit. Images larger than 900KB are now
automatically resized using CatmullRom scaling and re-encoded as
JPEG with 85% quality. Iteratively scales down (90%, 72%, 58%...)
until under limit, with minimum dimensions of 100x100.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Filter imported domains to only .com TLD for now.
Re-enabled the import loop that was disabled for testing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BBC CDN supports larger image sizes by changing the URL path.
Upgrade /standard/240/ and /standard/480/ to /standard/800/.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Both createdAt and rkey now use the original publication date,
so posts sort consistently by their original publication time.
Falls back to DiscoveredAt if PubDate is not available.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Posts now use the item pub_date for the createdAt field instead
of the current time, so posts show their original publication time.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add image_urls to GetAllUnpublishedItems query
- Add aspectRatio to image embeds (required by Bluesky)
- Add image decoding to get dimensions (width/height)
- Fix rkey collision by using XOR of multiple hash bytes
The rkey collision was caused by using only 2 hash bytes (10 bits)
which had ~0.1% collision rate per pair of items with same timestamp.
Now XORs 8 hash bytes for better entropy distribution.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fetches the site's favicon and uses it as the avatar when creating
or updating feed account profiles. Tries common favicon locations
(/favicon.ico, /favicon.png, /apple-touch-icon.png) then falls back
to Google's favicon service.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updates existing account profiles with the feed URL on startup.
This ensures all accounts have the source feed URL in their
profile description.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When creating new accounts, include the full RSS/Atom feed URL
in the profile description so users can find the original source.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
AT Protocol allows 63 characters per label, not 18. The previous
limit was incorrectly truncating category names like
"science-and-environment" and "entertainment-and-arts".
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Publishing:
- Add publisher.go for posting feed items to AT Protocol PDS
- Support deterministic rkeys from SHA256(guid + discoveredAt)
- Handle multiple URLs in posts with facets for each link
- Image embed support (app.bsky.embed.images) for up to 4 images
- External embed with thumbnail fallback
- Podcast/audio enclosure URLs included in post text
Media extraction:
- Parse RSS enclosures (audio, video, images)
- Extract Media RSS content and thumbnails
- Extract images from HTML content in descriptions
- Store enclosure and imageUrls in items table
SQLite stability improvements:
- Add synchronous=NORMAL and wal_autocheckpoint pragmas
- Connection pool tuning (idle conns, max lifetime)
- Periodic WAL checkpoint every 5 minutes
- Hourly integrity checks with PRAGMA quick_check
- Daily hot backup via VACUUM INTO
- Docker stop_grace_period: 30s for graceful shutdown
Dashboard:
- Feed publishing UI and API endpoints
- Account creation with invite codes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configure container deployment with:
- HTTPS via Traefik with LetsEncrypt certificate
- HTTP to HTTPS redirect for production (1440.news)
- HTTP-only routing for local development (1440.localhost)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The codebase evolved from a single-file app to a multi-file structure
with SQLite persistence, dashboard, and concurrent processing loops.
Updated documentation to accurately describe current architecture.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Split main.go into separate files for better organization:
crawler.go, domain.go, feed.go, parser.go, html.go, util.go
- Add PebbleDB for persistent storage of feeds and domains
- Store feeds with metadata: title, TTL, update frequency, ETag, etc.
- Track domains with crawl status (uncrawled/crawled/error)
- Normalize URLs by stripping scheme and www. prefix
- Add web dashboard on port 4321 with real-time stats:
- Crawl progress with completion percentage
- Feed counts by type (RSS/Atom)
- Top TLDs and domains by feed count
- Recent feeds table
- Filter out comment feeds from results
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- main.go: RSS/Atom feed crawler using Common Crawl data
- CLAUDE.md: Project documentation for Claude Code
- .gitignore: Ignore binary and go.* files
- Feed output now written to feed/ directory
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>