Dashboard UI overhaul: inline feed details, TLD filtering, status improvements
- Feed details now expand inline instead of navigating to new page - Add TLD section headers with domains sorted by TLD then name - Add TLD filter button to show/hide domain sections by TLD - Feed status behavior: pass creates account, hold crawls only, skip stops, drop cleans up - Auto-follow new accounts from directory account (1440.news) - Fix handle derivation (removed duplicate .1440.news suffix) - Increase domain import batch size to 100k - Various bug fixes for account creation and profile updates Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -45,10 +45,11 @@ Multi-file Go application that crawls websites for RSS/Atom feeds, stores them i
|
||||
|
||||
### Concurrent Loops (main.go)
|
||||
|
||||
The application runs six independent goroutine loops:
|
||||
- **Import loop** - Reads `vertices.txt.gz` and inserts domains into DB in 10k batches
|
||||
- **Crawl loop** - Worker pool processes unchecked domains, discovers feeds
|
||||
- **Check loop** - Worker pool re-checks known feeds for updates (conditional HTTP)
|
||||
The application runs seven independent goroutine loops:
|
||||
- **Import loop** - Reads `vertices.txt.gz` and inserts domains into DB in 10k batches (status='hold')
|
||||
- **Domain check loop** - HEAD requests to verify approved domains are reachable
|
||||
- **Crawl loop** - Worker pool crawls verified domains for feed discovery
|
||||
- **Feed check loop** - Worker pool re-checks known feeds for updates (conditional HTTP)
|
||||
- **Stats loop** - Updates cached dashboard statistics every minute
|
||||
- **Cleanup loop** - Removes items older than 12 months (weekly)
|
||||
- **Publish loop** - Autopublishes items from approved feeds to AT Protocol PDS
|
||||
@@ -70,8 +71,8 @@ The application runs six independent goroutine loops:
|
||||
### Database Schema
|
||||
|
||||
PostgreSQL with pgx driver, using connection pooling:
|
||||
- **domains** - Hosts to crawl (status: unchecked/checked/error)
|
||||
- **feeds** - Discovered RSS/Atom feeds with metadata and cache headers
|
||||
- **domains** - Hosts to crawl (status: hold/pass/skip/fail)
|
||||
- **feeds** - Discovered RSS/Atom feeds with metadata and cache headers (publish_status: hold/pass/skip)
|
||||
- **items** - Individual feed entries (guid + feed_url unique)
|
||||
- **search_vector** - GENERATED tsvector columns for full-text search (GIN indexed)
|
||||
|
||||
@@ -79,11 +80,12 @@ Column naming: snake_case (e.g., `source_host`, `pub_date`, `item_count`)
|
||||
|
||||
### Crawl Logic
|
||||
|
||||
1. Domain picked from `unchecked` status (random order)
|
||||
2. Try HTTPS, fall back to HTTP
|
||||
3. Recursive crawl up to MaxDepth=10, MaxPagesPerHost=10
|
||||
4. Extract `<link rel="alternate">` and anchor hrefs containing rss/atom/feed
|
||||
5. Parse discovered feeds for metadata, save with next_crawl_at
|
||||
1. Domain manually approved (status set to 'pass')
|
||||
2. Check stage: HEAD request verifies domain is reachable, sets last_checked_at
|
||||
3. Crawl stage: Full recursive crawl (HTTPS, fallback HTTP)
|
||||
4. Recursive crawl up to MaxDepth=10, MaxPagesPerHost=10
|
||||
5. Extract `<link rel="alternate">` and anchor hrefs containing rss/atom/feed
|
||||
6. Parse discovered feeds for metadata, save with next_crawl_at
|
||||
|
||||
### Feed Checking
|
||||
|
||||
@@ -92,7 +94,16 @@ Uses conditional HTTP (ETag, If-Modified-Since). Adaptive backoff: base 100s + 1
|
||||
### Publishing
|
||||
|
||||
Feeds with `publish_status = 'pass'` have their items automatically posted to AT Protocol.
|
||||
Status values: `held` (default), `pass` (approved), `deny` (rejected).
|
||||
Status values: `hold` (default/pending review), `pass` (approved), `skip` (rejected).
|
||||
|
||||
### Domain Processing (Two-Stage)
|
||||
|
||||
1. **Check stage** - HEAD request to verify domain is reachable
|
||||
2. **Crawl stage** - Full recursive crawl for feed discovery
|
||||
|
||||
Domain status values: `hold` (pending), `pass` (approved), `skip` (rejected), `fail` (error).
|
||||
Domains starting with a digit (except 1440.news) are auto-skipped.
|
||||
Non-English feeds are auto-skipped.
|
||||
|
||||
## AT Protocol Integration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user