Commit Graph

18 Commits

Author SHA1 Message Date
primal
253e04a749 Strip HTML and truncate descriptions on input
Clean descriptions when parsing feeds rather than at publish time.
Descriptions are now stored as plain text, max 300 chars.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:59:21 -05:00
primal
70828bf05d Remove unused enclosure_length from items table
The enclosure length was never used when publishing to the PDS.
Added migration to drop the column.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 22:56:45 -05:00
primal
018c059924 Remove GUID from items, use Link as primary key
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:34:39 -05:00
primal
6314b934c1 Remove content column from items table - only description used for posts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:14:29 -05:00
primal
8cf25a55dc Remove item_count column from feeds table - compute dynamically from items
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:00:23 -05:00
primal
be969c11db Remove site_url field from feeds
- Remove SiteURL from Feed struct
- Remove site_url from all SQL queries and scans
- Remove SiteURL parsing from RSS/Atom/JSON feed parsers
- Add migration to drop site_url column

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 20:44:35 -05:00
primal
288379804d Remove category field from feeds
- Remove classifyFeed and classifyFeedByTitle functions
- Remove Category from Feed struct
- Remove category from all SQL queries and scans
- Add migration to drop category column from database

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 20:37:26 -05:00
primal
037b453a68 Remove oldest_item_date and newest_item_date columns
- Drop columns from schema, add migration
- Remove date stats calculation from parsers
- Update all feed queries to exclude these columns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 19:28:20 -05:00
primal
fec53f913c Remove last_build_date column from feeds schema
- Drop last_build_date from schema and add migration
- Remove parsing of RSS lastBuildDate and Atom updated date
- Update all feed SQL queries to exclude the column

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 19:16:30 -05:00
primal
2c3fa5e104 Remove discovered_at column from feeds and items tables
- Remove DiscoveredAt field from Feed and Item structs
- Remove from all SQL queries
- Remove from schema definitions
- Add migrations to drop the columns
- Remove unused 'now' variable declarations

The column wasn't providing value - all feeds had the same timestamp
from bulk import, and items weren't using it for any logic.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 19:07:20 -05:00
primal
e56bcd456d Remove next_check_at column from feeds table
- Remove NextCheckAt field from Feed struct
- Remove from all SQL queries (saveFeed, getFeed, scanFeeds, etc.)
- Remove from schema definition
- Add migration to drop the column if it exists
- Update calculateNextCheck to use MissCount field
- Update tld.go to use status column instead of feed_health

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 16:07:04 -05:00
primal
8a9001c02c Restore working codebase with all methods
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:08:53 -05:00
primal
be595cb403 v100 2026-01-30 22:35:08 -05:00
primal
3999e96f26 Dashboard UI overhaul: inline feed details, TLD filtering, status improvements
- Feed details now expand inline instead of navigating to new page
- Add TLD section headers with domains sorted by TLD then name
- Add TLD filter button to show/hide domain sections by TLD
- Feed status behavior: pass creates account, hold crawls only, skip stops, drop cleans up
- Auto-follow new accounts from directory account (1440.news)
- Fix handle derivation (removed duplicate .1440.news suffix)
- Increase domain import batch size to 100k
- Various bug fixes for account creation and profile updates

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 20:51:05 -05:00
primal
ad78c1a4c0 Add JSON Feed support
- Detect JSON Feed format (jsonfeed.org) via version field
- Parse JSON Feed metadata and items
- Support application/feed+json MIME type for feed discovery
- Include "json" as valid feed type (not auto-denied)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 13:16:50 -05:00
primal
75835d771d Add AT Protocol publishing, media support, and SQLite stability
Publishing:
- Add publisher.go for posting feed items to AT Protocol PDS
- Support deterministic rkeys from SHA256(guid + discoveredAt)
- Handle multiple URLs in posts with facets for each link
- Image embed support (app.bsky.embed.images) for up to 4 images
- External embed with thumbnail fallback
- Podcast/audio enclosure URLs included in post text

Media extraction:
- Parse RSS enclosures (audio, video, images)
- Extract Media RSS content and thumbnails
- Extract images from HTML content in descriptions
- Store enclosure and imageUrls in items table

SQLite stability improvements:
- Add synchronous=NORMAL and wal_autocheckpoint pragmas
- Connection pool tuning (idle conns, max lifetime)
- Periodic WAL checkpoint every 5 minutes
- Hourly integrity checks with PRAGMA quick_check
- Daily hot backup via VACUUM INTO
- Docker stop_grace_period: 30s for graceful shutdown

Dashboard:
- Feed publishing UI and API endpoints
- Account creation with invite codes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 15:30:02 -05:00
primal
143807378f Add Docker support and refactor data layer 2026-01-26 16:02:05 -05:00
primal
219b49352e Add PebbleDB storage, domain tracking, and web dashboard
- Split main.go into separate files for better organization:
  crawler.go, domain.go, feed.go, parser.go, html.go, util.go
- Add PebbleDB for persistent storage of feeds and domains
- Store feeds with metadata: title, TTL, update frequency, ETag, etc.
- Track domains with crawl status (uncrawled/crawled/error)
- Normalize URLs by stripping scheme and www. prefix
- Add web dashboard on port 4321 with real-time stats:
  - Crawl progress with completion percentage
  - Feed counts by type (RSS/Atom)
  - Top TLDs and domains by feed count
  - Recent feeds table
- Filter out comment feeds from results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:29:00 -05:00