Commit Graph

8 Commits

Author SHA1 Message Date
primal
b515976fef Add clean-descriptions migration tool
Batch processes existing item descriptions to strip HTML tags,
decode HTML entities, and truncate to 300 characters. Processes
in batches of 1000 with progress output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 23:43:58 -05:00
primal
288379804d Remove category field from feeds
- Remove classifyFeed and classifyFeedByTitle functions
- Remove Category from Feed struct
- Remove category from all SQL queries and scans
- Add migration to drop category column from database

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 20:37:26 -05:00
primal
82b40b9155 Remove domain_host/domain_tld columns from feeds table
- Remove domain_host and domain_tld columns from feeds schema
- Add migrations to drop columns and related index/FK constraint
- Update all feed queries and structs to not include these columns
- Use URL pattern search instead of domain columns for GetFeedsByHost

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 20:07:03 -05:00
primal
2c3fa5e104 Remove discovered_at column from feeds and items tables
- Remove DiscoveredAt field from Feed and Item structs
- Remove from all SQL queries
- Remove from schema definitions
- Add migrations to drop the columns
- Remove unused 'now' variable declarations

The column wasn't providing value - all feeds had the same timestamp
from bulk import, and items weren't using it for any logic.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 19:07:20 -05:00
primal
1b0ff1b507 Add import-tsv tool for bulk importing TSV feed files
Standalone tool that uses pgx connection pool to import feeds from TSV.
Handles special characters in password via key=value connection string format.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 15:25:29 -05:00
primal
091fa8490b Filter text/html extraction by feed-like URL patterns
Reduces from ~2B URLs to ~2-3M by filtering for URLs containing:
rss, feed, atom, xml, syndication, frontpage, newest, etc.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 14:30:40 -05:00
primal
61ca7a4c7a Add html feed type for content-sniffed feeds
- New 'html' type for feeds served with text/html MIME
- feed_check content-sniffs html feeds and updates type to rss/atom/json
- If content-sniff returns unknown, marks feed as IGNORE
- Added cmd/extract-html tool to query local parquet files for text/html

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 14:18:26 -05:00
primal
8192bce301 Add AT Protocol OAuth 2.0 authentication for dashboard
- Implement full OAuth 2.0 with PKCE using haileyok/atproto-oauth-golang
- Backend For Frontend (BFF) pattern: tokens stored server-side only
- AES-256-GCM encrypted session cookies
- Auto token refresh when near expiry
- Restrict access to allowed handles (1440.news, wehrv.bsky.social)
- Add genkey utility for generating OAuth configuration
- Generic error messages to prevent handle enumeration
- Server-side logging of failed login attempts for security monitoring

New files:
- oauth.go: OAuth client wrapper and DID/handle resolution
- oauth_session.go: Session management with encrypted cookies
- oauth_middleware.go: RequireAuth middleware for route protection
- oauth_handlers.go: Login, callback, logout, metadata endpoints
- cmd/genkey/main.go: Generate OAuth secrets and JWK keypair
- oauth.env.example: Configuration template

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:16:51 -05:00