crawler

Author	SHA1	Message	Date
primal	b515976fef	Add clean-descriptions migration tool Batch processes existing item descriptions to strip HTML tags, decode HTML entities, and truncate to 300 characters. Processes in batches of 1000 with progress output. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 23:43:58 -05:00
primal	288379804d	Remove category field from feeds - Remove classifyFeed and classifyFeedByTitle functions - Remove Category from Feed struct - Remove category from all SQL queries and scans - Add migration to drop category column from database Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 20:37:26 -05:00
primal	82b40b9155	Remove domain_host/domain_tld columns from feeds table - Remove domain_host and domain_tld columns from feeds schema - Add migrations to drop columns and related index/FK constraint - Update all feed queries and structs to not include these columns - Use URL pattern search instead of domain columns for GetFeedsByHost Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 20:07:03 -05:00
primal	2c3fa5e104	Remove discovered_at column from feeds and items tables - Remove DiscoveredAt field from Feed and Item structs - Remove from all SQL queries - Remove from schema definitions - Add migrations to drop the columns - Remove unused 'now' variable declarations The column wasn't providing value - all feeds had the same timestamp from bulk import, and items weren't using it for any logic. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 19:07:20 -05:00
primal	1b0ff1b507	Add import-tsv tool for bulk importing TSV feed files Standalone tool that uses pgx connection pool to import feeds from TSV. Handles special characters in password via key=value connection string format. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 15:25:29 -05:00
primal	091fa8490b	Filter text/html extraction by feed-like URL patterns Reduces from ~2B URLs to ~2-3M by filtering for URLs containing: rss, feed, atom, xml, syndication, frontpage, newest, etc. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:30:40 -05:00
primal	61ca7a4c7a	Add html feed type for content-sniffed feeds - New 'html' type for feeds served with text/html MIME - feed_check content-sniffs html feeds and updates type to rss/atom/json - If content-sniff returns unknown, marks feed as IGNORE - Added cmd/extract-html tool to query local parquet files for text/html Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:18:26 -05:00
primal	8192bce301	Add AT Protocol OAuth 2.0 authentication for dashboard - Implement full OAuth 2.0 with PKCE using haileyok/atproto-oauth-golang - Backend For Frontend (BFF) pattern: tokens stored server-side only - AES-256-GCM encrypted session cookies - Auto token refresh when near expiry - Restrict access to allowed handles (1440.news, wehrv.bsky.social) - Add genkey utility for generating OAuth configuration - Generic error messages to prevent handle enumeration - Server-side logging of failed login attempts for security monitoring New files: - oauth.go: OAuth client wrapper and DID/handle resolution - oauth_session.go: Session management with encrypted cookies - oauth_middleware.go: RequireAuth middleware for route protection - oauth_handlers.go: Login, callback, logout, metadata endpoints - cmd/genkey/main.go: Generate OAuth secrets and JWK keypair - oauth.env.example: Configuration template Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:16:51 -05:00

8 Commits