- Implement full OAuth 2.0 with PKCE using haileyok/atproto-oauth-golang - Backend For Frontend (BFF) pattern: tokens stored server-side only - AES-256-GCM encrypted session cookies - Auto token refresh when near expiry - Restrict access to allowed handles (1440.news, wehrv.bsky.social) - Add genkey utility for generating OAuth configuration - Generic error messages to prevent handle enumeration - Server-side logging of failed login attempts for security monitoring New files: - oauth.go: OAuth client wrapper and DID/handle resolution - oauth_session.go: Session management with encrypted cookies - oauth_middleware.go: RequireAuth middleware for route protection - oauth_handlers.go: Login, callback, logout, metadata endpoints - cmd/genkey/main.go: Generate OAuth secrets and JWK keypair - oauth.env.example: Configuration template Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Note: Always run applications in containers via
docker compose up -d --buildwhen possible. This ensures proper networking between services (database, traefik, etc.) and matches the production environment.
Build & Run
go build -o 1440.news . # Build
./1440.news # Run (starts dashboard at http://localhost:4321)
go fmt ./... # Format
go vet ./... # Static analysis
Database Setup
Requires PostgreSQL. Start the database first:
cd ../postgres && docker compose up -d
Environment Variables
Set via environment or create a .env file:
# Database connection (individual vars)
DB_HOST=atproto-postgres # Default: atproto-postgres
DB_PORT=5432 # Default: 5432
DB_USER=news_1440 # Default: news_1440
DB_PASSWORD=<password> # Or use DB_PASSWORD_FILE
DB_NAME=news_1440 # Default: news_1440
# Or use a connection string
DATABASE_URL=postgres://news_1440:password@atproto-postgres:5432/news_1440?sslmode=disable
For Docker, use DB_PASSWORD_FILE=/run/secrets/db_password with Docker secrets.
Requires vertices.txt.gz (Common Crawl domain list) in the working directory.
Architecture
Multi-file Go application that crawls websites for RSS/Atom feeds, stores them in PostgreSQL, and provides a web dashboard.
Concurrent Loops (main.go)
The application runs seven independent goroutine loops:
- Import loop - Reads
vertices.txt.gzand inserts domains into DB in batches of 100 (status='pass') - Domain check loop - HEAD requests to verify approved domains are reachable
- Crawl loop - Worker pool crawls verified domains for feed discovery
- Feed check loop - Worker pool re-checks known feeds for updates (conditional HTTP)
- Stats loop - Updates cached dashboard statistics every minute
- Cleanup loop - Removes items older than 12 months (weekly)
- Publish loop - Autopublishes items from approved feeds to AT Protocol PDS
File Structure
| File | Purpose |
|---|---|
crawler.go |
Crawler struct, worker pools, page fetching, recursive crawl logic |
domain.go |
Domain struct, DB operations, vertices file import |
feed.go |
Feed/Item structs, DB operations, feed checking with HTTP caching |
parser.go |
RSS/Atom XML parsing, date parsing, next-crawl calculation |
html.go |
HTML parsing: feed link extraction, anchor feed detection |
util.go |
URL normalization, host utilities, TLD extraction |
db.go |
PostgreSQL schema (domains, feeds, items tables with tsvector FTS) |
dashboard.go |
HTTP server, JSON APIs, HTML template |
publisher.go |
AT Protocol PDS integration for posting items |
oauth.go |
OAuth 2.0 client wrapper for AT Protocol authentication |
oauth_session.go |
Session management with AES-256-GCM encrypted cookies |
oauth_middleware.go |
RequireAuth middleware for protecting routes |
oauth_handlers.go |
OAuth HTTP endpoints (login, callback, logout, metadata) |
routes.go |
HTTP route registration with auth middleware |
Database Schema
PostgreSQL with pgx driver, using connection pooling:
- domains - Hosts to crawl (status: hold/pass/skip/fail)
- feeds - Discovered RSS/Atom feeds with metadata and cache headers (publish_status: hold/pass/skip)
- items - Individual feed entries (guid + feed_url unique)
- search_vector - GENERATED tsvector columns for full-text search (GIN indexed)
Column naming: snake_case (e.g., source_host, pub_date, item_count)
Crawl Logic
- Domains import as
passby default (auto-crawled) - Check stage: HEAD request verifies domain is reachable, sets last_checked_at
- Crawl stage: Full recursive crawl (HTTPS, fallback HTTP)
- Recursive crawl up to MaxDepth=10, MaxPagesPerHost=10
- Extract
<link rel="alternate">and anchor hrefs containing rss/atom/feed - Parse discovered feeds for metadata, save with next_crawl_at
Feed Checking
Uses conditional HTTP (ETag, If-Modified-Since). Adaptive backoff: base 100s + 100s per consecutive no-change. Respects RSS <ttl> and Syndication namespace hints.
Publishing
Feeds with publish_status = 'pass' have their items automatically posted to AT Protocol.
Status values: hold (default/pending review), pass (approved), skip (rejected).
Domain Processing (Two-Stage)
- Check stage - HEAD request to verify domain is reachable
- Crawl stage - Full recursive crawl for feed discovery
Domain status values:
pass(default on import) - Domain is crawled and checked automaticallyhold(manual) - Pauses crawling, keeps existing feeds and itemsskip(manual) - Takes down PDS accounts (hides posts), marks feeds inactive, preserves all datadrop(manual, via button) - Permanently deletes all feeds, items, and PDS accounts (requires skip first)fail(automatic) - Set when check/crawl fails, keeps existing feeds and items
Skip vs Drop:
skipis reversible - use "un-skip" to restore accounts and resume publishingdropis permanent - all data is deleted, cannot be recovered Auto-skip patterns (imported asskip): bare TLDs, domains starting with digit, domains starting with letter-dash. Non-English feeds are auto-skipped.
AT Protocol Integration
Domain: 1440.news
User structure:
wehrv.1440.news- Owner/admin account{domain}.1440.news- Catch-all feed per source (e.g.,wsj.com.1440.news){category}.{domain}.1440.news- Category-specific feeds (future)
PDS configuration in pds.env:
PDS_HOST=https://pds.1440.news
PDS_ADMIN_PASSWORD=<admin_password>
Dashboard Authentication
The dashboard is protected by AT Protocol OAuth 2.0. Only the @1440.news handle can access it.
OAuth Setup
-
Generate configuration:
go run ./cmd/genkey -
Create
oauth.envwith the generated values:OAUTH_COOKIE_SECRET=<generated_hex_string> OAUTH_PRIVATE_JWK=<generated_jwk_json> -
Optionally set the base URL (defaults to https://app.1440.news):
OAUTH_BASE_URL=https://app.1440.news
OAuth Flow
- User navigates to
/dashboard-> redirected to/auth/login - User enters their Bluesky handle
- User is redirected to Bluesky authorization
- After approval, callback verifies handle is
1440.news - Session cookie is set, user redirected to dashboard
OAuth Endpoints
/.well-known/oauth-client-metadata- Client metadata (public)/.well-known/jwks.json- Public JWK set (public)/auth/login- Login page / initiates OAuth flow/auth/callback- OAuth callback handler/auth/logout- Clears session/auth/session- Returns current session info (JSON)
Security Notes
- Tokens are stored server-side only (BFF pattern)
- Browser only receives encrypted session cookie (AES-256-GCM)
- Access restricted to single handle (
1440.news) - Sessions expire after 24 hours
- Automatic token refresh when within 5 minutes of expiry