Migrate from SQLite to PostgreSQL

- Replace modernc.org/sqlite with jackc/pgx/v5 - Update all SQL queries for PostgreSQL syntax ($1, $2 placeholders) - Use snake_case column names throughout - Replace SQLite FTS5 with PostgreSQL tsvector/tsquery full-text search - Add connection pooling with pgxpool - Support Docker secrets for database password - Add trigger to normalize feed URLs (strip https://, http://, www.) - Fix anchor feed detection regex to avoid false positives - Connect app container to atproto network for PostgreSQL access - Add version indicator to dashboard UI Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 20:38:13 -05:00
parent 75835d771d
commit f4afb29980
11 changed files with 1525 additions and 1137 deletions
@@ -11,20 +11,47 @@ go fmt ./...                # Format
 go vet ./...                # Static analysis
 ```

+### Database Setup
+
+Requires PostgreSQL. Start the database first:
+
+```bash
+cd ../postgres && docker compose up -d
+```
+
+### Environment Variables
+
+Set via environment or create a `.env` file:
+
+```bash
+# Database connection (individual vars)
+DB_HOST=atproto-postgres    # Default: atproto-postgres
+DB_PORT=5432                # Default: 5432
+DB_USER=news_1440           # Default: news_1440
+DB_PASSWORD=<password>      # Or use DB_PASSWORD_FILE
+DB_NAME=news_1440           # Default: news_1440
+
+# Or use a connection string
+DATABASE_URL=postgres://news_1440:password@atproto-postgres:5432/news_1440?sslmode=disable
+```
+
+For Docker, use `DB_PASSWORD_FILE=/run/secrets/db_password` with Docker secrets.
+
 Requires `vertices.txt.gz` (Common Crawl domain list) in the working directory.

 ## Architecture

-Multi-file Go application that crawls websites for RSS/Atom feeds, stores them in SQLite, and provides a web dashboard.
+Multi-file Go application that crawls websites for RSS/Atom feeds, stores them in PostgreSQL, and provides a web dashboard.

 ### Concurrent Loops (main.go)

-The application runs five independent goroutine loops:
+The application runs six independent goroutine loops:
 - **Import loop** - Reads `vertices.txt.gz` and inserts domains into DB in 10k batches
 - **Crawl loop** - Worker pool processes unchecked domains, discovers feeds
 - **Check loop** - Worker pool re-checks known feeds for updates (conditional HTTP)
 - **Stats loop** - Updates cached dashboard statistics every minute
 - **Cleanup loop** - Removes items older than 12 months (weekly)
+- **Publish loop** - Autopublishes items from approved feeds to AT Protocol PDS

 ### File Structure

@@ -36,16 +63,19 @@ The application runs five independent goroutine loops:
 | `parser.go` | RSS/Atom XML parsing, date parsing, next-crawl calculation |
 | `html.go` | HTML parsing: feed link extraction, anchor feed detection |
 | `util.go` | URL normalization, host utilities, TLD extraction |
-| `db.go` | SQLite schema (domains, feeds, items tables with FTS5) |
+| `db.go` | PostgreSQL schema (domains, feeds, items tables with tsvector FTS) |
 | `dashboard.go` | HTTP server, JSON APIs, HTML template |
+| `publisher.go` | AT Protocol PDS integration for posting items |

 ### Database Schema

-SQLite with WAL mode at `feeds/feeds.db`:
+PostgreSQL with pgx driver, using connection pooling:
 - **domains** - Hosts to crawl (status: unchecked/checked/error)
 - **feeds** - Discovered RSS/Atom feeds with metadata and cache headers
- **items** - Individual feed entries (guid + feedUrl unique)
- **feeds_fts / items_fts** - FTS5 virtual tables for search
+- **items** - Individual feed entries (guid + feed_url unique)
+- **search_vector** - GENERATED tsvector columns for full-text search (GIN indexed)
+
+Column naming: snake_case (e.g., `source_host`, `pub_date`, `item_count`)

 ### Crawl Logic

@@ -53,13 +83,18 @@ SQLite with WAL mode at `feeds/feeds.db`:
 2. Try HTTPS, fall back to HTTP
 3. Recursive crawl up to MaxDepth=10, MaxPagesPerHost=10
 4. Extract `<link rel="alternate">` and anchor hrefs containing rss/atom/feed
-5. Parse discovered feeds for metadata, save with nextCrawlAt
+5. Parse discovered feeds for metadata, save with next_crawl_at

 ### Feed Checking

 Uses conditional HTTP (ETag, If-Modified-Since). Adaptive backoff: base 100s + 100s per consecutive no-change. Respects RSS `<ttl>` and Syndication namespace hints.

-## AT Protocol Integration (Planned)
+### Publishing
+
+Feeds with `publish_status = 'pass'` have their items automatically posted to AT Protocol.
+Status values: `held` (default), `pass` (approved), `deny` (rejected).
+
+## AT Protocol Integration

 Domain: 1440.news

@@ -68,9 +103,8 @@ User structure:
 - `{domain}.1440.news` - Catch-all feed per source (e.g., `wsj.com.1440.news`)
 - `{category}.{domain}.1440.news` - Category-specific feeds (future)

-Phases:
-1. Local PDS setup
-2. Account management
-3. Auto-create domain users
-4. Post articles to accounts
-5. Category detection
+PDS configuration in `pds.env`:
+```
+PDS_HOST=https://pds.1440.news
+PDS_ADMIN_PASSWORD=<admin_password>
+```
@@ -1,10 +1,10 @@
 package main

 import (
-	"database/sql"
 	"fmt"
 	"io"
 	"net/http"
+	"os"
 	"runtime"
 	"strings"
 	"sync"
@@ -25,17 +25,17 @@ type Crawler struct {
 	hostsProcessed     int32
 	feedsChecked       int32
 	startTime          time.Time
-	db                 *sql.DB
+	db                 *DB
 	displayedCrawlRate int
 	displayedCheckRate int
 	domainsImported    int32
-	cachedStats      *DashboardStats
-	cachedAllDomains []DomainStat
-	statsMu          sync.RWMutex
+	cachedStats        *DashboardStats
+	cachedAllDomains   []DomainStat
+	statsMu            sync.RWMutex
 }

-func NewCrawler(dbPath string) (*Crawler, error) {
-	db, err := OpenDatabase(dbPath)
+func NewCrawler(connString string) (*Crawler, error) {
+	db, err := OpenDatabase(connString)
 	if err != nil {
 		return nil, fmt.Errorf("failed to open database: %v", err)
 	}
@@ -61,12 +61,6 @@ func NewCrawler(dbPath string) (*Crawler, error) {

 func (c *Crawler) Close() error {
 	if c.db != nil {
-		// Checkpoint WAL to merge it back into main database before closing
-		// This prevents corruption if the container is stopped mid-write
-		fmt.Println("Checkpointing WAL...")
-		if _, err := c.db.Exec("PRAGMA wal_checkpoint(TRUNCATE)"); err != nil {
-			fmt.Printf("WAL checkpoint warning: %v\n", err)
-		}
 		fmt.Println("Closing database...")
 		return c.db.Close()
 	}
@@ -95,53 +89,247 @@ func (c *Crawler) StartCleanupLoop() {
 }

 // StartMaintenanceLoop performs periodic database maintenance
-// - WAL checkpoint every 5 minutes to prevent WAL bloat and reduce corruption risk
-// - Quick integrity check every hour to detect issues early
-// - Hot backup every 24 hours for recovery
 func (c *Crawler) StartMaintenanceLoop() {
-	checkpointTicker := time.NewTicker(5 * time.Minute)
-	integrityTicker := time.NewTicker(1 * time.Hour)
-	backupTicker := time.NewTicker(24 * time.Hour)
-	defer checkpointTicker.Stop()
-	defer integrityTicker.Stop()
-	defer backupTicker.Stop()
+	vacuumTicker := time.NewTicker(24 * time.Hour)
+	analyzeTicker := time.NewTicker(1 * time.Hour)
+	defer vacuumTicker.Stop()
+	defer analyzeTicker.Stop()

 	for {
 		select {
-		case <-checkpointTicker.C:
-			// Passive checkpoint - doesn't block writers
-			if _, err := c.db.Exec("PRAGMA wal_checkpoint(PASSIVE)"); err != nil {
-				fmt.Printf("WAL checkpoint error: %v\n", err)
+		case <-analyzeTicker.C:
+			// Update statistics for query planner
+			if _, err := c.db.Exec("ANALYZE"); err != nil {
+				fmt.Printf("ANALYZE error: %v\n", err)
 			}

-		case <-integrityTicker.C:
-			// Quick check is faster than full integrity_check
-			var result string
-			if err := c.db.QueryRow("PRAGMA quick_check").Scan(&result); err != nil {
-				fmt.Printf("Integrity check error: %v\n", err)
-			} else if result != "ok" {
-				fmt.Printf("WARNING: Database integrity issue detected: %s\n", result)
+		case <-vacuumTicker.C:
+			// Reclaim dead tuple space (VACUUM is lighter than VACUUM FULL)
+			fmt.Println("Running VACUUM...")
+			if _, err := c.db.Exec("VACUUM"); err != nil {
+				fmt.Printf("VACUUM error: %v\n", err)
+			} else {
+				fmt.Println("VACUUM complete")
 			}
-
-		case <-backupTicker.C:
-			c.createBackup()
 		}
 	}
 }

-// createBackup creates a hot backup of the database using SQLite's backup API
-func (c *Crawler) createBackup() {
-	backupPath := "feeds/feeds.db.backup"
-	fmt.Println("Creating database backup...")
+// StartPublishLoop automatically publishes unpublished items for approved feeds
+// Grabs up to 50 items sorted by discovered_at, publishes one per second, then reloops
+func (c *Crawler) StartPublishLoop() {
+	// Load PDS credentials from environment or pds.env file
+	pdsHost := os.Getenv("PDS_HOST")
+	pdsAdminPassword := os.Getenv("PDS_ADMIN_PASSWORD")

-	// Use SQLite's online backup via VACUUM INTO (available in SQLite 3.27+)
-	// This creates a consistent snapshot without blocking writers
-	if _, err := c.db.Exec("VACUUM INTO ?", backupPath); err != nil {
-		fmt.Printf("Backup error: %v\n", err)
+	if pdsHost == "" || pdsAdminPassword == "" {
+		if data, err := os.ReadFile("pds.env"); err == nil {
+			for _, line := range strings.Split(string(data), "\n") {
+				line = strings.TrimSpace(line)
+				if strings.HasPrefix(line, "#") || line == "" {
+					continue
+				}
+				parts := strings.SplitN(line, "=", 2)
+				if len(parts) == 2 {
+					key := strings.TrimSpace(parts[0])
+					value := strings.TrimSpace(parts[1])
+					switch key {
+					case "PDS_HOST":
+						pdsHost = value
+					case "PDS_ADMIN_PASSWORD":
+						pdsAdminPassword = value
+					}
+				}
+			}
+		}
+	}
+
+	if pdsHost == "" || pdsAdminPassword == "" {
+		fmt.Println("Publish loop: PDS credentials not configured, skipping")
 		return
 	}

-	fmt.Printf("Backup created: %s\n", backupPath)
+	fmt.Printf("Publish loop: starting with PDS %s\n", pdsHost)
+	feedPassword := "feed1440!"
+
+	// Cache sessions per account
+	sessions := make(map[string]*PDSSession)
+	publisher := NewPublisher(pdsHost)
+
+	for {
+		// Get up to 50 unpublished items from approved feeds, sorted by discovered_at ASC
+		items, err := c.GetAllUnpublishedItems(50)
+		if err != nil {
+			fmt.Printf("Publish loop error: %v\n", err)
+			time.Sleep(1 * time.Second)
+			continue
+		}
+
+		if len(items) == 0 {
+			time.Sleep(1 * time.Second)
+			continue
+		}
+
+		// Publish one item per second
+		for _, item := range items {
+			// Get or create session for this feed's account
+			account := c.getAccountForFeed(item.FeedURL)
+			if account == "" {
+				time.Sleep(1 * time.Second)
+				continue
+			}
+
+			session, ok := sessions[account]
+			if !ok {
+				// Try to log in
+				session, err = publisher.CreateSession(account, feedPassword)
+				if err != nil {
+					// Account might not exist - try to create it
+					inviteCode, err := publisher.CreateInviteCode(pdsAdminPassword, 1)
+					if err != nil {
+						fmt.Printf("Publish: failed to create invite for %s: %v\n", account, err)
+						time.Sleep(1 * time.Second)
+						continue
+					}
+
+					email := account + "@1440.news"
+					session, err = publisher.CreateAccount(account, email, feedPassword, inviteCode)
+					if err != nil {
+						fmt.Printf("Publish: failed to create account %s: %v\n", account, err)
+						time.Sleep(1 * time.Second)
+						continue
+					}
+					fmt.Printf("Publish: created account %s\n", account)
+					c.db.Exec("UPDATE feeds SET publish_account = $1 WHERE url = $2", account, item.FeedURL)
+
+					// Set up profile for new account
+					feedInfo := c.getFeedInfo(item.FeedURL)
+					if feedInfo != nil {
+						displayName := feedInfo.Title
+						if displayName == "" {
+							displayName = account
+						}
+						description := feedInfo.Description
+						if description == "" {
+							description = "News feed via 1440.news"
+						}
+						// Truncate if needed
+						if len(displayName) > 64 {
+							displayName = displayName[:61] + "..."
+						}
+						if len(description) > 256 {
+							description = description[:253] + "..."
+						}
+						if err := publisher.UpdateProfile(session, displayName, description, nil); err != nil {
+							fmt.Printf("Publish: failed to set profile for %s: %v\n", account, err)
+						} else {
+							fmt.Printf("Publish: set profile for %s\n", account)
+						}
+					}
+				}
+				sessions[account] = session
+			}
+
+			// Publish the item
+			uri, err := publisher.PublishItem(session, &item)
+			if err != nil {
+				fmt.Printf("Publish: failed item %d: %v\n", item.ID, err)
+				// Clear session cache on auth errors
+				if strings.Contains(err.Error(), "401") || strings.Contains(err.Error(), "auth") {
+					delete(sessions, account)
+				}
+			} else {
+				c.MarkItemPublished(item.ID, uri)
+				fmt.Printf("Publish: %s -> %s\n", item.Title[:min(40, len(item.Title))], account)
+			}
+
+			time.Sleep(1 * time.Second)
+		}
+
+		time.Sleep(1 * time.Second)
+	}
+}
+
+// getAccountForFeed returns the publish account for a feed URL
+func (c *Crawler) getAccountForFeed(feedURL string) string {
+	var account *string
+	err := c.db.QueryRow(`
+		SELECT publish_account FROM feeds
+		WHERE url = $1 AND publish_status = 'pass' AND status = 'active'
+	`, feedURL).Scan(&account)
+	if err != nil || account == nil || *account == "" {
+		// Derive handle from feed URL
+		return DeriveHandleFromFeed(feedURL)
+	}
+	return *account
+}
+
+// FeedInfo holds basic feed metadata for profile setup
+type FeedInfo struct {
+	Title       string
+	Description string
+	SiteURL     string
+}
+
+// getFeedInfo returns feed metadata for profile setup
+func (c *Crawler) getFeedInfo(feedURL string) *FeedInfo {
+	var title, description, siteURL *string
+	err := c.db.QueryRow(`
+		SELECT title, description, site_url FROM feeds WHERE url = $1
+	`, feedURL).Scan(&title, &description, &siteURL)
+	if err != nil {
+		return nil
+	}
+	return &FeedInfo{
+		Title:       StringValue(title),
+		Description: StringValue(description),
+		SiteURL:     StringValue(siteURL),
+	}
+}
+
+// GetAllUnpublishedItems returns unpublished items from all approved feeds
+func (c *Crawler) GetAllUnpublishedItems(limit int) ([]Item, error) {
+	rows, err := c.db.Query(`
+		SELECT i.id, i.feed_url, i.guid, i.title, i.link, i.description, i.content,
+		       i.author, i.pub_date, i.discovered_at
+		FROM items i
+		JOIN feeds f ON i.feed_url = f.url
+		WHERE f.publish_status = 'pass'
+		  AND f.status = 'active'
+		  AND i.published_at IS NULL
+		ORDER BY i.discovered_at ASC
+		LIMIT $1
+	`, limit)
+	if err != nil {
+		return nil, err
+	}
+	defer rows.Close()
+
+	var items []Item
+	for rows.Next() {
+		var item Item
+		var guid, title, link, description, content, author *string
+		var pubDate, discoveredAt *time.Time
+
+		err := rows.Scan(&item.ID, &item.FeedURL, &guid, &title, &link, &description,
+			&content, &author, &pubDate, &discoveredAt)
+		if err != nil {
+			continue
+		}
+
+		item.GUID = StringValue(guid)
+		item.Title = StringValue(title)
+		item.Link = StringValue(link)
+		item.Description = StringValue(description)
+		item.Content = StringValue(content)
+		item.Author = StringValue(author)
+		item.PubDate = TimeValue(pubDate)
+		item.DiscoveredAt = TimeValue(discoveredAt)
+
+		items = append(items, item)
+	}
+
+	return items, nil
 }

 // StartCrawlLoop runs the domain crawling loop independently
@@ -1,27 +1,31 @@
 package main

 import (
-	"database/sql"
+	"context"
 	"fmt"
+	"net/url"
+	"os"
+	"strings"
 	"time"

-	_ "modernc.org/sqlite"
+	"github.com/jackc/pgx/v5"
+	"github.com/jackc/pgx/v5/pgxpool"
 )

 const schema = `
 CREATE TABLE IF NOT EXISTS domains (
 	host TEXT PRIMARY KEY,
 	status TEXT NOT NULL DEFAULT 'unchecked',
-	discoveredAt DATETIME NOT NULL,
-	lastCrawledAt DATETIME,
-	feedsFound INTEGER DEFAULT 0,
-	lastError TEXT,
+	discovered_at TIMESTAMPTZ NOT NULL,
+	last_crawled_at TIMESTAMPTZ,
+	feeds_found INTEGER DEFAULT 0,
+	last_error TEXT,
 	tld TEXT
 );

 CREATE INDEX IF NOT EXISTS idx_domains_status ON domains(status);
 CREATE INDEX IF NOT EXISTS idx_domains_tld ON domains(tld);
-CREATE INDEX IF NOT EXISTS idx_domains_feedsFound ON domains(feedsFound DESC) WHERE feedsFound > 0;
+CREATE INDEX IF NOT EXISTS idx_domains_feeds_found ON domains(feeds_found DESC) WHERE feeds_found > 0;

 CREATE TABLE IF NOT EXISTS feeds (
 	url TEXT PRIMARY KEY,
@@ -30,196 +34,195 @@ CREATE TABLE IF NOT EXISTS feeds (
 	title TEXT,
 	description TEXT,
 	language TEXT,
-	siteUrl TEXT,
+	site_url TEXT,

-	discoveredAt DATETIME NOT NULL,
-	lastCrawledAt DATETIME,
-	nextCrawlAt DATETIME,
-	lastBuildDate DATETIME,
+	discovered_at TIMESTAMPTZ NOT NULL,
+	last_crawled_at TIMESTAMPTZ,
+	next_crawl_at TIMESTAMPTZ,
+	last_build_date TIMESTAMPTZ,

 	etag TEXT,
-	lastModified TEXT,
+	last_modified TEXT,

-	ttlMinutes INTEGER,
-	updatePeriod TEXT,
-	updateFreq INTEGER,
+	ttl_minutes INTEGER,
+	update_period TEXT,
+	update_freq INTEGER,

 	status TEXT DEFAULT 'active',
-	errorCount INTEGER DEFAULT 0,
-	lastError TEXT,
-	lastErrorAt DATETIME,
+	error_count INTEGER DEFAULT 0,
+	last_error TEXT,
+	last_error_at TIMESTAMPTZ,

-	sourceUrl TEXT,
-	sourceHost TEXT,
+	source_url TEXT,
+	source_host TEXT,
 	tld TEXT,

-	itemCount INTEGER,
-	avgPostFreqHrs REAL,
-	oldestItemDate DATETIME,
-	newestItemDate DATETIME,
+	item_count INTEGER,
+	avg_post_freq_hrs DOUBLE PRECISION,
+	oldest_item_date TIMESTAMPTZ,
+	newest_item_date TIMESTAMPTZ,

-	noUpdate INTEGER DEFAULT 0,
+	no_update INTEGER DEFAULT 0,

 	-- Publishing to PDS
-	publishStatus TEXT DEFAULT 'held' CHECK(publishStatus IN ('held', 'pass', 'fail')),
-	publishAccount TEXT
+	publish_status TEXT DEFAULT 'held' CHECK(publish_status IN ('held', 'pass', 'deny')),
+	publish_account TEXT,
+
+	-- Full-text search vector
+	search_vector tsvector GENERATED ALWAYS AS (
+		setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
+		setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
+		setweight(to_tsvector('english', coalesce(url, '')), 'C')
+	) STORED
 );

-CREATE INDEX IF NOT EXISTS idx_feeds_sourceHost ON feeds(sourceHost);
-CREATE INDEX IF NOT EXISTS idx_feeds_publishStatus ON feeds(publishStatus);
-CREATE INDEX IF NOT EXISTS idx_feeds_sourceHost_url ON feeds(sourceHost, url);
+CREATE INDEX IF NOT EXISTS idx_feeds_source_host ON feeds(source_host);
+CREATE INDEX IF NOT EXISTS idx_feeds_publish_status ON feeds(publish_status);
+CREATE INDEX IF NOT EXISTS idx_feeds_source_host_url ON feeds(source_host, url);
 CREATE INDEX IF NOT EXISTS idx_feeds_tld ON feeds(tld);
-CREATE INDEX IF NOT EXISTS idx_feeds_tld_sourceHost ON feeds(tld, sourceHost);
+CREATE INDEX IF NOT EXISTS idx_feeds_tld_source_host ON feeds(tld, source_host);
 CREATE INDEX IF NOT EXISTS idx_feeds_type ON feeds(type);
 CREATE INDEX IF NOT EXISTS idx_feeds_category ON feeds(category);
 CREATE INDEX IF NOT EXISTS idx_feeds_status ON feeds(status);
-CREATE INDEX IF NOT EXISTS idx_feeds_discoveredAt ON feeds(discoveredAt);
+CREATE INDEX IF NOT EXISTS idx_feeds_discovered_at ON feeds(discovered_at);
 CREATE INDEX IF NOT EXISTS idx_feeds_title ON feeds(title);
+CREATE INDEX IF NOT EXISTS idx_feeds_search ON feeds USING GIN(search_vector);

 CREATE TABLE IF NOT EXISTS items (
-	id INTEGER PRIMARY KEY AUTOINCREMENT,
-	feedUrl TEXT NOT NULL,
+	id BIGSERIAL PRIMARY KEY,
+	feed_url TEXT NOT NULL,
 	guid TEXT,
 	title TEXT,
 	link TEXT,
 	description TEXT,
 	content TEXT,
 	author TEXT,
-	pubDate DATETIME,
-	discoveredAt DATETIME NOT NULL,
-	updatedAt DATETIME,
+	pub_date TIMESTAMPTZ,
+	discovered_at TIMESTAMPTZ NOT NULL,
+	updated_at TIMESTAMPTZ,

 	-- Media attachments
-	enclosureUrl TEXT,
-	enclosureType TEXT,
-	enclosureLength INTEGER,
-	imageUrls TEXT,  -- JSON array of image URLs
+	enclosure_url TEXT,
+	enclosure_type TEXT,
+	enclosure_length BIGINT,
+	image_urls TEXT,  -- JSON array of image URLs

 	-- Publishing to PDS
-	publishedAt DATETIME,
-	publishedUri TEXT,
+	published_at TIMESTAMPTZ,
+	published_uri TEXT,

-	UNIQUE(feedUrl, guid)
+	-- Full-text search vector
+	search_vector tsvector GENERATED ALWAYS AS (
+		setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
+		setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
+		setweight(to_tsvector('english', coalesce(content, '')), 'C') ||
+		setweight(to_tsvector('english', coalesce(author, '')), 'D')
+	) STORED,
+
+	UNIQUE(feed_url, guid)
 );

-CREATE INDEX IF NOT EXISTS idx_items_feedUrl ON items(feedUrl);
-CREATE INDEX IF NOT EXISTS idx_items_pubDate ON items(pubDate DESC);
+CREATE INDEX IF NOT EXISTS idx_items_feed_url ON items(feed_url);
+CREATE INDEX IF NOT EXISTS idx_items_pub_date ON items(pub_date DESC);
 CREATE INDEX IF NOT EXISTS idx_items_link ON items(link);
-CREATE INDEX IF NOT EXISTS idx_items_feedUrl_pubDate ON items(feedUrl, pubDate DESC);
-CREATE INDEX IF NOT EXISTS idx_items_unpublished ON items(feedUrl, publishedAt) WHERE publishedAt IS NULL;
+CREATE INDEX IF NOT EXISTS idx_items_feed_url_pub_date ON items(feed_url, pub_date DESC);
+CREATE INDEX IF NOT EXISTS idx_items_unpublished ON items(feed_url, published_at) WHERE published_at IS NULL;
+CREATE INDEX IF NOT EXISTS idx_items_search ON items USING GIN(search_vector);

-- Full-text search for feeds
-CREATE VIRTUAL TABLE IF NOT EXISTS feeds_fts USING fts5(
-	url,
-	title,
-	description,
-	content='feeds',
-	content_rowid='rowid'
-);
-
-- Triggers to keep FTS in sync
-CREATE TRIGGER IF NOT EXISTS feeds_ai AFTER INSERT ON feeds BEGIN
-	INSERT INTO feeds_fts(rowid, url, title, description)
-	VALUES (NEW.rowid, NEW.url, NEW.title, NEW.description);
+-- Trigger to normalize feed URLs on insert/update (strips https://, http://, www.)
+CREATE OR REPLACE FUNCTION normalize_feed_url()
+RETURNS TRIGGER AS $$
+BEGIN
+    NEW.url = regexp_replace(NEW.url, '^https?://', '');
+    NEW.url = regexp_replace(NEW.url, '^www\.', '');
+    RETURN NEW;
 END;
+$$ LANGUAGE plpgsql;

-CREATE TRIGGER IF NOT EXISTS feeds_ad AFTER DELETE ON feeds BEGIN
-	INSERT INTO feeds_fts(feeds_fts, rowid, url, title, description)
-	VALUES ('delete', OLD.rowid, OLD.url, OLD.title, OLD.description);
-END;
-
-CREATE TRIGGER IF NOT EXISTS feeds_au AFTER UPDATE ON feeds BEGIN
-	INSERT INTO feeds_fts(feeds_fts, rowid, url, title, description)
-	VALUES ('delete', OLD.rowid, OLD.url, OLD.title, OLD.description);
-	INSERT INTO feeds_fts(rowid, url, title, description)
-	VALUES (NEW.rowid, NEW.url, NEW.title, NEW.description);
-END;
-
-- Full-text search for items
-CREATE VIRTUAL TABLE IF NOT EXISTS items_fts USING fts5(
-	title,
-	description,
-	content,
-	author,
-	content='items',
-	content_rowid='id'
-);
-
-- Triggers to keep items FTS in sync
-CREATE TRIGGER IF NOT EXISTS items_ai AFTER INSERT ON items BEGIN
-	INSERT INTO items_fts(rowid, title, description, content, author)
-	VALUES (NEW.id, NEW.title, NEW.description, NEW.content, NEW.author);
-END;
-
-CREATE TRIGGER IF NOT EXISTS items_ad AFTER DELETE ON items BEGIN
-	INSERT INTO items_fts(items_fts, rowid, title, description, content, author)
-	VALUES ('delete', OLD.id, OLD.title, OLD.description, OLD.content, OLD.author);
-END;
-
-CREATE TRIGGER IF NOT EXISTS items_au AFTER UPDATE ON items BEGIN
-	INSERT INTO items_fts(items_fts, rowid, title, description, content, author)
-	VALUES ('delete', OLD.id, OLD.title, OLD.description, OLD.content, OLD.author);
-	INSERT INTO items_fts(rowid, title, description, content, author)
-	VALUES (NEW.id, NEW.title, NEW.description, NEW.content, NEW.author);
-END;
+DROP TRIGGER IF EXISTS normalize_feed_url_trigger ON feeds;
+CREATE TRIGGER normalize_feed_url_trigger
+    BEFORE INSERT OR UPDATE ON feeds
+    FOR EACH ROW
+    EXECUTE FUNCTION normalize_feed_url();
 `

-func OpenDatabase(dbPath string) (*sql.DB, error) {
-	fmt.Printf("Opening database: %s\n", dbPath)
+// DB wraps pgxpool.Pool with helper methods
+type DB struct {
+	*pgxpool.Pool
+}

-	// Use pragmas in connection string for consistent application
-	// - busy_timeout: wait up to 10s for locks instead of failing immediately
-	// - journal_mode: WAL for better concurrency and crash recovery
-	// - synchronous: NORMAL is safe with WAL (fsync at checkpoint, not every commit)
-	// - wal_autocheckpoint: checkpoint every 1000 pages (~4MB) to prevent WAL bloat
-	// - foreign_keys: enforce referential integrity
-	connStr := dbPath + "?_pragma=busy_timeout(10000)&_pragma=journal_mode(WAL)&_pragma=synchronous(NORMAL)&_pragma=wal_autocheckpoint(1000)&_pragma=foreign_keys(ON)"
-	db, err := sql.Open("sqlite", connStr)
+func OpenDatabase(connString string) (*DB, error) {
+	fmt.Printf("Connecting to database...\n")
+
+	// If connection string not provided, try environment variables
+	if connString == "" {
+		connString = os.Getenv("DATABASE_URL")
+	}
+	if connString == "" {
+		// Build from individual env vars
+		host := getEnvOrDefault("DB_HOST", "atproto-postgres")
+		port := getEnvOrDefault("DB_PORT", "5432")
+		user := getEnvOrDefault("DB_USER", "news_1440")
+		dbname := getEnvOrDefault("DB_NAME", "news_1440")
+
+		// Support Docker secrets (password file) or direct password
+		password := os.Getenv("DB_PASSWORD")
+		if password == "" {
+			if passwordFile := os.Getenv("DB_PASSWORD_FILE"); passwordFile != "" {
+				data, err := os.ReadFile(passwordFile)
+				if err != nil {
+					return nil, fmt.Errorf("failed to read password file: %v", err)
+				}
+				password = strings.TrimSpace(string(data))
+			}
+		}
+
+		connString = fmt.Sprintf("postgres://%s:%s@%s:%s/%s?sslmode=disable",
+			user, url.QueryEscape(password), host, port, dbname)
+	}
+
+	config, err := pgxpool.ParseConfig(connString)
 	if err != nil {
-		return nil, fmt.Errorf("failed to open database: %v", err)
+		return nil, fmt.Errorf("failed to parse connection string: %v", err)
 	}

-	// Connection pool settings for stability
-	db.SetMaxOpenConns(4)                  // Limit concurrent connections
-	db.SetMaxIdleConns(2)                  // Keep some connections warm
-	db.SetConnMaxLifetime(5 * time.Minute) // Recycle connections periodically
-	db.SetConnMaxIdleTime(1 * time.Minute) // Close idle connections
+	// Connection pool settings
+	config.MaxConns = 10
+	config.MinConns = 2
+	config.MaxConnLifetime = 5 * time.Minute
+	config.MaxConnIdleTime = 1 * time.Minute

-	// Verify connection and show journal mode
-	var journalMode string
-	if err := db.QueryRow("PRAGMA journal_mode").Scan(&journalMode); err != nil {
-		fmt.Printf("  Warning: could not query journal_mode: %v\n", err)
-	} else {
-		fmt.Printf("  Journal mode: %s\n", journalMode)
+	ctx := context.Background()
+	pool, err := pgxpool.NewWithConfig(ctx, config)
+	if err != nil {
+		return nil, fmt.Errorf("failed to connect to database: %v", err)
 	}

+	// Verify connection
+	if err := pool.Ping(ctx); err != nil {
+		pool.Close()
+		return nil, fmt.Errorf("failed to ping database: %v", err)
+	}
+	fmt.Println("  Connected to PostgreSQL")
+
+	db := &DB{pool}
+
 	// Create schema
-	if _, err := db.Exec(schema); err != nil {
-		db.Close()
+	if _, err := pool.Exec(ctx, schema); err != nil {
+		pool.Close()
 		return nil, fmt.Errorf("failed to create schema: %v", err)
 	}
 	fmt.Println("  Schema OK")

-	// Migrations for existing databases
-	migrations := []string{
-		"ALTER TABLE items ADD COLUMN enclosureUrl TEXT",
-		"ALTER TABLE items ADD COLUMN enclosureType TEXT",
-		"ALTER TABLE items ADD COLUMN enclosureLength INTEGER",
-		"ALTER TABLE items ADD COLUMN imageUrls TEXT",
-	}
-	for _, m := range migrations {
-		db.Exec(m) // Ignore errors (column may already exist)
-	}
-
-	// Run stats and ANALYZE in background to avoid blocking startup with large databases
+	// Run stats in background
 	go func() {
 		var domainCount, feedCount int
-		db.QueryRow("SELECT COUNT(*) FROM domains").Scan(&domainCount)
-		db.QueryRow("SELECT COUNT(*) FROM feeds").Scan(&feedCount)
+		pool.QueryRow(context.Background(), "SELECT COUNT(*) FROM domains").Scan(&domainCount)
+		pool.QueryRow(context.Background(), "SELECT COUNT(*) FROM feeds").Scan(&feedCount)
 		fmt.Printf("  Existing data: %d domains, %d feeds\n", domainCount, feedCount)

 		fmt.Println("  Running ANALYZE...")
-		if _, err := db.Exec("ANALYZE"); err != nil {
+		if _, err := pool.Exec(context.Background(), "ANALYZE"); err != nil {
 			fmt.Printf("  Warning: ANALYZE failed: %v\n", err)
 		} else {
 			fmt.Println("  ANALYZE complete")
@@ -228,3 +231,82 @@ func OpenDatabase(dbPath string) (*sql.DB, error) {

 	return db, nil
 }
+
+func getEnvOrDefault(key, defaultVal string) string {
+	if val := os.Getenv(key); val != "" {
+		return val
+	}
+	return defaultVal
+}
+
+// QueryRow wraps pool.QueryRow for compatibility
+func (db *DB) QueryRow(query string, args ...interface{}) pgx.Row {
+	return db.Pool.QueryRow(context.Background(), query, args...)
+}
+
+// Query wraps pool.Query for compatibility
+func (db *DB) Query(query string, args ...interface{}) (pgx.Rows, error) {
+	return db.Pool.Query(context.Background(), query, args...)
+}
+
+// Exec wraps pool.Exec for compatibility
+func (db *DB) Exec(query string, args ...interface{}) (int64, error) {
+	result, err := db.Pool.Exec(context.Background(), query, args...)
+	if err != nil {
+		return 0, err
+	}
+	return result.RowsAffected(), nil
+}
+
+// Begin starts a transaction
+func (db *DB) Begin() (pgx.Tx, error) {
+	return db.Pool.Begin(context.Background())
+}
+
+// Close closes the connection pool
+func (db *DB) Close() error {
+	db.Pool.Close()
+	return nil
+}
+
+// NullableString returns nil for empty strings, otherwise the string pointer
+func NullableString(s string) *string {
+	if s == "" {
+		return nil
+	}
+	return &s
+}
+
+// NullableTime returns nil for zero times, otherwise the time pointer
+func NullableTime(t time.Time) *time.Time {
+	if t.IsZero() {
+		return nil
+	}
+	return &t
+}
+
+// StringValue returns empty string for nil, otherwise the dereferenced value
+func StringValue(s *string) string {
+	if s == nil {
+		return ""
+	}
+	return *s
+}
+
+// TimeValue returns zero time for nil, otherwise the dereferenced value
+func TimeValue(t *time.Time) time.Time {
+	if t == nil {
+		return time.Time{}
+	}
+	return *t
+}
+
+// ToSearchQuery converts a user query to PostgreSQL tsquery format
+func ToSearchQuery(query string) string {
+	// Simple conversion: split on spaces and join with &
+	words := strings.Fields(query)
+	if len(words) == 0 {
+		return ""
+	}
+	return strings.Join(words, " & ")
+}
@@ -6,11 +6,19 @@ services:
    stop_grace_period: 30s
    env_file:
      - pds.env
+    environment:
+      DB_HOST: atproto-postgres
+      DB_PORT: 5432
+      DB_USER: news_1440
+      DB_PASSWORD_FILE: /run/secrets/db_password
+      DB_NAME: news_1440
+    secrets:
+      - db_password
    volumes:
-      - ./feeds:/app/feeds
      - ./vertices.txt.gz:/app/vertices.txt.gz:ro
    networks:
      - proxy
+      - atproto
    labels:
      - "traefik.enable=true"
      # Production: HTTPS with Let's Encrypt
@@ -29,6 +37,12 @@ services:
      # Shared service
      - "traefik.http.services.app-1440-news.loadbalancer.server.port=4321"

+secrets:
+  db_password:
+    file: ../postgres/secrets/news_1440_password.txt
+
 networks:
  proxy:
    external: true
+  atproto:
+    external: true
@@ -3,13 +3,15 @@ package main
 import (
 	"bufio"
 	"compress/gzip"
-	"database/sql"
+	"context"
 	"fmt"
 	"io"
 	"os"
 	"strings"
 	"sync/atomic"
 	"time"
+
+	"github.com/jackc/pgx/v5"
 )

 // Domain represents a host to be crawled for feeds
@@ -23,78 +25,74 @@ type Domain struct {
 	TLD           string    `json:"tld,omitempty"`
 }

-// saveDomain stores a domain in SQLite
+// saveDomain stores a domain in PostgreSQL
 func (c *Crawler) saveDomain(domain *Domain) error {
 	_, err := c.db.Exec(`
-		INSERT INTO domains (host, status, discoveredAt, lastCrawledAt, feedsFound, lastError, tld)
-		VALUES (?, ?, ?, ?, ?, ?, ?)
+		INSERT INTO domains (host, status, discovered_at, last_crawled_at, feeds_found, last_error, tld)
+		VALUES ($1, $2, $3, $4, $5, $6, $7)
 		ON CONFLICT(host) DO UPDATE SET
-			status = excluded.status,
-			lastCrawledAt = excluded.lastCrawledAt,
-			feedsFound = excluded.feedsFound,
-			lastError = excluded.lastError,
-			tld = excluded.tld
-	`, domain.Host, domain.Status, domain.DiscoveredAt, nullTime(domain.LastCrawledAt),
-		domain.FeedsFound, nullString(domain.LastError), domain.TLD)
+			status = EXCLUDED.status,
+			last_crawled_at = EXCLUDED.last_crawled_at,
+			feeds_found = EXCLUDED.feeds_found,
+			last_error = EXCLUDED.last_error,
+			tld = EXCLUDED.tld
+	`, domain.Host, domain.Status, domain.DiscoveredAt, NullableTime(domain.LastCrawledAt),
+		domain.FeedsFound, NullableString(domain.LastError), domain.TLD)
 	return err
 }

 // saveDomainTx stores a domain using a transaction
-func (c *Crawler) saveDomainTx(tx *sql.Tx, domain *Domain) error {
-	_, err := tx.Exec(`
-		INSERT INTO domains (host, status, discoveredAt, lastCrawledAt, feedsFound, lastError, tld)
-		VALUES (?, ?, ?, ?, ?, ?, ?)
+func (c *Crawler) saveDomainTx(tx pgx.Tx, domain *Domain) error {
+	_, err := tx.Exec(context.Background(), `
+		INSERT INTO domains (host, status, discovered_at, last_crawled_at, feeds_found, last_error, tld)
+		VALUES ($1, $2, $3, $4, $5, $6, $7)
 		ON CONFLICT(host) DO NOTHING
-	`, domain.Host, domain.Status, domain.DiscoveredAt, nullTime(domain.LastCrawledAt),
-		domain.FeedsFound, nullString(domain.LastError), domain.TLD)
+	`, domain.Host, domain.Status, domain.DiscoveredAt, NullableTime(domain.LastCrawledAt),
+		domain.FeedsFound, NullableString(domain.LastError), domain.TLD)
 	return err
 }

 // domainExists checks if a domain already exists in the database
 func (c *Crawler) domainExists(host string) bool {
 	var exists bool
-	err := c.db.QueryRow("SELECT EXISTS(SELECT 1 FROM domains WHERE host = ?)", normalizeHost(host)).Scan(&exists)
+	err := c.db.QueryRow("SELECT EXISTS(SELECT 1 FROM domains WHERE host = $1)", normalizeHost(host)).Scan(&exists)
 	return err == nil && exists
 }

-// getDomain retrieves a domain from SQLite
+// getDomain retrieves a domain from PostgreSQL
 func (c *Crawler) getDomain(host string) (*Domain, error) {
 	domain := &Domain{}
-	var lastCrawledAt sql.NullTime
-	var lastError sql.NullString
+	var lastCrawledAt *time.Time
+	var lastError *string

 	err := c.db.QueryRow(`
-		SELECT host, status, discoveredAt, lastCrawledAt, feedsFound, lastError, tld
-		FROM domains WHERE host = ?
+		SELECT host, status, discovered_at, last_crawled_at, feeds_found, last_error, tld
+		FROM domains WHERE host = $1
 	`, normalizeHost(host)).Scan(
 		&domain.Host, &domain.Status, &domain.DiscoveredAt, &lastCrawledAt,
 		&domain.FeedsFound, &lastError, &domain.TLD,
 	)

-	if err == sql.ErrNoRows {
+	if err == pgx.ErrNoRows {
 		return nil, nil
 	}
 	if err != nil {
 		return nil, err
 	}

-	if lastCrawledAt.Valid {
-		domain.LastCrawledAt = lastCrawledAt.Time
-	}
-	if lastError.Valid {
-		domain.LastError = lastError.String
-	}
+	domain.LastCrawledAt = TimeValue(lastCrawledAt)
+	domain.LastError = StringValue(lastError)

 	return domain, nil
 }

-// GetUncheckedDomains returns up to limit unchecked domains ordered by discoveredAt (FIFO)
+// GetUncheckedDomains returns up to limit unchecked domains ordered by discovered_at (FIFO)
 func (c *Crawler) GetUncheckedDomains(limit int) ([]*Domain, error) {
 	rows, err := c.db.Query(`
-		SELECT host, status, discoveredAt, lastCrawledAt, feedsFound, lastError, tld
+		SELECT host, status, discovered_at, last_crawled_at, feeds_found, last_error, tld
 		FROM domains WHERE status = 'unchecked'
-		ORDER BY discoveredAt ASC
-		LIMIT ?
+		ORDER BY discovered_at ASC
+		LIMIT $1
 	`, limit)
 	if err != nil {
 		return nil, err
@@ -105,12 +103,12 @@ func (c *Crawler) GetUncheckedDomains(limit int) ([]*Domain, error) {
 }

 // scanDomains is a helper to scan multiple domain rows
-func (c *Crawler) scanDomains(rows *sql.Rows) ([]*Domain, error) {
+func (c *Crawler) scanDomains(rows pgx.Rows) ([]*Domain, error) {
 	var domains []*Domain
 	for rows.Next() {
 		domain := &Domain{}
-		var lastCrawledAt sql.NullTime
-		var lastError sql.NullString
+		var lastCrawledAt *time.Time
+		var lastError *string

 		if err := rows.Scan(
 			&domain.Host, &domain.Status, &domain.DiscoveredAt, &lastCrawledAt,
@@ -119,12 +117,8 @@ func (c *Crawler) scanDomains(rows *sql.Rows) ([]*Domain, error) {
 			continue
 		}

-		if lastCrawledAt.Valid {
-			domain.LastCrawledAt = lastCrawledAt.Time
-		}
-		if lastError.Valid {
-			domain.LastError = lastError.String
-		}
+		domain.LastCrawledAt = TimeValue(lastCrawledAt)
+		domain.LastError = StringValue(lastError)

 		domains = append(domains, domain)
 	}
@@ -142,13 +136,13 @@ func (c *Crawler) markDomainCrawled(host string, feedsFound int, lastError strin
 	var err error
 	if lastError != "" {
 		_, err = c.db.Exec(`
-			UPDATE domains SET status = ?, lastCrawledAt = ?, feedsFound = ?, lastError = ?
-			WHERE host = ?
+			UPDATE domains SET status = $1, last_crawled_at = $2, feeds_found = $3, last_error = $4
+			WHERE host = $5
 		`, status, time.Now(), feedsFound, lastError, normalizeHost(host))
 	} else {
 		_, err = c.db.Exec(`
-			UPDATE domains SET status = ?, lastCrawledAt = ?, feedsFound = ?, lastError = NULL
-			WHERE host = ?
+			UPDATE domains SET status = $1, last_crawled_at = $2, feeds_found = $3, last_error = NULL
+			WHERE host = $4
 		`, status, time.Now(), feedsFound, normalizeHost(host))
 	}
 	return err
@@ -164,6 +158,23 @@ func (c *Crawler) GetDomainCount() (total int, unchecked int, err error) {
 	return total, unchecked, err
 }

+// ImportTestDomains adds a list of specific domains for testing
+func (c *Crawler) ImportTestDomains(domains []string) {
+	now := time.Now()
+	for _, host := range domains {
+		_, err := c.db.Exec(`
+			INSERT INTO domains (host, status, discovered_at, tld)
+			VALUES ($1, 'unchecked', $2, $3)
+			ON CONFLICT(host) DO NOTHING
+		`, host, now, getTLD(host))
+		if err != nil {
+			fmt.Printf("Error adding test domain %s: %v\n", host, err)
+		} else {
+			fmt.Printf("Added test domain: %s\n", host)
+		}
+	}
+}
+
 // ImportDomainsFromFile reads a vertices file and stores new domains as "unchecked"
 func (c *Crawler) ImportDomainsFromFile(filename string, limit int) (imported int, skipped int, err error) {
 	file, err := os.Open(filename)
@@ -212,7 +223,6 @@ func (c *Crawler) ImportDomainsInBackground(filename string) {

 		const batchSize = 1000
 		now := time.Now()
-		nowStr := now.Format("2006-01-02 15:04:05")
 		totalImported := 0
 		batchCount := 0

@@ -240,31 +250,43 @@ func (c *Crawler) ImportDomainsInBackground(filename string) {
 				break
 			}

-			// Build bulk INSERT statement
-			var sb strings.Builder
-			sb.WriteString("INSERT INTO domains (host, status, discoveredAt, tld) VALUES ")
-			args := make([]interface{}, 0, len(domains)*4)
-			for i, d := range domains {
-				if i > 0 {
-					sb.WriteString(",")
-				}
-				sb.WriteString("(?, 'unchecked', ?, ?)")
-				args = append(args, d.host, nowStr, d.tld)
-			}
-			sb.WriteString(" ON CONFLICT(host) DO NOTHING")
-
-			// Execute bulk insert
-			result, err := c.db.Exec(sb.String(), args...)
-			imported := 0
+			// Use COPY for bulk insert (much faster than individual INSERTs)
+			ctx := context.Background()
+			conn, err := c.db.Acquire(ctx)
 			if err != nil {
-				fmt.Printf("Bulk insert error: %v\n", err)
-			} else {
-				rowsAffected, _ := result.RowsAffected()
-				imported = int(rowsAffected)
+				fmt.Printf("Failed to acquire connection: %v\n", err)
+				break
+			}
+
+			// Build rows for copy
+			rows := make([][]interface{}, len(domains))
+			for i, d := range domains {
+				rows[i] = []interface{}{d.host, "unchecked", now, d.tld}
+			}
+
+			// Use CopyFrom for bulk insert
+			imported, err := conn.CopyFrom(
+				ctx,
+				pgx.Identifier{"domains"},
+				[]string{"host", "status", "discovered_at", "tld"},
+				pgx.CopyFromRows(rows),
+			)
+			conn.Release()
+
+			if err != nil {
+				// Fall back to individual inserts with ON CONFLICT
+				for _, d := range domains {
+					c.db.Exec(`
+						INSERT INTO domains (host, status, discovered_at, tld)
+						VALUES ($1, 'unchecked', $2, $3)
+						ON CONFLICT(host) DO NOTHING
+					`, d.host, now, d.tld)
+				}
+				imported = int64(len(domains))
 			}

 			batchCount++
-			totalImported += imported
+			totalImported += int(imported)
 			atomic.AddInt32(&c.domainsImported, int32(imported))

 			// Wait 1 second before the next batch
@@ -304,7 +326,6 @@ func (c *Crawler) parseAndStoreDomains(reader io.Reader, limit int) (imported in
 	scanner.Buffer(buf, 1024*1024)

 	now := time.Now()
-	nowStr := now.Format("2006-01-02 15:04:05")
 	count := 0
 	const batchSize = 1000

@@ -336,28 +357,21 @@ func (c *Crawler) parseAndStoreDomains(reader io.Reader, limit int) (imported in
 			break
 		}

-		// Build bulk INSERT statement
-		var sb strings.Builder
-		sb.WriteString("INSERT INTO domains (host, status, discoveredAt, tld) VALUES ")
-		args := make([]interface{}, 0, len(domains)*4)
-		for i, d := range domains {
-			if i > 0 {
-				sb.WriteString(",")
+		// Insert with ON CONFLICT
+		for _, d := range domains {
+			result, err := c.db.Exec(`
+				INSERT INTO domains (host, status, discovered_at, tld)
+				VALUES ($1, 'unchecked', $2, $3)
+				ON CONFLICT(host) DO NOTHING
+			`, d.host, now, d.tld)
+			if err != nil {
+				skipped++
+			} else if result > 0 {
+				imported++
+			} else {
+				skipped++
 			}
-			sb.WriteString("(?, 'unchecked', ?, ?)")
-			args = append(args, d.host, nowStr, d.tld)
 		}
-		sb.WriteString(" ON CONFLICT(host) DO NOTHING")
-
-		// Execute bulk insert
-		result, execErr := c.db.Exec(sb.String(), args...)
-		if execErr != nil {
-			skipped += len(domains)
-			continue
-		}
-		rowsAffected, _ := result.RowsAffected()
-		imported += int(rowsAffected)
-		skipped += len(domains) - int(rowsAffected)

 		if limit > 0 && count >= limit {
 			break
@@ -370,18 +384,3 @@ func (c *Crawler) parseAndStoreDomains(reader io.Reader, limit int) (imported in

 	return imported, skipped, nil
 }
-
-// Helper functions for SQL null handling
-func nullTime(t time.Time) sql.NullTime {
-	if t.IsZero() {
-		return sql.NullTime{}
-	}
-	return sql.NullTime{Time: t, Valid: true}
-}
-
-func nullString(s string) sql.NullString {
-	if s == "" {
-		return sql.NullString{}
-	}
-	return sql.NullString{String: s, Valid: true}
-}
@@ -77,7 +77,11 @@ func (c *Crawler) extractFeedLinks(n *html.Node, baseURL string) []simpleFeed {

 func (c *Crawler) extractAnchorFeeds(n *html.Node, baseURL string) []simpleFeed {
 	feeds := make([]simpleFeed, 0)
-	feedPattern := regexp.MustCompile(`(?i)(rss|atom|feed)`)
+	// Match feed URLs more precisely:
+	// - /feed, /rss, /atom as path segments (not "feeds" or "feedback")
+	// - .rss, .atom, .xml file extensions
+	// - ?feed=, ?format=rss, etc.
+	feedPattern := regexp.MustCompile(`(?i)(/feed/?$|/feed/|/rss/?$|/rss/|/atom/?$|/atom/|\.rss|\.atom|\.xml|\?.*feed=|\?.*format=rss|\?.*format=atom)`)

 	var f func(*html.Node)
 	f = func(n *html.Node) {
@@ -8,13 +8,8 @@ import (
 )

 func main() {
-	// Ensure feeds directory exists
-	if err := os.MkdirAll("feeds", 0755); err != nil {
-		fmt.Fprintf(os.Stderr, "Error creating feeds directory: %v\n", err)
-		os.Exit(1)
-	}
-
-	crawler, err := NewCrawler("feeds/feeds.db")
+	// Connection string from environment (DATABASE_URL or DB_* vars)
+	crawler, err := NewCrawler("")
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "Error initializing crawler: %v\n", err)
 		os.Exit(1)
@@ -37,8 +32,14 @@ func main() {
 	// Start all loops independently
 	fmt.Println("Starting import, crawl, check, and stats loops...")

-	// Import loop (background)
-	go crawler.ImportDomainsInBackground("vertices.txt.gz")
+	// Import loop (background) - DISABLED for testing, using manual domains
+	// go crawler.ImportDomainsInBackground("vertices.txt.gz")
+
+	// Add only ycombinator domains for testing
+	go crawler.ImportTestDomains([]string{
+		"news.ycombinator.com",
+		"ycombinator.com",
+	})

 	// Check loop (background)
 	go crawler.StartCheckLoop()
@@ -52,6 +53,9 @@ func main() {
 	// Maintenance loop (background) - WAL checkpoints and integrity checks
 	go crawler.StartMaintenanceLoop()

+	// Publish loop (background) - autopublishes items for approved feeds
+	go crawler.StartPublishLoop()
+
 	// Crawl loop (background)
 	go crawler.StartCrawlLoop()

@@ -3,7 +3,6 @@ package main
 import (
 	"bytes"
 	"crypto/sha256"
-	"encoding/base32"
 	"encoding/json"
 	"fmt"
 	"io"
@@ -12,6 +11,7 @@ import (
 	"regexp"
 	"strings"
 	"time"
+	"unicode/utf8"
 )

 // Publisher handles posting items to AT Protocol PDS
@@ -196,22 +196,41 @@ func (p *Publisher) CreateInviteCode(adminPassword string, useCount int) (string
 	return result.Code, nil
 }

-// GenerateRkey creates a deterministic rkey from a GUID and timestamp
-// Uses a truncated base32-encoded SHA256 hash
-// Including the timestamp allows regenerating a new rkey by updating discoveredAt
+// TID alphabet for base32-sortable encoding
+const tidAlphabet = "234567abcdefghijklmnopqrstuvwxyz"
+
+// GenerateRkey creates a deterministic TID-format rkey from a GUID and timestamp
+// TIDs are required by Bluesky relay for indexing - custom rkeys don't sync
+// Format: 13 chars base32-sortable, 53 bits timestamp + 10 bits clock ID
 func GenerateRkey(guid string, timestamp time.Time) string {
 	if guid == "" {
 		return ""
 	}

-	// Combine GUID with timestamp for the hash input
-	// Format timestamp to second precision for consistency
-	input := guid + "|" + timestamp.UTC().Format(time.RFC3339)
-	hash := sha256.Sum256([]byte(input))
-	// Use first 10 bytes (80 bits) - plenty for uniqueness
-	// Base32 encode without padding, lowercase for rkey compatibility
-	encoded := base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(hash[:10])
-	return strings.ToLower(encoded)
+	// Get microseconds since Unix epoch (53 bits)
+	microsInt := timestamp.UnixMicro()
+	if microsInt < 0 {
+		microsInt = 0
+	}
+	// Convert to uint64 and mask to 53 bits
+	micros := uint64(microsInt) & ((1 << 53) - 1)
+
+	// Generate deterministic 10-bit clock ID from GUID hash
+	hash := sha256.Sum256([]byte(guid))
+	clockID := uint64(hash[0])<<2 | uint64(hash[1])>>6
+	clockID = clockID & ((1 << 10) - 1) // 10 bits = 0-1023
+
+	// Combine: top bit 0, 53 bits timestamp, 10 bits clock ID
+	tid := (micros << 10) | clockID
+
+	// Encode as base32-sortable (13 characters)
+	var result [13]byte
+	for i := 12; i >= 0; i-- {
+		result[i] = tidAlphabet[tid&0x1f]
+		tid >>= 5
+	}
+
+	return string(result[:])
 }

 // extractURLs finds all URLs in a string
@@ -239,7 +258,8 @@ func (p *Publisher) PublishItem(session *PDSSession, item *Item) (string, error)
 		return "", fmt.Errorf("item has no GUID or link, cannot publish")
 	}

-	// Collect all unique URLs: main link + any URLs in description
+	// Collect URLs: main link + HN comments link (if applicable)
+	// Limit to 2 URLs max to stay under 300 grapheme limit
 	urlSet := make(map[string]bool)
 	var allURLs []string

@@ -249,8 +269,18 @@ func (p *Publisher) PublishItem(session *PDSSession, item *Item) (string, error)
 		allURLs = append(allURLs, item.Link)
 	}

-	// Add enclosure URL for podcasts/media (audio/video)
-	if item.Enclosure != nil && item.Enclosure.URL != "" {
+	// For HN feeds, add comments link from description (looks like "https://news.ycombinator.com/item?id=...")
+	descURLs := extractURLs(item.Description)
+	for _, u := range descURLs {
+		if strings.Contains(u, "news.ycombinator.com/item") && !urlSet[u] {
+			urlSet[u] = true
+			allURLs = append(allURLs, u)
+			break // Only add one comments link
+		}
+	}
+
+	// Add enclosure URL for podcasts/media (audio/video) if we have room
+	if len(allURLs) < 2 && item.Enclosure != nil && item.Enclosure.URL != "" {
 		encType := strings.ToLower(item.Enclosure.Type)
 		if strings.HasPrefix(encType, "audio/") || strings.HasPrefix(encType, "video/") {
 			if !urlSet[item.Enclosure.URL] {
@@ -260,59 +290,52 @@ func (p *Publisher) PublishItem(session *PDSSession, item *Item) (string, error)
 		}
 	}

-	// Extract URLs from description
-	descURLs := extractURLs(item.Description)
-	for _, u := range descURLs {
-		if !urlSet[u] {
-			urlSet[u] = true
-			allURLs = append(allURLs, u)
-		}
-	}
-
-	// Extract URLs from content if available
-	contentURLs := extractURLs(item.Content)
-	for _, u := range contentURLs {
-		if !urlSet[u] {
-			urlSet[u] = true
-			allURLs = append(allURLs, u)
-		}
-	}
-
 	// Build post text: title + all links
-	// Bluesky has 300 grapheme limit
-	var textBuilder strings.Builder
-	textBuilder.WriteString(item.Title)
+	// Bluesky has 300 grapheme limit - use rune count as approximation
+	const maxGraphemes = 295 // Leave some margin

+	// Calculate space needed for URLs (in runes)
+	urlSpace := 0
 	for _, u := range allURLs {
-		textBuilder.WriteString("\n\n")
-		textBuilder.WriteString(u)
+		urlSpace += utf8.RuneCountInString(u) + 2 // +2 for \n\n
 	}

-	text := textBuilder.String()
+	// Truncate title if needed
+	title := item.Title
+	titleRunes := utf8.RuneCountInString(title)
+	maxTitleRunes := maxGraphemes - urlSpace - 3 // -3 for "..."

-	// Truncate title if text is too long (keep URLs intact)
-	const maxLen = 300
-	if len(text) > maxLen {
-		// Calculate space needed for URLs
-		urlSpace := 0
-		for _, u := range allURLs {
-			urlSpace += len(u) + 2 // +2 for \n\n
-		}
-
-		maxTitleLen := maxLen - urlSpace - 3 // -3 for "..."
-		if maxTitleLen > 10 {
-			text = item.Title[:maxTitleLen] + "..."
-			for _, u := range allURLs {
-				text += "\n\n" + u
+	if titleRunes+urlSpace > maxGraphemes {
+		if maxTitleRunes > 10 {
+			// Truncate title to fit
+			runes := []rune(title)
+			if len(runes) > maxTitleRunes {
+				title = string(runes[:maxTitleRunes]) + "..."
+			}
+		} else {
+			// Title too long even with minimal space - just truncate hard
+			runes := []rune(title)
+			if len(runes) > 50 {
+				title = string(runes[:50]) + "..."
 			}
 		}
 	}

-	// Use item's pubDate for createdAt, fall back to now
-	createdAt := time.Now()
-	if !item.PubDate.IsZero() {
-		createdAt = item.PubDate
+	// Build final text
+	var textBuilder strings.Builder
+	textBuilder.WriteString(title)
+	for _, u := range allURLs {
+		textBuilder.WriteString("\n\n")
+		textBuilder.WriteString(u)
 	}
+	text := textBuilder.String()
+
+	// Use current time for createdAt (Bluesky won't index backdated posts)
+	// TODO: Restore original pubDate once Bluesky indexing is understood
+	createdAt := time.Now()
+	// if !item.PubDate.IsZero() {
+	// 	createdAt = item.PubDate
+	// }

 	post := BskyPost{
 		Type:      "app.bsky.feed.post",
@@ -258,6 +258,7 @@ function initDashboard() {
            output.innerHTML = html;
            attachTldHandlers(output.querySelector('.tld-list'));
        } catch (err) {
+            console.error('TLDs error:', err);
            output.innerHTML = '<div style="color: #f66; padding: 10px;">Error: ' + escapeHtml(err.message) + '</div>';
        }
    }
@@ -301,7 +302,7 @@ function initDashboard() {
                const result = await response.json();

                if (!result.data || result.data.length === 0) {
-                    infiniteScrollState.ended = true;
+                    if (infiniteScrollState) infiniteScrollState.ended = true;
                    document.getElementById('infiniteLoader').textContent = offset === 0 ? 'No results found' : 'End of list';
                    return;
                }
@@ -319,11 +320,12 @@ function initDashboard() {
                offset += result.data.length;

                if (result.data.length < limit) {
-                    infiniteScrollState.ended = true;
+                    if (infiniteScrollState) infiniteScrollState.ended = true;
                    document.getElementById('infiniteLoader').textContent = 'End of list';
                }
            } catch (err) {
-                document.getElementById('infiniteLoader').textContent = 'Error loading';
+                console.error('Filter error:', err);
+                document.getElementById('infiniteLoader').textContent = 'Error loading: ' + err.message;
            }
        }

@@ -479,17 +481,26 @@ function initDashboard() {
        output.innerHTML = '<div style="color: #666; padding: 10px;">Loading publish data...</div>';

        try {
-            const [candidatesRes, passedRes] = await Promise.all([
+            const [candidatesRes, passedRes, deniedRes] = await Promise.all([
                fetch('/api/publishCandidates?limit=50'),
-                fetch('/api/publishEnabled')
+                fetch('/api/publishEnabled'),
+                fetch('/api/publishDenied')
            ]);
            const candidates = await candidatesRes.json();
            const passed = await passedRes.json();
+            const denied = await deniedRes.json();

            let html = '<div style="padding: 10px;">';

+            // Filter buttons
+            html += '<div style="margin-bottom: 15px; display: flex; gap: 10px;">';
+            html += '<button class="filter-btn" data-filter="pass" style="padding: 6px 16px; background: #040; border: 1px solid #060; border-radius: 3px; color: #0a0; cursor: pointer;">Pass (' + passed.length + ')</button>';
+            html += '<button class="filter-btn" data-filter="held" style="padding: 6px 16px; background: #330; border: 1px solid #550; border-radius: 3px; color: #f90; cursor: pointer;">Held (' + candidates.length + ')</button>';
+            html += '<button class="filter-btn" data-filter="deny" style="padding: 6px 16px; background: #400; border: 1px solid #600; border-radius: 3px; color: #f66; cursor: pointer;">Deny (' + denied.length + ')</button>';
+            html += '</div>';
+
            // Passed feeds (approved for publishing)
-            html += '<div style="margin-bottom: 20px;">';
+            html += '<div id="section-pass" style="margin-bottom: 20px;">';
            html += '<div style="color: #0a0; font-weight: bold; margin-bottom: 10px; border-bottom: 1px solid #333; padding-bottom: 5px;">✓ Approved for Publishing (' + passed.length + ')</div>';
            if (passed.length === 0) {
                html += '<div style="color: #666; padding: 10px;">No feeds approved yet</div>';
@@ -501,14 +512,14 @@ function initDashboard() {
                    html += '<div style="color: #666; font-size: 0.85em;">' + escapeHtml(f.url) + '</div>';
                    html += '<div style="color: #888; font-size: 0.85em;">→ ' + escapeHtml(f.account) + ' (' + f.unpublished_count + ' unpublished)</div>';
                    html += '</div>';
-                    html += '<button class="status-btn" data-url="' + escapeHtml(f.url) + '" data-status="fail" style="padding: 4px 12px; background: #400; border: 1px solid #600; border-radius: 3px; color: #f66; cursor: pointer; margin-left: 10px;">Revoke</button>';
+                    html += '<button class="status-btn" data-url="' + escapeHtml(f.url) + '" data-status="deny" style="padding: 4px 12px; background: #400; border: 1px solid #600; border-radius: 3px; color: #f66; cursor: pointer; margin-left: 10px;">Revoke</button>';
                    html += '</div>';
                });
            }
            html += '</div>';

            // Candidates (held for review)
-            html += '<div>';
+            html += '<div id="section-held">';
            html += '<div style="color: #f90; font-weight: bold; margin-bottom: 10px; border-bottom: 1px solid #333; padding-bottom: 5px;">⏳ Held for Review (' + candidates.length + ')</div>';
            if (candidates.length === 0) {
                html += '<div style="color: #666; padding: 10px;">No candidates held</div>';
@@ -523,7 +534,28 @@ function initDashboard() {
                    html += '<div style="color: #555; font-size: 0.8em;">' + escapeHtml(f.source_host) + ' · ' + f.item_count + ' items · ' + escapeHtml(f.category) + '</div>';
                    html += '</div>';
                    html += '<button class="status-btn pass-btn" data-url="' + escapeHtml(f.url) + '" data-status="pass" style="padding: 4px 12px; background: #040; border: 1px solid #060; border-radius: 3px; color: #0a0; cursor: pointer; margin-left: 10px;">Pass</button>';
-                    html += '<button class="status-btn fail-btn" data-url="' + escapeHtml(f.url) + '" data-status="fail" style="padding: 4px 12px; background: #400; border: 1px solid #600; border-radius: 3px; color: #f66; cursor: pointer; margin-left: 5px;">Fail</button>';
+                    html += '<button class="status-btn deny-btn" data-url="' + escapeHtml(f.url) + '" data-status="deny" style="padding: 4px 12px; background: #400; border: 1px solid #600; border-radius: 3px; color: #f66; cursor: pointer; margin-left: 5px;">Deny</button>';
+                    html += '</div>';
+                    html += '</div>';
+                });
+            }
+            html += '</div>';
+
+            // Denied feeds
+            html += '<div id="section-deny" style="display: none;">';
+            html += '<div style="color: #f66; font-weight: bold; margin-bottom: 10px; border-bottom: 1px solid #333; padding-bottom: 5px;">✗ Denied (' + denied.length + ')</div>';
+            if (denied.length === 0) {
+                html += '<div style="color: #666; padding: 10px;">No feeds denied</div>';
+            } else {
+                denied.forEach(f => {
+                    html += '<div class="publish-row" style="padding: 8px; border-bottom: 1px solid #202020;">';
+                    html += '<div style="display: flex; align-items: center;">';
+                    html += '<div style="flex: 1;">';
+                    html += '<div style="color: #0af;">' + escapeHtml(f.title || f.url) + '</div>';
+                    html += '<div style="color: #666; font-size: 0.85em;">' + escapeHtml(f.url) + '</div>';
+                    html += '<div style="color: #555; font-size: 0.8em;">' + escapeHtml(f.source_host) + ' · ' + f.item_count + ' items</div>';
+                    html += '</div>';
+                    html += '<button class="status-btn" data-url="' + escapeHtml(f.url) + '" data-status="held" style="padding: 4px 12px; background: #330; border: 1px solid #550; border-radius: 3px; color: #f90; cursor: pointer; margin-left: 10px;">Restore</button>';
                    html += '</div>';
                    html += '</div>';
                });
@@ -533,7 +565,21 @@ function initDashboard() {
            html += '</div>';
            output.innerHTML = html;

-            // Attach handlers for pass/fail buttons
+            // Filter button handlers
+            output.querySelectorAll('.filter-btn').forEach(btn => {
+                btn.addEventListener('click', () => {
+                    const filter = btn.dataset.filter;
+                    document.getElementById('section-pass').style.display = filter === 'pass' ? 'block' : 'none';
+                    document.getElementById('section-held').style.display = filter === 'held' ? 'block' : 'none';
+                    document.getElementById('section-deny').style.display = filter === 'deny' ? 'block' : 'none';
+                    // Update button styles
+                    output.querySelectorAll('.filter-btn').forEach(b => {
+                        b.style.opacity = b.dataset.filter === filter ? '1' : '0.5';
+                    });
+                });
+            });
+
+            // Attach handlers for pass/deny buttons
            output.querySelectorAll('.status-btn').forEach(btn => {
                btn.addEventListener('click', async () => {
                    const url = btn.dataset.url;