Commit Graph

9 Commits

Author SHA1 Message Date
primal
aa6f571215 Add PDS credentials env file for service auth 2026-01-26 19:38:36 -05:00
primal
67bd8339b2 Move crawler to app.1440.news subdomain 2026-01-26 17:10:50 -05:00
primal
6a3f894d6a Rename container to app-1440-news 2026-01-26 16:30:00 -05:00
primal
143807378f Add Docker support and refactor data layer 2026-01-26 16:02:05 -05:00
primal
398e7b3969 Add Docker Compose config with Traefik HTTPS routing
Configure container deployment with:
- HTTPS via Traefik with LetsEncrypt certificate
- HTTP to HTTPS redirect for production (1440.news)
- HTTP-only routing for local development (1440.localhost)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 15:30:03 -05:00
primal
93ab1f8117 Update CLAUDE.md to reflect current multi-file architecture
The codebase evolved from a single-file app to a multi-file structure
with SQLite persistence, dashboard, and concurrent processing loops.
Updated documentation to accurately describe current architecture.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 13:47:19 -05:00
primal
219b49352e Add PebbleDB storage, domain tracking, and web dashboard
- Split main.go into separate files for better organization:
  crawler.go, domain.go, feed.go, parser.go, html.go, util.go
- Add PebbleDB for persistent storage of feeds and domains
- Store feeds with metadata: title, TTL, update frequency, ETag, etc.
- Track domains with crawl status (uncrawled/crawled/error)
- Normalize URLs by stripping scheme and www. prefix
- Add web dashboard on port 4321 with real-time stats:
  - Crawl progress with completion percentage
  - Feed counts by type (RSS/Atom)
  - Top TLDs and domains by feed count
  - Recent feeds table
- Filter out comment feeds from results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:29:00 -05:00
primal
0dd612b7e1 Rename feed directory to feeds
Update output directory path in main.go and .gitignore to use
feeds/ instead of feed/.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 15:36:56 -05:00
primal
f4cae127cc Add feed crawler with documentation
- main.go: RSS/Atom feed crawler using Common Crawl data
- CLAUDE.md: Project documentation for Claude Code
- .gitignore: Ignore binary and go.* files
- Feed output now written to feed/ directory

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 15:15:30 -05:00