Archivum
RepoArchivum is a full-stack data warehouse for Twitch.tv, storing 6M+ messages across 50K+ users. The platform ingests real-time chat data via a custom concurrent IRC client, enriches it through scheduled jobs, and serves analytics through a REST API and React frontend. All components run containerized on a self-hosted Kubernetes cluster (ZimaBoard 2c/2t @ 1.1GHz), with the IRC client capable of 40 msgs/sec throughput under stress testing.
Stack architecture
The system runs entirely on Kubernetes, with each component containerized and deployed independently.
Data Ingestion
Custom IRC client with concurrent message parsing handles Twitch's chat streams. Each message is parsed for metadata (user info, subscriptions, badges, timestamps), native and third-party extension emotes are extracted, and unknown users trigger API lookups to Twitch before database writes.
Achieves 40 msgs/sec sustained throughput while maxing out CPU on the limited hardware.
Additional data like username changes and VOD metadata is retrieved through nightly Kubernetes CronJobs hitting Twitch's API.
Report Generator
Generates statistical reports for live streams via Kubernetes Jobs, requested by users for archival and analysis purposes.
Examples of reports include:
- Chat statistics (first-time chatters, subscriber ratios, average message length)
- Emote usage tracking (native and third-party extensions, with percentage-based filtering)
- Chat rankings (total messages per user, with filtered versions excluding emote-heavy spam)
- Word contribution analysis (percentage of stream's total words per user)
- General Statistics: View Report
- Unfiltered Chat Rankings: View Report
- Filtered Chat Rankings: View Report
Front-end
React/Tailwind frontend providing access to the data warehouse through the REST API.
Built with performance in mind for browsing large datasets:
- Pagination for chat history across millions of messages
- Search and filtering across 50K+ tracked users and 6M+ messages
- Responsive design for mobile and desktop
Key features:
- User search and chat history browsing
- Subscription and gift tracking
- Username change history over time