Taalas serves Llama 3.1 8B at 17,000 tokens/second
Summary
Taalas, a Canadian hardware startup, has created custom silicon (specialized computer chips) that runs Llama 3.1 8B (a type of AI language model that processes text) at 17,000 tokens per second (units of text the AI can process). The hardware uses aggressive quantization (a technique that compresses the model by reducing precision of its numerical values) with 3-bit and 6-bit parameters (different levels of data compression), and their next version will use 4-bit compression.
Classification
Affected Vendors
Related Issues
Original source: https://simonwillison.net/2026/Feb/20/taalas/#atom-everything
First tracked: February 20, 2026 at 07:00 PM
Classified by LLM (prompt v3) · confidence: 85%