Subquadratic’s SubQ claims 56x speed and $8 inference beat for Opus 4.6

An independent Appen evaluation backs some of the Miami AI startup’s most expensive-to-dismiss efficiency promises.

ByKhalid Al-HarbiBusiness Desk, The Executives Brief

about 11 hours ago·4 min read

Subquadratic’s SubQ claims 56x speed and $8 inference beat for Opus 4.6

Executive summary

Miami-based startup Subquadratic says its SubQ model breaks a long-running LLM bottleneck by using sparse attention instead of dense attention. If Appen’s independent results hold up, executives should rethink cost, speed, and the near-term ceiling for “transformer-era” model design.

Subquadratic, the Miami AI startup that emerged from stealth last month, is now putting numbers behind its most eyebrow-raising claim: a model it says is faster, cheaper, and dramatically more energy efficient than anything else on the market. The company’s latest evidence starts with independent testing from third-party firm Appen, which reported SubQ running 56 times faster in a speed baseline against models using FlashAttention, a prior sparse-attention technique. It also points to a live coding result: on LiveCodeBench, SubQ scored 89.7%, placing it in the same ballpark as other top coding models.

The cost story is even more aggressive, at least in Subquadratic’s own comparisons. CEO Justin Dangel says it costs $2600 to run Anthropic's LLM Opus 4.6 through RULER 128, a test developed by Nvidia to assess a model’s ability to retrieve information from large data sets. For SubQ, Dangel says, “It cost us eight dollars.” That “receipts” phase matters because Subquadratic’s earlier launch included thin details and a bunch of self-published benchmarks, which left many people unconvinced.

So what is Subquadratic actually claiming to fix? The core technical problem is the “dense attention” operation used by most LLMs. Dense attention multiplies relationships between tokens in a way that explodes as context grows. In the MIT Technology Review description, a transformer encodes each word (or token) into numbers, then multiplies those numbers with every other number for the text. As the text gets longer, computation grows quadratically, which is why today’s LLMs are notorious power hogs. Subquadratic’s approach is to ditch dense attention in favor of sparse attention, which selects only some token relationships to compute.

That sounds simple, but it is hard in practice because earlier sparse-attention attempts typically rely on fixed patterns, like comparing the first word to the fifth. Subquadratic says SubQ is different because it dynamically selects which tokens matter on the fly for each piece of text. Cofounder and chief technology officer Alex Whedon frames this as avoiding rigid “always compare position X to position Y” behavior and instead choosing relationships that are actually important for language tasks. The company also won’t say exactly how the selection works, but it emphasizes that the on-the-fly decision is the “secret sauce.”

Executives should also notice the incentives and timing here. Subquadratic expected “healthy skepticism,” Whedon says. And with hindsight, he argued that releasing third-party benchmarks alongside the initial announcement would have reduced the backlash. That is a board-level signal: the company is trying to de-risk its credibility now, using external evaluation rather than letting the narrative live or die on internal claims. In this case, Subquadratic asked Appen, which evaluates other companies’ models, to run tests on SubQ.

Appen’s generative AI research director, Jeanine Sinanan-Singh, said the results were “really exciting” because speed and inefficiency are core pain points for models. She also explains why independent validation matters: when results are “shocking,” it is less credible if the company is the one reporting them. The larger implication for decision-makers is that cost and compute advantages are only commercially real if they survive third-party scrutiny, especially in a market where many architectures promise efficiency but do not deliver across tasks.

SubQ is not positioned as a universal replacement for all top models. But the company argues it could deliver huge speed increases at a fraction of typical cost for certain data-heavy workloads. Subquadratic also cites capacity: it says SubQ can handle a context window up to 12 million tokens long, compared with many top models that have context windows around one million tokens. In a demo described by the source, Whedon asked SubQ to reason over information in 400 documents, and it responded “in seconds.” In the same demonstration flow, the source says Perplexity, a popular LLM-powered search engine, failed to load all 400 documents.

The deeper strategic stake: if Subquadratic’s breakthrough translates into real-world throughput and energy savings, it could pressure the roadmap assumptions behind “transformer-era” architectures. Subquadratic’s cofounder and CEO Justin Dangel says, “We hope we’re kicking off a new age of efficiency,” and adds, “We don’t think anybody will be building on transformers in a few years.” Even if SubQ is not a full-scale replacement, executives funding model training, deploying agents, or buying inference infrastructure should treat this as a potential inflection point in how compute budgets map to product performance.

And there is one more governance-relevant angle. Independent results were used now, but availability remains limited. SubQ is not yet widely available for others to test themselves, which means the market will likely keep demanding corroboration. That tension will matter for investors and operators alike: model claims live in two worlds, the lab and the production bill, and skepticism tends to fade only after repeated, verifiable performance under real constraints.

Executive ActionsLocked

This story's Key Insights and Take-aways are locked.

Create a free account to unlock Executive Actions for one credit.

Always free for Executives Club members. Join the Club

Taggedllms ai-infrastructure sparse-attention dense-attention benchmarking inference-cost retrieval nvidia-ruler appen subquadratic

Subquadratic’s SubQ claims 56x speed and $8 inference beat for Opus 4.6

This story's Key Insights and Take-aways are locked.

More in Business

Accenture’s $4.18bn play fails as AI fears spark a 20% worst-ever stock plunge

SpaceX stock jumps 3% after it overtakes Amazon’s market cap

SpaceX’s first options day breaks U.S. records after a $85B IPO win