Cambridge spin-out Trismik raises £2.2M to redefine AI evaluation

Borrowing from human IQ testing, the startup applies adaptive psychometrics to large language models, cutting costs and boosting precision.
Cambridge spin-out Trismik raises £2.2M to redefine AI evaluation

Cambridge University spin-out Trismik has just emerged from stealth with a £2.2 million Pre-Seed round to tackle this very problem and it’s doing so with a science-backed approach borrowed from human IQ testing.

At a time when traditional benchmarks like MMLU and GSM8K are saturating — with many leading models scoring above 90 per cent; Trismik is offering a rethink of how we measure AI capabilities. 

The team is applying Item Response Theory and Computerised Adaptive Testing — foundational methods in psychometrics — to LLM evaluation. This, they argue, enables faster, more scalable insights into what a model can actually do.

According to Professor Nigel Collier, NLP researcher at Cambridge and Trismik’s Chief Scientific Officer, if we want to trust AI, our methods have to be as rigorous as our ideas.

"Benchmark saturation is creating problems in every domain, from general knowledge, to reasoning, math, and coding.

Scientists, researchers and technical teams face mounting pressure as evaluation is exploding in importance and has become essential for tying AI to trust.We need an evaluation framework that scales and can support this."

Trismik’s platform adapts evaluation difficulty in real-time based on model responses — similar to how human aptitude tests tailor question sets to estimate intelligence. This technique allows the system to deliver near-identical accuracy rankings with a fraction of the questions.

Early results suggest promising efficiency: adaptive tests matched traditional evaluation rankings with Spearman correlations over 0.96, while requiring just 8.5 per cent of test items.  According to the company, this could cut evaluation costs by up to 95 per cent — a major incentive for teams spending six figures monthly on GPU compute just to assess their models.

This scientific approach is rooted in Professor Collier’s decades of research. Having published over 200 papers in NLP and AI, Collier has shifted focus to ensuring AI systems are measurable, explainable, and ultimately, trustworthy. His collaboration with CEO Rebekka Mikkola — a repeat founder with experience in enterprise AI sales — began in 2023 through a Cambridge Enterprise-backed design partnership with a major UK telco. The team was later joined by Marco Basaldella, a former Amazon scientist and TEDx speaker, as CTO.

With new regulatory frameworks on the horizon — from the EU AI Act to sector-specific compliance regimes — the demand for precise, transparent evaluation is intensifying. At the same time, AI development cycles are accelerating, putting pressure on teams to ship faster while ensuring models are safe, aligned, and effective. Generic benchmarks are falling short of these needs.

According to Trismik, they fail to reflect proprietary data distributions and domain-specific tasks. Worse still, traditional evaluations are static — offering no way to adapt over time as models evolve or shift objectives. The funding round was led by Twinpath Ventures, with support from Cambridge Enterprise Ventures, Parkwalk Advisors, Fund F, Vento Ventures, and angel network Ventures Together. A

“The AI evaluation market is at an inflection point. Every AI team we speak with is drowning in evaluation overhead, it has become the hidden bottleneck preventing teams from shipping faster and with confidence,” said John Spindler, from lead investor Twinpath Ventures.

“Trismik's approach is compelling because it applies proven scientific methods from a completely different domain to solve this problem.

When you can reduce evaluation time by two orders of magnitude while actually increasing measurement precision, you fundamentally change what's possible in AI development cycles.”

Trismik will now begin rolling out its LLM evaluation platform to AI builders. The product currently supports classical and adaptive testing across datasets related to factuality, alignment, reasoning, safety, and domain knowledge, offering a lightweight interface for fast experimentation.

The company envisions the platform evolving into a broader environment for LLM experimentation — incorporating fine-tuning, prompt engineering, compliance tracking, and performance visualisation.

“Trismik exemplifies Cambridge’s continued contribution to global AI development, with the team combining world-class academic credentials and practical industry experience that has given them the unique authority to define how AI capabilities should be measured,” said Dr Christine Martin, Head of Ventures at Cambridge Enterprise.

“By solving a pivotal challenge in AI adoption, Trismik is positioned to drive trust at scale — we’re excited to support their journey to market.”

The capital will go toward launching Trismik’s adaptive AI evaluation platform, which aims to replace slow, expensive benchmarking with fast, statistically precise assessments.

Early access to Trismik’s platform is available via its website, with adaptive testing capabilities already validated across seven models and five benchmark datasets. The team plans to publish further technical results and case studies later this year. Enterprise users will begin onboarding toward the end of 2025, with a full enterprise solution expected to launch in early 2026.

Follow the developments in the technology world. What would you like us to deliver to you?
Your subscription registration has been successfully created.