Anthropic and OpenAI Bring Their AI Competition Into the World of Scientific Discovery
Anthropic launched Claude Science, an AI research workbench connecting over 60 scientific databases, while OpenAI released GeneBench-Pro, a 129-problem benchmark for computational biology — both on the same day.
On the same Tuesday, two of the biggest names in artificial intelligence simultaneously expanded their rivalry into a brand-new arena: scientific research. Anthropic unveiled Claude Science, a dedicated AI workbench built for researchers, while OpenAI countered with GeneBench-Pro, a benchmark designed to measure how capable AI systems are at handling real computational biology challenges.
The coordinated — though likely coincidental — dual release signals a clear shift in the AI industry's ambitions. The competition is no longer just about chatbots or coding assistants. It's moving into laboratories, genomics pipelines, and drug discovery workflows.
Anthropica's offering takes a practical, product-first approach. Claude Science is not a new underlying model but a purpose-built application that consolidates the tools scientists rely on every day — databases, code environments, and computing resources — into a single interface. The app connects to more than 60 scientific databases spanning genomics, proteomics, and cheminformatics. Critically, every result it generates is traceable back to the specific code that produced it, giving researchers the kind of auditability that scientific work demands.
The launch builds on a life sciences initiative Anthropic started in October 2025. During its beta phase, researcher Jérôme Lecoq from the Allen Institute used the tool to dramatically cut down literature review timelines that previously stretched up to two years. Anthropic is also backing the initiative financially, offering funding for up to 50 research projects with credits of up to $30,000 each. Notably, Claude Science arrives while the company's most advanced models — Fable 5 and Mythos 5 — remain restricted under current US export regulations.
OpenAI's response came in the form of measurement rather than a deployable tool. GeneBench-Pro is a rigorous benchmark consisting of 129 problems drawn from genomics, quantitative biology, and translational medicine. The problems are deliberately complex: independent reviewers estimated that a human expert would need between 20 and 40 hours and thousands of dollars to work through a single one. OpenAI's top-performing model, GPT-5.6 Sol, reportedly completes the equivalent analysis for just a few dollars.
The raw performance numbers, however, tell a more sobering story. GPT-5.6 Sol solved 28.7% of the benchmark problems at its highest reasoning setting, climbing slightly to 31.5% in Pro mode. For context, GPT-5 scored below 5% on the original GeneBench. Anthropic's Opus 4.8 reached 16% on the harder version. These figures underscore that even the most advanced AI systems still fail the majority of research-grade biology tasks.
The two launches represent genuinely different strategies. Anthropic is focused on getting a working product into researchers' hands right now. OpenAI is focused on defining what good AI-driven science looks like and how far current systems fall short of that bar. Both approaches have merit, and both reflect the enormous pressure each company faces — not only from each other, but from rapidly improving Chinese AI models that are increasingly competitive in research contexts.
The geopolitical dimension adds another layer of urgency. US export restrictions have already forced Anthropic to explore alternative host countries for some of its model deployments, while OpenAI's earlier staggered release of GPT-5.6 was reportedly carried out at Washington's request.
Looking beyond the benchmarks, scientists and medical researchers are already expressing confidence in AI's transformative potential. Aubrey de Grey, President and Chief Science Officer of the Longevity Escape Velocity Foundation, has argued that AI will soon eliminate certain bottlenecks in drug development, even if broader medical gains will take longer to materialize. He noted, however, that translating faster research into approved therapies still hinges on regulatory frameworks and public risk tolerance.
Immunologist Dr. Derya Unutmaz went further, stating that after 35 years in his field he now trusts AI-generated insights more than his own intuitions. He predicted that failing to incorporate AI into clinical practice will eventually be considered not just poor judgment, but malpractice.
Whether these ambitions are validated will depend on whether researchers actually adopt these tools at scale — and whether GeneBench-Pro scores begin to move meaningfully upward over the coming months.


