News
Image: Epoch AI The latest results from FrontierMath, a benchmark test for generative AI on advanced ... Other rankings include: OpenAI o1 Grok-3 mini Claude 3.7 Sonnet (16K) Grok-3 Claude 3.7 ...
A pseudonymous developer has created what they’re calling a “free speech eval,” SpeechMap, for the AI models ... the chatbot Grok. Grok 3 responds to 96.2% of SpeechMap’s test prompts ...
OpenAI has launched HealthBench, a new dataset designed to test how accurately AI models respond to real-world health care ...
Surveillance of speech by algorithm raises urgent questions about data privacy and the future of a neutral, expert public ...
Meta is scrambling to grab some of that ChatGPT and Grok buzz with the launch of its own standalone AI app. Built on its ...
“We’re seeing [internally], with o3 in aggressive test-time compute settings ... of publishing misleading benchmark charts for its latest AI model, Grok 3. Just this month, Meta admitted ...
Their latest model, Llama 4, underlines features such as text conversations, voice conversations and image editing, which ...
Baron Funds, an investment management company, released its “Baron Technology Fund” first quarter 2025 investor letter. A ...
Benchmark reveals which LLMs you can use for some SEO tasks. It also reminds us that humans are more reliable than AI (for ...
The AI app market is forecast to grow by a compound annual growth rate of 80.7% over the next five years, according to the AI App Report. Chatbots, image ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results