Sapient researchers trained a 1B reasoning model on just 40B tokens — scoring competitively with 2B-7B models at a fraction ...
Most AI models are designed to be autoregressive—they generate text left to right one token at a time. DiffusionGemma has ...