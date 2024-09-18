BY Harry McCracken3 minute read

When OpenAI announced an AI chatbot called ChatGPT in November 2022, it modestly described it as a “research release.” In no time at all, it was obvious that it was way, way more than that.

Still, as ChatGPT’s second anniversary approaches, it’s also less than a fully fleshed-out product. Compared to internet mainstays such as Google Search, it still has a whiff of proof-of-concept about it. Even before you get to challenges such as AI’s tendency to hallucinate, there’s a lot that can go wrong. It’s an awesomely powerful tool, but surprisingly impenetrable in some ways. I thought about that last week when OpenAI announced a new LLM called OpenAI o1—no “GPT.” Available for paid ChatGPT Plus users as an option (choose “o1-preview” from the Model drop-down menu), it can take far longer to chug away at generating an answer than the default model, GPT-4o, which has gotten remarkably zippy. But it’s been optimized for the kind of complex reasoning tasks that can befuddle earlier ChatGPT versions. According to OpenAI, o1 “ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the U.S. in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).” Given my profound lack of expertise in all these areas, I didn’t put its capabilities in them to the test. But I was amazed by its aptitude ant answering brain teasers from an old book full of them. GPT-4o’s answers for the same puzzles were so ridiculously wrong, it was almost endearing.

Even more remarkable: When I fed ChatGPT with o1-preview a bunch of 800-word mysteries starring the famed boy sleuth, Encyclopedia Brown, it often correctly solved them, explaining its reasoning in greater detail than Encyclopedia did in the Donald J. Sobol original stories. In some cases, where its solution is different from Sobol’s, it was still detailed and plausible. But as impressed as I am with OpenAI o1, its arrival complicates the matter of figuring out which variant of ChatGPT to use for any particular purpose. The names “GPT-4o” and “o1-preview” sure don’t make clear to us mere morals how the two versions relate to each other. Their taglines on the ChatGPT site—“Best for complex tasks” and “Uses advanced reasoning,” respectively—don’t help much, either. I haven’t even gotten to the other available models: o1-mini, GPT-4o mini, and GPT-4. Which version of ChatGPT should you use to tackle any particular project? In some cases, it might not be clear unless you try more than one and compare the results. It also helps to read OpenAI’s blog posts, which is where you’ll learn crucial details, such as the fact that o1-preview can’t handle uploaded files and images and doesn’t know how to browse the web.

Part of the problem here is that OpenAI isn’t just building new AI models to make ChatGPT more capable. It’s also offering them as cloud services for companies that need AI to power their own software. In that context, it makes plenty of sense for the company to provide several versions with different strengths at different price points. It’s also not too much to expect a technologist considering using them to figure out how they compare. It’s just that the same expectation has different implications in a product that so many consumers are using as a knowledge engine. To be fair, I love the fact that OpenAI isn’t the kind of company that shares its works-in-progress rather than holding off until they’re perfectly polished and integrated. All of us who love to play around with this stuff benefit from its willingness to experiment in public. And I keep reminding myself: The one time so far when OpenAI acted as if it was releasing a totally consumer-y product—the ChatGPT smartphone app’s uncannily human, Her-style “voice mode”—hasn’t yet resulted in a product to live up to the launch. Months later, only some ChatGPT Plus users have access to voice mode, and it’s missing much of the functionality that was so dazzling in the demo. Ultimately, it’s better that ChatGPT preserve some of its “research release” rawness than indulge in the overpromising that’s standard operating practice in the tech industry—especially in this current moment of AI hype.