Fast company logo
|
advertisement

ChatGPT’s thunder just got stolen by an even smarter generative AI chatbot. But it still has an unnerving tendency to get facts wrong—or even make stuff up on the fly.

Microsoft’s OpenAI-powered Bing search is astounding—but messy

[Images: Microsoft]

BY Harry McCracken5 minute read

This story is from Fast Company’s new Plugged In newsletter, a weekly roundup of tech insights, news, and trends from global technology editor Harry McCracken, delivered to your inbox every Wednesday morning. Sign up for Plugged In—and all of our newsletters—here.


On Tuesday, Microsoft held an extravagant unveiling at its Redmond, Washington, headquarters for an updated version of a product most of us haven’t thought much about lately: Bing, the perennial distant second-place search engine after Google.

As expected, the new Bing incorporates an improved version of the technology from OpenAI’s ChatGPT, melded with Microsoft’s own search AI (dubbed “the Prometheus model”) and integrated with its Edge web browser. After the keynote, I got early access to the new version, which is currently in preview mode with a waitlist, and began exploring the features the company had shown off.

At less than 10 weeks old, ChatGPT is hardly legacy tech. But as the Microsoft executives onstage at its event claimed, the new Bing’s AI features are a pretty dramatic stride forward in this nascent category of chatbots powered by generative AI:

Bing is far more current. ChatGPT begs off answering many questions on the grounds that it’s based on a data set created back in 2020. Bing’s AI chat feature, by contrast, answered my questions about President Joe Biden’s State of the Union speech, which had ended shortly before I asked them.

It knows way more stuff. Bing readily identifies obscure people, places, and things that flummoxed ChatGPT. It clearly pulls in facts from vast quantities of web pages in a way that ChatGPT does not, then weaves them together adeptly. And unlike ChatGPT, it often cites its work with links back to the original pages.

It gives better advice. When I asked ChatGPT which e-bike brands to consider and where to buy used camera equipment, it understood the query, but its recommendations were vague and unsatisfying. The new Bing, however, offered up helpful ideas that were actually a good starting point for further research.

It’s even more glib. Bing expresses itself in language that’s just as clear as ChatGPT but feels more sophisticated. When I asked it to generate stories such as fantasy I Love Lucy scripts, it came up with ones that were much richer, imaginative, and amusing than ChatGPT’s. It even reacted to iffy queries with a sly sense of humor, refusing my request for a story in which Winnie the Pooh randomly beats up strangers—and instead giving me one in which he randomly hugs them.

It adds new tricks. At the moment, Microsoft forces those of us who have access to this version of Bing to use it in a test version of Edge with a Bing button. Clicking it opens up a chat panel that can interact with the web content you’re on, such as letting you say “summarize this page” for anything that looks way too TLDR to digest in its entirety. These summaries aren’t perfect—especially with longer articles—but they’re often eerily good. (Bing even got and called out a weird joke I made in the kicker to one article.)

Once more people get access to the new Bing, I can’t imagine why most would spend much time with ChatGPT—especially since Bing, at least in its current semipublic form, is far more robust and reliable.

advertisement

So far, so great. But Microsoft has not whipped the single biggest flaw in the GPT technology underlying both ChatGPT and this updated Bing. Rather than truly comprehending what it’s saying, GPT strings together words based on probabilistic data derived from all the existing text it’s digested. More often than not, that results in new text that’s not only comprehensible but also correct and useful. But it can also generate material that, though it may sound plausible, is either a little inaccurate or just plain fantasy—a phenomenon that AI scientists call hallucination.

I encountered this with my very first test of the new Bing. One of its sample queries provided recommendations for vacation trips that were a three-hour flight from London’s Heathrow Airport; I tweaked it to involve San Francisco’s SFO. Bing’s response declared that New York was only 3 hours and 15 minutes away from San Francisco—maybe by hyperloop!—and badly underestimated travel time to the Bahamas. It included Seychelles, even though it said the flight would take 7 hours and 45 minutes, which was also way off.

Bing does seem significantly less likely to indulge in outright hallucination than ChatGPT, but its results are nowhere near airtight. It told me that San Francisco’s present Cliff House building opened in 1989, 80 years after the correct date—perhaps because it confused the city’s 1906 and 1989 earthquakes. I asked for a list of books by the children’s author Roger Bradfield, and it named at least 50 imaginary titles (I got tired of counting). Like with ChatGPT, making the same request of Bing repeatedly can result in wildly different takes: When I asked it to profile, ahem, me, it made a point of saying that I was not the noted golf player of the same name. Then I asked again and it started talking about my famous love of the game.

Just to make the whole matter more nebulous, AI seems more likely to get creative when it’s discussing matters that aren’t exactly household knowledge, making them challenging to fact-check. When I asked Bing for a biography of cartoonist Chic Young, the creator of the comic strip Blondie, it fabricated a divorce, a second marriage, and some kids he never had. I know more about the history of comic strips than the average bear, but I was halfway convinced they were real until I consulted sources such as Young’s 1973 New York Times obituary.

Microsoft isn’t claiming that Bing has conquered the hallucination issue. At the Tuesday event, Sarah Bird of its Responsible AI team told me that the search engine will get some things wrong and that the company hopes users will regard its answers as starting points for deeper research of their own. Still, I wonder if the companies unleashing these AI bots are being a bit blithe about their potential to misinform the world at scale. On Wednesday, when Google demoed its own upcoming generative AI assistant, Bard, the company didn’t notice that one of its preplanned example research requests spewed inaccurate information about the James Webb Space Telescope.

Back in 1995, a new search site called AltaVista instantly retrieved pages from across the web in response to any text query, a feat that seemed incredible at the time. People were so dazzled that they tended to gloss over the fact that it often put terrible sites at the top. Three years later, Google’s breakthrough PageRank algorithm provided radically more relevant results, which is why Google is still with us and AltaVista is not.

Generative AI search’s struggles with accuracy remind me of AltaVista’s relevancy problem. And until they’re under control, tools such as the new Bing—mind-bendingly impressive though it is—will be fundamentally flawed.

Recognize your brand’s excellence by applying to this year’s Brands That Matter Awards before the early-rate deadline, May 3.

PluggedIn Newsletter logo
Sign up for our weekly tech digest.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Privacy Policy

ABOUT THE AUTHOR

Harry McCracken is the global technology editor for Fast Company, based in San Francisco. In past lives, he was editor at large for Time magazine, founder and editor of Technologizer, and editor of PC World More


Explore Topics