Cognitive scientist, author and entrepreneur Gary Marcus was an early and vocal critic of large language models, arguing their limitations were far greater than many people thought. Lately, he has been making the case that LLMs are far riskier, too—and must be regulated.
In September, Marcus published “Taming Silicon Valley: How We Can Ensure That AI Works for Us.” The book argues that technological risks and moral problems raised by today’s AI are deeply intertwined. Marcus wrote the book in roughly two months, because he said there is an urgent need for greater skepticism to enter the public conversation about AI.
“The hype has led the average person to think these things are magic, but they’re not,” said Marcus, a professor emeritus at New York University who founded Geometric Intelligence in 2014. Geometric Intelligence was a machine learning company that developed new techniques for learning more from modest amounts of data. It was sold to Uber, where Marcus directed AI research for a time.
“One of the craziest things that I mentioned in my book is that some senators and Congress tried to pass a law saying you couldn’t use AI to make nuclear weapon decisions without a human in the loop and they couldn’t pass that,” Marcus said.
He argues for a regulatory framework that would address such challenges, and much more. There are signs that AI oversight is on the agenda of the next administration. President-elect Trump is considering naming an AI czar in the White House, Axios reported. Here are highlights of Marcus’s discussion with The Wall Street Journal:
WSJ: How do you assess the limitations of large language models, the foundation for things like text-based generative AI?
Gary Marcus: LLMs are both morally and technically inadequate.
On the technical side, what you have is basically a giant black box. I sometimes call it autocomplete on steroids, that is trying to learn the statistical structure of how humans talk to each other, and that is correlated with how humans think about the world. But it’s not identical to that. You can train them on hundreds of millions of games of chess and they still don’t really internalize the rules of chess. They still make illegal moves. On almost any domain where people have looked at LLMs, you get astonishingly good performance, but never reliable performance.
Something that I foresaw, and some in the field really hated me for saying this, was that we might reach a peak. I think it’s partly a data limitation and partly an architectural limitation, where the system is not really representing facts, it’s just approximating those kinds of things. I think we’re running into a point of diminishing returns. And, you know, for a while there was a really good run.
WSJ: How do you see the link between the technical and moral limitations of LLMs?
Marcus: Many of the moral limitations actually follow from the technical limitations. The nature of these systems is that we can’t give them simple instructions and assume that they will follow them.
If you tell them, ‘Don’t hallucinate,’ they’re still going to hallucinate. If you tell them, ‘Don’t do anything harmful’ or ‘Don’t recommend anything that’s harmful to humans,’ they still will. People have tried to build guardrails, but the guardrails are not very effective.
The largest problem, the longest-term problem, is we have poor control over these systems. Some people have written about what they call the alignment problem, which basically boils down to getting machines to behave in ways that are consistent with human values or with just whatever you ask. And they don’t really do that. And that is a serious risk—that we’re going to build some very intelligent machine, but it’s not really going to do what we want, that it’s going to be some source of Sorcerer’s Apprentice-style mess.
WSJ: How do you propose to construct a regulatory framework for AI?
Marcus: If Sam Altman wants to release a technology that puts us all at risk, he can basically do that. There’s no government procedure to say, ‘Hey, slow down here, let’s make sure this thing is OK.’
I would emphasize having some kind of AI agency for the United States. It should be a cabinet-level position because AI is changing so fast. It’s affecting so many aspects of society. It’s just as important as having a cabinet-level thing for defense or health and so forth. Also at the top of my list would be some kind of FDA-like process to approve things that are released at large scale.
The third thing I would prioritize is monitoring once things are out. So, for example, it should be possible for well-qualified scientists to say, ‘I want to study the degree to which this particular large language model might discriminate against people.’ And how is it actually being used in practice, in job decisions, jail sentences and so forth. There should be some auditing that has government backing to allow independent scientists to ask legitimate questions of this sort.
We should have some kind of liability, especially if something very seriously goes wrong. The big tech AI companies right now are basically trying to privatize the profits and socialize the costs.
WSJ: How do you think Elon Musk, a close ally of President Trump, will use his role to influence the conversation about AI regulation?
Marcus: Elon Musk is a very interesting player in this. He on the one hand was one of the first people to warn about long-term AI risks, machines going rogue. I’ve corresponded a little bit with him and I know that he’s sincere about that. But on the other hand, he’s building an AI company that I think is going to be valued at $40 billion or something like that.
WSJ: You argue that there’s a better alternative to the LLM. What is it?
Marcus: Daniel Kahneman drew a distinction between System One and System Two thinking. System One is fast and reflexive. It works, but it makes mistakes. System Two is deliberate, is about reasoning and abstraction. The AI that we are currently building is basically like System One. And what I think we need to do is to bring those two traditions together. Even that is not going to magically bring us to artificial general intelligence tomorrow. But I think it’s a prerequisite to go further.