New

Nope, machines still can’t think. Where does AI go now?

Exploring Apple’s Illusion of Thinking, the gap between AI marketing and reality, and data monetisation opportunities in AI.

Jun 26, 2025

Nope, machines still can’t think. Where does AI go now?

This post was written by Jessica Li Gebert, who helps Neudata’s clients unlock the hidden value of their data – especially in cutting-edge AI and emerging data use cases.

If you’ve read Turing’s Computing Machinery and Intelligence (1950), you’ll realise he never answered his famous question, “Can machines think?”. Turing thought that machine and think could not be defined and instead focused on a more practical question, “Can a machine mimic human tasks such that its performance is indistinguishable from that of humans?”. And thus was born the imitation game (aka the Turing Test).

75 years on, are we closer to creating machines that can think? In this piece, I’ll dissect Apple’s latest findings in The Illusion of Thinking and discuss the implications for our broader AI and data ecosystem.

The Illusion of Thinking

Fast forward to 2024, a UC San Diego study demonstrated that GPT-4 has passed the Turing test! But does that mean AI is thinking?

Not quite. In fact, Apple’s The Illusion of Thinking revealed the limits of current models at ‘thinking’ and raised important questions about where we’re headed with artificial general intelligence (AGI).

The Illusion study analysed the ability of three Large Reasoning models (LRM)¹ - Claude-3.7 thinking, DeepSeek R-1, and GPT-o1/o3 - and their non-thinking counterparts (i.e. regular large language models (LLMs)) at solving math puzzles and concluded that:

For simple puzzles, regular LLMs performed as well as, if not better than, LRMs in terms of accuracy². In fact, LRMs had a tendency to overthink - they continued to reason after reaching correct answers early on in their reasoning traces², thus wasting compute², i.e. LRMs are inefficient at solving simple puzzles.
As the complexity of puzzles increased, LRMs’ ability to use chain-of-thought reasoning gave them an advantage over regular LLMs.
But when complexity was dialled up further, LRMs simply gave up - they used drastically fewer thinking tokens² and their accuracy and pass@k² dropped to zero. They refused to ‘think’, suggesting LRMs might have a fundamental inability or barrier to achieving generalisable reasoning.

The study also noted our current benchmark for ‘thinking’ or intelligence³ is narrowly defined by a model’s performance at answering math questions without assessing its reasoning traces.

In other words, we can’t ascertain if reasoning models actually think or simply use probability (prediction) and pattern recognition (regurgitation)! In human-speak, rote learning is not critical thinking.

Now, if we understand thinking as the exercise of intelligence, what does Apple’s Illusion paper mean for the broader AI and data ecosystem?

-----

This section explains the technical terms used above. Feel free to skip to Implication 1, if you're already familiar

¹LRMs are a type of frontier model developed based on LLMs. LRMs are meant for problem-solving and logical reasoning, whereas LLMs are meant for textual understanding only. Reasoning models are interchangeably referred to as thinking models and inference models.

²These are some common evaluation metrics used in AI benchmarking:

Accuracy (of solution) measures the correctness of a model’s performance, i.e. the number of correct answers out of total number of attempts.
Reasoning trace refers to the thought process behind a model’s response. By analyzing reasoning traces to locate where the correct answer appears in the thought process allows us to ascertain if a model is overthinking or oversimplifying.
Compute refers to the hardware and software resources required to run a model.
Thinking tokens measure how much compute a model uses in its reasoning, i.e., how much effort a model puts into reasoning.
Pass@k measures the probability of a model getting at least 1 correct solution out of k solutions generated. This metric reflects how we use AI in practice where we may generate multiple outputs before picking one we find most suitable.

³ François Chollet’s formulation of ‘intelligence’ is currently my favorite. I’ve been following his ARC-AGI benchmark for a while and will discuss it in the future.

-----

Implication 1: We are a long way from AGI and we may not even be on the right path

To date, most AGI research has been built on LLMs, which represent just one narrow branch of the broader AI landscape. LLMs could be a path to AGI, or not. At this time, they just happen to be the most commercially available and visible form of AI. As Karen Hao puts it in Empire of AI:

Nothing about this form of AI [LLMs and other GenAI applications] coming to the fore or even existing at all is inevitable; it is the culmination of thousands of subjective choices, made by the people who had the power to be in the decision-making room.”

‍

Maybe those powerful people made wrong decisions? Maybe our path to AGI is yet to be written?

We shall see.

Moreover, have you ever wondered why we are constantly inundated with talks about AGI? While it captures headlines and imaginations, it’s also a convenient PR narrative.

AGI diverts our attention from what Big Tech isn't saying. Behind the scenes, the real business of AI raises tough questions – from environmental impact to labour concerns. I’ll explore the ethics behind it all in a future post.

Implication 2: Artificial narrow intelligence is where market growth lies!

Artificial narrow intelligence (ANI) is built for specific tasks, such as an LLM for text generation, image generators (i.e. generative AI or GenAI), resume screening tools, speech and facial recognition.

ANI may not sound as cool as AGI, but it’s where market growth lies because ANI is technologically and commercially attainable, and far from reaching market saturation. Since November 2022, when OpenAI launched ChatGPT, enterprise AI adoption has grown rapidly. According to McKinsey’s latest Global Survey on AI, as of July 2024, 78% of the respondents have implemented AI in at least one business function, up from 55% in 2023.

Going forward, I expect to see a continued upward trajectory and here’s why:

ANI - especially LLM - has become more reliable and usable

Chief among ANI models, LLM is highly relevant to enterprise use cases as most of our jobs require communication in natural languages. So, unsurprisingly, improvements in LLMs due to high-quality training datasets and advanced architecture mean wider enterprise AI adoption.

For context, take GPT-3.5 Turbo (March 2023) and GPT-4.5 (February 2025). GPT-3.5 Turbo scored 69.8% on the MMLU benchmark⁴, significantly lower than GPT-4.5’s 90.8%, signifying the LLM’s drastically improved ability to understand our languages. (Benchmark source: llm-stats.com)

⁴ The MMLU benchmark stands for Massive Multitask Language Understanding, a commonly used LLM benchmark in the industry. It measures a model’s reasoning and general language understanding abilities.

More cost-efficient LLMs

The costs of implementing an enterprise LLM solution include LLM tokens⁵, model hosting, security, implementation and support.

While the latest LLM versions have costlier tokens, their improved reasoning abilities mean fewer generations/tokens are now required to get a desired response. This means newer LLMs are more cost-efficient. Moreover, the AI hosting landscape has evolved, too. In early 2023, the enterprise AI hosting market was dominated by Microsoft Azure and AWS. Today, we also have Google Cloud Vertext AI, Hugging Face, Fireworks, and more to democratise the hosting market and bring down hosting costs.

With more value per token and decreasing hosting costs, the overall implementation cost has gone down, making it more affordable for enterprises to adopt AI solutions.

⁵ Token refers to the unit of information that AI models process. Every bit of input and output data is expressed in the form of tokens. An AI model is priced in terms of price per 1k or 1m tokens.

The good ol’ FOMO

By now, most of us will have experienced firsthand the productivity gain from AI tools. Enterprises that don’t implement AI will risk losing out in the long-run.

‍

The growth of ANI adoption

As to where future ANI adoption lies? I expect to see AI grow in two ways: Increasing automation in business functions and domain-specific AI applications.

Automation trends in business functions:

Risk, legal, compliance and finance: automated research, structured report writing, note and minutes-taking
Software development: data cleaning and coding
Strategy, sales/marketing, market research, business intelligence: multimodal analysis by analytical AI models
Operations and manufacturing: operational analytics, such as maintenance predictive analytics, demand forecasting, resource planning
Sales and customer success: better chatbots with diverse and localised language understanding
Agentic AI: an AI that does not just answer one question at a time like the current state of LLM, but is capable of running a sequence of actions to perform a pre-defined job.

Industries that rely on structure but require high levels of meticulous analysis are leading the way in domain-specific AI development:

Healthcare: drug development, medical imaging, diagnostics
Finance: fundamental investment analysis, fraud detection/KYC/AML, risk management, personal wealth management
Legal: legal research, contract analysis
Supply chain and logistics: demand forecasting, route planning
Consumer and retail: inventory management, fulfilment, demand analysis, consumer AI products
Telecom: network optimisation

Next steps for business leaders

For business leaders, this means two things for your AI+data strategy:

If you haven’t implemented AI in those functions, you have to start now or risk falling behind!
If you have domain-specific enterprise data in these industries, you might just become the most popular kid in town! If you recall my AI data deals mid-year review, model makers and application developers are hungry for training data!

If you're unsure how to make the most out of your AI+data strategy you can reach out to consulting@neudata.co, to discuss how to turn your cost centre into a revenue stream!

‍

All insights

Suggest a topic for the Neudata blog

Suggest a blog topic

Nope, machines still can’t think. Where does AI go now?

The Illusion of Thinking

Implication 1: We are a long way from AGI and we may not even be on the right path

Implication 2: Artificial narrow intelligence is where market growth lies!

ANI - especially LLM - has become more reliable and usable

More cost-efficient LLMs

The good ol’ FOMO

The growth of ANI adoption

Next steps for business leaders

More on this topic

Index data, tracking datasets and rebalancing forecasts: What investors need to know

Data monetization insights and opinions

Understanding short interest and securities lending data: A guide for investors

Corporate actions data: A guide to sourcing, use cases and future trends

Suggest a topic for the Neudata blog

Visit us at the Neudata booth during the Traditional and Market Data Summit on 18th September in London

Visit us at the Neudata booth during the Traditional and Market Data Summit on 18th September in London