
Welcome to this edition of learning of the week.
In this modern era, almost everyone with a smartphone has either used, or at least heard about AI. Even for those with limited tech knowledge, AI is not unknown. With all the buzzwords flying around and AI being called a defining point of the modern era, it is often said that there are no limits to what it can do. However, is that really the case? Recently, Sam Altman, CEO of OpenAI, stated that we might be in an AI bubble. Additionally, a recent MIT report has highlighted concerns stating that a group of U.S. enterprises collectively invested $30 to $40 billion in AI, with a staggering 95% seeing little to no measurable return so far. This begs the question: Is there a limit to this seemingly limitless technology?
Before we dive into today’s learning, let’s step back and understand how these AI models typically work.
What helps AI Generate Astonishingly good outputs?
Large Language Models (LLMs) are the ones that power the Generative AI we use in the form of ChatGPT, Gemini Flash, etc. Think of LLMs as massive silent libraries that are not visible on the front end but are at the core of Gen-AI’s impressive output.
Recall our Newsletter from May, which explained generative AI and what an LLM is. Simply put, LLMs learn patterns from massive data and generate text by predicting the next token based on those patterns.
What are tokens?
A simple way to think of tokens is as compact pieces of text. A token can be a whole word, a part of a word, or even punctuation. For example, in “unable,” the model might treat “un” and “able” as two tokens. In “precision,” it might split into “precis” and “ion.” Tokens can also be a group of words.
For the sentence “Playing with my dog,” we see “4” words, but the model may see “6” tokens: “Play” “ing” “with” “my” “dog” “.” This helps models process text faster and more consistently. It is important to understand that tokens are not limited to text. For images, a token can represent a group of pixels. For audio, a token can represent a small slice of sound.

(Source: Miquido)
When you ask a model like GPT a question, the following steps occur on the backend:
Step 1 – Input tokens are processed
Step 2 – Matching patterns it learned during training
Step 3 – Generating output tokens one by one.
The model doesn’t “understand” in a human sense. Instead, it selects the next most likely token based on the context of the input. Additionally, tokens are counted for both input and output, and every model, such as GPT-4 (or others), has a context window (or a maximum token limit) which includes both the input and the output. For example, if a model has a context window of 4,096 tokens, the sum of your input and the model’s output cannot exceed that number. Newer models support much larger windows, such as 8,000 or 32,000 tokens, depending on the version.
For example, your question, “How do I bake a cake?”, might be a 7-token input. Based on this, the model can produce a recipe of about 2,000 tokens, bringing the total token usage to 2,007.
As AI becomes more sophisticated, its evolution has given rise to new models that can act as more effective problem solvers. This brings us to Large Reasoning Models (LRMs).
What are LRMs?
When you use a generative AI like ChatGPT or Gemini, you might see an option for “Reasoning” or “Thinking”. Switching this on prompts the AI to use more computational resources, enabling it to rely on Large Reasoning Models (LRMs).
The goal of LRMs is to act more like a step-by-step problem solver. Unlike many LLMs that simply map input tokens to output tokens, LRMs are trained to expose their intermediate reasoning, verify each step, and correct themselves.
For example, if you ask an older LLM how many “r”s are in the word “strawberry,” it might incorrectly answer 2 without checking its work. While newer LLMs are less prone to such simple mistakes, they can still falter with more complex questions. An LRM, however, would first verify its output against your input. If its initial count was 2, it would recalculate and then provide the correct output of 3.

Source: OpenAI Forum Bug Report)
How LLMs and LRMs differ?
Traditional LLMs are good for most use cases, so the first question on your mind might be: why do we even need LRMs? Well, there are certain limitations to LLMs, one of which we discussed earlier: they can generate the wrong output. That is because LLMs do not have a true understanding of content in a human sense. At their heart, they only generate output that is statistically most likely based on the memory (the giant library) it has. Also, LLMs are also prone to hallucinations. They make things up and remain firm in their fabricated output. This can be extremely hazardous for applications in medicine, law, or science. LLMs are also more static in terms of knowledge which is only limited to the time of training. If you had used early pilot versions of ChatGPT, you would know you would not get answers for questions regarding present events. This is why it needed to be retrained, which is why GPT-3, GPT-4, and even model 5 have been released as they were retrained with the latest data.
An LRM extends this capability to perform multi-step logical reasoning to solve complex, multi-step problems, designed explicitly to enhance reasoning, logic, and structured thought. At their core, LRMs aim to bridge the gap between language generation and cognitive problem-solving. They are built to go beyond simply predicting the next token. They are trained to follow logical chains, solve multi-step problems, make inferences, and simulate analytical thinking. Whether it is solving a math problem, debugging a software function, or interpreting a scientific hypothesis, LRMs focus on the process over the result.
How does an LRM work?
LRMs are typically trained and used with methods that emphasize the “thinking” between input and output. Common elements include:

(Source: Aryaxai)
1. Targeted training data – The model is fed carefully selected problems and detailed, step-by-step explanations on how to solve them. This approach ensures the model learns the reasoning process itself, not just the final answer.
2. Chain-of-thought prompting – Instead of being prompted for a direct answer, LRMs are encouraged to “think aloud.” This process forces the model to generate and articulate the reasoning steps it takes before arriving at a final conclusion.
3. Step-by-step supervision and reward modeling – During its training, the model is rewarded when its individual reasoning steps are correct. This reinforcement mechanism encourages accurate and logical thinking, helping the model learn to produce sound, verifiable processes.
4. Tool use – The model can utilize external tools during its reasoning process. This might include calling on calculators, code interpreters, or even search and visual tools to aid in solving complex problems.
Practical Applications of LRMs
Newly developed LRM models are able to solve many problems, especially in the medical field with MedOrch, a framework that uses tool-augmented reasoning agents. It was evaluated on Alzheimer’s diagnosis, chest X-ray interpretation, and medical visual question answering. It achieved about 93.3% accuracy in Alzheimer’s diagnosis.
In legal problems, reasoning models help scan large contracts, flag unusual clauses, detect deviations from standard playbooks, assess risk, and propose modifications. They can “reason through” structure and also offer context, or past precedent.
Even at Bastion Research, we use LRM models to a certain extent to help in our research process.
Are LRMs really the next big thing?
The entire premise of LRMs is that they are more human and can solve more complex problems on their own. However, a study conducted by Apple states something different.
It concludes that for lower complexity problems, LLMs are much more efficient than LRMs at utilizing tokens for thinking, and they provide an output with similar or the same accuracy. As the complexity of the problem increases, LRMs start outperforming LLM models. However, as the problem’s complexity increases even more, the LRM models collapse as well.
Many times, the LRMs faced an “overthinking phenomenon” where they would find a correct solution early but then keep exploring incorrect alternatives.
One of the strangest findings came from a study that was conducted in two phases. In the first phase, they allowed AI models to explore answers on their own. In the second phase, they gave the AI a blueprint by providing a third-party answer, a middle-stage prompt on what process it should go through, similar to how an algorithm is trained. However, despite the assistance in the second phase, the LRM was not able to break through the complexity threshold. More importantly, it used the same number of tokens to “think” as it did in the first phase, which should not have been the case, as phase two should have required less thinking from a human perspective.

How Phase 2 instructed LRMs by guiding them with middle stage solutions
(Source: Apple)
Speaking of token requirements, these LRMs require many more tokens to train, even 4x-5x more in some cases. But what about the impact on natural resources?
Remember the “Ghibli Art” which took social media by storm? It was amazing, wasn’t it? However, as OpenAI CEO Sam Altman pointed out, generating these images was “melting” the company’s GPUs, putting a significant load on their hardware. If you’ve been following the news about AI, you’ve likely heard that AI image generation is leading to water waste, but how does that happen?
Is there a limit to this limitless tech? – The sustainability angle
To train cutting-edge AI models, companies use Graphical Processing Units (GPUs), which are energy-intensive. These GPUs are expensive to purchase, and the electricity they demand is equally costly. Additionally, the vast amount of data they process needs to be stored, which requires large data centers to maintain all the training data.
According to the US Department of Energy, the foundation of the US’s energy grid was laid in the 1960s and 1970s, and much of that infrastructure is still in use today. In fact, 70% of transmission lines are over 25 years old, approaching the end of their 50-80 year lifecycle.
Because of this, the US energy grid operates with only a 15% reserve margin. This means if the peak demand is 100 units, the grid can only maintain 115 units. With new AI data centers demanding more and more energy, electricity price inflation looks imminent, at least for those in the US.
Moreover, on the water usage front, data centers generate a significant amount of heat. To maintain performance and prevent damage to hardware, they use vast amounts of water for cooling. A study by Cornell University even states that a GPT-4 model could use up to 3 liters of water to generate a 120-200 word email.
The spending wave and the ROI question
Reports suggest Meta guided 2025 CAPEX of $64 to $72 billion, much of it for AI infrastructure. Google, Microsoft, Amazon, and others are also investing heavily. Some sources estimate that the four largest U.S. tech firms could spend about $344 billion on AI-related efforts in CY25. OpenAI’s future spending is often cited at roughly $60 billion, and one Stanford-linked estimate claims China could invest about $912 billion.
Conclusion
AI has been a blessing for repetitive work, freeing people to focus on higher-value decisions. As we push into more complex problems, scaling accuracy and economics are getting harder. LRMs hold promise for deeper reasoning, but they come with higher token and compute costs. Whether this becomes a repeat of the dot-com cycle or grows into something much larger depends on sustained real-world impact, measurable ROI, and responsible scaling of power and water use. Only time will tell.
If you enjoyed reading this newsletter, please feel free to share it with others who might find it insightful. We’d also love to hear your thoughts and feedback on X. Connect with us there at @bastionresearch.
Happy Investing!!!
😂Meme of the Week🤣
