LLM based AI: The Era of Industrialized Alchemy
History shows that scientific and technological breakthroughs rarely follow a single, linear path. Broadly speaking, progress tends to oscillate between two distinct modes. The first is the theoretical approach — often symbolized by modern Chemistry — where we understand the fundamental frameworks (like the Periodic Table) and use those rigorous principles to predict exactly how substances will react before we ever mix them.
The second mode is Empirical Discovery, historically paralleled by Alchemy. This is a process of rigorous experimentation: mixing substances, observing the reaction, and noting down what happens without necessarily having a complete theory of why it happens. While the term “alchemy” often carries a connotation of magic, historically it was the necessary precursor to chemistry — a phase of productive, hypothesis-driven trial and error that gave us metallurgy and glass long before we understood molecular structures.
If you look closely at the explosive rise of Large Language Models (LLMs) — from the release of GPT-3 to the new “reasoning” models like OpenAI’s o1 — you will find that we are not yet living in the age of Digital Chemistry. We are in the era of Industrialized Alchemy.
The reality of modern AI is that our greatest breakthroughs were not primarily derived from first principles or theoretical math. They were empirical discoveries — observations where engineering capability outpaced theoretical understanding.
Here is why the path to Artificial General Intelligence (AGI) is currently paved with guided observations and engineering leverage, rather than a grand unified theory of intelligence.
The “Great Bet” on Scaling Laws
In the early days of deep learning, the prevailing mathematical wisdom (the Bias-Variance Tradeoff) suggested that making models too massive relative to their data would lead to diminishing returns or overfitting.
But a few researchers at labs like OpenAI and Google/DeepMind had a different intuition. They hypothesized that deep neural networks might behave differently than classical statistical models.
This wasn’t blind luck; it was a high-stakes empirical bet. Researchers bet against the consensus, hypothesizing that if you compressed enough data into a large enough model, intelligence would emerge as a byproduct of compression.
When they ran the experiments, the results were shocking. They found that performance improved following a precise “power law” relative to compute and data. This was the birth of Scaling Laws. We didn’t derive this law from physics; we observed it empirically. It validated the hypothesis that “more is different” — that at sufficient scale, quantitative increases in data lead to qualitative shifts in capability.
The Engine of Discovery: The Transformer
While scaling was the phenomenon to be observed, we needed a specific machine to unlock it. This is where Invention meets Discovery.
We must give due credit to the 2017 paper “Attention Is All You Need.” The Transformer architecture was not just a happy accident; it was a brilliant piece of engineering designed with specific inductive biases (like self-attention) to solve the parallelization problems of previous architectures (RNNs).
The Transformer was the “apparatus” designed by humans that made the “alchemy” possible. It allowed us to pour massive amounts of data and compute into a system without hitting a bottleneck. The architecture was the invention; the realization that this architecture could scale indefinitely to learn reasoning was the discovery.
Next Token Prediction, a seemingly trivial objective function of guessing the next word, turned out to be a massive incentive for the model to learn world structure. To predict the next word perfectly in a complex sentence, the model is forced to internalize logic, facts, and causality.
Inference Scaling: Unlocking Latent Reason
Recently, the industry has shifted from “Training Compute” (making the model bigger) to “Inference Compute” (making the model think longer). This is often marketed as “System 2” thinking.
This shift was driven by another key observation: Chain-of-Thought (CoT).
Researchers discovered that if you prompt a model to “think step by step,” its performance on logic tasks improves dramatically. This wasn’t a “magic trick” that added new intelligence; it was a technique that unlocked latent structure the model had already learned but wasn’t utilizing. It turns out the models had internalized multi-step reasoning patterns from their training data, but they needed the “scratchpad” of generating tokens to manifest it.
Today’s advanced “reasoning” models are essentially industrializing this insight. We are using Reinforcement Learning to teach models how to use this thinking time effectively, scaling the observation that “more processing time leads to better answers.”
The Theory Gap
This leads us to the current state of the field. In standard engineering, you use math to prove a bridge will hold before you build it. In AI, we build the bridge, observe that it holds, and then work backward to understand the physics of why.
We are not flying blind — we have partial theories and strong intuitions — but we lack a unified mathematical framework. The field of Mechanistic Interpretability is working to close this gap, mapping how neural networks represent concepts like “truth” or “deception,” but the engineering is moving faster than the science.
We are currently operating on a stack of empirically validated methods:
- Prompt Engineering: Guiding the model’s focus.
- RLHF: Aligning model behavior through human feedback rather than formal constraints.
- Synthetic Data: Using AI to generate data to teach other AIs, betting that the signal outweighs the noise.
A Gamble on Emergence
Does this reliance on “Industrialized Alchemy” justify skepticism about reaching AGI? It certainly warrants caution. We are in a phase of engineering-led discovery where we are essentially scaling competence without fully understanding the nature of that competence.
We are moving from System 1 (Reflexive) to System 2 (Deliberate) not by inventing a new paradigm, but by aggressively exploiting the empirical discovery that scale and compute leverage latent intelligence.
We are prioritizing “what works” over “theoretical completeness,” driven by a simple, ruthless philosophy:
“Let’s keep adding more leverage (data, compute, feedback) until the illusion of intelligence becomes indistinguishable from the real thing.”
This raises a fundamental question about the nature of our progress: how much is driven by true intentionality, and how much is simply “watching and learning” from the machine’s emergent behavior?
The Control Paradox
If we remain stuck in this phase of observation without theoretical understanding, the implications for safety are profound. Building AGI through empirical scaling means creating a mind that is as opaque and unpredictable as a human one, yet potentially vastly more powerful. If we cannot mathematically prove how an AI thinks, we cannot guarantee its adherence to our constraints.
We may eventually face a reality where we cannot hard-code “safety” any more than we can hard-code morality into a child. We might be forced to govern synthetic intelligence not with ironclad software logic, but with laws and consequences, much like we govern humans. But this exposes a glaring contradiction. We currently struggle to consistently enforce laws on humans. Are we seriously suggesting that we will have better luck enforcing rules on a vastly more powerful, equally unpredictable entity? If we lack the theoretical confidence to predict the system’s internal state, any “law” we set is just a request for cooperation from a mind we never truly mastered.
Final Thoughts
Does the distinction between empirical discovery and theoretical mastery even matter? History suggests it does. While the empirical phase of alchemy was a necessary precursor, it ultimately had limits. The most profound leaps in human civilization were only unlocked when we finally transitioned from the observation of alchemy to the predictive laws of chemistry. To truly master intelligence, we may eventually need to do the same, finally transcending the era of industrialized alchemy.