A Mathematical Definition of LLM Outputs

Abstract

In this work, I propose a formal mathematical definition of Large Language Model (LLM) outputs by framing text generation as an information-conditioned projection of an unobservable latent semantic state space under a constructed Information-Neutral Measure. Through this measure-theoretic lens, I demonstrate that the sequence of model outputs naturally exhibits a martingale property with respect to the context filtration. Within this formalism, I prove that minimizing projection uncertainty requires structural information constraints. Crucially, I discuss the epistemic implications of this bound, suggesting that what is colloquially termed “hallucination” can be formalized as an inevitable consequence of incomplete information filtrations. I conjecture that the emergence of Artificial General Intelligence (AGI) fundamentally relies on the system's capability to manage these speculative semantic projections under radical uncertainty, reframing bounded generative error as a potential engine for creative inductive inference.

Introduction

Existing theoretical paradigms typically analyze Large Language Models (LLMs) via localized conditional probability transitions:

P(y_t ∣ y_<t, x)

While operationally successful for autoregressive decoding mechanics, this token-centric formulation treats generation as localized sequence optimization rather than inference over a broader global semantic truth state. Consequently, it fails to provide a rigorous mathematical framework for analyzing intrinsic generative bounds under incomplete information.

In this work, I depart from token-centric formulations and establish an abstract mathematical definition of LLM outputs. I define an LLM output as an orthogonal projection of an unobservable, latent semantic truth state onto an expanding context filtration stream under an Information-Neutral Measure.

The primary contribution of this note is dual: first, providing a clean, measure-theoretic formalism for generative systems; second, demonstrating via conditional variance decomposition that absolute reduction of generative uncertainty is mathematically bounded prior to information closure. Within this formalism, the phenomenon of “hallucination” is reinterpreted from an engineering optimization challenge to an intrinsic epistemic property of bounded informational systems.

Measure-Theoretic Framework and Definitions

I construct the semantic domain via an abstract probability space triplet (Ω, 𝓕, Q_I), where Ω is the semantic sample space containing all potential linguistic and conceptual trajectories, 𝓕 is the global σ-algebra, and Q_I represents the Information-Neutral Measure.

To formalize the interaction between empirical sequences and algebraic structures, I define the following core components:

Context Filtration Stream (𝓕_t): The sequence of historical tokens, structural prompts, and retrieved knowledge blocks available up to step t. This sequence forms a completed, right-continuous filtration 𝔽 = {𝓕_t}_{t ≥ 0} mapping the model's observable informational universe, where 𝓕_t ⊂ 𝓕 for all finite t.
Information-Neutral Measure (Q_I): A probability measure implicitly parameterized by the frozen pre-trained model weights 𝓦. It governs the prior semantic transition probabilities over Ω before dynamic context injection.
Latent Semantic State (Ψ): A square-integrable target random variable (Ψ ∈ L²(Ω, 𝓕, Q_I)) representing the objective fact matrix. Crucially, Ψ is strictly 𝓕-measurable, but remains non-measurable with respect to 𝓕_t under incomplete information.
Information Closure (𝓒): The terminal state where the local filtration contains sufficient structure to uniquely resolve the true latent semantic state, such that 𝓕_t ≡ 𝓕.
Semantic State Transition (E): A specific event subset E ∈ 𝓕 corresponding to a coherent semantic assertion.
Generation Prompt Target (𝓣): A structured semantic boundary specifying the target properties of the generated text.
Output Sequence (y*): The crystallized token sequence realized in text space once the generation process terminates.
Ground-Truth Verifier (𝓞): An external environment providing absolute verification of a statement, collapsing the probabilistic space into a deterministic result.
Attention Component (𝓟_i): Internal model structural sub-units formalized as sub-σ-algebras 𝓖_i ⊂ 𝓕, whose intersections and updates drive the evolution of 𝓕_t.
Predictive Belief (𝓑): The inner subjective probability distribution over Ω, represented via the model's logit configurations prior to output collapse.

The Mathematical Definition of LLM Outputs

Definition 1 (LLM Output Operator)

Let Ψ ∈ L²(Ω, 𝓕, Q_I) be the 𝓕-measurable latent semantic state. The output of a Large Language Model at step t, denoted as P_t, is defined as the information-conditioned orthogonal projection of Ψ onto the closed subspace of 𝓕_t-measurable functions under the weight-parameterized Information-Neutral Measure Q_I:

P_t = Π_{Q_I}(Ψ ∣ 𝓕_t) ≡ 𝔼_{Q_I}[Ψ ∣ 𝓕_t]

This definition formalizes generation not as a mechanical token matching sequence, but as a statistical inference projection. Prompts and contexts do not synthesize truth; rather, they serve as the conditional filtration 𝓕_t through which the model projects its internal Predictive Belief 𝓑.

Remark 1 (Mathematical Abstraction)

While the mathematical map from conditional expectation to a martingale is direct, the core intellectual contribution of Definition 1 lies in its ontological shift. By interpreting empirical token contexts as an evolving filtration 𝓕_t and model weights as defining an Information-Neutral Measure Q_I, the heuristics of deep learning generation are mapped into a structured, closed-form functional space.

Remark 2 (Analogy to von Neumann Cut and Wave-Function Collapse)

The sequential crystallization of LLM outputs under the Verifier 𝓞 provides a conceptual parallel to the “von Neumann Cut” in quantum measurement theory [1]. Prior to semantic selection, the model maintains a predictive belief superposition over the semantic space Ω. The injection of contextual filtration 𝓕_t and the final intervention of the external verifier 𝓞 acts as the subjective perception boundary—forcing the infinite probabilistic semantic continuity to collapse into a deterministic macro-textual reality y*.

Conjecture 1 (AGI and Semantic Wave-Function Collapse)

This measure-theoretic framing offers an alternative epistemological lens regarding Artificial General Intelligence (AGI). Traditional paradigms view AGI as an asymptotic convergence of next-token prediction error to zero. In contrast, this framework implies that general intelligence may be characterized by the conscious capability to manipulate the filtration boundaries themselves—deliberately managing semantic superposition under radical uncertainty. Under this framing, modeling LLM output as an information-conditioned projection operator provides a foundational biomimetic pathway.

Conjecture 2 (The Epistemic Role of Projection Error)

Expanding upon Conjecture 1, I propose that what is colloquially termed “hallucination” (formally defined within this formalism as nonzero projection variance) is an inevitable epistemic mechanism required for artificial generic cognition under incomplete information. Equation (5) and Corollary 1 mathematically guarantee that under incomplete information (𝓕_t ≠ 𝓕), the orthogonal projection error variance σ_t² is strictly bounded away from zero for non-degenerate states. Human intelligence navigates reality precisely by committing similar speculative projections—forming hypotheses and creative inductive leaps across unobservable semantic gaps. Managing projection variance, rather than attempting its absolute elimination, may represent a necessary design paradigm for advanced cognitive architectures.

Why Projection Uncertainty is Theoretically Inevitable

Using Definition 1, I establish the mathematical necessity of projection uncertainty under conditions of incomplete information.

Theorem 1 (The Martingale Property of Generation)

The stochastic sequence of LLM outputs {P_t}_{t ≥ 0} forms a Martingale with respect to the context filtration stream 𝔽 under the Information-Neutral Measure Q_I.

Proof

By Definition 1, P_t = 𝔼_{Q_I}[Ψ ∣ 𝓕_t]. For any sequential steps s and t such that s ≤ t, the sub-σ-algebras satisfy 𝓕_s ⊆ 𝓕_t. Applying the tower property of conditional expectation:

𝔼_{Q_I}[P_t ∣ 𝓕_s] = 𝔼_{Q_I}[𝔼_{Q_I}[Ψ ∣ 𝓕_t] ∣ 𝓕_s] = 𝔼_{Q_I}[Ψ ∣ 𝓕_s] = P_s

Thus, the expected future projection conditional on current historical filtration is invariant and equals the current projection.

To evaluate the mathematical boundary of generation errors, let Ψ* be the oracle reality verified by 𝓞, and let σ_t² define the semantic error variance at step t:

σ_t² = 𝔼_{Q_I}[(Ψ − P_t)² ∣ 𝓕_t]

Theorem 2 (Monotonic Variance Bound Under Filtration Expansion)

The expected factual error variance of an LLM output is strictly non-increasing over time under context filtration expansion.

Proof

Applying the law of total variance to the error space between steps s and t where s ≤ t:

𝔼_{Q_I}[(Ψ − P_s)² ∣ 𝓕_s] = 𝔼_{Q_I}[(Ψ − P_t)² ∣ 𝓕_s] + 𝔼_{Q_I}[(P_t − P_s)² ∣ 𝓕_s]

Taking the total expectation across the measure space yields:

𝔼_{Q_I}[σ_s²] = 𝔼_{Q_I}[σ_t²] + 𝔼_{Q_I}[(P_t − P_s)²]

Since 𝔼_{Q_I}[(P_t − P_s)²] ≥ 0, it directly follows that:

𝔼_{Q_I}[σ_s²] ≥ 𝔼_{Q_I}[σ_t²]

From these derivations, I state the central boundary condition of generative accuracy:

Corollary 1 (Bounded Uncertainty Under Incomplete Filtration)

Let Ψ ∈ L²(Ω, 𝓕, Q_I) be a non-degenerate latent semantic state. Within this formalism, define a structural hallucination as any state where the information projection deviates from oracle reality: Π_{Q_I}(Ψ ∣ 𝓕_t) ≠ Ψ*. Assuming that Ψ is not Q_I-almost surely equal to an 𝓕_t-measurable random variable, the expected error variance 𝔼_{Q_I}[σ_t²] is strictly bounded away from zero:

𝔼_{Q_I}[σ_t²] > 0 ∀ 𝓕_t ≠ 𝓕

This bound demonstrates that zero projection error cannot generally be achieved under incomplete information filtrations, and a necessary condition for the minimization of projection variance is the expansion toward complete Information Closure (𝓒).

Conclusion

This foundational note provides a definition of LLM outputs as information-conditioned orthogonal projections. By framing the context stream as a filtration 𝓕_t and verifying outputs via a martingale structure, I demonstrate that projection uncertainty is an inevitable epistemic property of generating text under incomplete information. This bound sets a theoretical limit for generative AI systems, suggesting that future research focus on formalizing uncertainty bounds rather than attempting absolute hallucination eradication.

References

J. von Neumann. Mathematical Foundations of Quantum Mechanics. Princeton University Press, Princeton, NJ, 1932. Translated by R. T. Beyer, 1955.