


From Dirac Notation to Probability Bracket Notation,
Information Retrieval (IR) and Artificial Intelligence (AI)
Author: Dr. Xing M (Sherman) Wang

Dirac notation (or Braket notation) is a very powerful and indispensable tool for modern physicists.
Unfortunately it is only taught in
Quantum Mechanics .
I believe it would be great to
introduce it in Applied Mathematics (like Linear Algebra ).
On the other hand, while studying probability theories, I felt that it would be very helpful
if we had a similar notation to represent or derive probabilistic formulas .
That was why I posted following articles online, in which Dirac Notation
was insroduced to IR, and Probability Bracket Notation was proposed and applied to IR and AI.
No mater you agree or disagree with my work, I welcome and appreciate your opinions.


How to email to the author? 
Subject line:  About the articles on your web site 
Email address:  swang (at) shermanlab (dot) com,
or from arxiv.org if you are a member 



1: Dirac Notation, Fock Space and Riemann Metric Tensor in IR Models

HTML;
PDF: Current (06/21/2011);
Archived

Abstract

Using Dirac Notation as a powerful tool, we investigate the three classical Information Retrieval (IR)
models and some their extensions. We show that almost all such models can be described by vectors
in Occupation Number Representations (ONR) of Fock spaces with various specifications on, e.g., occupation number,
inner product or termterm interactions. As an important cases of study, Concep Fock Spacs (CFS) is intruduced for Boolean Model; the basic formulas for
Singular Value Decomposition (SVD) of Latent Semantic Indexing (LSI) Model are manipulated in terms of Dirac notation.
And, based on SVD, a Riemannian metric tensor is introduced, which not only can be used to calculate the relevance of
documents to a query, but also may be used to measure the closeness of documents in data clustering.



2: Probability Bracket Notation, Probability Vectors, Markov Chains and Stochestic Processes

PDF: Current (07/17/2007);
Archived

Abstract

Dirac notation has been widely used for vectors in Hilbert spaces of Quantum Theories. It now has also been
introduced to Information Retrieval. In this paper, we propose a new set of symbols, the Probability Bracket
Notation (PBN), for probability theories. We define new symbols like probability bra (pbra), pket, pbracket,
sample base, unit operator, state ket and more as their counterparts in Dirac notation, which we refer as Vector
Bracket Notation (VBN). By applying PBN to represent fundamental definitions and theorems for discrete and continuous
random variables, we show that PBN could play the same role in probability sample space as Dirac notation in Hilbert space.
We also find that there is a close relation between our probability state kets and probability vectors in Markov chains,
which are involved in data clustering like Diffusion Maps .We summarize the similarities and differences between PBN
and VBN in the two tables of Appendix A.



3: Induced Hilbert Space, Markov Chain, Diffusion Map and Fock Space in Thermophysics

PDF: Current (04/08/2007);
Archived

Abstract

In this article, we continue to explore Probability Bracket Notation (PBN), proposed in our previous article.
Using both Dirac vector bracket notation (VBN) and PBN, we define induced Hilbert space and induced sample space,
and propose that there exists an equivalence relation between a Hilbert space and a probability sample space constructed
from the same base observable(s). Then we investigate Markov transition matrices and their eigenvectors to make diffusion
maps with two examples: a simple graph theory example, to serve as a prototype of bidirectional transition operator;
a famous text document example in IR literature, to serve as a tutorial of diffusion map in text document space.
We notice that, in both examples, the sample space of the Markov chain and the Hilbert space spanned by the
eigenvectors of the transition matrix are not equivalent. At the end, we apply our PBN and equivalence proposal
to Thermophysics by associating phase space with Hilbert space or Fock space of manyparticle systems.



4: Probability Bracket Notation:
Term Vector Space, Concept Fock Space and Induced Probabilistic IR Models

PDF: Current (06/21/2011);
Archived

Abstract

After a brief introduction to Probability Bracket Notation (PBN) for discrete random variables in timeindependent probability spaces,
we apply both PBN and Dirac notation to investigate probabilistic modeling for information retrieval (IR). We derive the expressions of relevance
of document to query (RDQ) for various probabilistic models, induced by Term Vector Space (TVS) and by Concept Fock Space (CFS).
The inference network model (INM) formula is symmetric and can be used to evaluate relevance of document to document (RDD);
the CFSinduced models contain ingredients of all three classical IR models. The relevance formulas are tested and compared on different scenarios against a famous textbook example.



5: Probability Bracket Notation, Multivariable Systems and Static Bayesian Networks

PDF: Current (10/07/2012);
Archived

Abstract

Probability Bracket Notation (PBN) is applied to systems of multiple random variables for preliminary study of static Bayesian Networks (BN) and
Probabilistic Graphic Models (PGM). The famous
Student BN Example is explored to show the local independences and reasoning power of a BN.
Software package
Elvira
is used to graphically display the student BN. Our investigation shows that PBN provides a consistent and convenient
alternative to manipulate many expressions related to joint, marginal and conditional probability distributions in static BN.



6: Probability Bracket Notation: Markov State Chain Projector, Hidden Markov Models and Dynamic Bayesian Networks

PDF: Current (12/06/2012);
Archived

Abstract

After a brief discussion of Markov Evolution Formula (MEF) expressed in Probability
Bracket Notation (PBN), its close relation with the joint probability distribution (JPD) of
Visible Markov Models (VMM) is demonstrated by introducing Markov State Chain
Projector (MSCP). The state basis and the observed basis are defined in the Sequential
Event Space (SES) of Hidden Markov Models (HMM). The JPD of HMM is derived by
using basis transformation in SES. The Viterbi algorithm is revisited and applied to the
famous
Weather HMM example , whose node graph and inference results are displayed
by using software package
Elvira . In the end, the formulas of VMM, HMM and some
factorial HMM (FHMM) are expressed in PBN as instances of dynamic Bayesian
Networks (DBN).



7: Thematic Clustering and the Dual Representations of Text Objects

PDF: Current (01/02/2017);

Abstract

We introduce Thematic Clustering , a new methodology to discover clusters of a set of text documents and,
at the same time, to define the theme of each cluster by using its top frequent keywords.
Our procedure is based on the ideal of dual representations (TF rep and Concept rep) of text
objects (docs or clusters) in term space. We derive cluster TF reps in initial clustering,
use them to reduce term space and then renovate clusters. Our test results on three wellknown
data sets (Disease, Star and Reuters) are very promising: the formed clusters and their themes almost
perfectly match our knowledge about the data sets.



Share this with your friends:


Post your comment via our blogs:






