The CTK Exchange Forums

CTK Exchange
Front Page
Movie shortcuts
Personal info
Awards
Reciprocal links
Terms of use
Privacy Policy

Cut The Knot!
MSET99 Talk
Games & Puzzles
Arithmetic/Algebra
Geometry
Probability
Eye Opener
Analog Gadgets
Inventor's Paradox
Did you know?...
Proofs
Math as Language
Things Impossible
My Logo
Math Poll
Other Math sit's
Guest book
News sit's

Recommend this site

CTK Exchange

Subject: "Linear MMSE"

Previous Topic | Next Topic

Conferences The CTK Exchange College math Topic #568
Printer-friendly copy Email this topic to a friend
Reading Topic #568

Ignorant

guest

Apr-11-06, 03:38 PM (EST)

"Linear MMSE"

Suppose
Y=AX+B

Y, B M x 1 vector
X N x 1 vector
A M x N Matrix

X and B are random vectors of some distribution.

The linear estimate of X given a sample of Y=y is

inverse(R_{YY})R_{XY} y.

I searched online for the proof, but in vain.

The problem is to minimise E<||TY-X||> w r t T, where T is the linear MMSE estimate. For scalars the problem is easy to solve, However for vectors, one needs vector calculus to approach the problem. Can anybody help me in figuring out this problem.

Alert | IP

Printer-friendly page | Reply | Reply With Quote | Top

mr_homm
Member since May-22-05

Apr-25-06, 04:18 PM (EST)

Click to send private message to mr_homm

Click to add this user to your buddy list

1. "RE: Linear MMSE"
In response to message #0

Hi,

I've been intending to answer this post, but I have been very busy. In case there is still any interest in this question, here is a proof:

>Suppose
>Y=AX+B
>
>Y, B M x 1 vector
>X N x 1 vector
>A M x N Matrix
>

I think that this is best handled with index notation. Let the matrices A and T have entries
A^j_k and T^j_k ,
and the vectors X, Y, and B have entries
X^j, Y^j, B^j .
To keep the notation clean, let's use the Einstein summation convention (if any index is repeated in a product expression, once in the upper and once in the lower position, the expression is implicitly summed with the repeated index as the dummy variable of the sum). Then your equation becomes

Y^j = A^j_kX^k + B^j.

A note about the order of the indices: when turning indexed expressions back into matrix expressions, the upper index is the row index, and the lower is the column index. But for matrices, the first index is the row and the second is the column. Therefore, to keep the indexed expressions consistent with standard matrix notation, the index order should look like
M1ⁱ_jM2^j_k = (M1M2)ⁱ_k ,
where the second (which must be lower) index of the left matrix matches the first (which must be upper) index of the right matrix. This is the index that is summed over in standard matrix multiplication.
>

>X and B are random vectors of some distribution.
>
>The linear estimate of X given a sample of Y=y is
>
>inverse(R_{YY})R_{XY} y.
>
>I searched online for the proof, but in vain.
>

This notation R_{YY} is unfamiliar to me. It is some kind of matrix
constructed from the probabilities of Y, but how? If the derivation below works out, we'll have an estimator for X, and then I can try to see how that relates to R.

>The problem is to minimise E<||TY-X||> w r t T, where T is
>the linear MMSE estimate. For scalars the problem is easy to
>solve, However for vectors, one needs vector calculus to
>approach the problem. Can anybody help me in figuring out
>this problem.

The expectation you wish to minimize, written in index notation, is
E((T_i^jY_j - X_i)(Tⁱ_kY^k - Xⁱ)) .
Notice the position of the indices on the left factor. Upper and lower are reversed, because this corresponds to forming the transpose in the common matrix notation. However, the first index remains first, and the second index remains second. This is important, because swapping them would cause the same actual element of the array to now have different indices, confusing the whole calculation.

Let d_l^m stand for differentiation with respect to T^l_m. Then since E is linear, the differentiation comes inside the E operator and applies to the products of variables it finds there. Since these are just lots of products of scalars, the product rule applies (index notation makes this work very nicely, since matrix and vector operations are displayed as organized patterns of scalar multiplication, where the action of differentiation is obvious).

Also, since the entries of T are independent variables, the differentiation gives zero for every entry except the one containing the variable you are differentiating on. Therefore,
d_l^mTⁱ_j = deltaⁱ_ldelta^m_j,
or in other words, the derivative is zero unless i and j match l and m exactly. For a transposed matrix, you get
d_l^mT_i^j = delta_ildelta^mj.

Note that this is not a matrix product, but a set of derivative operations indexed by l,m operating on a set of variables indexed by i,j. As such, it is actually a tensor product of an operator and a matrix, and produces an array of values with 4 indices -- a 4-tensor. When both indices are upper or both lower, you get something that doesn't correspond well to the ordinary matrix notation. This is one of the ways in which index notation is more flexible. However, the final formula must always reduce to one upper and one lower index because this is a matrix computation to start with.

Recall also the following property of delta:
delta^j_kX^k = X^j .
This means that your minimization equation reduces like this:
0 = d_l^mE((T_i^jY_j - X_i)(Tⁱ_kY^k - Xⁱ))
= E(d_l^m(T_i^jY_j - X_i)(Tⁱ_kY^k - Xⁱ))
= E((delta_lidelta^mjY_j)(Tⁱ_kY^k - Xⁱ) + (T_i^jY_j - X_i)(deltaⁱ_ldelta^m_kY^k))
= 2E((delta_lidelta^mjY_j)(Tⁱ_kY^k - Xⁱ))
= 2E((Y^m)(T_lkY^k - X_l))

Therefore
E(Y^mT_lkY^k) = E(Y^mX_l).
Now the indices are in funny places here, but we can raise and lower them in pairs, using the identity
delta_ijdelta^ji = deltaⁱ_i = 1. Therefore
Y^mT_lkY^k = Y^mdelta_kjdelta^jkT_lkY^k = Y^mT_l^jY_j .
Since E is linear, this gives
T_l^jE(Y^mY_j) = E(Y^mX_l).
Now this appears to tell me what the definitions of the R's must be:
(R_XY)_l^m = E(X_lY^m) ,
and putting Y in place of X,
(R_YY)_l^m = E(Y_lY^m) .

Therefore, the formula becomes
T_l^j(R_YY)_j^k = (R_XY)_l^m .
Everything here is in transposed form, so transposing both sides and putting the multiplication in the standard matrix order gives
(R_YY)^j_kT^l_j = (R_XY)^l_m ,
which in standard matrix order now, so we can drop the index notation and just write
(R_YY)(T) = R_XY ,
from which it is obvious that
T = (R_YY)^-1R_XY ,
so that
x_estimated = Ty = (R_YY)^-1R_XYy ,
which completes the proof.

As you can see, the main thing to remember with index notation is to be very, very careful to keep thinks in the right places. If you do that, then everything works out just as if you were working with scalars.

I hope this helps, and is still of interest.

--Stuart Anderson

Alert | IP

Printer-friendly page | Reply | Reply With Quote | Top

Conferences | Forums | Topics | Previous Topic | Next Topic

You may be curious to have a look at the old CTK Exchange archive.
Please do not post there.

|Front page| |Contents|