Hi,I've been intending to answer this post, but I have been very busy. In case there is still any interest in this question, here is a proof:
>Suppose
>Y=AX+B
>
>Y, B M x 1 vector
>X N x 1 vector
>A M x N Matrix
>
I think that this is best handled with index notation. Let the matrices A and T have entries
Ajk and Tjk ,
and the vectors X, Y, and B have entries
Xj, Yj, Bj .
To keep the notation clean, let's use the Einstein summation convention (if any index is repeated in a product expression, once in the upper and once in the lower position, the expression is implicitly summed with the repeated index as the dummy variable of the sum). Then your equation becomes
Yj = AjkXk + Bj.
A note about the order of the indices: when turning indexed expressions back into matrix expressions, the upper index is the row index, and the lower is the column index. But for matrices, the first index is the row and the second is the column. Therefore, to keep the indexed expressions consistent with standard matrix notation, the index order should look like
M1ijM2jk = (M1M2)ik ,
where the second (which must be lower) index of the left matrix matches the first (which must be upper) index of the right matrix. This is the index that is summed over in standard matrix multiplication.
>
>X and B are random vectors of some distribution.
>
>The linear estimate of X given a sample of Y=y is
>
>inverse(R_{YY})R_{XY} y.
>
>I searched online for the proof, but in vain.
>
This notation R_{YY} is unfamiliar to me. It is some kind of matrix
constructed from the probabilities of Y, but how? If the derivation below works out, we'll have an estimator for X, and then I can try to see how that relates to R.
>The problem is to minimise E<||TY-X||> w r t T, where T is
>the linear MMSE estimate. For scalars the problem is easy to
>solve, However for vectors, one needs vector calculus to
>approach the problem. Can anybody help me in figuring out
>this problem.
The expectation you wish to minimize, written in index notation, is
E((TijYj - Xi)(TikYk - Xi)) .
Notice the position of the indices on the left factor. Upper and lower are reversed, because this corresponds to forming the transpose in the common matrix notation. However, the first index remains first, and the second index remains second. This is important, because swapping them would cause the same actual element of the array to now have different indices, confusing the whole calculation.
Let dlm stand for differentiation with respect to Tlm. Then since E is linear, the differentiation comes inside the E operator and applies to the products of variables it finds there. Since these are just lots of products of scalars, the product rule applies (index notation makes this work very nicely, since matrix and vector operations are displayed as organized patterns of scalar multiplication, where the action of differentiation is obvious).
Also, since the entries of T are independent variables, the differentiation gives zero for every entry except the one containing the variable you are differentiating on. Therefore,
dlmTij = deltaildeltamj,
or in other words, the derivative is zero unless i and j match l and m exactly. For a transposed matrix, you get
dlmTij = deltaildeltamj.
Note that this is not a matrix product, but a set of derivative operations indexed by l,m operating on a set of variables indexed by i,j. As such, it is actually a tensor product of an operator and a matrix, and produces an array of values with 4 indices -- a 4-tensor. When both indices are upper or both lower, you get something that doesn't correspond well to the ordinary matrix notation. This is one of the ways in which index notation is more flexible. However, the final formula must always reduce to one upper and one lower index because this is a matrix computation to start with.
Recall also the following property of delta:
deltajkXk = Xj .
This means that your minimization equation reduces like this:
0 = dlmE((TijYj - Xi)(TikYk - Xi))
= E(dlm(TijYj - Xi)(TikYk - Xi))
= E((deltalideltamjYj)(TikYk - Xi) + (TijYj - Xi)(deltaildeltamkYk))
= 2E((deltalideltamjYj)(TikYk - Xi))
= 2E((Ym)(TlkYk - Xl))
Therefore
E(YmTlkYk) = E(YmXl).
Now the indices are in funny places here, but we can raise and lower them in pairs, using the identity
deltaijdeltaji = deltaii = 1. Therefore
YmTlkYk = YmdeltakjdeltajkTlkYk = YmTljYj .
Since E is linear, this gives
TljE(YmYj) = E(YmXl).
Now this appears to tell me what the definitions of the R's must be:
(R_XY)lm = E(XlYm) ,
and putting Y in place of X,
(R_YY)lm = E(YlYm) .
Therefore, the formula becomes
Tlj(R_YY)jk = (R_XY)lm .
Everything here is in transposed form, so transposing both sides and putting the multiplication in the standard matrix order gives
(R_YY)jkTlj = (R_XY)lm ,
which in standard matrix order now, so we can drop the index notation and just write
(R_YY)(T) = R_XY ,
from which it is obvious that
T = (R_YY)-1R_XY ,
so that
xestimated = Ty = (R_YY)-1R_XYy ,
which completes the proof.
As you can see, the main thing to remember with index notation is to be very, very careful to keep thinks in the right places. If you do that, then everything works out just as if you were working with scalars.
I hope this helps, and is still of interest.
--Stuart Anderson