Dimensionality reduction
This Assignment involves translating several methods from pseudocode to real code.
As you do it, please be sure to pay some attention to what you are instructing the computer do.
 Implement a function that computes the first principal component of some data. Here is pseudocode:
Let X be a matrix, where each row is a pointvector.
Let c be the number of columns in X.
Let p be a random vector of size c drawn from a spherical distribution.
Let m = 0.
for i from 0 to 199, do:
Let t be a vector of size c.
Set each element of t to 0. // See Vec.fill
for each row x in X, do:
t = t + (x * p)x // * = inner product
d = t // d = magnitude of t
p = t / d // See Vec.normalize
if i < 6 or d  m > .0001 // if significant progress was made
m = d
else
break
return p
 Implement a function that uses the GramSchmidt process to remove a component from some data. Here is pseudocode:
Let X be a matrix, where each row is a pointvector.
Let r be the number of rows in X.
Let v be the component vector that you wish to remove from X.
(v should be a unit vector. That is, it should be normalized.)
for each pointvector (row) x in X, do:
Let d = x * v // * = inner product
x = x  dv
 Implement Principal Component Analysis. Here is some pseudocode:
Let X be a set of points in highdimensional space that we want to reduce.
Let n be the number of points (rows) in X.
Let d be the current number of dimensions (columns) in X.
Let t be the target number of dimensions into which we want to reduce X.
// Training phase
Let c be the centroid of X.
Subtract c from each point in X.
Let M be a matrix with 0 rows and d columns.
for i from 0 to t  1, do:
Compute p as the first principal component in X.
Use the GramSchmidt process to remove the p component from X.
Add p as a new row to M.
// Reducing phase
Let X be the original set of points that it was before the training phase.
Let Y be a matrix with 0 rows and t columns.
for each row x in X, do:
y = M * (x  c) // * = Matrixvector multiplication
Add y as a row to Y.
return Y
 Implement an iterative form of multidimensional scaling (MDS).
Here is some pseudocode:
Let X be a set of points we wish to adjust.
Let d(a,b) be the target distance between points a and b.
Let c(a,b) be the current distance between a and b.
Do until convergence is detected:
Pick a random point, a from X.
Pick a random point, b from X.
Adjust a and b minimally and equally
to make c(a,b)=d(a,b).
How does one adjust a and b? Maybe this picture will help:
If you write a function to adjust a and b, I recommend returning a value that indicates how far they moved.
This value could be useful for detecting convergence.
(For example, you might detect convergence if the total amount of movement over 1000 iterations is less than 0.01.)
 Implement an incremental version of Isomap that builds on your iterative MDS algorithm.
Here is some pseudocode:
Let X be a set of points in highdimensional space that we want to reduce.
Let n be the number of points (rows) in X.
Let t be the number of target dimensions into which we want to reduce X.
Let k be about 14.
// Training phase
Find the knearest neighbors of each point in X.
Measure the distance to each neighbor.
Apply the FloydWarshall algorithm to compute the "geodesic"
distances between all nonneighboring points.
// Reducing phase
Let Y be a set of n points, each in tdimensional space.
Initialize Y with small random values.
Apply MDS to Y. (Use the geodesic distances as the target distances.)
Geodesic distance refers to the shortest path hopping from neighbor to neighbor between two points, as computed by the FloydWarshall algorithm.
(Geodesic distance is usually longer than Euclidean distance, which hops directly between the two points in one shot.)
 Implement a simplified version of Maximum Variance Unfolding (MVU).
Here is some pseudocode:
Let X be a set of points in highdimensional space that we want to reduce.
Let n be the number of points (rows) in X.
Let t be the target number of dimensions into which we want to reduce X.
Let k be about 14.
// Training phase
Find the knearest neighbors of each point in X.
Measure the distance to each neighbor.
Let d(a,b) be the original distance between points a and b.
// Reducing phase
Do until convergence is detected:
Multiply all the values in X by some scaling factor, such as 1.05.
Do 100*k*n times:
Pick a random point, a, from X.
Pick one of a's neighbors, b.
Let c(a,b) be the current distance between a and b.
Adjust a and b minimally and equally,
to make c(a,b)=d(a,b).
Apply PCA to reduce X into t dimensions.
 Download this Swiss roll data.
Reduce it into 2 dimensions using PCA, Isomap, and MVU.
Plot each of the results.
(Your plot will look nicer if you use color to indicate the row index of each point, but this is not required.)
The original data represents points on a swiss roll manifold, shown here from 9 different perspectives:
The ideal solution should look something like this:
 Download this Crane data.
Reduce it into 2 dimensions using PCA and Isomap.
Plot each of the results.
Each row in the original data represents a raytraced image of a crane sampled from a space like this:
The ideal solution should look something like this:
 Make a PDF document named plots.pdf containing your 5 plots.
Label them, so I can tell which plot is which.
(You do not need to write a report. Just show me your 5 plots.)
Zip it up with your source code and submit it in the usual manner.
