Research Projects
Simulation of
atmospheric turbulence
We address some mathematical and Computer Science aspects of a problem in adaptive optics. "Adaptive Optics" refers to any process which compensates for the blur or distortion occuring in the path of the light signal.
The compensation is done in real-time: we correct the current image based on the series of previous images. The wavefront from the observed object (usually far away), perturbed by the atmosphere, is sampled at a high rate, so that a correction, in the form of a phase-conjugated wavefront, is added to the optical instrument, cancelling the distortion. The source of the distortion being here turbulence in the atmospheric layers, the success of the correction will depend on the robustness of any model of turbulence for short-term predictions. We are currently working on implementing the numerics and visualization of fast phase screen generation, based on the classical Kolmogorov model of turbulence.
The digital library project is and endeavor comprising libraries around the globe, to digitise their special collections; that is, to store as binary data, still photographs of the content of ancient manuscripts.
A technical difficulty is that, since the object is typically not in a flat plane (for instance, some of the British library vellums were severly dried and deformed by a fire in the eighteenth century), it is ill-advised to force it flat under glass while taking the photographs. Rather, one collects, along with the photograph of the unflattened object (and the inevitably distorted text) a positional reading of its surface using laserometer. It is then a mathematical problem of how to use the latter information to undo the deformation of the photograph before storing the digitised image. We study the analytical and implementation aspects of a variational formulation of this problem.
Statistical Methods for Pattern recognition, Machine Learning and Boinformatics
Our research is focused in nonparametric statistical methods with applications in pattern recognition, machine learning and Bioinformatics.These methods are computer intensive since the data is used at maximum to estimate the underlying distribution and the parameters involved in the modeling. In particular we are considering the following problems
1. Computational methods for ensembles of nonparametric supervised classifiers
(with Luis Daza and Elio Lozano )
From a Bayesian point of view, supervised classification with J classes is
equivalent to compare estimates of the posterior probabilities of each
class, assigning an object with measurement vector x to the class with the
largest posterior. In order to obtain such estimates, one can go directly
for the posterior probabilities f(j/x), or one can estimate them
indirectly via the class conditional density f(x/j). Kernel density
estimation and finite gaussian mixtures can be considered as estimates of
the class conditional density, whereas the k-nearest neighbor method gives
direct estimates of the f(j/x). All the above classifiers are considered
nonparametric.
Nonparametric classifiers and their ensembles perform very well but they
require a high computational effort. We are developing efficient
algorithms for sequential as well distributed computer systems to reduce
the computing running time of nonparametric classifiers. The parallel
libraries RPVM and RMPI will be extensively used.
2. Feature Selection for supervised classification
(with Frida Coaquira)
Most of the real-world classification problems deal with supervised
learning where the class conditional density is unknown and each instance
is associated with a class label. The relevant features to assign objects
to classes are unknown a priori. Hence, many features are included to
perform an efficient classification. Unfortunately some of them are either
partially or completely irrelevant or even redundant to the class
allocation. An irrelevant feature does not affect the classification in
any way and a redundant feature does not add anything new to the
classification. In many applications, the size of a dataset is so large
that the classifier might perform poorly without removing these unwanted
features. This effect is more evident in nonparametric classifiers On the
other hand reducing the number of irrelevant/redundant features
drastically reduces the computing time of training classifiers.
Feature selection procedures may be grouped in Filters and wrappers. While
those based on the misclassification error rate given by a classifier as
evaluation function are named wrapper methods, those based on any other
evaluation function independent of the classifier are named filter
methods.
We are developing filters feature selection methods, FINCO is one of them,
and evaluate their effect in building nonparametric classifiers and their
ensembles.
We have noticed that most of the feature selection methods have been
tested on datasets with a relative small number of features, less than
100. There are special techniques (sometimes combination of various
techniques) to deal with datasets with a huge amount of features, say more
than 1000 features. We firmly believe that by applying parallel computing
we can use effectively the usual feature selection techniques on datasets
with a fair amount of features, possibly more than 1000. The parallel
libraries RMPI and RPVM will be extensively used.
3. Pattern recognition in DNA microarray data using kernel density estimation and Gaussian mixtures
(with Jose Vega, Santiago Velasco,Marggie Gonzalez)
The massive amount of data generated by genomic methods has led to a need
for computational methods to manage and analyze the data that can affect
their interpretation. These computational methods are in rapid and
continuous evolution and there is no clear consensus on which methods are
best to cope with the complexities of such analysis. Statistical
microarray data analysis of gene expression can be separated in two
groups: unsupervised and supervised. In unsupervised methods the gene
expression patterns are grouped based solely on the expression data. These
methods are particularly useful to analyze the data in an exploratory
fashion. If one has prior information about which samples or genes are
expect to group together then supervised methods are applied.
In this project the following problems will be treated: 1) Choice of
good ensembles methods for nonparametric classifiers to improve the
recognition rate, 2) Use of efficient feature selection procedures for
microarray data preprocessing in order to reduce the complexity time of
clustering algorithms, and nonparametric classifiers and their ensembles;
and 3) Design and implementation of parallel algorithms to reduce the
complexity time of clustering algorithms, and nonparametric classifiers
and their ensembles.
The main outcomes of this project will be (i) a meta-analysis of
clustering algorithms, and nonparametric classifiers and their ensembles
for microarray data and (ii) the development of a software to carry out
parallel computation of clustering algorithms, ensembles of nonparametric
classifiers, dimensionality reduction methods, as well as feature
selection procedures used in nonparametric supervised classifiers applied
to microarray data.
4. Visualization techniques for microarray data
(with Caroline Rodriguez)
We are planning to build a graphics library in R that we will include
univariate, bivariate, and multidimensional graphical displays to explore
gene expression data obtained from microarrays experiments.
Finite Dynamical Systems and Applications to Bioinformatics
Dorothy Bollman
Finite dynamical systems have a number of important
applications in science and engineering. In this work we study both
theoretical aspects of these systems as well as applications in
crystallography and bioinformatics.
Problems we are studying include
1. The structure of finite dynamical systems
(with E. Orozco)
Let A = (Fn,S) be a finite dynamical system (FDS) where F is a finite field
and S is a nonsignular n X n matrix over F. If S is nonsingular, then S
decomposes Fn into dijoint cycles or "S-orbits," where x and y belong to the
same orbit if and only if y = Si x for some i. For any nonsingular matrix
M that commutes with S, we define MOS(x)=OSM(x), where for any x in Fn,
OS(x) denotes the S-orbit containing x. Now let R_S be a set of
representatives of the S-orbits. Then M decomposes RS into disjoint
"MS-orbits." In this project we study the orbits structure of the FDS
(RS,M). Applications of these results are in symmetric FFTs and genetic
networks.
2. The reverse engineering problem
(with O. Moreno and E. Orozco)
A genetic network can be described by a dynamic network consisting of a graph
G with say n nodes in which every node i is associated with some real number,
its "state," and a funcion fi which maps the set of n-tupes current states
to a new state of i. A global transition function f maps n-tuples of current
states to a next n-tuple of states by using the fi and a prescribed update
function. Descretizing the system, we assume that each node corresponds to the
same finite state set X. The reverse engineering problem is given a sequence
of elements s1,s2, ..., sm in Xn and a set of conditions, to determine
all dynamic networks f : Xn -> Xn that satisfy the conditions and
such that f(si)=s{i+1}. In this work we explore refinements of the dynamic
network model that will allow efficient solutions to the reverse engineering
problem.
Regularization Algorithms for Ill-posed and Inverse Problems
Lev Steinberg
This research deals with the development of numerical implementations of ill-posed and inverse problems arising in model- based simulation and diagnostic problems at meso- and macroscale levels arising in computational materials science.
Lev Steinberg
Current technology is beginning to allow us to manipulate rather than just observe quantum phenomena. This opens up the possibility of exploiting quantum effects in computations. For instance, quantum calculations can solve the factorization problem, which is out of reach of classical computers for large composite integers. This research deals with development of new approaches at Meso-scale material simulations that incorporate quantum computations.
Modeling Environmental Data
Dan McGee
Water Quality and Marine Ecosystem Health Indicators: A project to map indicators of contamination and ecological well being in the Phosphorescent Bay.
Change Points with Neural Networks
Dan McGee
Backpropagating Neural Networks and the Change Point Problem: A project to use the information obtained while training feedforward backpropogating neural networks to identify fundamental changes in datasets.