Research Projects

Research projects currently being conducted by the Scientific Computing Group




Simulation of atmospheric turbulence

Robert Acar

We address some mathematical and Computer Science aspects of a problem in adaptive optics. "Adaptive Optics" refers to any process which compensates for the blur or distortion occuring in the path of the light signal.

The compensation is done in real-time: we correct the current image based on the series of previous images. The wavefront from the observed object (usually far away), perturbed by the atmosphere, is sampled at a high rate, so that a correction, in the form of a phase-conjugated wavefront, is added to the optical instrument, cancelling the distortion. The source of the distortion being here turbulence in the atmospheric layers, the success of the correction will depend on the robustness of any model of turbulence for short-term predictions. We are currently working on implementing the numerics and visualization of fast phase screen generation, based on the classical Kolmogorov model of turbulence.



Mathematics of the digital library

Robert Acar

The digital library project is and endeavor comprising libraries around the globe, to digitise their special collections; that is, to store as binary data, still photographs of the content of ancient manuscripts.

A technical difficulty is that, since the object is typically not in a flat plane (for instance, some of the British library vellums were severly dried and deformed by a fire in the eighteenth century), it is ill-advised to force it flat under glass while taking the photographs. Rather, one collects, along with the photograph of the unflattened object (and the inevitably distorted text) a positional reading of its surface using laserometer. It is then a mathematical problem of how to use the latter information to undo the deformation of the photograph before storing the digitised image. We study the analytical and implementation aspects of a variational formulation of this problem.

Statistical Methods for Pattern recognition, Machine Learning and Boinformatics

Edgar Acuña

Our research is focused in nonparametric statistical methods with applications in pattern recognition, machine learning and Bioinformatics.These methods are computer intensive since the data is used at maximum to estimate the underlying distribution and the parameters involved in the modeling. In particular we are considering the following problems

1. Computational methods for ensembles of nonparametric supervised classifiers

(with Luis Daza and Elio Lozano )

From a Bayesian point of view, supervised classification with J classes is

equivalent to compare estimates of the posterior probabilities of each

class, assigning an object with measurement vector x to the class with the

largest posterior. In order to obtain such estimates, one can go directly

for the posterior probabilities f(j/x), or one can estimate them

indirectly via the class conditional density f(x/j). Kernel density

estimation and finite gaussian mixtures can be considered as estimates of

the class conditional density, whereas the k-nearest neighbor method gives

direct estimates of the f(j/x). All the above classifiers are considered

nonparametric.

Nonparametric classifiers and their ensembles perform very well but they

require a high computational effort. We are developing efficient

algorithms for sequential as well distributed computer systems to reduce

the computing running time of nonparametric classifiers. The parallel

libraries RPVM and RMPI will be extensively used.

2. Feature Selection for supervised classification

(with Frida Coaquira)

Most of the real-world classification problems deal with supervised

learning where the class conditional density is unknown and each instance

is associated with a class label. The relevant features to assign objects

to classes are unknown a priori. Hence, many features are included to

perform an efficient classification. Unfortunately some of them are either

partially or completely irrelevant or even redundant to the class

allocation. An irrelevant feature does not affect the classification in

any way and a redundant feature does not add anything new to the

classification. In many applications, the size of a dataset is so large

that the classifier might perform poorly without removing these unwanted

features. This effect is more evident in nonparametric classifiers On the

other hand reducing the number of irrelevant/redundant features

drastically reduces the computing time of training classifiers.

Feature selection procedures may be grouped in Filters and wrappers. While

those based on the misclassification error rate given by a classifier as

evaluation function are named wrapper methods, those based on any other

evaluation function independent of the classifier are named filter

methods.

We are developing filters feature selection methods, FINCO is one of them,

and evaluate their effect in building nonparametric classifiers and their

ensembles.

We have noticed that most of the feature selection methods have been

tested on datasets with a relative small number of features, less than

100. There are special techniques (sometimes combination of various

techniques) to deal with datasets with a huge amount of features, say more

than 1000 features. We firmly believe that by applying parallel computing

we can use effectively the usual feature selection techniques on datasets

with a fair amount of features, possibly more than 1000. The parallel

libraries RMPI and RPVM will be extensively used.

3. Pattern recognition in DNA microarray data using kernel density estimation and Gaussian mixtures

(with Jose Vega, Santiago Velasco,Marggie Gonzalez)

The massive amount of data generated by genomic methods has led to a need

for computational methods to manage and analyze the data that can affect

their interpretation. These computational methods are in rapid and

continuous evolution and there is no clear consensus on which methods are

best to cope with the complexities of such analysis. Statistical

microarray data analysis of gene expression can be separated in two

groups: unsupervised and supervised. In unsupervised methods the gene

expression patterns are grouped based solely on the expression data. These

methods are particularly useful to analyze the data in an exploratory

fashion. If one has prior information about which samples or genes are

expect to group together then supervised methods are applied.

In this project the following problems will be treated: 1) Choice of

good ensembles methods for nonparametric classifiers to improve the

recognition rate, 2) Use of efficient feature selection procedures for

microarray data preprocessing in order to reduce the complexity time of

clustering algorithms, and nonparametric classifiers and their ensembles;

and 3) Design and implementation of parallel algorithms to reduce the

complexity time of clustering algorithms, and nonparametric classifiers

and their ensembles.

The main outcomes of this project will be (i) a meta-analysis of

clustering algorithms, and nonparametric classifiers and their ensembles

for microarray data and (ii) the development of a software to carry out

parallel computation of clustering algorithms, ensembles of nonparametric

classifiers, dimensionality reduction methods, as well as feature

selection procedures used in nonparametric supervised classifiers applied

to microarray data.

4. Visualization techniques for microarray data

(with Caroline Rodriguez)

We are planning to build a graphics library in R that we will include

univariate, bivariate, and multidimensional graphical displays to explore

gene expression data obtained from microarrays experiments.


Finite Dynamical Systems and Applications to Bioinformatics

Dorothy Bollman


Finite dynamical systems have a number of important applications in science and engineering. In this work we study both theoretical aspects of these systems as well as applications in crystallography and bioinformatics.

Problems we are studying include

1. The structure of finite dynamical systems

(with E. Orozco)

Let A = (Fn,S) be a finite dynamical system (FDS) where F is a finite field

and S is a nonsignular n X n matrix over F. If S is nonsingular, then S

decomposes Fn into dijoint cycles or "S-orbits," where x and y belong to the

same orbit if and only if y = Si x for some i. For any nonsingular matrix

M that commutes with S, we define MOS(x)=OSM(x), where for any x in Fn,

OS(x) denotes the S-orbit containing x. Now let R_S be a set of

representatives of the S-orbits. Then M decomposes RS into disjoint

"MS-orbits." In this project we study the orbits structure of the FDS

(RS,M). Applications of these results are in symmetric FFTs and genetic

networks.

2. The reverse engineering problem

(with O. Moreno and E. Orozco)

A genetic network can be described by a dynamic network consisting of a graph

G with say n nodes in which every node i is associated with some real number,

its "state," and a funcion fi which maps the set of n-tupes current states

to a new state of i. A global transition function f maps n-tuples of current

states to a next n-tuple of states by using the fi and a prescribed update

function. Descretizing the system, we assume that each node corresponds to the

same finite state set X. The reverse engineering problem is given a sequence

of elements s1,s2, ..., sm in Xn and a set of conditions, to determine

all dynamic networks f : Xn -> Xn that satisfy the conditions and

such that f(si)=s{i+1}. In this work we explore refinements of the dynamic

network model that will allow efficient solutions to the reverse engineering

problem.


Regularization Algorithms for Ill-posed and Inverse Problems

Lev Steinberg


This research deals with the development of numerical implementations of ill-posed and inverse problems arising in model- based simulation and diagnostic problems at meso- and macroscale levels arising in computational materials science.

Quantum Computations

Lev Steinberg


Current technology is beginning to allow us to manipulate rather than just observe quantum phenomena. This opens up the possibility of exploiting quantum effects in computations. For instance, quantum calculations can solve the factorization problem, which is out of reach of classical computers for large composite integers. This research deals with development of new approaches at Meso-scale material simulations that incorporate quantum computations.

Modeling Environmental Data

Dan McGee

Water Quality and Marine Ecosystem Health Indicators: A project to map indicators of contamination and ecological well being in the Phosphorescent Bay.

Change Points with Neural Networks

Dan McGee

Backpropagating Neural Networks and the Change Point Problem: A project to use the information obtained while training feedforward backpropogating neural networks to identify fundamental changes in datasets.