PittGrid has been an invaluable resource in my research. A chapter of my dissertation required complex manipulation of a non-linear model of multiple economies, trade, and technology spillovers. Using a desktop to analyze this model would have required several weeks of analysis. My department has limited computing resources and locking up my own computer for such a long period of time would impact my ability to work on other research projects. PittGrid's distributed computing network effectively allowed me to run multiple jobs simultaneously, which reduced the computing time for my project to few days.
For my dissertation, I have proposed a new method for analysis of genome-wide association analysis. These studies involve the analysis of millions of genetic markers for each research participant! With such massive data sets, proper computational equipment is crucial. In order to compare my method to others previously published, I required a massive simulation which was ideal for the pittgrid. These simulations individually did not require much computing power, but there sheer number made it impossible to work with a single cpu. One of my professor suggested I talk with the Center of Simulation and modeling here at Pittsburgh and they quickly referred me to the pittgrid. With my limited computational background, Senthil was extremely helpful in getting me set up and teaching me how to use the grid.
After many months of continuous use I was finally able to finish my simulation and I am currently in the process of writing a manuscript based on this study. I would recommend the pittgrid to any faculty or student who need a powerful parallel computing aid. I am sure that I will have future projects utilizing this amazing tool. The University of Pittsburgh is very fortunate to have support systems such as the grid as it only helps improve the quality of research around campus.
I am a recently graduated Ph.D. student from Department of Biostatistics. I would like to share my PittGrid success story with you. In the last term of my Ph.D. study, I was working on a project about Major Depressive Disorder (MDD) with my collaborator in Department of Psychiatry. This project was the core part of my dissertation and I was planning on to graduate within three months and start a new job. But I encountered a computing resource issue with MDD project which was computationally demanding. Even though my adviser provided a very fast server with multiple cores to run this project, even after I tried my best to optimize the programs to run fast still it was expected to take at least five months to finish all the computing.
In the meantime, I heard about PittGrid through one of my friends. With the help of PittGrid, I finished the MDD project within only one and half months. Right now, one of my papers based on this project has been accepted by one journal. I also finished my defense on time and started a new job smoothly. Here, I also would like to say many thanks to Mr. Senthil for his help and time. He is really a person with golden heart. Without his help, I could not imagine whether I could finish my project on time.
I am a second-year medical student who received a summer scholarship from the American Federation on Aging to study the effect of blood pressure on the thickness of the cerebral cortex. In order to do this, I had to reconstruct 3D models of human brains from T1-weighted MRI scans using a software package known as FreeSurfer. This process typically requires more than 12 hours of computation time per brain, and I had dozens of brains in my pipeline. I used PittGrid to run eight reconstructions in parallel, and this allowed me to finish my summer research in a surprisingly short amount of time. Using PittGrid gave me enough time to analyze my results and present my findings.
Researchers at other institutions have access to powerful (and expensive) clusters dedicated to running FreeSurfer. For a medical student, PittGrid is the one place to turn for comparable supercomputing resources. I would like to personally thank Senthil for his help in setting up FreeSurfer on the PittGrid and transmitting the large data structures produced by the brain reconstructions.
Here is an abstract of our research so far:
Thinning of the cerebral cortex on MRI has been associated with hypertension and a number of other risk factors for vascular disease. We examined the association between mildly elevated blood pressure and cortical thickness in 66 healthy individuals from the MR-HYPER study. Our sample consisted of participants aged 35-59, of which 21 had prehypertensive blood pressures (systolic 120-139 mmHg and/or diastolic 80-89 mmHg). We also examined the association between cortical thickness and other vascular risk factors, including blood lipids, body mass index and blood glucose. Cortical thickness was measured on brain models reconstructed from T1-weighted MPRAGE MRIs using the FreeSurfer software package. Age was found to be a significant predictor of both regional and global cortical thickness, whereas blood pressure was not. Fasting blood triglycerides and glucose were modestly associated with thicker and thinner cortex, respectively. Neither age nor vascular risk factors were significantly associated with total cortical volume as a fraction of intracranial volume. Our findings suggest that subclinical levels of vascular risk may not affect cortical gray matter structure in middle-aged adults, while age alone is a better predictor of cortical thickness.
I have been enjoying a powerful grid computing tool called PittGrid and would like to share it with others. PittGrid allows users to submit multiple jobs to a host machine which monitors the usage of a cluster of computers and distributes each job to an idle machine to be run. Thanks to Senthilís constant work, the pool of computers is growing. Now I can use up to 100+ Windows computers simultaneously, as compared with the capacity of around 50 computers years ago (see successful stories below by Ana-Maria Iosif and Qiang Wu).
I cannot imagine how I could complete my project in a timely fashion without the help of PittGrid. In our research, my advisor and I proposed three estimators to measure the mother-child association in dementia in the Cache County Study data on memory in aging. Some estimators involve very complicated algorithms which require O(n^2) computing time for a single run and repeated bootstrap samples to get the variance estimators. I spent around one month to finish all the simulations in my project using about 110 computers simultaneously. It is impossible to run such a computationally intensive simulation study on a single computer.
PittGrid makes it possible to verify new ideas in my research in a much shortened time. With the aid of PittGrid, I do not have to wait for months to get a feedback result so I can debug my programs quickly. I am now developing a regression analysis for the Cache County data, which requires iterative optimization of the target function with respect to a large number of parameters (say, 200). Each single simulation may take days to complete. PittGrid will definitely be my only choice to successfully complete this project.
I would like to sincerely thank Senthil for his great work in providing this powerful grid service!
Dept. of Statistics
School of Arts and Science
University of Pittsburgh
Pittsburgh 15260 PA
PittGrid has been the ideal tool for performing my work on generalizing the so-called Matrix Element Method (MEM) for new physics analyses at the Large Hadron Collider (LHC). This method is particularly powerful for investigating potential signals of dark matter at the LHC. Since dark matter particles are expected to be weakly interacting, they would be invisible to the detector, which poses a challenge for the interpretation of the experimental data. The MEM addresses this problem by using all visible information in observed collision events and integrating over the unobserved momenta of the invisible particles. In a recent publication currently under review at Phys Rev D, my collaborators and I have analyzed how to extend the MEM to deal with additional radiation, which is expected to be abundant at the LHC.
In practice, for every simulated data event and every choice of values for the underlying model parameters, we needed to perform a multi-dimensional numerical integration. Running our code therefore requires a relative high amount of computing time, but it is trivial to parallelize since no communication between the integrations for individual events is required. PittGrid has been a very suitable and flexible tool for this task, in particular since it allows the user direct control of the jobs, which is extremely useful during the debugging stage.
Entromics uses quantitative computation of evolution-related entropy and enthalpy along a gene sequence to enlarge the scope of bioinformatics from the conventional consideration of sequence similarity and homology for annotations of the genome to expand it using the analysis of Property Coherences across the genome. The concept of property coherence encapsulates the principle that identity of physical properties across distant loci strongly indicates a functional or evolutionary relationship. In the entromics framework, the conventional link between the sequence and function similarities is a special case of property coherence: All similar genomic sequences share their physical properties by definition and are therefore .coherent.. The revolutionary information gain afforded by entromics emerges from a suite of innovative mathematical tools that have identified a large, overlooked fraction of genomic sequences that share physical properties of entropy and enthalpy but exhibit no significant sequence similarity. The implementation of entromic methods facilitates quantitative modeling of systems and processes in which genetic factors play a role, especially in clinical applications and disease modeling.
The property of coherence across the genome can, in principle, be studied by conventional computational approaches. However, these conventional approaches require prohibitively complex algorithms, large numbers of model parameters, and extensive experiments to determine these parameters. A more efficient approach is to use the mathematics of graph theory, applied in the context of statistical thermodynamics. The enabling feature of this innovative application of mathematics revolves around the observation that individual graphs represent very large series of disparate genomic sequences that share their physical properties. As a consequence, estimates of the entropy and enthalpy or entromics, encapsulate series of model-driven, single-sequence based, deterministic methods, not relying upon statistical arguments. Instead, entromics provides a computationally tractable method for transcending the interpretation of DNA as merely an ordered sequence of bases. It uses additional insight provided by correspondence between the fundamental properties of metric models for random graphs and functions of statistical thermodynamics applied to gene sequence. This cross-disciplinary approach results in a rigorous mathematical formulation of property coherence in terms of analysis of graph distances defined from rigorous set metrics for graphs representing different genomic sequences.
Algorithms that encapsulate entromic project are currently being computed on the PittGrid environment using a very static, linear computation; yet still consumes 21,000 of the 525,000 CPU hours on the PittGrid environment, and this for sequences relatively small (from 1,000 to 300,000 base pairs). We benefited from the possibility to run these computations in parallel. The above CPU time was spent on per partes computation of the entropy part of the entromic results for the complete human genome, genomes of Plasmodium falciparum, plant Arabidopsis thaliana, parasite Trypanosoma brucei, Leishmania major and many variants of viral genomes (influenza, HIV). The largest genomic sequence we processed so far was the complete one chromosome genomes of Mycoplasma genitalium and pneumonia (genome lengths ~580,000 and ~816,000 base pairs, respectively). For problem of this size, results may not obtained in hours but in days upon weeks of computation time, and considering the scalability of this project and the emphasis of networks of long-range relations in genomes of sizes up to 3,200,000,000 base pairs for a Homo sapiens genome there is clear advantage in being able to parallelize current computations.
I am a forth year Ph.D student in mathematics. I started to use PittGrid half a year ago. It helps me a lot in research and save tons of time. My research subject is mathematical modeling in system biology. We construct models and do simulations, parameter fittings, which is heavily numerical.
A single computer is obviously not sufficient. PittGrid offer us multi-CPUs to help us achieve our goal. You can submit more than 50 jobs at one time! What's more, you can log into PittGrid at anytime, anywhere. Even if you are working at home, you still can connect to it through VPN such that you can check the your working process.It is cool!
I worked with David Swigon on a discrete mechanical modell of DNA, which consists of rigid plates (representing the base pairs) with elastic connections, regarding the torsion, the bending, the shear and the tension-compression.We considered the case of linear elasticity. We were interested in whether multiple solutions (static equilibrium states) exist. I adopted the piecewise linearization based simplex algorithm and its modified version to the problem. This algorithm is suitable to find all the solutions of a nonlinear BVP.
The basic concept of the algorithm is simple: it first divides the space spanned by the independent variables and the parameter of the problem into hypercubes, then each cube is divided into simplexes and the linearized equations are solved in each simplex. It is possible to scan a part of the whole space for equilibrium branches, or for points of the branches at a certain value of the parameter of the problem, and to follow an equilibrium branch from its known point.
The disadvantage of the algorithm, that the number of the computational steps is proportional to (n-1)^3n!d^n, where n is the dimension of the space, d is the number of steps on an axis. Because of this, only short, two, three and four base pair long molecules were studied. In case of the four base pair long chain with simple boundary conditions and one parameter, the dimension of the space is seven.With a single PC it takes a long time to get the results, even if d is small, smaller than 10.
That's why we decided to use PittGrid, which was really helpful and made it feasible to finish the simulations within a considerable amount of time. The space to be scanned was divided into 64 equal parts, the scanning program was set up for all the parts, and this 64 independent computations were sent to PittGrid. In this way the results were obtained within some days. Different base pair step combinations and boundary conditions were studied, finally few thousands of simulations were done using the PittGrid. I believe it wouldn't have been possible without PittGrid.
As part of a research program in the inflammatory response during Sepsis, we are performing Bayesian parameter estimation in equation based models of inflammation. Analytic analysis of the posterior parameter distributions is not possible due to the non-linear and high dimensional nature of the models. Monte Carlo Markov Chain (MCMC) methods provide a tractable, but computationally intense, approximation to the true distribution.
We are developing a coarse-grain parallel MCMC method with fast mixing times for multi-modal distributions. The algorithms are currently being implemented and tested on the PittGrid system. Our group does not have a dedicated compute cluster so the PittGrid system has been a valuable resource for our work.
Pittgrid has provided me with a computing environment in which I can simulate inflammatory effects on gas exchange in different regions of the lung simultaneously.
Once data is collected from these simulations, I mix the output from these regions in order to model the full lung during inflammation. With Pittgrid this process can be accomplished overnight. Without Pittgrid it would take days.
Shortly after we started our project on biological neural networks, we were considering the efforts to build a small scale cluster for our emerging computational needs. Such efforts would have diverted a significant portion of our time and budget.
Luckily, we were informed by our departmental IT support on the availability of PittGrid. It has since proved to be an important resource for our ongoing research.
Due to the exploratory nature of our research, our computational efforts are interleaved with non-computational efforts including the developments of new codes. A dedicated cluster for our group would have sat idly during these intervals and wasted most of its CPU cycles. It makes sense for researches of similar exploratory nature to share a single pool of computational resources that can collect unused CPU cycles across the campus into a unified interface of access.
Given an 8-D mathematical model of the acute inflammatory response well calibrated to sparse rat cytokine data following an endotoxin challenge, parametric sensitivity and local identifiability analysis was performed to reduce the 46-D parameter space to an 18-D core parameter set predominantly characterizing model variability.
PittGrid was used to refit the mathematical model to the experimental data, starting with 750 different initial parameter guesses and varying only the core parameters.An ensemble of 296 parameter vectors providing a good fit to experimental data was identified.
I am pleased to say that the email preceding yours in my mailbox was an email saying that the paper had been accepted by the journal Biometrics.
In the response, the editor commented on how impressed she was with the scope of our simulations. Without the grid, such an in-depth simulation study would not have been possible.
I have thus far used the grid for one project which resulted in two publications. The project explores a new way to analyzed longitudinal data that accounts for within-subject correlation in order to produce more accurate inference. This new method is applied to a study that examines when it is beneficial to combine the standard chemotherapy Taxol with an anti-angiogenic drug in the treatment of ovarian tumors.
This work has resulted in both a clinical publication, Holtz et al., (2008), . Should tumor VEGF expression influence decision on combining low-dose chemotherapy with antiangiogenic therapy?. , Journal of Translational Medicine, 6:2, and a methodological paper, Krafty et al, (2008), .Varying coefficient model with unknown within-subject covariance for analysis of tumor growth curves,. Biometrics, in press.
All the simulations in my dissertation were performed in R programming language (www.r-project.org) using PittGrid, which is the University of Pittsburgh's campus-wide computing environment.
PittGrid provides the ability to access additional CPU time and memory in order to run complex calculations using existing, underutilized CPU.s participating in the PittGrid network.
Since every one of the jobs we needed to run involved complex computations for 1000 data sets, we separated each of the jobs into 20 sub-jobs, each involving only 50 data sets, which we submitted to PittGrid. After the jobs were completed, we gathered the results.
The gain in efficiency was tremendous: while a normal computer needs more than five days to complete a single job involving 1000 data sets with 100 subjects per group and 4 repeated measurements, using PittGrid allowed us to run the same job divided into 20 sub-jobs in less than 24 hours.
I had small (100) to medium (1000) sets of Matlab simulations that I needed to run and they were taking too long to run on my personal computer. However, the jobs were too small to warrant the use of the supercomputing center. Then I heard about Pittgrid. It was the perfect resource for the size of jobs that I needed to run. Getting authorization to use Pittgrid was an easy process and the staff was extremely helpful with instructions on its use. In a very short amount of time, I was up and running simulations by myself from my office or from an off campus location. Pittgrid really helped me acquire results for my thesis in a fast and efficient way.
Before I knew PittGrid, all my computing work was based on an old computer in my office. It has a CPU with a speed of 1.7M. I shall wait for a couple of days for a simulation of size 1000 to complete. And it occupies all the CPU capacity so that I would not be able to do any other work, even reviewing webs.
I learned the PittGrid in one of Mr. Senthil's presentations. And after that, he was so kindly in helping me setting up the account, giving me instructions on how to use PittGrid, and installing packages. Now I can complete the same simulation work in 30 minutes, since my work is distributed to more than 50 computers.
This is so helpful in preparing my PhD dissertation which involves a lot of statistical computing. I don't have to struggling running algorithms on multiple computers. I can always submit my jobs from the computer in my office. This is so convenient.
Another thanks to Senthil.
Contact Senthil Natarajan:
senthil (AT) pitt (DOT) edu
Last updated: 01/10/13