The aim of the project is to investigate nuclear matter at high temperatures and pressure using the CHPC for large-scale calculations, hydrodynamic modelling simulations and data analysis. The specific case of simulation and analysis of high energy nuclear physics data has been selected due to the experience of the three groups involved in performing this work, the UCT-CERN Research Centre (University of Cape Town), and iThemba LABS, as well as the suitability of this field of science for the investigation of high-energy physics. The consortium will also assist the CHPC in promoting the development of GRID in South Africa by implementing GRID computing workshops. They will furthermore promote the usage of GRIDS in all other scientific disciplines.
Last Updated on Tuesday, 24 July 2012 18:06
Quantum information technology has gained tremendous momentum in the research arena over the past decade and its potential impact on the future was realised in South Africa when the Quantum Research Group at UKZN received a substantial amount of funding from the Innovation Fund to set up a Centre for Quantum Technologies and to develop a quantum key distribution system. One of the ultimate goals of quantum technology is to produce a completely self-sustained quantum computer. Quantum computers based on logic quantum gates are able to perform certain computational tasks more efficiently than classical computers. However, there are still technological shortcomings with respect to the realisation of quantum gates, the building blocks of the quantum computer.
One of the novel and most promising candidates in realising quantum gates are double-optical lattices, periodic atom structures created by atom-photon interactions. The double-optical lattice structure was first implemented by Prof A. Kastberg and his research team (Umea, Sweden). The Quantum Research Group at UKZN is involved in collaborations with this research team and much progress has been made in the development of a quantum model that can be used to simulate quantum state manipulation of atoms or de-coherence in this lattice structure. It is vital to be able to simulate the dynamics of the atoms in the construction of these quantum gates. The numerical simulation of this model is based on Monte Carlo Wave Function techniques, renowned as an efficient numerical tool, for their and requires the diagonalization of matrices of 10 000 x 10 000 for a complete description of the dynamics. With these techniques and ensemble of stochastic state vectors are propagated in the open system's space so that the reduced-density matrix is recovered through an appropriate ensemble average. Unravellings of time-local non-Markovian quantum master equations in a doubled Hilbert space obtained by time-convolutionless projection operator technique allow for efficient Monte Carlo algorithms.
A generalisation of the Lindblad theory on the regime of highly non-Markovian quantum process in structured environments has been developed by H.P. Breuer, with whom the UKZN Quantum Research Group collaborates in research efforts. This research intends to investigate realistic quantum optical and spintronic devices that exhibit non-Markovian behaviour by using recently-developed algorithms. Exact Monte Carlo methods have been developed for the representation of the non-Markovian dynamics of open quantum systems. These techniques lead to an exact solution of the Von Neumann equation of the total system and they have been shown to be successful in studies of simple-spin models for de-coherence.
Last Updated on Tuesday, 24 July 2012 18:04
RNA viruses, such as HIV, have extremely high mutation rates largely attributable to a low fidelity reverse transcriptase. This high mutation rate of HIV combined with a large effective population size and a short generation time, result in an enormous capacity to generate diversity and to adapt to changes in the host environment. This has important implications for the design and monitoring of drug treatment regimens and is the primary reason for the difficulty experienced in developing a HIV vaccine that is effective across the globe. Moreover, the intra-host evolution of the virus is an essential part of the HIV disease dynamics. It is therefore vital to develop tools to understand HIV evolution within and across infected individuals. HIV data collection efforts are on the increase in scale and resolution as a result of ultra-high throughput sequencing technology and the large-scale application of the Single Genome Amplification (SGA) technique. Efforts to monitor the large-scale roll-out of antiretroviral treatment in South Africa will generate large quantities of molecular sequence data. The availability of data as well as the biologically and clinically relevant insights that can be obtained by modelling intra-host evolution, emphasize the need to develop and apply more accurate and biologically-relevant evolutionary models.
Probabilistic models of sequence evolution typically assume a continuous-time Markov process whereby a rate matrix Q with elements qij denoting the instantaneous DNA substitution rates from codon i to codon j. This enables the determination of a transition probability matrix P(t), as a function of time, that describes the probability of a substitution from state i to state j in a time interval t. Typically, sequences are modelled along the branches of a phylogenetic tree which may be known already or may be included as one of the parameters to be estimated. In one approach the likelihood of a codon alignment, given an estimated phylogenetic tree of relationships, is calculated using a model of evolution that constrains all DNA sites to evolve naturally and then compares it to the likelihood using a model in which a subset of sites are permitted to evolve adaptively. Positive selection is inferred when the latter model is favoured (using standard model comparison techniques).
The distinction between neutral and adaptive evolution is captured by comparing the relative rates of substitutions that alter the encoded amino acid (referred to as non-synonymous substitutions) and substitutions that, because of the degeneracy of the genetic code, have no effect on the encoded amino acid sequence (referred to as synonymous substitutions). The latter class results in the substitution of one codon (or nucleotide triplet, encoding an amino acid) for another codon that encodes the same amino acid, often assumed not to have any effect on organism fitness, and therefore to occur at the neutral rate of evolution (which, according to evolutionary theory, is the same as the rate of mutation). When coding sequences evolve such that the rate of non-synonymous substitution (dN) is greater than the rate of synonymous substitution (dS), positive Darwinian selection (i.e. adaptive evolution) can be inferred. The inference of positive selection and the sites that are evolving under selection can reveal key aspects of organism biology – particularly in the context of pathogen infection and host resistance.
Strategies as the one discussed supra have been used to investigate the evolution of resistance to drug treatment in pathogens, including HIV-1, and have also been used to study escape from immune responses. The models proposed in this study are modifications of these general models of sequence evolution and will be designed to provide more power to detect the effects of immune responses on HIV evolution intra-host, and to detect more sensitively the footprint of host immune genotype on the virus. The study proposes to extend directional selection models, in the context of antiretroviral drug treatment, the detection of secondary resistance mutations and apply these models to sequence data sets from southern Africa and to sequence data sets that are available to the public.
The descriptive project objectives are:
Each of the objectives stated above will be addressed through the development of custom probabilistic models of evolution implemented in HyPhy, the R-package or C/C++.
Develop and apply methods to evaluate the intra-host evolutionary selective pressure acting on HIV-1 protein-coding genes
Previous models of sequence evolution give very high rates of false positives when applied to recombining sequences such as HIV. A model was recently developed to alleviate the confounding effects of recombination on parameter estimation. As part of another study, the researchers intend to carry out an exhaustive survey of selective pressures in HIV protein-coding genes using this model with a view to re-evaluating the published evidence of selection and its biological significance using an unbiased method. The intention is to distinguish between evolutionary selective pressures acting on viruses within individual hosts and selective pressures that shape the evolution of the pandemic more generally. In the first case, the researchers will focus on selective pressures acting on the virus in acute infection. This critical stage of the disease is believed to have a significant impact on set-point viral load, which in turn affects the rate of disease progression. Understanding the nature of the virus isolated from acute infection and its adaptation to the new host environment has potential implications for vaccine development.
The PI of this research is an investigator on the CHAVI project (http://www.chavi.org/) through which he has access to a large set of highly accurate sequences from the HIV env gene, generated through the Single Genome Amplification technique from samples collected in North America (approximately 3 400 unpublished complete sub-type B coding sequences generated from samples from 102 acute individuals). A large number of env coding sequences isolated from southern African individuals acutely infected with HIV sub-type C became available in 2008. These data form a very comprehensive data set and give rise to a unique opportunity to understand the nature of the virus that is transmitted to newly-infected individuals (a key objective of the CHAVI project) as well as the nature of the selective forces that shape the evolution of the virus in the crucial phase of infection (the primary objective of this study). The CHAVI project will also generate whole genome HIV sequences at later stages, and this will enable investigation of the evolution of other HIV genes.
With input from other researchers, the research team has implemented a method to assess the selective pressure acting on HIV sequences in acute infection, using the HyPhy batch language for evolutionary modelling. The method combines evidence over multiple phylogenetic trees representing the evolutionary relationships of sequences from different infected individuals. For each infected individual, the time to the most recent ancestor of the sequences in the individual is estimated using Bayesian Evolutionary Analysis Sampling Trees (BEAST). This method makes use of the Markov Chain Monte Carlo (MCMC) algorithm to sample polygenetic trees (the graphs representing the relationships between the viral sequences) according to their posterior probabilities, given the sequence data. Given a set of sampled trees and a set of input parameters, including mutation rate and generation time, confidence intervals on demographic parameters of interest can be inferred, including the time to the most recent common ancestor of the viral sequences. If the most recent common ancestor of the sequences post-dates the estimated time since the patient was infected (based on clinical stage information), we can be confident that the observed viral diversity was generated during acute infection. The MCMC algorithm used by BEAST is naturally parallelisable since a single chain from which posterior distributions are derived can be sub-divided across multiple nodes. This research team has developed an automated MPI wrapper for BEAST and has tested scalability on an acute infection data set derived from the CHAVI project and because MCMC methods are well-suited to parallel implementation we expect good parallelisability.
This research team will use the methodologies set out above to determine whether the HIV env gene evolves under positive Darwinian selective pressure in acute infection and, if so, identify the sites in HIV env that evolve rapidly at this stage. The researchers expect that these sites will include reversion of mutations that enabled the virus to escape from immune responses that were present in the infecting individual, but absent from the newly-infected host and possibly mutations that facilitate escape from the earliest antibody responses of the newly-infected host. CHAVI investigators have revealed a severe population bottleneck in HIV transmission to the extent that most new infections result from expansion of just a single viral particle, and there is great interest in determining whether this bottleneck is selective (Does the process of HIV transmission from one host to another select for a viral strain that is particularly adapted for this purpose?). This concept is topical, because if transmission involves a severe selective bottleneck, vaccines need not target the full diversity of HIV, but merely a sub-set of strains selected for improved inter-host transmission. If the bottleneck is selective, consistent reversion of amino acids that are advantageous fro transmission to a state that is most advantageous for expansion in a newly-infected host, will leave a detectable signature of positive selection.
Develop a method to model viral evolution and host genotype simultaneously and to use this model to identify the footprint of the host genotype on the autologous viral sequence
One of the most exciting developments in genomics recently has been the development and broad application of whole genome genotyping technologies. These technologies enable genotypes of up to a million single nucleotide polymorphisms (SNPs) to be determined in a single experiment. One of the objectives of this research is to develop the capacity to model the genome sequence of a pathogen and its host simultaneously. HIV is known to evolve under extremely strong evolutionary selective pressure. This fact is central to the current research and an essential aspect of the pathogenesis of the virus. The main driving force for the very rapid and adaptive evolution of HIV is the host environment. The dynamics of the adaptive immune response within individual hosts drives adaptive evolution of HIV within the host.
Differences in the innate and adaptive immune capacity between hosts as well as differences between hosts in a large number of non-immune-related host proteins with which the virus must interact, drive adaptive evolution between the hosts. There have been several attempts to determine the footprint of the host immune capacity on HIV sequences, but no attempt has been made to determine the more general footprint of host genotype on viral sequence evolution. Collaborators in the CHAVI project carried out the first large-scale whole-genome association study of human loci that affect HIV disease progression. This project proposes to use the data from the whole genome association study, together with HIV sequence data from the same individuals that were genotyped as part of that study to investigate general host factors that place a selective pressure on HIV and thereby to establish the general host molecular footprint on the virus. This has profound implications for understanding the mechanisms through which different hosts progress to disease at very different rates and for understanding the biology of HIV pathogenesis. Determining the footprint of the host genotype on HIV sequence requires simultaneous models of sequence evolution and the host environment. Previous studies that investigated the footprint of HLA alleles on an HIV sequence have generally either neglected to account for the non-independence of epidemiologically-linked data or have use heuristic methods to take this into account.
This research proposes to take a more complete model approach to the problem of finding a relationship between viral sequence polymorphisms and host HLA genotype. This model will explicitly model the host genotype and allow sequence evolution to depend on the HLA allele. The method in this research will sum over the genotypes of individuals infected by the ancestral viral sequences, represented by the internal nodes of the tree since these genotypes are unknown. It is possible to encode this analysis using the HyPhy batch language; however, the analysis is extremely computationally intensive owing to the addition of a variable (HLA allele) over which all ancestral nodes of the tree have to be summed. This approach furthermore requires the researchers to test for relationships by fitting models independently for each pair consisting of an amino acid site and an HLA allele. Approximately 20 HLA alleles are readily found in southern African populations to enable testing with this method. The HIV genome is approximately 10 000 base pairs long and approximately one third as many amino acids will be encoded. The implementation of this project thus requires the optimization of 20 x 3000 models. A reduction in computational requirements will be obtained by removing invariant and slow-moving sites, although the computational requirements of this part of the project remain significant. This task is highly parallelizable since there is independent model-fitting for each pair of amino acid site and HLA allele.
Model viral evasion of host immune responses, in particular T-lymphocyte (CTL) responses
Escape from host immune responses can be associated with a significant cost in terms of viral fitness. Reversion of escape mutations to wild-type in hosts lacking a specific immune response and re-escape in hosts with the response can cause a pattern of amino acid toggling. Sequences tend to toggle between a very fit state and an easily accessible or very unfit escape variant. This research proposes to develop a model of amino acid toggling to improve the sensitivity with which it is possible to detect evasion of immune responses that carry high costs in terms of viral fitness. Positive Darwinian selection can be inferred when the rate of non-synonymous, dN, (amino-acid changing) substitution is greater than the rate of synonymous, dS, substitutions. Although dN.dS is sufficient to infer positive Darwinian selection, it is not necessary. The objective in this research is to develop methods to infer positive Darwinian selection from coding sequences when dN=dS. In the case of selective pressure for escape from a specific CTL response and reversion to the most fit amino acid in individuals lacking the CTL response, the researchers expect to observe toggling between the most replicative fit, but immune susceptible state and toggling between the least fit, but immune escaped state. If the rate at which mutations between these states occur are greater than the rate predicted under random neutral mutation, positive Darwinian selection can be inferred, even when dN
Validation of the models of sequence evolution proposed requires extensive simulation with significant high-performance computing implications. In order to determine whether these models are applicable to real-world HIV and other comparable sequence data sets, the researchers will simulate data sets using parameter values and sizes comparable to published data sets of HIV genes that have been used to carry out relatively heuristic investigations of the relationship between immune system alleles (in particular alleles of the Human Leukocyte Antigen – HLA) and sequence evolution. HyPhy allows simulation of data sets using flexible input models and will also be used to fit the models of the simulated data. Fitting the evolutionary models to data involves calculating the likelihood of a data set given a phylogenetic tree (byfurcating tree structure representing the relationships between the sequences, with the lengths of the tree branches representing evolutionary distance), a model and a set of parameter values. The ancestral states of the sequence (the states of the sequence at the ancestral or extinct nodes of the tree) are unknown, and therefore the data of the tree is calculated as a sum over all possible ancestral states. The number of ancestral nodes are more or less equal to the number of sequences (which can be a few hundred) and the alphabet size for codon models of evolution is relatively large, making it computationally intricate to calculate. It can however be calculated in a reasonable time using the dynamic Felsenstein’s Pruning programming algorithm. Fitting the model to the data involves optimization and the EM algorithm will be applied, but this makes for difficulty in parallelisation. The number of simulations carried out will depend on the availability of computer resources, but ideally this research would like to simulate 1 000 replicate data sets fro each model tested in order to reduce the variance in the false positive and power estimates.
Evolutionary modelling to confirm putative CTL (and, later, antibody) epitopes
As an extension of the model proposed above, the research team will investigate the development of a model specifically designed to aid in the identification of epitopes (the regions with pathogen proteins that are recognised by the host immune system). The researchers will firstly concentrate on epitomes recognised by human cytotoxic T-lymphocytes (CTLs) since these epitopes are contiguous on the viral sequence. The PI of this research has worked with researchers at the South African National Institute of Communicable Diseases (NCID) where they have experimentally identified the approximate regions within HIV proteins recognised by the CTL response (using interferon-gamma ELISPOT assay). They have also generated HLA genotype data for these HIV-infected individuals. This type of data has also been generated in laboratories in other parts of the world.
HLA proteins function in presenting the epitope at the cell surface, where it interacts with receptors on immune cells, causing a CTL-mediated immune response. It is possible to hypothesise that the CTL response to a given peptide is mediated by a given HLA allele when the HLA allele is statically over-represented among individuals indicating a specific CTL response. The researchers propose to confirm the existence of a relationship between specific HLA alleles and CTL-recognised peptides through the use of evolutionary modelling. In particular, the research will model the evolution of a CTL-recognised peptide, allowing different evolutionary rates along branches leading to individuals with and without a focal HLA allele. For branches of the phylogeny leading to ancestral nodes (where the HLA state is unknown because the individual infected by the ancestral virus is not present in the data set), a mixture model will be used with mixture proportions equal to the frequency (and one minus the frequency) of the HLA allele. This will enable the researchers to determine whether the evolutionary rate within a specific peptide is dependent on the presence or absence in an infected individual of the HLA allele under investigation. This approach can easily be extended by developing directional models of selection analogous to the model used to investigate drug resistance. In this instance the researchers will test whether there is a tendency at specific sites within the peptide to mutate away from the wild-type (or default state) in the presence of the HLA allele and towards the wild-type in its absence. This would provide a signature of immune escape and enable the researchers to pinpoint mutations that enable the virus to evade the immune response, while it would also provide strong evidence for an epitope for the investigated HLA allele within the peptide region modelled. The researchers have not determined parallelisability. Since it is similar in concept to the modelling of the human genotype-HIV evolution, it is likely to require substantial computational resources. The researchers predict that the project will scale more or less linearly with the number of epitope-HLA allele combinations.
Detect compensatory mutations associated with the evolution of anti-retroviral drug resistance
The recently-developed models of the evolution of drug resistance will be extended to the detection of compensatory mutations and this model will be applied to the serially sampled HIV sequences. These models will be used to detect compensatory mutations associated with the evolution of resistance to anti-retroviral drugs using the data set of Nevirapine-treated individuals that formed the basis for a recent study and from the new drug treatment data sets as they become available from the National Institute of Communicable Disease. The proposed method will test for dependence between evolutionary rate at a candidate accessory resistance site and the resistance state at known resistance sites. This will require independent model fits and optimizations for each site and for each of the most important known drug resistance sites. This task is computationally highly intensive, but also highly parallelisable and the researchers expect linear gains in efficiency with the number of available computer nodes.
Each of the five objectives described above requires substantial computing resources: in obtaining sufficiently large numbers of samples for accurate estimation using MCMC sampling techniques, for analysis of multiple databases, and for extensive testing of custom models through simulations. Preliminary analyses using some of the models were computationally extremely intensive (approximately two hours per codon site). Since these models are subdivided by codon site, they can be parallelised relatively easily.
Last Updated on Tuesday, 24 July 2012 18:36
Cosmologists seek to understand the structure and evolution of the universe and its physical constituents. A field that lies at the interface of astronomy and particle physics, cosmology has undergone a major revolution over the past decade, made possible by a wealth of observational data from cutting-edge experiments and telescopes. The wide range of these cosmological observations is impressively fit by a simple concordance model with a small number of parameters.
Modern South African Astronomy and Cosmology: Confronting the Simulated and the Observed Universe
The standard cosmological model describes an expanding universe that is smooth on the largest scales with inhomogenous structures, such as galaxies, galaxy clusters and super-clusters, present on smaller scales. These structures originated from small fluctuations, or irregularities, that were present in the matter distribution in earlier times and which then steadily grew via gravitational instability to form the variety of observed structures. In addition to the familiar, visible matter there is strong evidence that the universe contains significant fractions of dark matter and dark energy that dominate the cosmic energy budget today.
Despite the remarkable success the cosmological model has enjoyed, there are several outstanding issues that remain. The leading challenges in cosmology are:
Major national and international astronomical facilities will help address these questions and further transform our understanding of the cosmos. All three investigators on this research will play a leading role in cosmological surveys on those facilities. The researchers at UKZN, UWC, UCT and NASSP will be involved in a large telescopic survey of supernovae and galaxy clusters on the Southern African Large Telescope. This data will be used in conjunction with photometric data from the Sloan Digital Sky survey (SDSS) and microwave data from the Atacama cosmology Telescope (ACT) to study the use of supernovae and galaxy clusters, respectively, as probes of the dark energy. Both the SDSS and the ACT projects are high-profile international projects, of which the researchers are members. The microwave data from ACT can be combined with Wilkinson Microwave Anisotropy Probe (WMAP) data and future polarization data to set strong constraints on the nature of the primordial fluctuations. The South African MeerKAT radio telescope project will map out the distribution of neutral hydrogen (cold gas) in galaxies at intermediate redshift. The cold gas plays a pivotal role in galaxy and cluster evolution.
With the wealth of incoming data, cosmology has now entered the high-precision era. To analyse and interpret this data and to use it to constrain theoretical models requires significant computational effort. Large cosmological N-body simulations are necessary to understand the distribution of large-scale structures and the complicated gas physics that influence how galaxies and galaxy clusters evolve. These simulations are also used as an impetus to understand the yield of astronomical targets (supernovae, galaxy clusters) in large cosmological surveys in different wave bands. Computationally intensive algorithms that maximise the constraints on dark energy from large supernovae surveys are necessary to design optimal surveys. The N-body and hydrodynamic simulations are also used to study the systematic effects that arise in large observational surveys, and to calibrate these effects when studying real data – issues that will be studied for microwave observations. Large Monte Carlo Markov Chain simulations are necessary to estimate cosmological parameters from the variety of data sets resulting from these observational surveys, thereby further refining the cosmological model. The CHPC provides an unprecedented opportunity to complete this computationally intensive programme that will help to address leading cosmological questions.
Last Updated on Tuesday, 24 July 2012 18:01
This project addresses electromagnetic computer simulation of key elements of the proposed Square Kilometre Array (SKA).
Electromagnetic Computer Simulation for the MeerKAT and SKA
The world's radio astronomy community is working together to conceptualise the SKA - the largest and most sensitive radio telescope ever. It is likely to consist of a myriad of dishes, each 10-15m in diameter. Special antenna tiles in the core of the array will form a “radio fish eye lens” for all sky monitoring at low frequencies. This will allow many independent simultaneous observations. The joint receiving area of all these dishes and panels will add up to approximately one million square metres. The SKA will require super-fast data transport networks ever-increasing powerful computing. South Africa and Australia are the only two countries on the short-list to site this mega-telescope. A final decision on the site is expected by 2010 and construction should start in 2014. Should the telescope be sited in South Africa, the SKA's core will be in the Northern Cape Province (in the Karoo region). The SKA will consist of thousands of antennae spread over several thousands of kilometres. Approximately half of the antennae will be concentrated in a central region of approximately 5km in diameter.
The SKA is an international mega-science project and the projected cost is currently one billion euros. The erection of the SKA in South Africa will serve as a catalyst to firmly connect South Africa with international science.
The MeerKAT, another very important South African project with a budgeted cost of R800 million, is considered the largest government-funded science project since the early 1990s. The MeerKAT, with its approximately 80 dishes, is intended to serve as a technology demonstrator for the SKA, but will also be used independently.
The primary elements in both the SKA and KAT are receiving antennae. Due to the large size and complex design of these structures with demanding specifications, CEM plays a key role in the optimal design of the elements of the antennae to ensure maximum scientific return on investment.
A primary application of the SKA is the mapping of the entire sky and the ability to observe several parts of the sky simultaneously (in theory, as many times as there are elements in the focal plane array) can substantially reduce survey time (which can be measured in years for some radio astronomy projects). The focal plane array (located in the feed point) places a number of receiving elements at the focus of the main dish (probably a reflector with a 12 m diameter) and allows for the effective reception of a number of beams simultaneously. Simulating even a small array over the operating band of KAT requires high-performance computing. A major challenge in the construction of these antennae is to create them to be able to operate effectively across a wide frequency range. Constructing large arrays are grand challenges and thus call for the use of high-performance computing.
Last Updated on Tuesday, 24 July 2012 18:08