The Open University
A joint venture between the Science & MCT faculties
Stephen Lewis (Science, DPS)
Stephen Lewis (Science, DPS) runs global atmosphere models of Mars, Venus, Earth, Giant and ExtraSolar Planets. It is becoming increasingly imperative in his field to conduct experiments at higher spatial resolutions, and to analyse larger quantities of spacecraft data in combination with the models, in order to remain scientifically competitive and this requires much more computer memory (at least 8GB), processor time and disk storage. Expanded use of the OU cluster will be essential for the success of current (e.g. Mars related projects funded by NASA and ESA) and future (e.g., several grant applications submitted to STFC, a NERC application, the new version of the ESA contract, etc.) projects.
Jimena Gorfinkiel (Science, DPS)
Jimena Gorfinkiel (Science, DPS) runs ab initio calculations on electron-molecule collisions. A current EPSRC grant to study electron collisions with molecular clusters relies heavily on the Linux Cluster, in particular the large memory cores. A new collaboration with a Swiss group to study mid-size alcohols and ethers with the aim of understanding how functional groups affect the formation of resonances and the subsequent dissociative electron attachment process, has been making intensive use of the large memory (8 GB) cores. Routinely the process of running several tens of calculations takes about 2-3 days of computing time. These calculations make intensive use of I/O and therefore require large local scratch areas. Future grant applications to work on other processes of interest to biological radiation damage will require even more computer resource and will therefore not be possible without expansion of the current computer capability.
James Hague (Science, DPS)
James Hague’s research includes embolic stroke modelling, optical lattice simulations and electron-phonon interactions (a grant application as CI is to be submitted in the spring). These require a large quantity of CPU time, e.g., 3 months run time on 16 processors for a calculation similar to PRL 98 (2007) 037002; more detailed work may require even more computational resources. Both this latter application and the success of his EPSRC First Grant application to work on graphene will rely heavily on the availability of computer resources at the OU. Enhanced computing resources will significantly enhance the chance of success in an application for a Responsive Mode grant on embolic stroke to be submitted in autumn 2009 (which will involve detailed fluid dynamics simulations).
Neil Edwards's (Science, E3)
Neil Edwards's (Science, E3) group uses and develops simplified Earth System Models to study the interactions of climate change and economics, the origins of ice-age cycles, and mass-extinction events through the Phanerozoic. The thorough analysis of model error is absolutely essential for the credibility of this work, and requires large ensembles of order 1000 simulations to produce reliable statistics. This research therefore relies on having regularly upgraded machines, so that processor speed allows the longest integrations, and a large cluster of computing nodes, to investigate large-dimensional parameter spaces. Competitor institutions (e.g. Bristol, Exeter, Notts, Hamburg to name just four) have recently invested in new ~1000-processor clusters, thus to remain at the cutting edge in this field requires significant, sustained investment in procuring, upgrading and managing a large-scale cluster.
Andrew Norton (Science DPS)
Andrew Norton (Science DPS) has started an STFC rolling grant project, linked to that of Jones/Horner, to investigate the stability of the orbits of exoplanets in hierarchical multiple stellar systems (binaries, triples, quadruple stars etc. The code (swift_hjs) runs for millions of years of system time and each run takes several weeks of CPU time. Since the parameter space to explore is so large (different stellar masses, planetary masses, orbital separations / periods / eccentricities / inclinations and hierarchies) the project is currently open ended and likely to use a significant amount of computer time for years to come.
Andrew Norton has been using a custom-written period search program to identify all the periodic stellar variables in the SuperWASP archive. This currently contains 14 billion data points on 23 million objects. Initial period searches on subsets of objects were run on the cluster at the OU whilst the code was being developed, final runs are carried out on a similar system local to the archive itself at Leicester University. Several papers are likely to follow reporting the half-million newly identified variable stars that the search has uncovered, but an initial study based on period-searching done at the OU is published as:
A&A 467, 785-905 (2007) ‘New periodic variable stars coincident with ROSAT sources discovered using SuperWASP’, A. J. Norton, P. J. Wheatley, R. G. West, C. A. Haswell, R. A. Street, A. Collier Cameron, D. J. Christian, W. I. Clarkson, B. Enoch, M. Gallaway, C. Hellier, K. Horne, J. Irwin, S. R. Kane, T. A. Lister, J. P. Nicholas, N. Parley, D. Pollacco, R. Ryans, I. Skillen, D. M. Wilson.
Andrew Norton & Ollie Butters run a hydrodynamical particle code - HyDisc - which simulates accretion flows in magnetic cataclysmic variable stars. There is a vast parameter space to explore (different stellar mass ratios, magnetic field strengths, orbital periods and spin periods) each of which yields a different accretion flow. Establishing flows at equilibrium requires the code to be run over many system orbital periods which can take days of CPU time. Result are published in: The Astrophysical Journal, 672:524–530, 2008 January 1 The Accretion Flows and Evolution of Magnetic Cataclysmic Variables A. J. Norton, O. W. Butters, T. L. Parker, G. A. Wynn In an ongoing investigation we are now developing an additional code which simulates the X-ray emission that arises from each flow using a ray-tracing approach. This too requires significant cluster CPU time to calculate the simulated lightcurve for each set of system parameters.
Elaine Moore (Science, C&AS)
Elaine Moore (Science, C&AS) uses the IMPACT cluster for modelling molecules and materials. She has built up a suite of programs, mostly free to academic institutions (or available at low cost) and with site or group licences.
Ab initio molecular programs: GAMESS(US), DALTON
Ab initio solid state programs: CRYSTAL06, WIEN2k
Solid state modelling: GULP3.1
Biochemical molecular dynamics: AMBER10, X3DNA
Properties of doped metal oxides
Metal oxides are widely used in industry as catalysts, semiconductors, capacitors, magnetic devices etc. The properties of these oxides can be altered by replacing a fraction of the atoms by either vacancies or foreign atoms. Computational methods are used as an adjunct to experimental methods to aid the interpretation of data, for example by predicting the positions of atoms that are hard to distinguish experimentally. This project is run in collaboration with Frank Berry and will continue after his retirement. A new project in this area will be started with a student at the University of Khartoum, looking at the effect of replacing oxygen by fluorine or other elements on materials with unusual magnetic properties (colossal magnetoresistance) to try and point the way to better materials for use in electronic devices.
Modelling of pairing of modified DNA bases (with Yao Xu) Modified DNA bases can be incorporated into DNA strands but can cause the base to pair incorrectly leading to DNA mutation and possibly cell death or cancer. Experimentally it is possible to determine which modifications lead to incorrect pairing, e.g. 4-thiothymine pairs correctly with adenine but 6-thioguanine pairs very poorly with either cytosine or thymine. Our aim is to model base pairing in order to aid our understanding of why some modifications lead to incorrect pairing but not others. In the longer term we intend to model interactions of modified bases that have not yet been synthesised in order to predict those that would cause mis-pairing. A substantial amount of work was done on the EPSRC national facility NSCCS which is no longer available.
Computer Modelling Studies of Solid State Catalysts (with Eleanor Crabb) Experimental testing of catalysts can identify modifications that affect performance, for example, properties of catalysts composed of precious metal nanoparticles (<5 nm) supported on high surface area supports such as alumina, silica or carbon these catalysts (activity, selectivity or stability) can be altered dramatically by alloying/modification with another metal or oxide component. However it is not usually possible to determine how and why this happens. Using computer modelling, it is possible to explore possible mechanisms leading to insights into why the modifications have such effects and possibly suggest ways of enhancing performance. In 2009 the project will consider the stability and geometry of various surfaces of materials used as catalysts. In the first instance this will be noble metals such as platinum and ruthenium which are used as electrocatalysts. It will then go on to look at the attachment of molecules to these surfaces, for example the strength of adsorption to high or low-coordination sites, e.g. edges or planes. Differences between adsorption on different metals will be explored and applied to differences in reactivity.
The research of Joan Serras (MCT)
The research of Joan Serras (MCT) involves the study of multilevel representations on very large systems. The cluster is used to run a multi-agent transport model called TRANSIMS (TRansportation ANalysis and SIMulation System). “TRANSIMS is an integrated system of travel forecasting models designed to give transportation planners accurate and complete information on traffic impacts, congestion and pollution” (Hobeika, 2005). TRANSIMS can model urban systems of any size at a refined time-scale (1 second basis). TRANSIMS goes beyond the traditional four-step model to achieve an activity-based demand. The output data produced by the model is then integrated in a Multilevel Representation.
The cluster offers the chance to do a successful run of a model for Milton Keynes, modelling the movement of each inhabitant in the synthetic population (around 2x105 inhabitants) on a second-by-second basis. They are currently planning to address larger areas of the order of 106 agents. The use of the cluster’s parallel computing interfaces (MPI and PVM) are of key importance to execute the simulations on an acceptable time.
Mike Grannell and Terry Griggs (MCT)
Mike Grannell and Terry Griggs (MCT) use the cluster for “Orientable biembeddings of Steiner triple systems of order 15”. As part of continuing work on Topological Design Theory, it is important to study all biembeddings in an orientable surface of the Steiner triple systems of order 15. There are 80 of these and typically each one would take between 3 to 10 weeks to process depending on the size of the automorphism group (i.e. degree of symmetry in the system). Most systems would require the full 10 weeks. Run in series the project would take over 10 years to complete but the cluster is ideal because one can assign one processor for each system and by running in parallel the calculations are completed in under three months. M. J. Grannell, T. S. Griggs, M. Knor and A. R. W. Thrower, A census of the orientable biembeddings of Steiner triple systems of order 15, Australasian Journal of Combinatorics 42, 25 details these results. Further parallel use of the cluster is planned.
Michael Wilkinson (MCT)
Michael Wilkinson (MCT) uses the cluster in connection with ‘Strings in turbulent flow’ investigating the statistics of the configuration of a string advected by a turbulent fluid flow assessing whether it forms a compact, folded conformation like a random walk, or an extended conformation pulled out by large-scale eddies in the flow.
Other numerical studies will address the smallest objects produced by gravitational collapse It is commonly claimed that a cloud of interstellar gas which undergoes gravitational collapse will fragment into smaller pieces, until the fragments are dense enough that they are opaque to their own black body radiation. There are persuasive reasons to doubt this criterion, which suggest that fragmentation can continue until the pieces are significantly smaller than currently expected. Numerical work will be done to test theoretical estimates. This will use a variant of an SPH method which includes radiative transfer.
Andrey Umerski (MCT)
Andrey Umerski (MCT) models the spin-dependent electronic properties of magnetic multilayers and other nanostructured materials, at the atomic level using the IMPACT cluster. The cluster is used to model These materials and the quantum mechanical effects they exhibit, are currently of great theoretical, experimental and technological interest. This emergent research area is known as spintronics, and is the research area for which the physics Nobel prize was awarded in 2007.
Results obtained from the old cluster were crucial in a successful bid for a 3 year EPSRC PDRA fellowship. The theme of this grant is to investigate the spin dependent transport of electrons across a semiconductor/ferromagnet interface, and the new cluster will be of central importance in this project: allowing us to perform realistic simulations by including interfacial roughness and defects. Such computations are highly CPU intensive, but by using the MPI (Message Passing Interface), we distribute the workload over all the processors, making the calculations tractable.
Future applications for research grants will almost certainly be based on work performed on the new cluster.
Anne de Roeck: Computing (MCT)
Natural Language Processing: Computing Modern Natural Language Processing and Information Retrieval techniques typically rely on statistical language models extracted from large (eg 200M word) corpora or text collections. More recent approaches have seen the introduction of more and more context-sensitive approaches - i.e. approaches where the probability of a word occurring is not modelled as constant throughout a text, and hence, where it does not suffice to rely on simple word occurrence counts. Instead, the techniques include, for example, processing of large numbers of contexts and n-grams (of which there need to be many lest the data is too sparse), advanced statistical techniques (eg Bayesian statistics supported by Markov Chain Monte Carlo methods), advanced machine learning techniques for the induction of probabilistic grammars, and several compute-intensive methods for dimensionality reduction in shallow text representation models (such as LSA). In other words, as well as data-intensive, all these methods are also compute intensive. In addition, many standard packages in NLP and IR assume an underlying Unix (or Linux) architecture.
All the examples above are drawn from NLP and IR projects over the last 6 years or so. These include 4 PhDs (Sarkar on Term Burstiness; Chantree on Ambiguity Detection; Haley on using LSA in automatic Marking; Nanas on adaptive Information Filtering), 2 EPSRC funded projects (Autoadapt; Context Sensitive Information Retrieval). They will also be required for a recently funded EPSRC project based on Chantree's work (Matrex) and a Jisc project, all of which will be using this range of approaches and techniques. We also envisage using the array for grammar induction, to allow the OU to scale up its current capacity for marking short answers automatically (we are in the process of bidding for strategic funding for this, in collaboration with the COMSTL CETL, the VLE and the Science Faculty).
The Linux array is a vital piece of infrastructure for this line of research. The processing volume we require is on the increase as several of the projects we have secured carry a requirement of developing tools which will require not only access to large datasets (on a scale we have not engaged with before), but also the need to deliver reasonable results at run time.
Paul Upton: Applied Maths (MCT)
Modeling surfaces of crystalline materials such as metals and semiconductors: The cluster is used for modeling surfaces of crystalline materials such as metals and semiconductors. Particular attention is paid to the kinetics of step-edge roughening of vicinal surfaces. To do this, Monte Carlo simulations of simplified lattice models are run on the cluster and results are compared against purely theoretical predictions coming from stochastic models.
It is necessary to run repeated simulations with different random number seeds to get an averaged time series. Also, simulations are carried out with lattices of different sizes (i.e., numbers of lattice sites) in order to get a "scaling collapse" of data so as to check scaling predictions coming from the theory. The Linux cluster was absolutely essential for this work since different time runs can be placed on different cluster nodes and many repeated simulations are required to get good statistics.
Avik Sarkar: Centre for Computational Linguistics - former research student (MCT)
Statistical Natural Language Processing and Computational Linguistics: My research is in the field of statistical Natural Language Processing and Computational Linguistics. These require processing large amounts of textual data.
First I need to store these large textual data collections, for which the large hard-disk resources are required, the RAID disks on the Linux clusters help me for that. Processed data files are stored into many small files, as fast access is required on these files.
Processing large amounts of data requires large memory and CPU resources, which can only be done on powerful clusters like KRONOS. To understand data characteristics, Bayesian models are fitted to this data; these are very computational intensive and require large number of iterations for the parameters of the model to converge. The Linux cluster is the only answer to such computational intensive methods.
Often collection of real-world textual data is required for research purposes. Crawlers have to be run to collect data from the web. Crawlers are computational intensive and also require large storage space to store the collected files. Linux clusters with large storage are useful for such purposes.
Andrea Capiluppi: Computing (MCT)
Evolution of software systems: The research I'm conducting through the cluster involves several aspects in the study of evolution of software systems: both product and process evolutionary aspects are considered, and open source, Agile and commercial systems are currently under study.
Each of the aspects are evaluated for every release of around 15 open source systems, 1 Agile system and 3 commercial systems; each aspect is studied through a different algorithm implemented in an automated script.Among the product observations, we are evaluating the growth of systems at various levels of granularity (system level, folder level, file level and class/method/function level). The architecture of the system is evaluated through the research of folder structures.
Complexity, as a limit to growth of systems, is evaluated at each level, and through different approaches, like the McCabe cyclomatic number for functions/methods/classes, branching factor for folder structures and perseverance of highly complex elements along the evolution. Among the process observations, we are evaluating the amount of work provided to both evolve and maintain software systems; unbounded amount of developers are specifically tracked in open source systems, and their productivity is considered in long-lived, large-size applications. Comparison with other domains (agile and commercial) are sought for.
Simulations through an agent-based system are currently being studied and run for integrating the knowledge of complexity, growth and productivity
CVS facilities are used under the kronos server; communication with the helios SQL server are used to store relevant results
Francis Chantree: Computing - former research student (MCT)
Building a Corpus: During the course of my research I use Linux cluster. I have built a corpus using this facility, and also many programs in the PERL programming language. These programs, some of which are highly complex, perform tagging, chunking and text manipulation functions. I have also used this implementation of Linux to run machine learning programs. I would not have been able to carry out this research so successfully on a Windows platform.
Heather Whitaker: Statistics (MCT)
Evaluating and extending the self-controlled case series method (SCCS): Our research is on evaluating and extending the self-controlled case series method (SCCS). The SCCS method is used in epidemiology to study associations between a transient exposure and an acute adverse health event, using data only on individuals who have experienced the health event of interest. The method has proved most useful for evaluating vaccine safety. The cluster is used for extensive simulations to check our methods and to carry out epidemiological studies with large data sets.
David Broadhurst (Science, DPS)
David Broadhurst (Science, P&A)works on the relation between number theory and the Feynman diagrams of quantum field theory. He has coworkers in the USA, Canada, Australia, Germany and the Netherlands whose combined computing power considerably exceeds that available at Open University. Nonetheless, much of the work done remotely results from discoveries originally made at the OU. Notable examples include the enumeration of irreducible multiple zeta vales, new solutions to the Prouhet-Tarry- Escott problem, improvements to the methods of Lenstra, Coppersmith and Howgrave-Graham for divisors in residue classes, and the discovery of singular values of elliptic integrals in Feynman diagrams and lattice Green functions. Broadhurst's work contributed to UoA 20 (Mathematics) in the RAE. He also works informally with Geoff Bradshaw to monitor usage of the IMPACT cluster and to test the system after downtime. He is particularly pleased to see that it is now being exploited at close to full capacity, by a wide spectrum of appropriate users, behaving collegially, and will continue to encourage such use of the now urgently needed expansion.
Robert Hasson: Applied Mathematics (MCT)
Robert Hasson (Applied Mathematics, MCT) investigates the properties of inverse problems and in particular a brain imaging technique called Magnetoencephalography (MEG). The IMPACT cluster is used to segment MRI slices to produce smooth surfaces which accurately follow brain surfaces. In turn these smooth surfaces are used to create integration meshes so that the Boundary Element Method (BEM) can be used to relate brain sources to MEG measurements. Once a BEM model has been constructed for a subject then the process of analysing MEG measurement data for that subject can begin. The process of contructing a BEM model is very time consuming (currently about 4 days computation time per subject if there are no complications). Without the IMPACT cluster the current project which compares data across ten subjects would not be feasible.
Uwe Grimm (MCT)
Uwe Grimm (MCT) is working on aperiodically ordered systems and quasicrystals, and is particularly interested in combinatorial properties. He uses large scale computer calculations to count configurations, obtaining reliable estimates and establishing bounds on their growth rates. In particular, he has been working on pattern-avoiding sequences, and the IMPACT cluster provides an invaluable resource which enables us to compete with leading groups in the world.