Chinese Crunch Human Genome With Videogame Chips
The world’s largest genome sequencing center once needed four days to analyze data describing a human genome. Now it needs just six hours.
The trick is servers built with graphics chips — the sort of processors that were originally designed to draw images on your personal computer. They’re called graphics processing units, or GPUs — a term coined by chip giant Nvidia. This fall, BGI — a mega lab headquartered in Shenzhen, China — switched to servers that use GPUs built by Nvidia, and this slashed its genome analysis time by more than an order of magnitude.
In recent years, the cost of sequencing genomes — mapping out an organism’s entire genetic code — has dropped about five-fold each year. But according to Gregg TeHennepe — a senior manager and research liaison in the IT department at The Jackson Laboratory in Bar Harbor, Maine — the cost of analyzing that sequencing data has dropped much more slowly. With its GPU breakthrough, BGI is shrinking the gap.
In the world of medicine, this is nothing but good news. It promises to dramatically boost biological exploration, the study of diseases, and efforts to realize the long-touted vision of personalized medicine — the idea of being able to tailor drugs and other treatments based on an individual’s genetic makeup.
GPUs Get Super
GPUs began life in desktop PCs. But nowadays, they’re widely used for “high-performance computing,” driving supercomputers that crunch through huge amounts of data generated by scientists, financial institutions, and government agencies. Much of this data can be broken into small pieces and spread across hundreds or thousands of processors.
Graphics processors are designed to crunch floating-point data. Floating point processing — in which the decimal point can move — makes it easier for computers to handle the large numbers typical of scientific data. As a bonus, graphics processors are generally less expensive and less energy-intensive than standard CPUs.
According to Jackson Lab’s TeHennepe, the feat BGI and NVIDIA pulled off was porting key genome analysis tools to NVIDIA’s GPU architecture, a nontrivial accomplishment that the open source community and others have been working toward. The development is timely. TeHennepe’s Jackson Laboratory is best known as as one of the main sources of mice for the world’s biomedical research community, but it’s also a research center that focuses on the genetics of cancer and other diseases. The lab has been conducting high-throughput sequencing for more than a year, and it has been looking into GPU computing to bolster the lab’s ability to analyze the data.
TeHennepe calls BGI’s accomplishment “an important step forward in the effort to apply the promise of GPU computing to the challenge of scaling the mountain of high-throughput sequencing data” — assuming that BGI’s accomplishment can be verified and applied elsewhere.
GPU computing holds the promise of delivering orders of magnitude increases in performance and reducing power and space requirements for problems that can be structured to take advantage of the highly parallelized architecture. The open question in the high-throughput sequencing community has been the extent to which their analysis challenges can be restructured to fit the GPU model.
Beyond the CPU
To achieve the same genome analysis speeds with traditional CPUs, BGI would have to use 15 times more computer nodes, with an equivalent increase in power and air conditioning, according to bioinformatics consultant Martin Gollery. With GPUs, Gollery says, BGI gets faster results for its existing algorithms or use more sensitive algorithms to get better results. It can use its existing computing resources for other tasks.
According to Chris Dwan — principal investigator and director of professional services at BioTeam, a consulting firm that specializes in technology for biomedical research — organizations that use GPU-enabled genome analysis can also pare back their computing infrastructure. Sequencing machines generate hundreds of gigabytes of data at a time. That data has to remain “hot” on disk drives for as long as the analysis software runs.
“If you can churn through data in a few hours rather than a week you might be able to save quite a bit on high-performance disk space,” Dwan says.
Another consequence of BGI’s GPU initiative is the likelihood that other institutions will be able to use BGI’s GPU-enabled applications. “Most of the genomics folks that I know have been waiting for GPU-enabled applications to appear in the wild, rather than dedicating local developers and building the apps themselves,” says Dwan.