Part 1 of this blog provided an overview of drivers accelerating the transformation of healthcare to personalized medicine. This blog provides more detail on how IBM is helping facilitate this transformation.
IBM’s role in accelerating discovery
Vendors like IBM have key roles in facilitating this transformation. From investing in technologies including IBM Watson, cloud, Spark, and software defined infrastructure to investing in partnerships with leaders in life sciences and healthcare. One such collaboration is between IBM Systems and the Data Science Institute (DSI) at the Imperial College London (DSI). The partnership is focused on enhancing the performance of tranSMART, an open-source data warehouse and knowledge management system, on IBM Software Defined Infrastructure platform. tranSMART is used for integrating, accessing, analyzing, and sharing clinical and genomic data on very large patient populations. IBM Systems and DSI are working with 3rd parties and across IBM organizations to ensure tranSMART support for large-scale text analytics, natural language processing, machine learning, and data federation capabilities. Our partnership will link tranSMART to key IBM technologies including IBM Watson that will accelerate mining of clinical information and curated scientific literature at very large scale.
Technology challenges impeding innovations
With large scale projects, many organizations’ IT departments are finding their existing analytics and infrastructures lack the smarts and capacity to handle the data volume/variety and the workloads. Traditional organizations and their infrastructure architectures are often siloed, limiting their scalability and their interactions, thus impeding time to completion and collaboration. These technology environments are not equipped to handle the different workload patterns, data types, and file sizes inherent to genomics research. “Computational workloads must be performed across thousands of very large files containing heterogeneous data, where just a single file containing genomic sequence data alone can be on the order of hundreds of megabytes. Moreover, biological and clinical information critical to the study must be mined from natural language, medical images, and other non-traditional unstructured data types at very large scale.”2
Big data and software defined infrastructure
Abundant in data, yet lacking the IT resources to analyze or to store this data, research and clinical organizations are augmenting or replacing existing IT systems and analytics – even those just a few years old – with more agile, infrastructures and analytical frameworks. Organizations are looking to cloud, to software defined infrastructures, and to open source technologies like Mesos for greater infrastructure agility and scalability, For big data and analytics, many organizations are now choosing Apache Spark – instead of Hadoop MapReduce – resulting in up to 10 times better performance, and investing in cognitive offerings like IBM Watson.
Data challenges
One publication estimates that “for every 3 billion bases of human genome sequence, 30-fold more data … must be collected … This means that as much as 2–40 exabytes of storage capacity will be needed by 2025 just for the human genomes.”[1] It is no wonder that organizations are investing in more cost effective storage and looking to redesign their architectures to achieve maximum data flow.
Use cases
- For one IBM customer, a leading research hospital in the forefront of genomics research and medicine, the IT team supporting multiple departments – including oncology and psychology – achieved 10 times speedup for in-house disease mapping applications as well as greater reliability, performance and scalability. The team improved throughput and got rid of I/O bottlenecks – by studying the job mix and usage patterns – in the redesign of their HPC system using IBM Software Defined Infrastructure solutions. To minimize I/O bottlenecks, the team architected a new storage system with flash, tiering, and inodes. For the very large number of tiny (<1kb) files, the team stored this data within the inodes thus saving space.[2] Other providers leveraging IBM Storage and IBM software defined offerings include one of the largest healthcare group in the world. This is provider is readying themselves for the onslaught of data including images from remote devices e.g. monitors. To manage the coming flood, this East Coast-based health system is building a private cloud to help it store, sort, compress and transmit the data.[3]
- At a leading U.K. research institute, NGS systems were producing 120 Terabytes of raw data and generating 100,000’s of related analytical processes, per week, across dozens of heterogeneous HPC systems. Using IBM Software Defined Infrastructure software, the institute has maximized and is continually monitoring utilization of their IT resources. These capabilities have helped the organization identify and remove inefficient analytical workflows, improve resource usage, and defer significant capital expenditure.
- In addition to tranSMART partnership, IBM continues to invest in relationships with key software vendors and business partners including Lab 7, dataBiology, Dassault Systemes Accelrys, and Edico Genome. Edico Genome is partnering with IBM to ensure market leading speed and accuracy for their DRAGEN data processing technology – NGS data analyses on IBM Systems solutions including Power and software defined offerings. DRAGEN has cut down the data analysis time for a whole human genome from 22.5 hours to under 30 minutes. Such an advancement has a profound impact in clinical settings where treatment outcomes and a patient’s prognosis depend on rapid clinical diagnosis.
Conclusion
Personalized treatments are the end results of comprehensive analysis of exabytes of data starting with raw sequencer output and ending with the addition of prior medical histories, and environmental factors. Along the way, scientists have accessed, processed, and analyzed genomic sequences, longitudinal patient medical records, biomedical images, and other complex, information-rich data sources including peer-reviewed scientific literature. The timely completion of these tasks are dependent on the availability of powerful analytics underpinned by high performance, agile compute and storage architectures that must have: the flexibility to address the application needs of individual researchers; the speed and scale to process rapidly expanding stores of multimodal data within competitive time windows; and the smarts to extract facts from even the most complex unstructured information sources. Technologies such as Spark, IBM Watson, IBM Power, and IBM Software Defined Infrastructure offerings are only parts of the solution while initiatives such as ADDoPT, the IBM and DSI collaboration on tranSMART, and The 100,000 Genomes Project fill in the gap to hyper accelerate innovations in personalized medicine.
Come visit IBM at HIMSS16, Booth #5932 to learn more about IBM’s role in accelerating personalized medicine.
Join us in the conversation:
IBM Healthcare Twitter: @IBMHealthcare or #IBMHealthcare
IBM Healthcare YouTube: www.youtube.com/ibmhealthcare
IBM Healthcare Slideshare: http://www.slideshare.net/IBMSmarterHealthcare
We look forward to seeing you in Las Vegas on February 29 – March 4.
[1]http://openpowerfoundation.org/blogs/imperial-college-london-and-ibm-join-forces-to-accelerate-personalized-medicine-research-within-the-openpower-ecosystem/
[1] http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195#pbio.1002195.s006
[2] Big Omics Data Experience, Kovatch, SC ’15, Oovember 15-20, 2015, Austin, TX, USA ACM 978-1-4503-3723-6/15/11. http://ibm.biz/BdH3W3
[3] http://www.post-gazette.com/business/tech-news/2014/10/05/UPMC-prepares-for-onslaught-of-digital-images-and-remote-medical-data/stories/201410050008
This blog is authored by Jeff Hong (Global Industry Marketing Lead, Software Defined Infrastructure, IBM Systems) and Jane Yu (IBM Worldwide Industry Architect, Healthcare & Life Sciences)