HPC and Big Data Workshop
25 Apr 2012 - 27 Apr 2012

Date: 25-27 April 2012
Venue: Potential 1 & 2 (Fusionopolis level 13)
Instructor: Dr. John Foe, Oreste Villa, Sinan Al-Saffar from Pacific Northwest National Laboratory

The Challenge: Big Data -Technology advances have made data storage relatively inexpensive and bandwidth abundant, resulting in voluminous datasets from modeling and simulation, high-throughput instruments, and system sensors. Such data stores exist in a diverse range of application domains, including scientific research (e.g., bioinformatics, climate change), national security (e.g., cyber security, ports-of-entry), environment (e.g., carbon management, subsurface science) and energy (e.g., power grid management). 

As technology advances, the list grows. This challenge of extracting valuable knowledge from massive datasets is made all the more daunting by multiple types of data, numerous sources, and various scales -- not to mention the ultimate goal of achieving it in near-real time. To dissect the problem, the science and technology drivers can be grouped into three primary categories: 
1. Managing the explosion of data 
2. Extracting knowledge from massive datasets 
3. Reducing data to facilitate human understanding and response. 

Transformational Solution - Aggressive work to solve this big-data challenge through data intensive computing. 

Data Intensive Computing - Data Intensive Computing (DIC) is concerned with capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies. Addressing the demands of ever-growing data volume and complexity requires epochal advances in software, hardware, and algorithm development. Effective solution technologies must also scale to handle the amplified data rates and simultaneously accelerate timely, effective analysis results. 

About the course: 

Day 1 (25/4/12)
Leadership Class Systems (Instructor: Oreste Villa) 
1. Notable HPC Systems 
   * Cray XK6 
   * IBM Blue Gene Q 
   * K Machine 
2. Programming Models 
   * MPI 
   * GlobalArrays 
3. Program Exercises 

Day 2 (26/4/12) 
Multithreaded Systems (Instructor: John Feo) 
1. Cray XMT/2 system 
2. Programming Models 
   * Data parallelism 
   * Recursion 
   * Dataflow 
3. Program Exercises 

Day 3 (27/4/12) 
Data Intensive Science (Instructor: Sinan Al-Saffar) 
 1. Introduction to semantic graphs and ontologies 
2. Dataset as graphs: concepts and implementations 
3. Graph algorithms for semantic graph querying and mining 
4. INSPIRE: visualizing large data sets

No of Participants: 54