Visual and Science computing in MapReduce framework

 Vispark : GPU-Accelerated distributed visual computing using spark

With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework su ers from in exibility and ineciency inherent to its programming model and system architecture. In order to address these problems, we propose Vispark, a novel extension of Spark for GPU-accelerated MapReduce processing on array-based scienti c computing and image processing tasks. Vispark provides an easy-to-use, Python-like high-level language syntax and a novel data abstraction for MapReduce programming on a GPU cluster system. Vispark introduces a programming abstraction for accessing neighbor data in the mapper function, which greatly simpli es many image processing tasks using MapReduce by reducing memory footprints and bypassing the reduce stage. Vispark provides socket-based halo communication that synchronizes between data partitions transparently from the users, which is necessary for many scienti c computing problems in distributed systems. Vispark also provides domain-speci c functions and language supports speci cally designed for high-performance computing and image processing applications.

def meanfilter (data , x, y):
  u = point_query_2d (data , x , y +1)
  d = point_query_2d (data , x , y -1)
  r = point_query_2d (data , x+1, y )
  l = point_query_2d (data , x -1, y )
  ret = (u+d+r+l) /4.0
  return ((x,y),ret )

if __name__ == " __main__ ":
  sc = SparkContext ( appName =" meanfilter_vispark ")
  img = np. fromstring ( Image . open (" lenna . png "). tostring ())
  imgRDD = sc. parallelize (img , Tag =" VISPARK ")
  imgRDD = imgRDD . vmap ( meanfilter (data , x, y). range (512 , 512) )
  ret = np. array ( sorted ( imgRDD . collect ())) [: ,1]. astype (np. uint8 )
  Image . fromstring ("L", (512 ,512) , ret . tostring ()). save (" out .png ")

Simple Vispark Mean image filter example code