Vispark : GPU-Accelerated distributed visual computing using spark
With the growing need of big-data processing in diverse application domains, MapReduce (e.g., Hadoop) has become one of the standard computing paradigms for large-scale computing on a cluster system. Despite its popularity, the current MapReduce framework suers from in exibility and ineciency inherent to its programming model and system architecture. In order to address these problems, we propose Vispark, a novel extension of Spark for GPU-accelerated MapReduce processing on array-based scientic computing and image processing tasks. Vispark provides an easy-to-use, Python-like high-level language syntax and a novel data abstraction for MapReduce programming on a GPU cluster system. Vispark introduces a programming abstraction for accessing neighbor data in the mapper function, which greatly simplies many image processing tasks using MapReduce by reducing memory footprints and bypassing the reduce stage. Vispark provides socket-based halo communication that synchronizes between data partitions transparently from the users, which is necessary for many scientic computing problems in distributed systems. Vispark also provides domain-specic functions and language supports specically designed for high-performance computing and image processing applications.
def meanfilter (data , x, y): u = point_query_2d (data , x , y +1) d = point_query_2d (data , x , y -1) r = point_query_2d (data , x+1, y ) l = point_query_2d (data , x -1, y ) ret = (u+d+r+l) /4.0 return ((x,y),ret ) if __name__ == " __main__ ": sc = SparkContext ( appName =" meanfilter_vispark ") img = np. fromstring ( Image . open (" lenna . png "). tostring ()) imgRDD = sc. parallelize (img , Tag =" VISPARK ") imgRDD = imgRDD . vmap ( meanfilter (data , x, y). range (512 , 512) ) ret = np. array ( sorted ( imgRDD . collect ())) [: ,1]. astype (np. uint8 ) Image . fromstring ("L", (512 ,512) , ret . tostring ()). save (" out .png ")Simple Vispark Mean image filter example code