Sliced Coordinate List Implementation Analysis on Sparse Matrix-Vector Multiplication Using Compute Unified Device Architecture

fitriyani fitriyani

Abstract


Matrices are one of the most used data representation form from real-world problems. Lot of matrix was formed very big but sparse, hence information inside the matrix is relatively small compared to its size. This caused into heavy computational resources needed to process those matrices within short time. One of the solutions to do an efficient process to the sparse matrix is to form it into a specialized form of sparse matrix, such as Sliced Coordinate List (SCOO). SCOO format for sparse matrix has been developed and combined within an implementation using Compute Unified Device Architecture (CUDA). In this research, performance of SCOO implementation using GPU – CUDA will be compared to the other sparse matrix format named Coordinate List (COO) based on its memory usage and execution time. Results obtained from this research show that although SCOO implementation for sparse matrix use memory 1.000529 larger than COO format, its serial performance is 3.18 faster than serial COO, besides that, if SCOO implementation is conducted parallel using GPU – CUDA then its performance can be achieved around 123.8 faster than parallel COO or 77 times faster than parallel COO using one of the available library for CUDA, named CUSP.


Full Text:

PDF

References


Dang, H.V and Schmidt, B. (2013). CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations. Parallel Computing, 39 (11), 737-750.

http://dx.doi.org/10.1016/j.parco.2013.09.005">Crossref

Richard.B, Gabriel, C, & John , S.( 2014), Linear Algebra: Algorithms. Applications. and Techniques: Third Edition. USA, Academic Press.

Nicholas, Wilt. (2013). The CUDA Handbook: A Comprehensive Guide to GPU Programming. USA, Addison-Wesley Professional.

Halliday & Resnick ( 2011). Fundamental of Physics 9th Edition Extended. USA, John Wiley & Sons.

N. Bell & M. Garland (2009). Implementing Sparse Matrix–Vector Multiplication on Throughput-Oriented Processors, SC, ACM, 11.

N. Bell & M. Garland (2012). Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. , version 0.3.0.

Davis, T. A. & Y. Hu. "Tim Davis: University of Florida Sparse Matrix Collectio : sparse matrices from a wide range of applications". http://www.cise.ufl.edu/research/sparse/matrices/

Davis. Tim. "Research Sparse Matrix Algorithms and Software". http://faculty.cse.tamu.edu/davis/research.html.

Rauber,T. & Runger, G. (2010) Parallel Programming For Multicore and Cluster Systems, Springer.

NVIDIA Corporation, CUDA C Programming Guide, (2011) ..

NVIDIA. "What is GPU Computing? | High Performance Computing | NVIDIA | NVIDIA". http://www.nvidia.com/object/what-is-gpu-computing.html

GilbJ.R., Reinhardt, S. & Shah V. B. (2006). Highperformance Graph Algorithms from Parallel Sparse Matrices,. Proc. of the Int'l Workshop on Applied Parallel Computing.

Farzaneh,A., Kheiri, H. & Shahmersi, M.A. (2009). An Efficient Storage Format For Large Sparse Matrices, Commun. Fac.Sci. Univ.Ank. Series A1 Volume 58 , Number 2 , 1-10

Hong,S. & Kim, S. (2009) . Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model, ISCA'09 Proceeding, ACM.

Zlatev & Zahari (1991). Computational Methods for General Sparse Matrices. Netherlands: Springer.




DOI: http://dx.doi.org/10.21108/IJOICT.2016.21.71

Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 fitriyani fitriyani

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.