An Improved Sequential Pattern Mining Algorithm Based on Data Ming
Download as PDF
DOI: 10.25236/icess.2019.236
Corresponding Author
Lili Wang
Abstract
This paper improves PrefixSpan algorithm and proposes ISPM (Improved Sequential Pattern Mining) Algorithm. This algorithm can greatly reduce the numbers of construction projection database, thus improving the efficiency of sequential pattern mining. In addition, the algorithm proposes the concept of sequential pattern values, and reorders the results of the mining sequence patterns by the values of sequence pattern, so that it can find the most important sequence patterns. Then we make experiments to verify the efficiency of ISPM algorithm, from different supports, different types of datasets and different sizes of datasets. Propose the ISPM of Map-Reduce algorithm. In practical applications, in the face of huge datasets, the efficiency of the ISPM algorithm is facing bottlenecks. Therefore, we propose ISPM of Map-Reduce algorithm. By way of distributed processing, we put large tasks into multiple smaller tasks, then do sequence pattern mining in parallel on each name-node. Then we make experiments to verify the efficiency of the algorithm. The first experiment is to verify the speedup of the algorithm between single platform and Hadoop. The second experiment is to test the efficiency of the algorithm in different sizes of the datasets. From two experiments, we can find that this algorithm could be able to improve the efficiency in the face of large datasets.
Keywords
Sequential pattern mining, Projection database, PrefixSpan, Large datasets, Map-Reduce