Page Header Logo
TEI of Athens eJournals

Achieving optimal average data node storage utilization in k-dimensional point data indexes

Evangelos Outsios, Georgios Evangelidis

Abstract


Indexing of k-dimensional point data is becoming again a hot research topic because of the need to efficiently index and retrieve high dimensional vectors (points) in data mining applications. The most common query on such vectors is kNN searching, which is a variation of range searching. Most multidimensional indexes for point data follow the paradigm of the ubiquitous B+tree and store data entries at the leaf level of the index (data nodes). Since this level naturally occupies the majority of nodes in a multidimensional index tree, it is crucial that an index structure achieves the best possible average storage utilization regardless of data distribution and order of data insertion. An additional conflicting goal is the minimization of the index term that is posted at the levels above when data nodes are split. In this paper we revisit data node splitting techniques for point access methods like the KDB-tree, hB-tree, and, in general, any index that stores point data at its leaf level nodes and splits them so that no overlapping subspaces are created at the leaf level. We experiment with various splitting techniques that produce the minimum index term for posting but differ in the shape of the resulting nodes and the average storage utilization. We also test our splitting techniques using uniform and skewed data distributions. The comparison is on the average data node storage utilization and the efficiency of range query searches.

Keywords


Indexing of k-dimensional point data, Point Access Methods

References


Berchtold, S., Böhm, C., and Kriegel, H.P. (1998). The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. In Proc SIGMOD, pp. 142- 153.

Comer, D. (1979). The Ubiquitous B-Tree. ACM Comput. Surv. (CSUR), 11(2), pp. 121-137.

Evangelidis, G., Lomet, D.B., and Salzberg, B. (1997). The hB-Tree: A Multi-Attribute Index Supporting Concurrency, Recovery and Node Consolidation. VLDB J., 6(1), pp. 1-25.

Lomet, D.B. and Salzberg, B. (1990). The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance. ACM Trans. Database Syst. (TODS), 15(4), pp. 625-658.

Robinson, J.T. (1981). The K-D-B-Tree: A Search Structure For Large Multidimensional Dynamic Indexes. In Proc SIGMOD, pp. 10-18.


Full Text: PDF

DOI: 10.18780/jiim.v1i1.3050

Refbacks

  • There are currently no refbacks.

The application for presenting electronic journals TEI developed within subproject 2 "electronic publishing service" the Act "Development Services Digital Library of TEI" and financed by the operational program "Digital Convergence", NSRF 2007-2013.