Thursday, November 8, 2018

In the big data forest, we grow groves not trees

In this blog post, I describe a new indexing mechanism for big data. While the approach is general and can adapt many existing indexes to big data, this post particularly focuses on spatial index trees such as the R-tree as they tend to be more challenging. The key idea is that regular index structures are designed to write the index to disk pages in a regular file system and the indexes are expected to accommodate new records. In big data, indexes are written to the distributed file system which does not allow modifying the files.