- An Introduction to Hyracks Operators in AsterixDB
- Your first Hyracks operator
- A Hyracks operator for plane-sweep join
Monday, November 30, 2015
Reducing the memory footprint of the spatial join operator in Hyracks
This is the fourth blog post in a series that describes how to build an efficient spatial join Hyracks operator in AsterixDB. You can refer to the previous posts below:
Labels:
AsterixDB,
Hyracks,
Memory,
Plane sweep,
Spatial Join
Location:
Irvine, CA, USA
Tuesday, November 24, 2015
A Hyracks operator for plane-sweep join
This is the third blog post in a series of blog posts about creating an efficient Hyracks operator for spatial join. In the previous two posts, we gave an introduction to Hyracks operators and briefly described how to write a simple Hyracks operator. In this blog post, we describe how to make the previously created operator more efficient by using a plane-sweep spatial join algorithm instead of a naive nested loop algorithm.
Scope of this blog post
In this blog post, we will focus on improving the operator we created in the last blog post by replacing the nested-loop join subroutine with the more efficient plane-sweep join algorithm. In addition, we will do some minor code refactor to keep the code organized. For simplicity, we assume that the two inputs are already sorted on the x coordinate which can be done using one of the sort operators that ship with AsterixDB, e.g., ExternalSortOperatorDescriptor.
Labels:
AsterixDB,
Hyracks,
Plane sweep,
Spatial Join
Location:
California Ave, Irvine, CA, USA
Monday, November 23, 2015
Your first Hyracks operator
In a previous blog post, I introduced Hyracks operators and briefly described how they work. In this blog post, I'll show how to create and use a very simple Hyracks operator that performs spatial join using a naive nested loop algorithm. Although there is already an existing nested loop join operator in AsterixDB, I provide a simpler, probably less efficient, implementation for the sake of demonstration. This blog post will focus on the steps needed to create a Hyracks operator while future blog posts will show how to improve this operator to be more efficient.
Friday, November 20, 2015
An Introduction to Hyracks Operators in AsterixDB
I had the opportunity to collaborate with the AsterixDB team led by Mike Carey and Chen Li in University of California, Irvine. The main objective of this collaboration is to introduce efficient ad-hoc spatial join query processing in AsterixDB. In this blog post, I will try to summarize my work for future developers and users of AsterixDB. I believe that this post could be very helpful as it is coming from an outsider. So, I will try to make it simple and will not assume and previous knowledge of AsterixDB. In fact, this is my first real interaction with AsterixDB system internals.
Sunday, November 1, 2015
Speeding up point-in-polygon query using Pigeon and SpatialHadoop
Point-in-polygon Query
A widely used function in GIS and spatial applications is the point-in-polygon query which finds whether a point is inside a polygon or not. Typically, this function is used as a spatial-join predicate to relate a large set of points to a smaller set of polygons, for example, associate a set of geo-tagged tweets to states in the whole world. If the polygons are relatively simple, e.g., a few tens of points per polygon, then we can use any simple spatial join algorithm such as plane-sweep or partition-based spatial merge join (PBSM). However, if the polygons are rather complex, e.g., tens of thousands of points per polygon, then a single point-in-polygon query becomes too expensive and regular solutions do not scale. In this blog post, I'll describe a simple technique that we applied using Pigeon and SpatialHadoop to speed up the point-in-polygon query in the case of very complex polygons.
Subscribe to:
Posts (Atom)