tag:blogger.com,1999:blog-76980716615251768382024-03-21T19:02:51.621-07:00Ahmed EldawyAhmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.comBlogger21125tag:blogger.com,1999:blog-7698071661525176838.post-25213376434823701562021-01-09T22:32:00.003-08:002021-01-09T22:37:14.095-08:00Standardized generation of big spatial data in SparkIf you build a system or algorithm for spatial data processing, you might need
to generate large scale spatial data for benchmarking. The generated data needs
to have the following characteristics:
Flexible: You should be able to easily control the characteristics of the data, e.g., size or skewness.Reproducible: It should be relatively easy to reproduce this dataset to allow others to repeat Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com0tag:blogger.com,1999:blog-7698071661525176838.post-65561458408462907912019-07-29T08:00:00.001-07:002020-09-25T16:40:43.571-07:00UCR Star reveals Google Maps poor quality in Beijing, China (Or may be not!)We used to hear stories about some mapping applications failing due to poor data quality. However, it makes a big difference when you find major failure of the most prevalent mapping application in the country with the biggest user base. The story is that while exploring some datasets in UCR Star, I found the following view of building in Beijing, China.
Update (7/31/2019)
Turns out, thereAhmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com2Riverside, CA, USA33.9806005 -117.3754941999999933.7698425 -117.69821769999999 34.1913585 -117.0527707tag:blogger.com,1999:blog-7698071661525176838.post-87779624823989854322019-07-14T11:16:00.003-07:002020-09-25T16:42:34.622-07:00A Star Is Born at UCR
UCR STAR
In the Big Data Lab at UCR, we are happy to announce the first release of UCR STAR, formally, the UCR Spatio-temporal Active Repository [https://star.cs.ucr.edu/]. STAR is made available as a service to the research community to provide easy access to existing big spatio-temporal datasets through an interactive exploratory interface. Researchers and developers can choose from Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com0tag:blogger.com,1999:blog-7698071661525176838.post-19090695090780812652018-11-08T08:38:00.001-08:002021-01-09T22:33:43.435-08:00In the big data forest, we grow groves not trees
In this blog post, I describe a new indexing mechanism for big data. While the approach is general and can adapt many existing indexes to big data, this post particularly focuses on spatial index trees such as the R-tree as they tend to be more challenging. The key idea is that regular index structures are designed to write the index to disk pages in a regular file system and the indexes Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com0tag:blogger.com,1999:blog-7698071661525176838.post-4943319164591151702017-10-20T16:23:00.000-07:002018-02-15T22:51:45.689-08:00Visualize SpatialHadoop indexesI received several requests asking for help in building visualizations for SpatialHadoop indexes. In many of my papers, posters, and presentation, I display a visualization of spatial indexes like the one shown below.
[Click to enlarge] A Quad-tree-based index for a 400 GB dataset that represents the world road network extracted from OpenStreetMap.
There are actually several ways to visualize Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com0tag:blogger.com,1999:blog-7698071661525176838.post-55219551152614611052016-12-22T18:34:00.001-08:002017-03-14T12:34:28.546-07:00Visualize your ideas using RasemA major part of a researchers' work is to write papers and articles that describe their work and make posters and presentations to better communicate their ideas. We all believe that "A picture is worth a thousand words" and we are always looking for better ways to visualize our ideas. In this blog article, I present Rasem, a library that I built as I started my PhD and used it in many of my Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com0tag:blogger.com,1999:blog-7698071661525176838.post-47450897851003193192016-03-31T20:05:00.002-07:002017-03-14T12:35:08.185-07:00Around the world in one hour! (revisit)In this blog post, we revisit an earlier blog post about extracting data from OpenStreetMap Planet.osm file. We still use the same extraction script in Pigeon but we make it modular and easier to reuse. We make use of the macro definitions in Pig to extract common code into a separate file. In the following part, we first describe the OSMX.pig file which contains the reusable macros. After that, Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com1tag:blogger.com,1999:blog-7698071661525176838.post-1522615837967488602016-02-20T12:38:00.000-08:002017-03-14T12:35:48.548-07:00HadoopViz: Extensible Visualization of Big Spatial DataWith huge sizes of spatial data, a common functionality that users are looking for is to visualize this data to see how it looks like. This gives users the power of quickly exploring new datasets with huge sizes. For example, the video below summarizes 1 trillion points that represent the temperature of every 1 km2 on the earth surface on every day from 2009 to 2014 (total of six years).
This Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com25tag:blogger.com,1999:blog-7698071661525176838.post-86173175793619345132015-12-02T09:29:00.001-08:002017-03-14T12:36:04.037-07:00Voronoi diagram and Dealunay triangulation construction of Big Spatial Data using SpatialHadoop
Voronoi Diagram and Delaunay Triangulation
A very popular computational geometry problem is the Voronoi Diagram (VD), and its dual Delaunay Triangulation (DT). In both cases, the input is a set of points (sites). In VD, the output is a tessellation of the space into convex polygons, as one per input site, such that each polygon covers all locations that are closest to the correspondingAhmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com1tag:blogger.com,1999:blog-7698071661525176838.post-66101140595956879412015-11-30T16:14:00.002-08:002017-03-14T12:36:45.112-07:00Reducing the memory footprint of the spatial join operator in HyracksThis is the fourth blog post in a series that describes how to build an efficient spatial join Hyracks operator in AsterixDB. You can refer to the previous posts below:
An Introduction to Hyracks Operators in AsterixDB
Your first Hyracks operator
A Hyracks operator for plane-sweep join
Scope of this post
In the third post, I described how to implement an efficient plane-sweep join algorithm Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com1Irvine, CA, USA33.642884236950536 -117.8425097465515133.641231736950537 -117.84503124655151 33.644536736950535 -117.83998824655151tag:blogger.com,1999:blog-7698071661525176838.post-70582137471765741672015-11-24T12:39:00.001-08:002017-03-14T12:36:57.966-07:00A Hyracks operator for plane-sweep joinThis is the third blog post in a series of blog posts about creating an efficient Hyracks operator for spatial join. In the previous two posts, we gave an introduction to Hyracks operators and briefly described how to write a simple Hyracks operator. In this blog post, we describe how to make the previously created operator more efficient by using a plane-sweep spatial join algorithm instead of aAhmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com1California Ave, Irvine, CA, USA33.6369838 -117.8352874999999933.610545800000004 -117.87562799999999 33.6634218 -117.794947tag:blogger.com,1999:blog-7698071661525176838.post-49635817341077534802015-11-23T12:29:00.001-08:002017-03-14T12:37:44.585-07:00Your first Hyracks operatorIn a previous blog post, I introduced Hyracks operators and briefly described how they work. In this blog post, I'll show how to create and use a very simple Hyracks operator that performs spatial join using a naive nested loop algorithm. Although there is already an existing nested loop join operator in AsterixDB, I provide a simpler, probably less efficient, implementation for the sake of Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com1tag:blogger.com,1999:blog-7698071661525176838.post-58623496358032119332015-11-20T18:21:00.000-08:002017-03-14T12:37:31.175-07:00An Introduction to Hyracks Operators in AsterixDBI had the opportunity to collaborate with the AsterixDB team led by Mike Carey and Chen Li in University of California, Irvine. The main objective of this collaboration is to introduce efficient ad-hoc spatial join query processing in AsterixDB. In this blog post, I will try to summarize my work for future developers and users of AsterixDB. I believe that this post could be very helpful as it is Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com1tag:blogger.com,1999:blog-7698071661525176838.post-87360029687143639952015-11-01T10:31:00.001-08:002017-03-14T12:38:05.358-07:00Speeding up point-in-polygon query using Pigeon and SpatialHadoop
Point-in-polygon Query
A widely used function in GIS and spatial applications is the point-in-polygon query which finds whether a point is inside a polygon or not. Typically, this function is used as a spatial-join predicate to relate a large set of points to a smaller set of polygons, for example, associate a set of geo-tagged tweets to states in the whole world. If the polygons are relatively Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com2tag:blogger.com,1999:blog-7698071661525176838.post-35510070059285829662015-10-29T19:40:00.001-07:002017-03-14T12:38:44.514-07:00Quick tips for GPU programming
I was in the IEEE Big Data conference, and I attended a two-hour tutorial about GPU programming prepared by the folks in AMD. It was really nice and I would like to summarize the key points that I got from the tutorial for current and future GPU programmers.
There are many extensions to high level languages to work with GPUs. I was familiar with CUDA but the tutorial described two other Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com5Santa Clara, CA, USA37.3541079 -121.9552355999999837.253154900000006 -122.11659709999998 37.4550609 -121.79387409999998tag:blogger.com,1999:blog-7698071661525176838.post-67377001937347607662015-06-30T16:23:00.002-07:002017-03-14T12:39:01.151-07:00Setting up Pigeon on Pig and Hadoop
From Pig to Pigeon
Pigeon
Pig is a framework that allows developers to express their MapReduce programs in a nice and easy-to-use high level language, termed Pig Latin. Pigeon builds on top of that by providing a set of user-defined functions (UDFs) that can manipulate spatial data. In this blog post, I'll describe in a easy steps how to install and run Pigeon on an existing Hadoop cluster Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com27tag:blogger.com,1999:blog-7698071661525176838.post-59891335768962557532015-03-27T22:10:00.001-07:002015-03-30T09:32:28.347-07:00Around the world in one hour!
Abstract
This blog post shows you how you can process the whole Planet file produced by OpenStreetMap in only one hour. We use SpatialHadoop, an extension to Hadoop that supports spatial data, along with its high level language, Pigeon, to distribute the work over 50 machines and get it done within one hour instead of a week. The program is only tens of lines of code and can be easily customizedAhmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com2tag:blogger.com,1999:blog-7698071661525176838.post-7866681980991296722015-01-30T16:27:00.001-08:002015-01-30T16:27:21.453-08:00Installing SpatialHadoop on an existing Hadoop cluster
I occasionally get a question about how to install SpatialHadoop on an existing cluster that runs Hadoop. So, decided to write this blog post to describe the different ways to setup SpatialHadoop on an existing cluster.
In this blog post, I'll describe two techniques to install SpatialHadoop on an existing cluster. The first techniques requires an administrator access to Hadoop, not necessarily Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com46tag:blogger.com,1999:blog-7698071661525176838.post-4587874920169819662013-10-31T00:43:00.000-07:002013-10-31T02:53:26.931-07:00The day I changed my default search engine from Google to Bing
We use search engines more than we use anything else on the web. Your selection of a search engine greatly affects you. I'm a Google fan and use most of their services including their very first service, search engine. This was until I decided to move to Bing.
Before you start defending the quality of your favorite search engine, I need to tell you it's not aboassociated with a t quality, it's Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com3tag:blogger.com,1999:blog-7698071661525176838.post-52416210204316654342012-09-02T12:25:00.001-07:002012-09-05T16:09:37.742-07:00Creating a tiled floor pattern
In the house I was staying in while in Bellevue/WA, there was a floor pattern that looks like the image below. It's easy to figure out how to create such a pattern using a combination of small and big tiles. I tried to create that pattern in a graphics program just for fun.
I used Inkscape to create the image below but it turned out to be a tedious job. It's really boring to clone figures and Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com2tag:blogger.com,1999:blog-7698071661525176838.post-82166754858349087292012-07-27T20:29:00.000-07:002012-07-27T20:31:59.530-07:00TouchDevelop: Develop with passion
During the last weekend, I've been playing around with TouchDevelop, a development environment for Windows Phone that runs completely on the phone. It's still in the labs of MSR but you can use it if you have a device that runs Windows Phone.
TouchDevelop is easy to learn and use and it makes coding more fun with its innovative interface. It was my first time to use a Windows Phone in my Ahmed Eldawyhttp://www.blogger.com/profile/12361984608543038057noreply@blogger.com1Redmond, WA, USA47.6739881 -122.12151247.6739881 -122.121512 47.6739881 -122.121512