Tuesday, June 30, 2015

Setting up Pigeon on Pig and Hadoop

From Pig to Pigeon

Pigeon

Pig is a framework that allows developers to express their MapReduce programs in a nice and easy-to-use high level language, termed Pig Latin. Pigeon builds on top of that by providing a set of user-defined functions (UDFs) that can manipulate spatial data. In this blog post, I'll describe in a easy steps how to install and run Pigeon on an existing Hadoop cluster running Pig.

Preliminaries

Before you work with Pigeon, you need to have an installed Hadoop distribution with Pig configured to run with it. You can still use Pigeon on a single machine running Pig in local mode, however, it is supposed to be for testing only.
How to set up Hadoop
How to set up Pig

Install Pigeon

Pigeon ships as a set of JARs that can be easily integrated with your existing Pig Latin scripts. The primary JAR file that contains Pigeon UDFs can be downloaded here.
You will also need third part JAR files that are required by Pigeon. You need the JTS and ESRI Geometry API files. If  you are running Pigeon on SpatialHadoop, you would already have these JAR files. Otherwise, you can download them here and here.
And that's everything. You're ready to go with your first example now.

Example

In this example, we will process a dataset of sports areas around the world, extracted from OpenStreetMap, and compute the minimum bounding rectangle (MBR) around each object. The dataset can be download from this direct link or at the SpatialHadoop datasets page.
  1. Download the file to your local dirve:
    wget https://s3.amazonaws.com/planet-datasets/sports.bz2/sports.bz2
  2. Start Pig in local mode
    Pig -x local
  3. Load the JAR files
    REGISTER pigeon-0.2.1.jar;
    REGISTER jts-1.8.jar;
    REGISTER esri-geometry-api-1.2.1.jar;
  4. Load the dataset
    sports = LOAD 'sports.bz2' AS (id: int, geom, tags: map[chararray]);
  5. Calculate the envelope (MBR)
    sports_mbr = FOREACH sports GENERATE id, edu.umn.cs.pigeon.Envelope(geom) AS geom_mbr, tags;
  6. Store the result back to a file
    STORE sports_mbr INTO 'sports_mbr';

Use Short Function Names

To avoid writing the full function name (package + class name), you can use the 'pigeon_import.pig' file that creates a short name for all functions. You can download the file here and place it next to your Pigeon script. To load this file into your script, issue the following line to the beginning of your script.
IMPORT 'pigeon_import.pig';
After that, you can write the envelope function as ST_Envelope instead of edu.umn.cs.pigeon.Envelope

Compile Pigeon from source

You can also compile Pigeon from source in case you want to get the latest updates. Pigeon uses Maven to manage its source and dependencies.
  1. Download the source code using git.
    git clone https://github.com/aseldawy/pigeon.git
    Or download a ZIP archive with the source code at
    https://github.com/aseldawy/pigeon/archive/master.zip
  2. Compile the source code by issuing the command
    mvn package
  3. The generated JAR file will be located under 'target/' directory.

Further Resources

27 comments:

  1. People oftenly become upset about asking how to start ftp programming with java. This is quite easy you just have to follow simple step.

    ReplyDelete

  2. Thanks for your article. Its very helpful.As a beginner in hadoop ,i got depth knowlege. Thanks for your informative article. Hadoop training in chennai | Hadoop Training institute in chennai

    ReplyDelete
  3. The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
    Digital Marketing Training in Chennai

    Digital Marketing Training in Bangalore

    digital marketing training in tambaram

    digital marketing training in annanagar

    ReplyDelete
  4. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    full stack developer training in annanagar

    full stack developer training in tambaram

    full stack developer training in velachery

    ReplyDelete
  5. I am a regular reader of your blog and being students it is great to read that your responsibilities have not prevented you from continuing your study and other activities. Love
    python training institute in chennai
    python training in Bangalore

    ReplyDelete
  6. What a fantastic read on Big Data Hadoop Tutorial. This has helped me understand a lot in Big Data Hadoop Tutorial. Please keep sharing similar write ups on Big Data Hadoop Tutorial. Guys if you are keen to knw more on Big Data Hadoop Tutorial, must check this wonderful Big Data Hadoop tutorial and i'm sure you will enjoy learning on Big Data Hadoop Tutorial.https://www.youtube.com/watch?v=nuPp-TiEeeQ&

    ReplyDelete
  7. We are a group of volunteers and starting a new initiative in a community. Your blog provided us valuable information to work on.You have done a marvellous job!

    java training in jayanagar | java training in electronic city

    java training in chennai | java training in USA

    ReplyDelete
  8. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post I would like to read this

    angularjs Training in chennai

    angularjs-Training in tambaram

    angularjs-Training in sholinganallur

    angularjs-Training in velachery

    angularjs Training in bangalore

    ReplyDelete
  9. very informative blog and useful article thank you for sharing with us , keep posting learn more Big Data Hadoop Online Training India

    ReplyDelete
  10. You blog post is just completely quality and informative. Many new facts and information which I have not heard about before. Keep sharing more blog posts.
    Microsoft Azure online training
    Selenium online training
    Java online training
    Java Script online training
    Share Point online training

    ReplyDelete
  11. You got an extremely helpful website I actually have been here reading for regarding an hour. I’m an initiate and your success is incredibly a lot of a concept on behalf of me.

    devops online training

    aws online training

    data science with python online training

    data science online training

    rpa online training

    ReplyDelete
  12. Thanks for the latest information. last month i learnt the bigdata course from bigdata training in pallikaranai.
    I got clear knowledge bigdata as well as hadoop.

    ReplyDelete
  13. Very awesome!!! When I seek for this I found this website at the top of all blogs in search engine.
    for more info:
    https://360digitmg.com/course/certification-program-on-big-data-with-hadoop-spark
    https://360digitmg.com/course/machine-learning-using-python-r
    https://360digitmg.com/course/artificial-intelligence-ai-and-deep-learning

    ReplyDelete
  14. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    python training in Hyderabad
    .

    ReplyDelete
  15. Thanks for the post. It was very interesting and meaningful. I really appreciate it! Keep updating stuff like this.
    Institute For Big Data Analytics

    ReplyDelete
  16. Great blog and informative, thanks for sharing.
    Please check once https://clinosol.com/ institute for more details.

    ReplyDelete
  17. very intersting to read your blog and it makes the viewers to visit your blog and keep on updating.
    Tally Training in Bangalore
    Tally Course in Hyderabad

    ReplyDelete