This video consists of 72 frames, as one per month. These frames are put together in this video. While one can use a single machine to produce these 72 images, it might take up to 60 hours due to the huge size of the input.
In this blog post, we describe how to use HadoopViz, an extensible visualization framework based on SpatialHadoop, to visualize the same dataset in just three hours using a cluster of 10 machines.
Other than single-level images which are typically of low resolution, HadoopViz can also produce multilevel images where users can interactively zoom in and out to explore huge datasets with a lot of details. For example, the image below is a visualization of a 92GB dataset which represents all the objects extracted from OpenStreetMap dataset. You can pan and zoom in this image to view more details about a specific area.
Overview
In a nutshell, HadoopViz uses the parallelization power of MapReduce along with the efficiency of SpatialHadoop to partition the data into smaller parts, visualize each part separately into a smaller image, and then put these partial images together to produce the final image. HadoopViz builds on this idea and provides four key features that make it easy to use and very efficient.
- HadoopViz piggybacks data smoothing with visualization allowing it to smooth the data on-the-fly as the image is generated.
- HadoopViz automatically decides the best way to partition the data allowing it to scale to generate both small and large images efficiently.
- HadoopViz can also visualize multilevel images where users can freely pan and zoom into the image to interactively explore the huge dataset..
- Instead of customizing the algorithm for a specific use case, e.g., satellite data, HadoopViz provides an extensible implementation that can support a wide range of visualization types.
Below, we first describe how to generate the visualizations show above using HadoopViz, which ships with the recent version of SpatialHadoop. Then, we describe some technical details about the smoothing, partitioning, and extensibility features.
How to ...
... generate the temperature video
- You need to download and setup the most recent version of SpatialHadoop which ships with HadoopViz as its visualization package. Check this page for more details about setting up the most recent version of SpatialHadoop on both Hadoop 1.x and Hadoop 2.x.
- Download the temperature dataset you would like to visualize. The temperature dataset we used can be obtained from LP DAAC archive on this link.
You can use this ruby script to download all the data for the six years if you have a good internet connection and enough storage on your machine. Run it using the following command:
ruby hdf_downloader.rb http://e4ftl01.cr.usgs.gov/MOLA/MYD11A1.005/ time:2009.01.01..2014.12.31 - Once you have all the data, you can upload it to your HDFS using 'copyFromLocal' command. Let's assume the data is available at hdfs://user/hadoop/temperature
- To visualize the 72 frames, run the following SpatialHadoop command
shadoop multihdfplot hdfs://user/hadoop/temperature combine:31 dataset:LST_Day_1km hdfs://user/hadoop/frames/ time:2009.01.01..2014.12.31 - The frames will be available in the output path hdfs://user/hadoop/frames. Download them using 'copyToLocal' command.
- Now, upload the frames to YouTube which will put them together into a video similar to the one shown above.
... generate the multilevel image
- Follow step 1 above to download and install SpatialHadoop, if you haven't done already.
- Download the 'All objects' dataset at the following link
http://spatialhadoop.cs.umn.edu/datasets.html#osm2 - Upload the file to HDFS using the 'copyFromLocal' command. Let's assume it is uploaded to hdfs://user/hadoop/objects/
NB: You don't have to decompress the file as SpatialHadoop can decompress it on the fly while visualizing. However, if you upload the compressed file, you need to keep the .bz2 extension to tell SpatialHadoop it is compressed. - To generate a multilevel image with 11 levels similar to the one shown above, type the following command
shadoop gplot hdfs://user/hadoop/objects -pyramid levels:11 hdfs://user/hadoop/multilevel shape:osm - The generated image will be available at hdfs://user/hadoop/multilevel. Download it to your machine using the 'copyToLocal' command.
- To view the image in your browser, open the 'index.html' file available in the output directory.
Smoothing
In visualization, smoothing means the fuse of nearby records according to visualization logic to produce a correct result. For example, satellite datasets typically contain holes which are results of clouds that obstructs the view of the satellites. A smoothing function can recover these holes by estimating the missing values using simple interpolation techniques. The two figures below show an example of how the smoothing function can recover missing points.
HadoopViz support on-the-fly smoothing of the data as the visualization is done. This means that you can easily plug in a different smoothing function and regenerate the image without having to carry out the complex smoothing function as a separate step.
Original data without smoothing |
Data is smoothed using HadoopViz |
Partitioning
HadoopViz supports two ways of partitioning the data which affect the way it merges intermediate partial images. It can use either the default HDFS partitioning or the spatial partitioning that ships with SpatialHadoop.
Default HDFS Partitioning
By default, when you upload a file to HDFS, it is partitioned into equi-sized chunks of 128MB each. Spatial locations of records are not taken into account and nearby records will typically end up in two different partitions. This means that every partition would possibly cover the entire input space and we will end up overlaying intermediate images to produce the final image as shown below.
Overlay intermediate images |
Spatial Partitioning
If we use the spatial partitioning that ships with SpatialHadoop, each partition would only contain data from a small limited space and we will end up stitching intermediate images as shown below.
Stitch intermediate images |
Which partitioning technique is better?
While both techniques will end up producing the same final answer, the performance might be different. HadoopViz needs to automatically decide which one to use. First of all, if the data needs to be smoothed, then HadoopViz has to choose spatial partitioning as it is the only one that groups nearby records together in one partition before they can be fused.
If HadoopViz doesn't need to apply a smoothing function, then both techniques are applicable. According to the image size, There's an overhead between the partitioning and merging steps. The default HDFS partitioning is faster than spatial partitioning, but the overlay process is more time consuming than stitch due to the huge sizes of intermediate images. HadoopViz decides to go for spatial partitioning if the image size is huge as the cost of the overlay process becomes more and more time consuming.
Multilevel images
A multilevel image consists of a pyramid of fixed-size tiles, typically, each of size 256x256 pixels. The figure below shows an example of a three-level image with 1, 4, and 16 tiles in its three levels, aka, a pyramid of three levels.
A naive way to generate a multilevel image is to generate each tile independently using the (single-level) techniques shown above. However, this would require executing the single-level algorithm millions of times. Therefore, HadoopViz provides specialized multilevel visualization algorithms for multi-level images that take into consideration the pyramid structure of multi-level images. Similar to single-level visualization, HadoopViz supports two partitioning techniques, namely, default HDFS partitioning and pyramid partitioning.
A multilevel of three levels |
Default HDFS Partitioning
If we use default HDFS partitioning, each partition might contain records from all over the input space. In this case, each machine plots all these records to all overlapping tiles in all pyramid levels. The generated tiles are considered partial images as multiple partitions might overlap the same tile. Thus, a final merge step will need to overlay all intermediate partial images for the same tile to produce the final image for that tile.
Pyramid Partitioning
The other option for HadoopViz is to first repartition the data so that all records that overlap with one tile go to one partition. Then, these records are visualized to generate the final image for that tile. No merging is needed here as each tile is only generated by one machine.
Which partitioning technique is better?
Again, there is no clear winner here. It all depends on how many tiles are generated. If only a few tiles are generated, then default HDFS partitioning is better as it only needs to merge a few images. However, if a huge number of tiles are generated, pyramid partitioning is better as it avoids altogether the need for merging intermediate tiles.
HadoopViz splits a huge pyramid into two parts, the top and the base of the pyramid. The top of the pyramid contains only a few tiles and is generated by the default HDFS partitioning technique, while the base contains too many tiles and is generated by the pyramid partitioning technique. The tiles are then put together to produce the final image without any extra processing.
Extensibility
While the above techniques can be customized for every visualization type, it would require a huge coding effort to build and maintain all these implementations. Therefore, HadoopViz proposes a visualization abstraction that is used to describe the visualization logic. This abstraction is then plugged into generic implementations of the above algorithms to produce the image efficiently at scale. In short, if you would like to visualize your own data in a new way, all you need to do is write a small class that extends an abstract class, and you're ready to go with both single-level and multilevel visualization techniques.
A new visualization type is defined by extending the base class Plotter. There are mainly five functions that you would like to implement for a new visualization type.
A new visualization type is defined by extending the base class Plotter. There are mainly five functions that you would like to implement for a new visualization type.
<S extends Shape> Iterable<S> smooth(Iterable<S> r)
This function takes a set of nearby records, fuses them together, and returns a new set of records. This function can be used to apply a user-specified smoothing logic.
Canvas createCanvas(int width, int height, Rectangle mbr)
This function initializes an empty canvas with the given size in width and height. It also associates this canvas with the given MBR in input space. Notice that Canvas can be virtually anything. We provide a simple abstract Canvas class as a skeleton.void plot(Canvas layer, Shape shape)
The plot function updates the canvas layer by plotting the given shape on it. Users can define their own visualization logic for one shape based on the format of the shape and the canvas layer.
void merge(Canvas finalLayer, Canvas intermediateLayer)
The merge function merges two intermediate canvases. It updates the finalLayer by merging the intermediateLayer into it.
void writeImage(Canvas layer, DataOutputStream out, boolean vflip)
This writeImage function encodes the canvas layer into a standard image that can be displayed to the end user. The image is written to the given DataOutputStream which typically goes to an output file. If the vflip flag is set to true, the image should be vertically flipped before written to the output. The vflip flag is useful when the y-axis of the input is in a different direction than the final image. For example, in PNG images, the y-axis increases from bottom to top while in geographical coordinates, latitude increases from top to bottom.
Acknowledgement
This work was partially supported by an AWS in Education Grant.
Further References
- SpatialHadoop homepage: http://spatialhadoop.cs.umn.edu/
- Ahmed Eldawy, Mohamed F. Mokbel and Christopher Jonathan "HadoopViz: A MapReduce Framework for Extensible Visualization of Big Spatial Data". In Proceedings of the 32nd IEEE International Conference on Data Engineering, IEEE ICDE 2016, Helsinki, Finland, May 16-20, 2016
- Ahmed Eldawy, Mohamed F. Mokbel and Christopher Jonathan "A Demonstration of HadoopViz: An Extensible MapReduce System for Visualizing Big Spatial Data". In Proceedings of the International Conference on Very Large Databases, VLDB 2015, Kohala Coast, HI, 2015
Very nice and useful information you shared thank you. Know more about Big Data Hadoop Training
ReplyDeleteExcellent and very cool idea and the subject at the top of magnificence and I am happy to this post..Interesting post! Thanks for writing it. What's wrong with this kind of post exactly? It follows your previous guideline for post length as well as clarity..
ReplyDeleteDot Net Training in Chennai
Software Testing Training in Chennai
Helpful article.. All concept explanation are very clear and step by step so easy to understand.. thank you for sharing..
ReplyDeletehadoop training center in chennai
I'm Shalin from Creately online visualizations and collaboration tool. I love how data is visualized here. Good job!
ReplyDeleteHello,
ReplyDeleteHadoop is a framework that allows distributed processing of large data sets across clusters of computers using simple and fault tolerant programming model. It is designed to scale up from a very few to thousands of machines, each machine provides local computation and storage. The Hadoop software library itself is designed to detect and handle failures at the application layer.
Hadoop is written in java by Apache Software Foundation. It process data very reliably and fault-tolerant manner. Learn more About Hadoop Administration Training.
This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.
ReplyDeleteCloud computing Training in Chennai
Hadoop Training in Chennai
Cloud computing Training Chennai
Cloud computing Training centers in Chennai
Hadoop training institutes in chennai
hadoop big data training in chennai
8 Study Tips To Ace Your Biology Test
ReplyDeleteEconomic dissertation help specialists recommend that it's very essential to incorporate the sphere of economics in your economics dissertation answers to reflect a proper understanding of individuals and organisations functioning at a small level in our society.
ReplyDeleteThe students are all the time desperate to cope with their economics dissertation papers on imperative basis and that they purchase custom economics dissertations from reliable dissertation writing firms.
Make your assignment submission more effective and impressive using assignment help services. This online assignment help service allows students to discuss their concerns with qualified Assignment Helper and finish their assignment on time. If you are stuck with your management assignments then you take our management assignment help.
ReplyDeleteFor more info please visit our website: https://fullassignment.com/
or reach out us on whatsapp - (+1) 669-271-4848
Thank you for sharing such an informative article. foot massage increases the circulation in your feet and helps your blood and lymph systems carry away toxins. The very process of a foot massage sends messages to the rest of your body to relax. Massaging the feet targets the rest of the body. Visit Best Foot Massager for Diabetics for more.
ReplyDeleteIt is giving clear content about the topic..loving this article. Java training in Chennai | Certification | Online Course Training | Java training in Bangalore | Certification | Online Course Training | Java training in Hyderabad | Certification | Online Course Training | Java training in Coimbatore | Certification | Online Course Training | Java training in Online | Certification | Online Course Training
ReplyDeleteThis in turn will increase the no of diabetic people to about 8 million, cases of heart attack to about 7 million and around 669,000 increased cases of cancer. hire an essay writer
ReplyDeleteRecent explosion of virtual and hybrid event tech has helped the industry through one of the most challenging years it has ever faced, but it has also thrown event planners into unfamiliar territory and Sourcing virtual event platforms and hybrid technology was not the norm until 2020. email meeting invitation and looking for vendors for an event
ReplyDeleteAcademic-Answers.net is a professional academic writing service dedicated to assist our clients achieve scholarly excellence. Our agency has a reputation of a trustworthy and caring writing service not only among customers but also among academic writers. Get Academic-Answers
ReplyDelete
ReplyDeleteThis is a fantastic updates; I truly love how it is easy on my eyes it is. I am wondering how I might be notified whenever a new post has been made. I have subscribed to your RSS which may do the trick? Have a great moment. Its a great shared; Thanks for posting. Also checkout this site mcu post utme past question pdf
Shop men’s kurta for haldi ceremony Indulge in our new collection of Haldi kurtas for men from the extensive range of colours, designs and fabrics.
ReplyDeletePoker Tips
Export Pricing Strategy
best law schools in the World
Cheap Apartments for Rent
humble magnifying glass
Lose Body Fat
payroll software
green marketing
pre owned cars near me
Monster Hunter World PC Disconnection
ReplyDeleteEmployee Retention
Buying Mangalsutra Online
Software Development Cost Estimation
Master Degree in the USA
Sell Online Courses
Top Universities in the World
PayPal Payments
Credit Freeze Vs Credit Lock
Truth Social App
ReplyDeletewedding + write for us
Ways To Save Money
Pradhan Mantri Rozgar Yojana
Career Objective For Resume
Duplex for Rent Near Me
Objective in Resume
Business Card
become a digital nomad
ReplyDeleteHygienic Food for Kids
Cloaking in SEO
Inbound Marketing
Export Finance
Purchase a Disability Policy
eCommerce Website Development
Softwares for the Data Recovery
HRMS Software
Guru Gobind Singh Indraprastha University
ReplyDeleteVideo Assignment Tips
Marketing Strategies
Loan Against PPF vs Personal Loan
Plan for Launching a Product
Repay a Home Loan Faster
Building a Personal Style
Evergreen Feature Trees
Minimalism Interior Design
travel outfit + write for us
ReplyDeletePlant Protein
Diets for Weight Loss
SEO-Friendly Product Descriptions
Promo Codes for Papa John’s
San Sebastian Spain
Start My Own Clothing Brand
Apply for Student Loan Cancellation
Best Dark Circle Cream
ReplyDeleteautomated subtitle generator
Mobile App Development Trends
2020 Youtube Rewind
Wordle Uk
Above the Law Movie 1988
Plus Size Pear Shape
Student Loans Loan Forgiveness
Getting Mortgage Approved
family + write for us
ReplyDeleteLeadership Skills
Small Budget Business Ideas
Child’s College Admission
Best Rental Properties Near Me
Write An Effective Essay
Shiba Inu Forecast
Hope Everything Is Going Well
Diet Chart for Weight Loss
Current Price of Bitcoins in USD
ReplyDeleteimportant of Education
Wordle Games
key learning objectives
What is Zillow
Moisturizers for Combination Skin
Cerave Renewing Sa Cleanser
Defamation of Character
I appreciate the information you provide on your website; it is both relevant and pertinent taraba state college of nursing admission forms out
ReplyDelete