- Flexible: You should be able to easily control the characteristics of the data, e.g., size or skewness.
- Reproducible: It should be relatively easy to reproduce this dataset to allow others to repeat the experiments.
- Efficient: To be able to generate large-scale data without a problem.
All these characteristics are available in, spider, the award-winning open-source spatial data generator. Spider has currently three implementations, in Python, Ruby, and Scala on Spark. Spider was published in SpatialGems 2019 [1] and won the best paper award and was demonstrated in SIGSAPTIAL 2020 [2]. It is also publicly available on [https://spider.cs.ucr.edu]. The video below gives an overview of SpiderWeb. This article gives an overview on how the Scala implementation on Spark works.