My Google Summer of Code Experience

Extending remote data module, Google Summer of Code'20

Organization: PEcAn Project

Student: Ayush Prasad

Mentors: Istem Fer, Shawn Serbin, Olli Nevalainen

Background

This project aimed to develop a pipeline for ingesting remote sensing data in PEcAn. For this purpose, the data.remote module was extended to establish connections with Google Earth Engine and AppEEARS which now allows submitting data requests to these sources from the PEcAn workflow and stores the output in BETYdb for further analysis.

Implementation

The functioning of the Remote data module can be divided into two parts,

  1. RpTools,
    A Python package with the following modules,

    • gee2pecan_s2 for retrieving Sentinel 2 data from GEE
    • gee2pecan_l8 for retrieving Landsat data from GEE
    • gee2pecan_smap for retrieving SMAP soil moisture data from GEE
    • bands2lai_snap for computing Leaf Area Index using the SNAP toolbox
    • appeears2pecan for downloading data from AppEEARS
    • get_remote_data for handling the raw data download process
    • process_remote_data for processing raw data
    • rp_control main module for controlling the above modules

    Along with the above functions, RpTools also creates GeoJSON files from the BETYdb data and manages the merging of netCDF files of the same type. This package is designed in such a way that if the need arises it can be used independently of the PEcAn workflow.

  2. remote_process,

    The main R function inside PEcAn’s data.remote module which controls RpTools and adds PEcAn dependencies. The R - Python interfacing is handled using reticulate. remote_process checks the status of the requested data in the data base(BETYdb), sets the stages, calls rp_control and finally inserts the output in BETYdb.

Work done

Phase 1

During the first phase, the Sentinel 2 and SNAP code provided by the PEcAn team was modified for use in this module. Then the initial version of rp_control function(previously named remote_process) was created for managing the download and processing of data.

Pull requests:

Phase 2

Two functions gee2pecan_l8 and gee2pecan_smap were implemented for retrieving Landsat and SMAP data respectively from Google Earth Engine. rp_control was improved to allow adding new GEE collections without having to make any changes in the source code. Then appeears2pecan function was implemented to download data from AppEEARS.

Pull requests:

Phase 3

Functions were implemented for merging remote sensing data of the same type and for creating GeoJSON files from BETYdb geometry data. All of the Python code developed was packaged into RpTools package. Finally, the main function remote_process was implemented in R which added connections to the database(BETYdb) and made it possible to submit requests from the PEcAn workflow.

Pull request:

Link to all PR’s: https://github.com/PecanProject/pecan/pulls?q=is%3Apr+author%3Aayushprd

Future improvements

  • Module for calculating uncertainties - in addition to the raw data download module and process data module a third module can be developed which could calculate the uncertainties in remote sensing data.
  • Remote execution - PEcAn can be run on a remote server or HPC, this module can be improved to support such use cases.
  • Parallelization - some parts of this module can be modified to run parallelly or concurrently which could reduce the time taken to download data for multiple sites.

I will continue to work on these in the coming weeks.

Conclusion

PEcAn has the tools to run a model anywhere on earth, similarly, with the development of this remote data module it is now possible to evaluate the model everywhere as the module can get remote sensing observations from different sources for any place on earth.

Acknowledgements

I am deeply thankful to Istem Fer for her guidance throughout the project. Every feedback of yours has helped me in improving my skills tremendously. Thank you for suggesting ideas about developing the code in such a way that it could fit into the larger scheme. I thank Olli Nevalainen and Shawn Serbin for helping me with remote sensing issues and providing ideas to develop this module. Thank you PEcAn Project for this wonderful experience. I hope this project was my first step towards pursuing my interest in computer and environmental sciences.

Resources

These are some of the resources from which I benefited hugely during the course of this project.