Extending remote data module, Google Summer of Code'20
Organization: PEcAn Project
Student: Ayush Prasad
This project aimed to develop a pipeline for ingesting remote sensing data in PEcAn. For this purpose, the data.remote module was extended to establish connections with Google Earth Engine and AppEEARS which now allows submitting data requests to these sources from the PEcAn workflow and stores the output in BETYdb for further analysis.
The functioning of the Remote data module can be divided into two parts,
A Python package with the following modules,
gee2pecan_s2for retrieving Sentinel 2 data from GEE
gee2pecan_l8for retrieving Landsat data from GEE
gee2pecan_smapfor retrieving SMAP soil moisture data from GEE
bands2lai_snapfor computing Leaf Area Index using the SNAP toolbox
appeears2pecanfor downloading data from AppEEARS
get_remote_datafor handling the raw data download process
process_remote_datafor processing raw data
rp_controlmain module for controlling the above modules
Along with the above functions, RpTools also creates GeoJSON files from the BETYdb data and manages the merging of netCDF files of the same type. This package is designed in such a way that if the need arises it can be used independently of the PEcAn workflow.
The main R function inside PEcAn’s data.remote module which controls
RpToolsand adds PEcAn dependencies. The R - Python interfacing is handled using reticulate.
remote_processchecks the status of the requested data in the data base(BETYdb), sets the stages, calls
rp_controland finally inserts the output in BETYdb.
During the first phase, the Sentinel 2 and SNAP code provided by the PEcAn team was modified for use in this module. Then the initial version of
rp_control function(previously named remote_process) was created for managing the download and processing of data.
- https://github.com/PecanProject/pecan/pull/2634 [Merged]
- https://github.com/PecanProject/pecan/pull/2637 [Merged]
gee2pecan_smap were implemented for retrieving Landsat and SMAP data respectively from Google Earth Engine.
rp_control was improved to allow adding new GEE collections without having to make any changes in the source code. Then
appeears2pecan function was implemented to download data from AppEEARS.
- https://github.com/PecanProject/pecan/pull/2642 [Merged]
- https://github.com/PecanProject/pecan/pull/2645 [Merged]
- https://github.com/PecanProject/pecan/pull/2659 [Merged]
Functions were implemented for merging remote sensing data of the same type and for creating GeoJSON files from BETYdb geometry data. All of the Python code developed was packaged into
RpTools package. Finally, the main function
remote_process was implemented in R which added connections to the database(BETYdb) and made it possible to submit requests from the PEcAn workflow.
- https://github.com/PecanProject/pecan/pull/2672 [Approved, open]
Link to all PR’s: https://github.com/PecanProject/pecan/pulls?q=is%3Apr+author%3Aayushprd
- Module for calculating uncertainties - in addition to the raw data download module and process data module a third module can be developed which could calculate the uncertainties in remote sensing data.
- Remote execution - PEcAn can be run on a remote server or HPC, this module can be improved to support such use cases.
- Parallelization - some parts of this module can be modified to run parallelly or concurrently which could reduce the time taken to download data for multiple sites.
I will continue to work on these in the coming weeks.
PEcAn has the tools to run a model anywhere on earth, similarly, with the development of this remote data module it is now possible to evaluate the model everywhere as the module can get remote sensing observations from different sources for any place on earth.
I am deeply thankful to Istem Fer for her guidance throughout the project. Every feedback of yours has helped me in improving my skills tremendously. Thank you for suggesting ideas about developing the code in such a way that it could fit into the larger scheme. I thank Olli Nevalainen and Shawn Serbin for helping me with remote sensing issues and providing ideas to develop this module. Thank you PEcAn Project for this wonderful experience. I hope this project was my first step towards pursuing my interest in computer and environmental sciences.
These are some of the resources from which I benefited hugely during the course of this project.
- https://github.com/ollinevalainen/satellitetools by Olli Nevalainen - in addition to providing some of the code used in this project, the codes in this repository helped me a lot in learning about GEE.
- https://rabernat.github.io/research_computing_2018/xarray.html - for handling multidimensional data in Python using xarray.
- Although not directly related to this project, the materials of this ecosystem modelling course by Lund University helped me in gaining understanding about ecosystem modelling, while also providing some knowledge about the science behind some of the modules in PEcAn.