Streaming data from NASA's Earth Surface Minteral Dust Source Investigation (EMIT)¶
This is a proof of concept notebook to demonstrate how earthaccess can facilitate the use of cloud hosted data from NASA using xarray and holoviews. For a formal tutorial on EMIT please visit the official repository where things are explained in detail. EMIT Science Tutorial
Prerequisites
- NASA EDL credentials
- Openscapes Conda environment installed
- For direct access this notebook should run in AWS
IMPORTANT: This notebook should run out of AWS but is not recommended as streaming HDF5 data is slow out of region
from pprint import pprint
import earthaccess
import xarray as xr
print(f"using earthaccess version {earthaccess.__version__}")
auth = earthaccess.login()
using earthaccess version 0.18.1.dev26+g5d8ec2f6f
Authentication with Earthdata Login failed with:
{"error":"invalid_credentials","error_description":"Invalid user credentials"}
NoneType: None
--------------------------------------------------------------------------- LoginAttemptFailure Traceback (most recent call last) Cell In[1], line 8 4 import xarray as xr 6 print(f"using earthaccess version {earthaccess.__version__}") ----> 8 auth = earthaccess.login() File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/api.py:356, in login(strategy, persist, system) 354 for strategy_name in ["environment", "netrc", "interactive"]: 355 try: --> 356 earthaccess.__auth__.login( 357 strategy=strategy_name, 358 persist=persist, 359 system=system, 360 ) 361 except LoginStrategyUnavailable as err: 362 logger.debug(err) File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/auth.py:152, in Auth.login(self, strategy, persist, system) 150 self._netrc() 151 elif strategy == "environment": --> 152 self._environment() 154 return self File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/auth.py:305, in Auth._environment(self) 302 raise LoginStrategyUnavailable(msg) 304 logger.debug("Using environment variables for EDL") --> 305 return self._get_credentials(username, password, token) File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/auth.py:324, in Auth._get_credentials(self, username, password, user_token) 322 msg = f"Authentication with Earthdata Login failed with:\n{token_resp.text}" 323 logger.exception(msg) --> 324 raise LoginAttemptFailure(msg) 326 logger.info("You're now authenticated with NASA Earthdata Login") 328 token = token_resp.json() LoginAttemptFailure: Authentication with Earthdata Login failed with: {"error":"invalid_credentials","error_description":"Invalid user credentials"}
Searching for the dataset with .search_datasets()¶
Note: See our API docs for details
results = earthaccess.search_datasets(short_name="EMITL2ARFL", cloud_hosted=True)
# Let's print our datasets
for dataset in results:
pprint(dataset.summary())
{'cloud-info': {'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': ['s3://lp-prod-protected/EMITL2ARFL.001',
's3://lp-prod-public/EMITL2ARFL.001'],
'S3CredentialsAPIDocumentationURL': 'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentialsREADME',
'S3CredentialsAPIEndpoint': 'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'},
'concept-id': 'C2408750690-LPCLOUD',
'file-type': "[{'FormatType': 'Native', 'AverageFileSize': 1.8, 'Format': "
"'netCDF-4', 'TotalCollectionFileSizeBeginDate': "
"'2022-08-09T00:00:00.000Z', 'FormatDescription': 'Network "
"Common Data Format Version 4', 'AverageFileSizeUnit': 'GB', "
"'Media': ['Earthdata Cloud', 'HTTPS']}]",
'get-data': ['https://search.earthdata.nasa.gov/search/granules?p=C2408750690-LPCLOUD',
'https://appeears.earthdatacloud.nasa.gov/'],
'short-name': 'EMITL2ARFL',
'version': '001'}
/tmp/ipykernel_1569/2802488835.py:5: FutureWarning: As of version 1.0, `DataCollection.summary` will be accessed as an attribute; e.g. use `DataCollection.summary` **not** `DataCollection.summary()` pprint(dataset.summary())
Searching for the data with .search_data() over Ecuador¶
# ~Ecuador = -82.05,-3.17,-76.94,-0.52
granules = earthaccess.search_data(
short_name="EMITL2ARFL",
bounding_box=(-82.05, -3.17, -76.94, -0.52),
count=10,
)
print(len(granules))
10
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/results.py:348: FutureWarning: As of version 1.0, `DataGranule.size` will be accessed as an attribute; e.g. use `DataCollection.size` **not** `DataCollection.size()` self["size"] = self.size()
earthaccess can print a preview of the data using the metadata from CMR¶
Note: there is a bug in earthaccess where the reported size of the granules are always 0, fix is coming next week
granules[7]
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/results.py:375: FutureWarning: As of version 1.0, `DataGranule.size` will be accessed as an attribute; e.g. use `DataCollection.size` **not** `DataCollection.size()`
Size(MB): {self.size()}
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/formatters.py:40: FutureWarning: As of version 1.0, `DataGranule.size` will be accessed as an attribute; e.g. use `DataCollection.size` **not** `DataCollection.size()`
granule_size = round(granule.size(), 2)
Data: EMIT_L2A_RFL_001_20230304T151234_2306310_003.ncEMIT_L2A_RFLUNCERT_001_20230304T151234_2306310_003.ncEMIT_L2A_MASK_001_20230304T151234_2306310_003.nc
Size: 3578.78 MB
Cloud Hosted: True
Streaming data from S3 with fsspec¶
Opening the data with earthaccess.open() and accessing the NetCDF as if it was local
If we run this code in AWS(us-west-2), earthaccess can use direct S3 links. If we run it out of AWS, earthaccess can only use HTTPS links. Direct S3 access for NASA data is only allowed in region.
# open() accepts a list of results or a list of links
file_handlers = earthaccess.open(granules)
file_handlers
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[5], line 2 1 # open() accepts a list of results or a list of links ----> 2 file_handlers = earthaccess.open(granules) 3 file_handlers File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/api.py:475, in open(granules, provider, credentials_endpoint, show_progress, pqdm_kwargs, open_kwargs) 448 def open( # noqa: A001, PLR0913 449 granules: list[str] | list[DataGranule], 450 provider: str | None = None, (...) 455 open_kwargs: dict[str, Any] | None = None, 456 ) -> list[AbstractFileSystem]: 457 """Returns a list of file-like objects that can be used to access files 458 hosted on S3 or HTTPS by third party libraries like xarray. 459 (...) 473 A list of "file pointers" to remote (i.e. s3 or https) files. 474 """ --> 475 return earthaccess.__store__.open( 476 granules=granules, 477 provider=_normalize_location(provider), 478 credentials_endpoint=credentials_endpoint, 479 show_progress=show_progress, 480 pqdm_kwargs=pqdm_kwargs, 481 open_kwargs=open_kwargs, 482 ) AttributeError: 'NoneType' object has no attribute 'open'
%%time
# we can use any file from the array
file_p = file_handlers[4]
refl = xr.open_dataset(file_p)
wvl = xr.open_dataset(file_p, group="sensor_band_parameters")
loc = xr.open_dataset(file_p, group="location")
ds = xr.merge([refl, loc])
ds = ds.assign_coords(
{
"downtrack": (["downtrack"], refl.downtrack.data),
"crosstrack": (["crosstrack"], refl.crosstrack.data),
**wvl.variables,
},
)
ds
CPU times: user 9 μs, sys: 1 μs, total: 10 μs Wall time: 13.6 μs
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[6], line 1 ----> 1 get_ipython().run_cell_magic('time', '', '\n# we can use any file from the array\nfile_p = file_handlers[4]\n\nrefl = xr.open_dataset(file_p)\nwvl = xr.open_dataset(file_p, group="sensor_band_parameters")\nloc = xr.open_dataset(file_p, group="location")\nds = xr.merge([refl, loc])\nds = ds.assign_coords(\n {\n "downtrack": (["downtrack"], refl.downtrack.data),\n "crosstrack": (["crosstrack"], refl.crosstrack.data),\n **wvl.variables,\n },\n)\n\nds\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1360/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2563 with self.builtin_trap: 2564 args = (magic_arg_s, cell) -> 2565 result = fn(*args, **kwargs) 2567 # The code below prevents the output from being displayed 2568 # when using magics with decorator @output_can_be_silenced 2569 # when the last Python token in the expression is a ';'. 2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1360/lib/python3.12/site-packages/IPython/core/magics/execution.py:1470, in ExecutionMagics.time(self, line, cell, local_ns) 1468 if interrupt_occured: 1469 if exit_on_interrupt and captured_exception: -> 1470 raise captured_exception 1471 return 1472 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1360/lib/python3.12/site-packages/IPython/core/magics/execution.py:1434, in ExecutionMagics.time(self, line, cell, local_ns) 1432 st = clock2() 1433 try: -> 1434 exec(code, glob, local_ns) 1435 out = None 1436 # multi-line %%time case File <timed exec>:2 NameError: name 'file_handlers' is not defined
Plotting non orthorectified data¶
Use the following code to plot the Panel widget when you run this code on AWS us-west-2
import holoviews as hv
import hvplot.xarray
import numpy as np
import panel as pn
pn.extension()
# Find band nearest to value of 850 nm (NIR)
b850 = np.nanargmin(abs(ds["wavelengths"].values - 850))
ref_unc = ds["reflectance_uncertainty"]
image = ref_unc.sel(bands=b850).hvplot("crosstrack", "downtrack", cmap="viridis")
stream = hv.streams.Tap(source=image, x=255, y=484)
def wavelengths_histogram(x, y):
histo = ref_unc.sel(crosstrack=x, downtrack=y, method="nearest").hvplot(
x="wavelengths", color="green"
)
return histo
tap_dmap = hv.DynamicMap(wavelengths_histogram, streams=[stream])
pn.Column(image, tap_dmap)
