Accessing remote files with earthaccess¶
When we search for data using earthaccess we get back a list of results from NASA's Common Metadata Repository or CMR for short. These results contain all the information
we need to access the files represented by the metadata. earthaccess offers 2 access methods that operate with these results, the first method is the well known, download()
where we copy the results from their location to our local disk, if we are running the code in AWS say on a Jupyterhub the files will be copied to the local VM disk.
The other method is open(), earthaccess uses fsspec to open remote files as if they were local. open has advantages and some disadvantages that we must know before using it.
The main advantage for open() is that we don't have to download the file, we can stream it into memory however depending on how we do it we may run into network performance issues. Again, if we run the code next to the data this would be fast, if we do it locally in our laptopts it will be slow.
import earthaccess
auth = earthaccess.login()
Authentication with Earthdata Login failed with:
{"error":"invalid_credentials","error_description":"Invalid user credentials"}
NoneType: None
--------------------------------------------------------------------------- LoginAttemptFailure Traceback (most recent call last) Cell In[1], line 3 1 import earthaccess ----> 3 auth = earthaccess.login() File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/api.py:356, in login(strategy, persist, system) 354 for strategy_name in ["environment", "netrc", "interactive"]: 355 try: --> 356 earthaccess.__auth__.login( 357 strategy=strategy_name, 358 persist=persist, 359 system=system, 360 ) 361 except LoginStrategyUnavailable as err: 362 logger.debug(err) File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/auth.py:152, in Auth.login(self, strategy, persist, system) 150 self._netrc() 151 elif strategy == "environment": --> 152 self._environment() 154 return self File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/auth.py:305, in Auth._environment(self) 302 raise LoginStrategyUnavailable(msg) 304 logger.debug("Using environment variables for EDL") --> 305 return self._get_credentials(username, password, token) File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/auth.py:324, in Auth._get_credentials(self, username, password, user_token) 322 msg = f"Authentication with Earthdata Login failed with:\n{token_resp.text}" 323 logger.exception(msg) --> 324 raise LoginAttemptFailure(msg) 326 logger.info("You're now authenticated with NASA Earthdata Login") 328 token = token_resp.json() LoginAttemptFailure: Authentication with Earthdata Login failed with: {"error":"invalid_credentials","error_description":"Invalid user credentials"}
results = earthaccess.search_data(
short_name="ATL06",
cloud_hosted=False,
temporal=("2019-01", "2019-02"),
polygon=[(-100, 40), (-110, 40), (-105, 38), (-100, 40)],
)
results[0]
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/results.py:348: FutureWarning: As of version 1.0, `DataGranule.size` will be accessed as an attribute; e.g. use `DataCollection.size` **not** `DataCollection.size()`
self["size"] = self.size()
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/results.py:375: FutureWarning: As of version 1.0, `DataGranule.size` will be accessed as an attribute; e.g. use `DataCollection.size` **not** `DataCollection.size()`
Size(MB): {self.size()}
/home/docs/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/formatters.py:40: FutureWarning: As of version 1.0, `DataGranule.size` will be accessed as an attribute; e.g. use `DataCollection.size` **not** `DataCollection.size()`
granule_size = round(granule.size(), 2)
nsidc_url = "https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2019.02.21/ATL06_20190221121851_08410203_005_01.h5"
lpcloud_url = "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc"
session = earthaccess.get_requests_https_session()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[3], line 4 1 nsidc_url = "https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2019.02.21/ATL06_20190221121851_08410203_005_01.h5" 2 lpcloud_url = "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc" ----> 4 session = earthaccess.get_requests_https_session() File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/api.py:575, in get_requests_https_session() 556 def get_requests_https_session() -> requests.Session: 557 """Returns a requests Session instance with an authorized bearer token. 558 This is useful for making requests to restricted URLs, such as data granules or services that 559 require authentication with NASA EDL. (...) 573 ``` 574 """ --> 575 return earthaccess.__store__.get_requests_session() AttributeError: 'NoneType' object has no attribute 'get_requests_session'
headers = {"Range": "bytes=0-100"}
r = session.get(lpcloud_url, headers=headers)
r
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 2 1 headers = {"Range": "bytes=0-100"} ----> 2 r = session.get(lpcloud_url, headers=headers) 3 r NameError: name 'session' is not defined
fs = earthaccess.get_fsspec_https_session()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[5], line 1 ----> 1 fs = earthaccess.get_fsspec_https_session() File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/api.py:553, in get_fsspec_https_session() 537 def get_fsspec_https_session() -> AbstractFileSystem: 538 """Returns a fsspec session that can be used to access datafiles across many different DAACs. 539 540 Returns: (...) 551 ``` 552 """ --> 553 return earthaccess.__store__.get_fsspec_session() AttributeError: 'NoneType' object has no attribute 'get_fsspec_session'
with fs.open(lpcloud_url) as f:
data = f.read(100)
data
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[6], line 1 ----> 1 with fs.open(lpcloud_url) as f: 2 data = f.read(100) 3 data NameError: name 'fs' is not defined
%%time
import xarray as xr
files = earthaccess.open(results[0:2])
ds = xr.open_dataset(files[0], group="/gt1r/land_ice_segments")
ds
CPU times: user 464 ms, sys: 41.8 ms, total: 505 ms Wall time: 407 ms
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 get_ipython().run_cell_magic('time', '', '\nimport xarray as xr\n\nfiles = earthaccess.open(results[0:2])\n\nds = xr.open_dataset(files[0], group="/gt1r/land_ice_segments")\nds\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1360/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2565, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2563 with self.builtin_trap: 2564 args = (magic_arg_s, cell) -> 2565 result = fn(*args, **kwargs) 2567 # The code below prevents the output from being displayed 2568 # when using magics with decorator @output_can_be_silenced 2569 # when the last Python token in the expression is a ';'. 2570 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1360/lib/python3.12/site-packages/IPython/core/magics/execution.py:1470, in ExecutionMagics.time(self, line, cell, local_ns) 1468 if interrupt_occured: 1469 if exit_on_interrupt and captured_exception: -> 1470 raise captured_exception 1471 return 1472 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1360/lib/python3.12/site-packages/IPython/core/magics/execution.py:1434, in ExecutionMagics.time(self, line, cell, local_ns) 1432 st = clock2() 1433 try: -> 1434 exec(code, glob, local_ns) 1435 out = None 1436 # multi-line %%time case File <timed exec>:3 File ~/checkouts/readthedocs.org/user_builds/earthaccess/checkouts/1360/earthaccess/api.py:475, in open(granules, provider, credentials_endpoint, show_progress, pqdm_kwargs, open_kwargs) 448 def open( # noqa: A001, PLR0913 449 granules: list[str] | list[DataGranule], 450 provider: str | None = None, (...) 455 open_kwargs: dict[str, Any] | None = None, 456 ) -> list[AbstractFileSystem]: 457 """Returns a list of file-like objects that can be used to access files 458 hosted on S3 or HTTPS by third party libraries like xarray. 459 (...) 473 A list of "file pointers" to remote (i.e. s3 or https) files. 474 """ --> 475 return earthaccess.__store__.open( 476 granules=granules, 477 provider=_normalize_location(provider), 478 credentials_endpoint=credentials_endpoint, 479 show_progress=show_progress, 480 pqdm_kwargs=pqdm_kwargs, 481 open_kwargs=open_kwargs, 482 ) AttributeError: 'NoneType' object has no attribute 'open'

