Create geosoft_gxf_io.py #539

ThomasMGeo · 2024-10-31T17:05:12Z

GXF File Format IO is a common file type for grav/mag for USGS releases. It's an ASCII file type with variable header structures.

On the dataset that I tested this on, it is upside down in the decoding to xarray. This is not impossible to fix, but I do like keeping the option for numpy and metadata output.

I bet this could be optimized further, will try to find more datasets to test with (smaller the better!). Any comments/optimizations are welcome.

Relevant issues/PRs:: This addresses #538 that has some more information about the format itself.

ThomasMGeo · 2024-10-31T17:17:19Z

I found a small file that works from here: https://pubs.usgs.gov/of/2000/ofr-00-0198/html/wyoming.htm

Specifically: mag5001_gxf.gz | 2000-04-24 16:50 | 57K

Potential citation:

Kucks, R.P., and Hill, P.L., 2000, Wyoming aeromagnetic and gravity maps and data—A web site for distribution of data: U.S. Geological Survey Open-File Report 00-0198, https://pubs.usgs.gov/of/2000/ofr-00-0198/html/wyoming.htm

ThomasMGeo · 2024-11-12T17:11:11Z

ToDo: Licensing of data file from USGS.

ThomasMGeo · 2024-11-12T17:11:39Z

ToDo: Add Attribution to gist

ThomasMGeo · 2024-11-12T17:18:44Z

For USGS, looks like we just need to add attribution as well: https://www.usgs.gov/faqs/are-usgs-reportspublications-copyrighted

ThomasMGeo · 2024-11-12T18:02:09Z

@santisoler added some text about the USGS, the attribution to the gist was already there. Let me know if I need to do anything else! I probably won't edit this further until I have some more feedback.

ThomasMGeo · 2024-11-22T14:28:40Z

@santisoler , is there anything else you want to see?

santisoler · 2024-11-22T16:47:54Z

Hi @ThomasMGeo. Thanks for pushing these changes. Let me take a look at this and I'll come back to you.

santisoler

Thanks @ThomasMGeo for opening this PR!

I took a quick look at it and I have the following comments.

I'm not sure what's the benefit by not making the read_gxf to always return an xarray.DataArray. Once we have the DataArray, it's trivial to get the underlying Numpy array and the metadata as a dict. And by doing so, we only need to add one public function to Harmonica (read_gxf), instead of read_gxf and gxf_to_xarray (which has only a single use). In my opinion we should make read_gxf to behave similarly to the other readers in Harmonica and return a DataArray.

BTW, we need to remember to add any new public function to harmonica/__init__.py and to the API reference in doc/api/index.rst.

One little thing to note is that the docstrings should follow numpydoc's style, so they can be nicely rendered in Sphinx.

I left some more comments below.

Let me know what do you think. We can iterate over it later on! And feel free to ask for help if you need so.

santisoler · 2024-11-26T00:16:54Z

harmonica/_io/geosoft_gxf_io.py

+1. Grid eXchange Format (*.gxf)
+
+GXF (Grid eXchange File) is a standard ASCII file format for
+exchanging gridded data among different software systems. 
+Software that supports the GXF standard will be able to import
+properly formatted GXF files and export grids in GXF format.


I suspect that these paragraphs are coming from this PDF: https://pubs.usgs.gov/of/1999/of99-514/grids/gxf.pdf. Am I right?

Do we have permission to reproduce the same text here? Can we double check the license of those files? In doubt, I would remove these lines from this file, a reference should be enough. If we do have permission we can leave them, but I would make sure we are compliant with the license (reference the document, acknowledgement, include license text, etc?).

I am happy to remove it.

harmonica/_io/geosoft_gxf_io.py

santisoler · 2024-11-26T00:41:01Z

harmonica/_io/geosoft_gxf_io.py

+        if reading_data:
+            data_list.append(line)


From what I understood these lines are populating the data_list with the values of the grid. Since the number of lines for the grid are usually going to be way larger than the header lines, why not using np.loadtxt to read them? It performs much faster than appending lines to a list, it already returns an array in the right shape, and it also has some sanity checks (the number of elements per row are consistent) that we don't have to take care of.

I think we could start reading the header lines until we find the #GRID line, keep record of the number of rows for the header and stop the iteration. Then we can use np.loadtxt to read the rest of the file by passing the number of rows to skiprows (+/- 1, we need to figure out the correct indexing).

santisoler · 2024-11-26T00:44:08Z

harmonica/_io/geosoft_gxf_io.py

+    # Get grid dimensions
+    nrows = int(headers['ROWS'])
+    ncols = int(headers['POINTS'])


In lines https://github.com/fatiando/harmonica/pull/539/files#diff-5ac31ec9d8d11b47ff39f59bf25e1cad43484857580dd7b6c65cddd8e0f504cfR219-R229 we populate the metadata dictionary with parsed values (float, int, stripped str). Isn't it better to use metadata to get the number of rows and columns, instead of having to parse them again as ints?

That being said, maybe we want the keys in metadata to be lowercase (as you do in following lines with nx, ny, x_inc, etc.).

santisoler · 2024-11-26T00:47:37Z

harmonica/_io/geosoft_gxf_io.py

+def get_grid_info(metadata: Dict[str, Any]) -> None:
+    """
+    Print comprehensive information about the GXF grid.
+
+    Parameters:
+    metadata (dict): Metadata dictionary from read_gxf
+    """
+    print("=== Grid Information ===")
+    print(f"Title: {metadata.get('TITLE', 'Not specified')}")
+    print(f"Grid size: {metadata['nx']} x {metadata['ny']} points")
+    print(f"Cell size: {metadata['x_inc']} x {metadata['y_inc']}")
+    print(f"Origin: ({metadata['x_min']}, {metadata['y_min']})")
+    print(f"Rotation: {metadata.get('ROTATION', 0)}")
+    print(f"Dummy value: {metadata.get('DUMMY', 'Not specified')}")
+
+    if 'projection' in metadata:
+        print("\n=== Projection Information ===")
+        print(f"Type: {metadata['projection']['type']}")
+        print(f"Units: {metadata['projection']['units']}")
+
+        params = metadata['projection']['parameters']
+        if any(params.values()):
+            print("\nProjection Parameters:")
+            for key, value in params.items():
+                if value is not None:
+                    print(f"{key}: {value}")


I think this function is nice for debugging the reader. But I hardly see users using it (we would need to expose it as well). Instead, if read_gxf returns a DataArray, then we don't need this function: xarray can nicely display all the data.

What do you think? Is there any special reason to include this function?

We can drop this!

Co-authored-by: Santiago Soler <[email protected]>

ThomasMGeo · 2024-11-26T01:02:30Z

Hi @santisoler , thanks for the review, I just want to touch on this few comments before I dig into the rest of the issues.

I'm not sure what's the benefit by not making the read_gxf to always return an xarray.DataArray

I am happy to have it be a very plain DataArray without the CRS (usually the CRS is not given as a code in the files that I have seen). For some of my work, I have had to do some weird warping to get them to match, but this might be rare and isolated.

The get_grid_info was just used to figure out the projections (again, these are not standard CRS's with modern EPSG codes from what I have seen). So my general workflow would be to load it as a numpy array, look at that output, and then load it into xarray with all of the EPSG codes attached.

Let me know what you think is best, and I can tackle it!

santisoler · 2024-11-27T00:23:37Z

Hi @ThomasMGeo. I see your point. Since the gxf files don't have a strict specification for setting the CRS, I think we should just add to the attrs in the xarray.DataArray whatever information we can find in the header of the file. If that information is enough to determine the CRS, that's cool. But if it's not, too bad. I don't think there's nothing the reader function can do to solve that, am I right?

I do like having the information about the projection in the metadata. A nested dict (like the one you already implemented) should be ok (we might want to make sure that that structure follows the CF metadata conventions).

Another thing we were discussing today during the meeting is to use easting and northing for the x and y directions. But use x and y in case the grid is rotated (very much like the read_oasis_grd function does).

ThomasMGeo · 2024-11-27T00:25:07Z

Awesome, that all makes sense. Let me work on it for a bit and get back to you. Thanks for giving it a look and for all the feedback! :)

ThomasMGeo · 2025-07-23T20:25:10Z

ok it's been a long time, but check out some of the changes, happy for any feedback!

santisoler

Hi @ThomasMGeo. Thanks for pinging me back. I'm sorry for the delayed reply. I'll try to provide some feedback soon.

In the meantime, I noticed that the grid values in the GXF files are not laid out with number of rows and columns that match the shape of the grid. They seem to be a flattened array with always 5 columns, and some rows with less than 5 values. That's annoying, and basically rules out my idea of using np.loadtxt: we cannot use it in the general case, only if the number of points per row is multiple of 5. Sorry for bringing that up.

Also, I think we can safely remove the read_gxf_raw function, right? I don't see it being used elsewhere.

I'll leave a comment below with one more thing I saw while quickly looking at the code.

I'll try to come back soon to this PR.

santisoler · 2025-08-29T20:57:16Z

harmonica/_io/geosoft_gxf_io.py

+    with open(infile, 'r') as f:
+        lines = f.readlines()


I'd recommend avoiding this. The readlines loads all lines in the file into memory as strings. In case the file is too large, we'll be using more memory than needed. I'd recommend iterating over each line, one by one. Something like:

with open(infile, "r") as f: for line in f: ...

This should be fixed!

Updating it to be more efficient

…into gxf_reader

ThomasMGeo · 2025-09-02T00:08:38Z

Deleted the unnecessary function and test.

Pulling imports to the top.

Create geosoft_gxf_io.py

43880be

ThomasMGeo added 4 commits October 31, 2024 11:22

Create mag5001_gxf

5b41391

Update geosoft_gxf_io.py

cea9cb3

Update geosoft_gxf_io.py

42c2d0e

Create test_gxf.py

3dcf98d

ThomasMGeo added 2 commits November 12, 2024 11:00

Updated attribution to USGS

d8e3fa0

Merge branch 'main' into gxf_reader

aa0c6ce

Merge branch 'main' into gxf_reader

ca785fd

santisoler reviewed Nov 26, 2024

View reviewed changes

Update harmonica/_io/geosoft_gxf_io.py

a7081c0

Co-authored-by: Santiago Soler <[email protected]>

ThomasMGeo added 4 commits July 23, 2025 12:49

Merge branch 'main' into gxf_reader

7d5d0ad

Update based on feedback

d631113

Update geosoft_gxf_io.py

00841cf

Update geosoft_gxf_io.py

1ee87d6

santisoler reviewed Aug 29, 2025

View reviewed changes

ThomasMGeo added 4 commits September 1, 2025 17:04

Small changes & deleting function

523f362

Merge branch 'main' into gxf_reader

5fc5f76

Update geosoft_gxf_io.py

8c43f02

Updating it to be more efficient

Merge branch 'gxf_reader' of https://github.com/ThomasMGeo/harmonica …

243fbb7

…into gxf_reader

Fixing some code style issues

62aba69

Pulling imports to the top.

Create geosoft_gxf_io.py #539

Are you sure you want to change the base?

Create geosoft_gxf_io.py #539

Uh oh!

Conversation

ThomasMGeo commented Oct 31, 2024

Uh oh!

ThomasMGeo commented Oct 31, 2024

Uh oh!

ThomasMGeo commented Nov 12, 2024

Uh oh!

ThomasMGeo commented Nov 12, 2024

Uh oh!

ThomasMGeo commented Nov 12, 2024

Uh oh!

ThomasMGeo commented Nov 12, 2024

Uh oh!

ThomasMGeo commented Nov 22, 2024

Uh oh!

santisoler commented Nov 22, 2024

Uh oh!

santisoler left a comment

Choose a reason for hiding this comment

Uh oh!

santisoler Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

ThomasMGeo Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

santisoler Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

santisoler Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

santisoler Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

ThomasMGeo Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasMGeo commented Nov 26, 2024

Uh oh!

santisoler commented Nov 27, 2024

Uh oh!

ThomasMGeo commented Nov 27, 2024

Uh oh!

ThomasMGeo commented Jul 23, 2025

Uh oh!

santisoler left a comment

Choose a reason for hiding this comment

Uh oh!

santisoler Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasMGeo Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasMGeo commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

santisoler Nov 26, 2024 •

edited

Loading