Access Record fields as Python attributes #132

philippkraft · 2018-01-15T13:00:02Z

I've extended the shapefile module in a way that a record is not just a list but a new Record class. The Record class inherits from list and ensures therefore backwards compatibility. However, the new record can also be accessed differently:

>>>r = shp.record(3)
>>>r[2] == r['BKG_KEY'] == r.BKG_KEY

Usage and test in altered README.rst

see GeospatialPython#107

fields and the filename. DBF-Records are now not returned as a list but as a Record. Since Record is a subclass of list, no break should happen.

…o megies-context_manager # Conflicts: # shapefile.py

karimbahgat · 2018-06-06T18:41:34Z

Thanks for this, and apologies for the late reply (#145). I agree that retrieving record values in real-world scenarios is not currently the most intuitive, so a new Record class sure could be a useful addition. I might be open to this idea, but there are some issues with the PR that I'm not yet sure how I feel about:

I think this could be rewritten with only the bare necessities, least amount of changes/new classes/new args/possible breaks. I would prefer this rewritten as just a Record class, initiated using the row values and the shapefile fields list directly. No namedtuples, no RecordFactory. Just a Record class with a fields list and a value list as args, that should be sufficient to retrieve record values based on field names, the main point of the PR.
Not sure I see the point of storing a record 'oid' (which scenarious would this be useful?), it might add more complexity that it's worth (eg would it break if the sequence of rows are changed, what about writing records, etc). But would be happy to hear arguments why it would be a good idea.
While I agree that accessing the values as attributes is certainly easier than with name indexing, this might prove problematic. Since the Record() class is supposed to subclass 'list' for compatibility, this would mean that when trying to accesss any field names with the same name as a list property or method ('index', 'append', etc), one would get those internal values or methods instead of the record value. This might prove confusing and mask the real problem for some users. But maybe it's an acceptable problem, I would be curious to hear if other libs use this approach and how they handle it (eg I wonder if Pandas uses this type of row value access?).
The PR also duplicated a previous PR and now has a conflict.

I was considering implementing the simpler version of this, but was curious about some of these issues, and also need some more time to think/discuss if there might be some scenarios that would create compatibility problems or if handling the Record class might prove less intuitive than just a simple list.

Looking forward to hearing your thoughts on this, and potentially an updated PR.

philippkraft · 2018-06-06T19:45:00Z

Great to hear that this project is not dead! Currently I am quite busy and can work on this only in slow pace - and I have to recall my design decisions. Hopefully we are not in a hurry.

Work for me:

tidy up the PR (it was one of my first PRs on GitHub ever)
Write more intentions into comments, and recall those intentions

When this is done, I would love to discuss your issues with the PR. Is this ok for you?

# Conflicts: # README.md # shapefile.py

philippkraft · 2018-06-07T07:48:27Z

I start answering from the bottom

The PR also duplicated a previous PR and now has a conflict.

Conflicts are removed and the PR is updated with the recent development

philippkraft · 2018-06-07T08:25:06Z

While I agree that accessing the values as attributes is certainly easier than with name indexing, this might prove problematic

This is a design choice - some people love this approach (like me, obviously) others see more the problems. Pandas and numpy.recarrays use it also for column access and I have no problems with that. Personally I am using a lot the orderedattrdict from the respective package which has this also. For working in interactive mode, this approach is much more natural - especially if it is programmed in a way that code completion works. With this proposal, the fields are not available for code completion as a result of the implementation, but I will try to implement it

philippkraft · 2018-06-07T10:44:46Z

I think this could be rewritten with only the bare necessities, least amount of changes/new classes/new args/possible breaks.

I fully understand the concerns, but I would add memory overhead too, since shapefiles can become quite large. The seperation in Record and RecordFactory is in fact a memory conservation trick. The names of the fields are only stored once in RecordFactory and Record only holds the values and a single pointer to the factory. There is no problem with replacing the factory with the Reader itself, but I wanted to keep the changes to Reader / Writer as small as possible. Another way to do it would be, that the Reader creates a custom Record class (not object) with the field names. That way the field names would be stored in the class and not in the factory (as now) or in the replicated in the objects (as your proposal). This is possible to avoid a single extra class, but with the buy-in of more Python magic. However, the user of the library would not need to know about _RecordFactory.

About possible breaks, if we write the right tests for this functionality, we should be rather safe from extra breaks. And in the end, a record IS still a list - just with more features.

philippkraft · 2018-06-07T11:35:42Z

Not sure I see the point of storing a record 'oid' (which scenarious would this be useful?)

The reason I put it in was to have a simple way to access the shape (shp.shape(r.oid)). Although it looks I never implemented that possibility. The whole shapefile relies on the oid system - shapes and records and the shape index use the position in the shapefile as index. I guess (and I have to guess) that is the reason I implemented it. But that is an implementation detail I do not need for this proposal. I can delete it, but I do not see it making harm.

would it break if the sequence of rows are changed

The sequence of rows never changes in a shapefile - it would not be the same shapefile anymore (would need to rewrite .shp, .dbf and .shx)

what about writing records

Until now, the 2 classes _Record and _RecordFactory do not play a role in the writer, although they could as a return value of the Writer.record method. One could assign the correct value of oid in that function.

README.md

value of list.__repr__)

philippkraft · 2018-06-08T11:52:57Z

Another option is to integrate the functionality of _RowFactory into Reader. If you like that I can implement that easy enough.

# Conflicts: # shapefile.py # shapefiles/test/dtype.dbf # shapefiles/test/dtype.shp # shapefiles/test/dtype.shx # shapefiles/test/line.dbf # shapefiles/test/line.shp # shapefiles/test/line.shx # shapefiles/test/point.dbf # shapefiles/test/point.shp # shapefiles/test/point.shx # shapefiles/test/polygon.dbf # shapefiles/test/polygon.shp # shapefiles/test/polygon.shx

philippkraft · 2018-06-20T09:32:04Z

Simplified the _Record class and removed _RecordFactory. Now shapefile.Reader just sets a reference to each _Record object of the field to position dictionary.

karimbahgat · 2018-08-07T15:51:47Z

Will take a look at merging this in the next few weeks, resolving the remaining conflicts. I haven't looked at your newest changes yet, but my initial thought regarding using the factory to avoid repeated references to the fields, is that setting a fields attribute on the Record class directly actually will not create any copies, but will rather refer to the memory address of the list attached to the parent Reader class. But will look at this in more detail shortly.

@philippkraft

Also accessing a record's "oid" position attribute. Thanks to @philippkraft and #132. Reworked to resolve merge conflicts.

karimbahgat · 2018-09-02T02:36:50Z

The merge errors were caused by some strange diff issue, so had to manually find the lines you added. Had to reimplement with some tweaks (with credits to you) in 0428e67, and is now available in v2.0.0. Again, thanks for this @philippkraft!

megies and others added 11 commits October 15, 2017 13:42

make all classes inherit from object

049c0f0

safer closing of files in Reader.close()

628443d

see GeospatialPython#107

make convenience mapping of shape type names/numbers

581a8d5

implement Reader.__str__() printing a general overview of the shapefile

aa333f8

add gitignore to ignore files generated during running tests

6439fc5

fix setting up SHAPE_TYPES convenience mapping on Py3

26f8937

When the dbf header is read, a RecordFactory is created from the

dc9e193

fields and the filename. DBF-Records are now not returned as a list but as a Record. Since Record is a subclass of list, no break should happen.

Extended the Record by oid, fields property and as_dict method

c363214

Added Record usage to README.md and fixed a minor regression bug.

791b144

Merge branch 'context_manager' of https://github.com/megies/pyshp int…

8a47adc

…o megies-context_manager # Conflicts: # shapefile.py

All tests successfull

a887277

philippkraft mentioned this pull request Jan 15, 2018

Drop the geo_shapereader module philippkraft/cmf#7

Closed

karimbahgat added enhancement question labels Jun 6, 2018

philippkraft added 3 commits June 7, 2018 08:45

Merge branch 'master' of https://github.com/GeospatialPython/pyshp

649fe06

# Conflicts: # README.md # shapefile.py

Added more docstrings to explain the code intention better

0041f48

Marked module private use of _RecordFactory and _Record by renaming

c16d792

philippkraft added 2 commits June 7, 2018 13:44

Added code completion capacity for _Records and changed tab to space in

00dd948

README.md

Removed _Record.__repr__ to fix failing tests (they expect the return

23e50d9

value of list.__repr__)

philippkraft added 2 commits June 20, 2018 09:55

Simplified the _Record class and removed _RecordFactory

cdeb0dd

karimbahgat added a commit that referenced this pull request Aug 30, 2018

Access record fields as attributes (#132)

0428e67

Also accessing a record's "oid" position attribute. Thanks to @philippkraft and #132. Reworked to resolve merge conflicts.

karimbahgat closed this Sep 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Access Record fields as Python attributes #132

Access Record fields as Python attributes #132

Uh oh!

philippkraft commented Jan 15, 2018

Uh oh!

karimbahgat commented Jun 6, 2018

Uh oh!

philippkraft commented Jun 6, 2018 •

edited

Loading

Uh oh!

philippkraft commented Jun 7, 2018

Uh oh!

philippkraft commented Jun 7, 2018 •

edited

Loading

Uh oh!

philippkraft commented Jun 7, 2018

Uh oh!

philippkraft commented Jun 7, 2018 •

edited

Loading

Uh oh!

philippkraft commented Jun 8, 2018

Uh oh!

philippkraft commented Jun 20, 2018

Uh oh!

karimbahgat commented Aug 7, 2018

Uh oh!

karimbahgat commented Sep 2, 2018

Uh oh!

Uh oh!

Uh oh!

Access Record fields as Python attributes #132

Access Record fields as Python attributes #132

Uh oh!

Conversation

philippkraft commented Jan 15, 2018

Uh oh!

karimbahgat commented Jun 6, 2018

Uh oh!

philippkraft commented Jun 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Work for me:

Uh oh!

philippkraft commented Jun 7, 2018

Uh oh!

philippkraft commented Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philippkraft commented Jun 7, 2018

Uh oh!

philippkraft commented Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philippkraft commented Jun 8, 2018

Uh oh!

philippkraft commented Jun 20, 2018

Uh oh!

karimbahgat commented Aug 7, 2018

Uh oh!

karimbahgat commented Sep 2, 2018

Uh oh!

Uh oh!

philippkraft commented Jun 6, 2018 •

edited

Loading

philippkraft commented Jun 7, 2018 •

edited

Loading

philippkraft commented Jun 7, 2018 •

edited

Loading