Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Access Record fields as Python attributes #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from

Conversation

philippkraft
Copy link

I've extended the shapefile module in a way that a record is not just a list but a new Record class. The Record class inherits from list and ensures therefore backwards compatibility. However, the new record can also be accessed differently:

>>>r = shp.record(3)
>>>r[2] == r['BKG_KEY'] == r.BKG_KEY

Usage and test in altered README.rst

@karimbahgat
Copy link
Collaborator

Thanks for this, and apologies for the late reply (#145). I agree that retrieving record values in real-world scenarios is not currently the most intuitive, so a new Record class sure could be a useful addition. I might be open to this idea, but there are some issues with the PR that I'm not yet sure how I feel about:

  • I think this could be rewritten with only the bare necessities, least amount of changes/new classes/new args/possible breaks. I would prefer this rewritten as just a Record class, initiated using the row values and the shapefile fields list directly. No namedtuples, no RecordFactory. Just a Record class with a fields list and a value list as args, that should be sufficient to retrieve record values based on field names, the main point of the PR.
  • Not sure I see the point of storing a record 'oid' (which scenarious would this be useful?), it might add more complexity that it's worth (eg would it break if the sequence of rows are changed, what about writing records, etc). But would be happy to hear arguments why it would be a good idea.
  • While I agree that accessing the values as attributes is certainly easier than with name indexing, this might prove problematic. Since the Record() class is supposed to subclass 'list' for compatibility, this would mean that when trying to accesss any field names with the same name as a list property or method ('index', 'append', etc), one would get those internal values or methods instead of the record value. This might prove confusing and mask the real problem for some users. But maybe it's an acceptable problem, I would be curious to hear if other libs use this approach and how they handle it (eg I wonder if Pandas uses this type of row value access?).
  • The PR also duplicated a previous PR and now has a conflict.

I was considering implementing the simpler version of this, but was curious about some of these issues, and also need some more time to think/discuss if there might be some scenarios that would create compatibility problems or if handling the Record class might prove less intuitive than just a simple list.

Looking forward to hearing your thoughts on this, and potentially an updated PR.

@philippkraft
Copy link
Author

philippkraft commented Jun 6, 2018

Great to hear that this project is not dead! Currently I am quite busy and can work on this only in slow pace - and I have to recall my design decisions. Hopefully we are not in a hurry.

Work for me:

  • tidy up the PR (it was one of my first PRs on GitHub ever)
  • Write more intentions into comments, and recall those intentions

When this is done, I would love to discuss your issues with the PR. Is this ok for you?

@philippkraft
Copy link
Author

I start answering from the bottom

The PR also duplicated a previous PR and now has a conflict.

Conflicts are removed and the PR is updated with the recent development

@philippkraft
Copy link
Author

philippkraft commented Jun 7, 2018

While I agree that accessing the values as attributes is certainly easier than with name indexing, this might prove problematic

This is a design choice - some people love this approach (like me, obviously) others see more the problems. Pandas and numpy.recarrays use it also for column access and I have no problems with that. Personally I am using a lot the orderedattrdict from the respective package which has this also. For working in interactive mode, this approach is much more natural - especially if it is programmed in a way that code completion works. With this proposal, the fields are not available for code completion as a result of the implementation, but I will try to implement it

@philippkraft
Copy link
Author

I think this could be rewritten with only the bare necessities, least amount of changes/new classes/new args/possible breaks.

I fully understand the concerns, but I would add memory overhead too, since shapefiles can become quite large. The seperation in Record and RecordFactory is in fact a memory conservation trick. The names of the fields are only stored once in RecordFactory and Record only holds the values and a single pointer to the factory. There is no problem with replacing the factory with the Reader itself, but I wanted to keep the changes to Reader / Writer as small as possible. Another way to do it would be, that the Reader creates a custom Record class (not object) with the field names. That way the field names would be stored in the class and not in the factory (as now) or in the replicated in the objects (as your proposal). This is possible to avoid a single extra class, but with the buy-in of more Python magic. However, the user of the library would not need to know about _RecordFactory.

About possible breaks, if we write the right tests for this functionality, we should be rather safe from extra breaks. And in the end, a record IS still a list - just with more features.

@philippkraft
Copy link
Author

philippkraft commented Jun 7, 2018

Not sure I see the point of storing a record 'oid' (which scenarious would this be useful?)

The reason I put it in was to have a simple way to access the shape (shp.shape(r.oid)). Although it looks I never implemented that possibility. The whole shapefile relies on the oid system - shapes and records and the shape index use the position in the shapefile as index. I guess (and I have to guess) that is the reason I implemented it. But that is an implementation detail I do not need for this proposal. I can delete it, but I do not see it making harm.

would it break if the sequence of rows are changed

The sequence of rows never changes in a shapefile - it would not be the same shapefile anymore (would need to rewrite .shp, .dbf and .shx)

what about writing records

Until now, the 2 classes _Record and _RecordFactory do not play a role in the writer, although they could as a return value of the Writer.record method. One could assign the correct value of oid in that function.

@philippkraft
Copy link
Author

Another option is to integrate the functionality of _RowFactory into Reader. If you like that I can implement that easy enough.

# Conflicts:
#	shapefile.py
#	shapefiles/test/dtype.dbf
#	shapefiles/test/dtype.shp
#	shapefiles/test/dtype.shx
#	shapefiles/test/line.dbf
#	shapefiles/test/line.shp
#	shapefiles/test/line.shx
#	shapefiles/test/point.dbf
#	shapefiles/test/point.shp
#	shapefiles/test/point.shx
#	shapefiles/test/polygon.dbf
#	shapefiles/test/polygon.shp
#	shapefiles/test/polygon.shx
@philippkraft
Copy link
Author

Simplified the _Record class and removed _RecordFactory. Now shapefile.Reader just sets a reference to each _Record object of the field to position dictionary.

@karimbahgat
Copy link
Collaborator

Will take a look at merging this in the next few weeks, resolving the remaining conflicts. I haven't looked at your newest changes yet, but my initial thought regarding using the factory to avoid repeated references to the fields, is that setting a fields attribute on the Record class directly actually will not create any copies, but will rather refer to the memory address of the list attached to the parent Reader class. But will look at this in more detail shortly.

karimbahgat added a commit that referenced this pull request Aug 30, 2018
Also accessing a record's "oid" position attribute.
Thanks to @philippkraft and #132.
Reworked to resolve merge conflicts.
@karimbahgat
Copy link
Collaborator

The merge errors were caused by some strange diff issue, so had to manually find the lines you added. Had to reimplement with some tweaks (with credits to you) in 0428e67, and is now available in v2.0.0. Again, thanks for this @philippkraft!

@karimbahgat karimbahgat closed this Sep 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants