-
-
Notifications
You must be signed in to change notification settings - Fork 259
Access Record fields as Python attributes #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fields and the filename. DBF-Records are now not returned as a list but as a Record. Since Record is a subclass of list, no break should happen.
…o megies-context_manager # Conflicts: # shapefile.py
Thanks for this, and apologies for the late reply (#145). I agree that retrieving record values in real-world scenarios is not currently the most intuitive, so a new Record class sure could be a useful addition. I might be open to this idea, but there are some issues with the PR that I'm not yet sure how I feel about:
I was considering implementing the simpler version of this, but was curious about some of these issues, and also need some more time to think/discuss if there might be some scenarios that would create compatibility problems or if handling the Record class might prove less intuitive than just a simple list. Looking forward to hearing your thoughts on this, and potentially an updated PR. |
Great to hear that this project is not dead! Currently I am quite busy and can work on this only in slow pace - and I have to recall my design decisions. Hopefully we are not in a hurry. Work for me:
When this is done, I would love to discuss your issues with the PR. Is this ok for you? |
I start answering from the bottom
Conflicts are removed and the PR is updated with the recent development |
This is a design choice - some people love this approach (like me, obviously) others see more the problems. Pandas and numpy.recarrays use it also for column access and I have no problems with that. Personally I am using a lot the orderedattrdict from the respective package which has this also. For working in interactive mode, this approach is much more natural - especially if it is programmed in a way that code completion works. With this proposal, the fields are not available for code completion as a result of the implementation, but I will try to implement it |
I fully understand the concerns, but I would add memory overhead too, since shapefiles can become quite large. The seperation in Record and RecordFactory is in fact a memory conservation trick. The names of the fields are only stored once in RecordFactory and Record only holds the values and a single pointer to the factory. There is no problem with replacing the factory with the Reader itself, but I wanted to keep the changes to Reader / Writer as small as possible. Another way to do it would be, that the Reader creates a custom Record class (not object) with the field names. That way the field names would be stored in the class and not in the factory (as now) or in the replicated in the objects (as your proposal). This is possible to avoid a single extra class, but with the buy-in of more Python magic. However, the user of the library would not need to know about _RecordFactory. About possible breaks, if we write the right tests for this functionality, we should be rather safe from extra breaks. And in the end, a record IS still a list - just with more features. |
The reason I put it in was to have a simple way to access the shape (
The sequence of rows never changes in a shapefile - it would not be the same shapefile anymore (would need to rewrite .shp, .dbf and .shx)
Until now, the 2 classes _Record and _RecordFactory do not play a role in the writer, although they could as a return value of the Writer.record method. One could assign the correct value of oid in that function. |
Another option is to integrate the functionality of _RowFactory into Reader. If you like that I can implement that easy enough. |
# Conflicts: # shapefile.py # shapefiles/test/dtype.dbf # shapefiles/test/dtype.shp # shapefiles/test/dtype.shx # shapefiles/test/line.dbf # shapefiles/test/line.shp # shapefiles/test/line.shx # shapefiles/test/point.dbf # shapefiles/test/point.shp # shapefiles/test/point.shx # shapefiles/test/polygon.dbf # shapefiles/test/polygon.shp # shapefiles/test/polygon.shx
Simplified the _Record class and removed _RecordFactory. Now shapefile.Reader just sets a reference to each _Record object of the field to position dictionary. |
Will take a look at merging this in the next few weeks, resolving the remaining conflicts. I haven't looked at your newest changes yet, but my initial thought regarding using the factory to avoid repeated references to the fields, is that setting a fields attribute on the Record class directly actually will not create any copies, but will rather refer to the memory address of the list attached to the parent Reader class. But will look at this in more detail shortly. |
Also accessing a record's "oid" position attribute. Thanks to @philippkraft and #132. Reworked to resolve merge conflicts.
The merge errors were caused by some strange diff issue, so had to manually find the lines you added. Had to reimplement with some tweaks (with credits to you) in 0428e67, and is now available in v2.0.0. Again, thanks for this @philippkraft! |
I've extended the shapefile module in a way that a record is not just a list but a new Record class. The Record class inherits from list and ensures therefore backwards compatibility. However, the new record can also be accessed differently:
Usage and test in altered README.rst