refactor/clean up batch-mode workflow

In batch mode machinery, there are some places where pydap should refactor code. Of particular importance is after downloading data and de-serializing.

The following snippet of code is not production ready, even though the result is correct:

https://github.com/pydap/pydap/blob/9dee9aa2a339ed0cc8a8ec73e50b4985048ac7e2/src/pydap/model.py#L1166-L1176



The data is deserialized into numpy arrays (in memory) and held into a dictionary within the dataset object. However, a better way to handle this deserialized data that does not hold the arrays in memory, is the way the Dap4BaseProxy handles it. For example, in the following:


https://github.com/pydap/pydap/blob/9dee9aa2a339ed0cc8a8ec73e50b4985048ac7e2/src/pydap/handlers/dap.py#L876-L886


So that data, once it is read once, it clears the RAM and no longer is held into memory.

The workflow then depends on a secondary function called which retrieves the data from the batch promise, and then assigns it back to the original dataset as an inmemory numpy object. The function snippet is

https://github.com/pydap/pydap/blob/9dee9aa2a339ed0cc8a8ec73e50b4985048ac7e2/src/pydap/lib.py#L449-L452



Rather than having a dictionary retain the data arrays in memory and then fetched them / assign them as in-memory numpy arrays, these need to be assigned to the dataset itself from the initial step of deserializing the dap response.


This is a must before any release. A proper testing should show there is no explosion of RAM memory usage.





	# Collect results
	results_dict = {}
	for var in variables:
	results_dict[var.id] = np.asarray(parsed_dataset[var.id].data[:])
	var._pending_batch_slice = None
	var._is_registered_for_batch = False
	self._batch_registry.discard(var)
	var._batch_promise = None

	# Resolve the promise for all waiting arrays
	batch_promise.set_results(results_dict)

	def decode_variable(buffer, start, stop, variable, endian):
	dtype = variable.dtype
	dtype = dtype.newbyteorder(endian)
	if dtype.kind == "S":
	data = numpy.array(decode_utf8_string_array(buffer)).astype(dtype.kind)
	data = data.reshape(variable.shape)
	return data
	else:
	data = numpy.frombuffer(buffer[start:stop], dtype=dtype)
	data = data.reshape(variable.shape)
	return DapDecodedArray(data)

	for var in Variables:
	var = ds[var]
	data = promise.wait_for_result(var.id)
	ds[var.id].data = np.asarray(data)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor/clean up batch-mode workflow #558

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

refactor/clean up batch-mode workflow #558

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions