Description
AWS smithy specs have a streaming trait. This is used for example in S3, where the Body
of the PutObjectRequest
should not be just bytes
, but actually a stream that you can iterate over chunk by chunk.
the major problem currently seems to be how to generate the code correctly in the scaffold. should the shape be an io.IO[bytes]
object, or the member declaration in the structure?
checking whether something is streaming
there seem to be three ways to determine whether something is streaming that seem to produce different results:
operation.has_streaming_input
service.shape_for(shape_name).serialization.get("streaming")
for member_name, member_shape in operation.input_shape.members.items(): member_shape.serialization.get("streaming")
you'd think the last two would always return the same, but they don't.
results
i checked all three scenarios.
1. operation level
all operations where operation.has_streaming_input
service | operation |
---|---|
apigateway | ImportApiKeys |
apigateway | ImportDocumentationParts |
apigateway | ImportRestApi |
apigateway | PutRestApi |
apigatewaymanagementapi | PostToConnection |
appconfig | CreateHostedConfigurationVersion |
cloudsearchdomain | UploadDocuments |
codeguruprofiler | PostAgentProfile |
ebs | PutSnapshotBlock |
glacier | UploadArchive |
glacier | UploadMultipartPart |
iot-data | Publish |
iot-data | UpdateThingShadow |
lambda | Invoke |
lambda | InvokeAsync |
lex-runtime | PostContent |
lexv2-runtime | RecognizeUtterance |
lookoutvision | DetectAnomalies |
mediastore-data | PutObject |
mobile | CreateProject |
mobile | UpdateProject |
s3 | PutObject |
s3 | UploadPart |
s3 | WriteGetObjectResponse |
sagemaker-runtime | InvokeEndpoint |
2. service-level shapes
here are all the shapes where service.shape_for(shape_name).serialization.get("streaming") == True
note the difference in services of both the results of 1. and 3.
lakeformation.GetWorkUnitResultsResponse
(that uses ResultStream
) shows up in neither. wat? update: this is included in an output shape! totally missed that, and now the world makes sense again.
service | shape |
---|---|
cloudsearchdomain | Blob |
codeartifact | Asset |
ebs | BlockData |
glacier | Stream |
kinesis-video-archived-media | Payload |
kinesis-video-media | Payload |
lakeformation | ResultStream |
lambda | BlobStream |
lex-runtime | BlobStream |
lexv2-runtime | BlobStream |
lookoutvision | Stream |
medialive | InputDeviceThumbnail |
mediastore-data | PayloadBlob |
polly | AudioStream |
workmailmessageflow | messageContentBlob |
operation-level member shapes
here are all the operations for which a member shape returns true for serialization.get("streaming")
all of those seem to be represented in the first table. meaning it would be safe for the parser to assume that the operation.serialization.get("payload")
is the streaming target.
service | operation | member | type |
---|---|---|---|
cloudsearchdomain | UploadDocuments | documents | Blob |
ebs | PutSnapshotBlock | BlockData | BlockData |
glacier | UploadArchive | body | Stream |
glacier | UploadMultipartPart | body | Stream |
lambda | InvokeAsync | InvokeArgs | BlobStream |
lex-runtime | PostContent | inputStream | BlobStream |
lexv2-runtime | RecognizeUtterance | inputStream | BlobStream |
lookoutvision | DetectAnomalies | Body | Stream |
mediastore-data | PutObject | Body | PayloadBlob |
s3 | PutObject | Body | Body |
s3 | UploadPart | Body | Body |
s3 | WriteGetObjectResponse | Body | Body |
all expect the three s3 operations return True for service.shape_for(member_shape.name).serialization.get("streaming")
(checking the service-level shape definition for streaming=True
)
specific example
the streaming=True
serialization trait seems to be defined in most services in the shape itself.
For example, glacier's UploadArchiveInput defines a member body
with shape Stream
:
"body":{
"shape":"Stream",
"documentation":"<p>The data to upload.</p>"
}
and the shape definition of Stream
has streaming=True
:
"Stream":{
"type":"blob",
"streaming":true
},
therefore, checking member_shape.serialization.get("streaming")
will return True.
S3 seems to be the exception, where streaming=True
is set in the member definition.
for example in the PutObjectRequest
"Body":{
"shape":"Body",
"documentation":"<p>Object data.</p>",
"streaming":true
},
and the shape definition Body
does not contain the streaming=True
trait.
"Body":{"type":"blob"},
yet still member_shape.serialization.get("streaming")
will return True!
some observations
-
Q: can we assume that every blob shape is streamed?
-
A: no. I checked, there are several blob type shapes that are not in any of the above tables. An example is
iam
ReportContentType
. -
Q: are there any blob type shapes with
streaming=False
that are used in both a streaming and non-streaming context? -
yes,
FunctionBlob
of cloudfront for example is streamed in the case ofGetFunctionResponse
, but not forCreateFunctionRequest
. wat. -
Q: are there any shapes or member shapes that have
streaming=True
but are not of typeblob
. -
A: no