Thanks to visit codestin.com
Credit goes to github.com

Skip to content

implement AWS streaming trait for ASF #6527

Closed
@thrau

Description

@thrau

AWS smithy specs have a streaming trait. This is used for example in S3, where the Body of the PutObjectRequest should not be just bytes, but actually a stream that you can iterate over chunk by chunk.

the major problem currently seems to be how to generate the code correctly in the scaffold. should the shape be an io.IO[bytes] object, or the member declaration in the structure?

checking whether something is streaming

there seem to be three ways to determine whether something is streaming that seem to produce different results:

  1. operation.has_streaming_input
  2. service.shape_for(shape_name).serialization.get("streaming")
  3. for member_name, member_shape in operation.input_shape.members.items(): member_shape.serialization.get("streaming")

you'd think the last two would always return the same, but they don't.

results

i checked all three scenarios.

1. operation level

all operations where operation.has_streaming_input

service operation
apigateway ImportApiKeys
apigateway ImportDocumentationParts
apigateway ImportRestApi
apigateway PutRestApi
apigatewaymanagementapi PostToConnection
appconfig CreateHostedConfigurationVersion
cloudsearchdomain UploadDocuments
codeguruprofiler PostAgentProfile
ebs PutSnapshotBlock
glacier UploadArchive
glacier UploadMultipartPart
iot-data Publish
iot-data UpdateThingShadow
lambda Invoke
lambda InvokeAsync
lex-runtime PostContent
lexv2-runtime RecognizeUtterance
lookoutvision DetectAnomalies
mediastore-data PutObject
mobile CreateProject
mobile UpdateProject
s3 PutObject
s3 UploadPart
s3 WriteGetObjectResponse
sagemaker-runtime InvokeEndpoint

2. service-level shapes

here are all the shapes where service.shape_for(shape_name).serialization.get("streaming") == True

note the difference in services of both the results of 1. and 3.
lakeformation.GetWorkUnitResultsResponse (that uses ResultStream) shows up in neither. wat? update: this is included in an output shape! totally missed that, and now the world makes sense again.

service shape
cloudsearchdomain Blob
codeartifact Asset
ebs BlockData
glacier Stream
kinesis-video-archived-media Payload
kinesis-video-media Payload
lakeformation ResultStream
lambda BlobStream
lex-runtime BlobStream
lexv2-runtime BlobStream
lookoutvision Stream
medialive InputDeviceThumbnail
mediastore-data PayloadBlob
polly AudioStream
workmailmessageflow messageContentBlob

operation-level member shapes

here are all the operations for which a member shape returns true for serialization.get("streaming")
all of those seem to be represented in the first table. meaning it would be safe for the parser to assume that the operation.serialization.get("payload") is the streaming target.

service operation member type
cloudsearchdomain UploadDocuments documents Blob
ebs PutSnapshotBlock BlockData BlockData
glacier UploadArchive body Stream
glacier UploadMultipartPart body Stream
lambda InvokeAsync InvokeArgs BlobStream
lex-runtime PostContent inputStream BlobStream
lexv2-runtime RecognizeUtterance inputStream BlobStream
lookoutvision DetectAnomalies Body Stream
mediastore-data PutObject Body PayloadBlob
s3 PutObject Body Body
s3 UploadPart Body Body
s3 WriteGetObjectResponse Body Body

all expect the three s3 operations return True for service.shape_for(member_shape.name).serialization.get("streaming") (checking the service-level shape definition for streaming=True)

specific example

the streaming=True serialization trait seems to be defined in most services in the shape itself.
For example, glacier's UploadArchiveInput defines a member body with shape Stream:

        "body":{
          "shape":"Stream",
          "documentation":"<p>The data to upload.</p>"
        }

and the shape definition of Stream has streaming=True:

    "Stream":{
      "type":"blob",
      "streaming":true
    },

therefore, checking member_shape.serialization.get("streaming") will return True.

S3 seems to be the exception, where streaming=True is set in the member definition.
for example in the PutObjectRequest

        "Body":{
          "shape":"Body",
          "documentation":"<p>Object data.</p>",
          "streaming":true
        },

and the shape definition Body does not contain the streaming=True trait.

    "Body":{"type":"blob"},

yet still member_shape.serialization.get("streaming") will return True!

some observations

  • Q: can we assume that every blob shape is streamed?

  • A: no. I checked, there are several blob type shapes that are not in any of the above tables. An example is iam ReportContentType.

  • Q: are there any blob type shapes with streaming=False that are used in both a streaming and non-streaming context?

  • yes, FunctionBlob of cloudfront for example is streamed in the case of GetFunctionResponse, but not for CreateFunctionRequest. wat.

  • Q: are there any shapes or member shapes that have streaming=True but are not of type blob.

  • A: no

Metadata

Metadata

Assignees

Labels

area: asfstatus: acceptedFeature/enhancement request was acceptedtype: featureNew feature, or improvement to an existing feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions