-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Use runtime helper CreateSpan
for stackalloc
of non-byte arrays
#71261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CreateSpan
for stackalloc
of non-byte arrays into ReadOnlySpan
CreateSpan
for stackalloc
of non-byte arrays
9dd1ba3
to
5261199
Compare
5261199
to
b452022
Compare
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_Conversion.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_Conversion.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_Conversion.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Test/Emit/CodeGen/CodeGenStackAllocInitializerTests.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Test/Emit/CodeGen/CodeGenStackAllocInitializerTests.cs
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
_builder.EmitOpCode(ILOpCode.Call, 0); | ||
EmitSymbolToken(getPinnableReference, syntaxNode, optArgList: null); | ||
_builder.EmitIntConstant(data.Length); | ||
_builder.EmitOpCode(ILOpCode.Cpblk, -3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We emit cpblk
in a very similar way as above (under if (elementType.EnumUnderlyingTypeOrSelf().SpecialType.SizeInBytes() == 1)
), so I assumed the alignment is fine.
ECMA-335 also says:
All such structures, allocated by the CLI, are naturally aligned for the current platform
and the field has a structure type, so I think it should be aligned properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the field has a structure type, so I think it should be aligned properly.
A blob of bytes is not an instance of a structure type. We are not copying data from a regular field. We are copying data from a blob stored in a module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we are emitting a special struct (e.g., __StaticArrayInitTypeSize=12
) and a field in <PrivateImplementationDetails>
which is of the type of that struct, initialized with a blob of data, and we are using that field - so that field should be naturally aligned as it's of the struct type. Plus CreateSpan
can return pointer to a different blob of data with fixed-up endianness, but it should be also naturally aligned as it allocates (and caches) an array in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what I think:
- We are not working with a regular field. We are working with a field which is mapped into a region of memory inside module's image, so-called an RVA field.
- Alignment of that region cannot change at runtime, it is the alignment that was used during emit.
- Some operations have certain alignment requirements. For example, we already established that
CreateSpan
API has them. - A comment for
PrivateImplementationDetails.CreateDataField
explains how compiler enforces the right alignment for an RVA field. Here is a verbatim quote: "a type is generated of the same size as
the data, and that type needs its .pack set to the alignment required for the underlying data. While that
.pack value isn't required by anything else in the compiler (the compiler always aligns RVA fields at 8-byte
boundaries, which accomodates any element type that's relevant), it is necessary for IL rewriters. Such rewriters
also need to ensure an appropriate alignment is maintained for the RVA field, and while they could also simplify
by choosing a worst-case alignment as does the compiler, they may instead use the .pack value as the alignment
to use for that field, since it's an opaque blob with no other indication as to what kind of data is
stored and what alignment might be required." - It looks like we already addressed the
CreateSpan
required alignment, which is the natural boundary of the span's element type. - The span produced by
CreateSpan
points into the same blob when the data are not moved or copied anywhere else. Therefore, still have the original alignment. - By comparison to
CreateSpan
,cpblk
command has a different alignment requirement: "aligned to the natural size of the machine". Does the RVA meet this requirement? The spec doesn't provide any specific number. As of commit 10, we are using different alignment for different element types. It is quite possible that some of the numbers aren't matching "the natural size of the machine". Perhaps we should always use 8-byte alignment. It would be good to get some guidance from the runtime team here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub can you please advise how we ensure we meet cpblk
's alignment requirements here? Thanks.
Btw, looks like this code creates an RVA field with alignment 1 and uses it with cpblk
, which, if correct, would suggest cpblk
doesn't need any special alignment of the field:
else if (elementType.EnumUnderlyingTypeOrSelf().SpecialType.SizeInBytes() == 1) | |
{ | |
// Initialize the stackalloc by copying the data from a metadata blob | |
var field = _builder.module.GetFieldForData(data, alignment: 1, inits.Syntax, _diagnostics.DiagnosticBag); | |
_builder.EmitOpCode(ILOpCode.Dup); | |
_builder.EmitOpCode(ILOpCode.Ldsflda); | |
_builder.EmitToken(field, inits.Syntax, _diagnostics.DiagnosticBag); | |
_builder.EmitIntConstant(data.Length); | |
_builder.EmitOpCode(ILOpCode.Cpblk, -3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can prefix the cpblk instruction with unaligned in order to avoid the alignment requirement, which IIRC wants the pointers to be aligned to the machine's natural size. It's possible the cited existing code should be using unaligned as well, or maybe the requirement is less about the machine's natural word size and more about the size of the type in use (or maybe this is a place where ECMA and coreclr differ). @jkotas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would like to be compliant with ECMA, emit the unaligned
prefix. It is not required to make things work on current runtimes. cpblk
implementations in current runtimes do not make any assumptions about the alignment of the inputs.
src/Compilers/CSharp/Portable/CodeGen/EmitStackAllocInitializer.cs
Outdated
Show resolved
Hide resolved
Done with review pass (commit 1) |
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_Conversion.cs
Outdated
Show resolved
Hide resolved
src/Compilers/CSharp/Test/Emit/CodeGen/CodeGenStackAllocInitializerTests.cs
Show resolved
Hide resolved
src/Compilers/CSharp/Test/Emit/CodeGen/CodeGenStackAllocInitializerTests.cs
Outdated
Show resolved
Hide resolved
EmitSymbolToken(spanGetItem, syntaxNode, optArgList: null); | ||
|
||
_builder.EmitIntConstant(data.Length); | ||
if (sizeInBytes != 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the unaligned.
section of ECMA-335
shall use unaligned. if the alignment is not known at compile time to be 8-byte.
From that, I infer that, if we align the RVA field type at 8 bytes, we will be compliant without the unaligned
prefix. Should we use that option instead, rather than having conditional IL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any downsides to conditional IL? On the other hand, it seems that aligning properly to 1/2/4/8 bytes instead of always to 8 bytes saves space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any downsides to conditional IL?
IL is bigger. Compiler's code is more complex. Test matrix is bigger, more scenarios need special attention.
it seems that aligning properly to 1/2/4/8 bytes instead of always to 8 bytes saves space.
I do not think it actually does. At least not for the code emitted by our compilers because, I think, compiler aligns all blobs at 8 bytes regardless of the requested alignment, this is mentioned in one of the comments in code. The alignment that we explicitly set on the types is important for IL rewriting scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment that we explicitly set on the types is important for IL rewriting scenarios.
I think this means it improves size for AOT and similar precompiled code (right?) which seems important as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means it improves size for AOT and similar precompiled code (right?) which seems important as well.
To be honest, I do not know the answer to these questions, And, even if that was the case, I am not sure if that is something that we should care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it makes sense to deoptimize the 1-byte case like that.
To make the two cases as similar as possible, you can emit unaligned.1 cpblk
in both cases and emit the blob with elementType.EnumUnderlyingTypeOrSelf().SpecialType.SizeInBytes()
alignment in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you want that to be aligned to 8 bytes as well?
Yes, this is what I would do. I would align both at 8 bytes explicitly (we are aligning blobs at 8 bytes regardless), and would completely get rid of the .unaligned
instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would align both at 8 bytes explicitly (we are aligning blobs at 8 bytes regardless), and would completely get rid of the .unaligned instruction.
It is size de-optimization for trimmer and AOT that are scenarios where we care about the side the most.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is size de-optimization for trimmer and AOT that are scenarios where we care about the side the most.
Perhaps I misinterpreted you first message on this discussion thread.
If the blob size optimization is important for AOT, then, I think we should keep the .unaligned
usage on both code paths. We can get there by reverting the last commit. Sorry for the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I misinterpreted you first message on this discussion thread.
I was commenting on why we set the alignment and why it would not work to omit it on the types. Sorry for the confusion.
src/Compilers/CSharp/Test/Emit/CodeGen/CodeGenStackAllocInitializerTests.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (commit 14)
Is the JIT able to optimize the new pattern well? For example, what is performance of |
Yes. Here's a benchmark: https://github.com/jjonescz/RoslynIssue69325
[Benchmark]
public void Run_3() => Use(stackalloc int[] { 1, 2, 3 });
[Benchmark]
public void Run_1000() => Use(stackalloc int[] {});
[MethodImpl(MethodImplOptions.NoInlining)]
static void Use(Span<int> span) { } |
Resolves #69325.
Based on #57123 by @svick.
RuntimeHelpers.CreateSpan
on constant data forstackalloc
of non-byte
arrays.stackalloc
isReadOnlySpan
, result of the helper is simply used. Otherwise, the data are copied to stack viaReadOnlySpan.GetPinnableReference
andcpblk
instruction.Benchmark (source)