Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

jjonescz
Copy link
Member

@jjonescz jjonescz commented Dec 14, 2023

Resolves #69325.

Based on #57123 by @svick.

  • Uses RuntimeHelpers.CreateSpan on constant data for stackalloc of non-byte arrays.
  • If the destination of the stackalloc is ReadOnlySpan, result of the helper is simply used. Otherwise, the data are copied to stack via ReadOnlySpan.GetPinnableReference and cpblk instruction.

Benchmark (source)

Method Job BuildConfiguration Mean Ratio Code Size
Run_3 Job-WJVUQY Release 2.372 ns 1.00 117 B
Run_3 Job-PMTJCT ReleaseCustomRoslyn 2.307 ns 0.97 114 B
Run_1000 Job-WJVUQY Release 164.043 ns 1.00 10,023 B
Run_1000 Job-PMTJCT ReleaseCustomRoslyn 60.540 ns 0.37 149 B
[Benchmark]
public void Run_3() => Use(stackalloc int[] { 1, 2, 3 });

[Benchmark]
public void Run_1000() => Use(stackalloc int[] {});

[MethodImpl(MethodImplOptions.NoInlining)]
static void Use(Span<int> span) { }

@jjonescz jjonescz added the Code Gen Quality Room for improvement in the quality of the compiler's generated code label Dec 14, 2023
@ghost ghost added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Dec 14, 2023
@jjonescz jjonescz changed the title Use runtime helper CreateSpan for stackalloc of non-byte arrays into ReadOnlySpan Use runtime helper CreateSpan for stackalloc of non-byte arrays Dec 14, 2023
@jjonescz jjonescz force-pushed the 69325-stackalloc-NonByte branch from 9dd1ba3 to 5261199 Compare December 14, 2023 17:41
@jjonescz jjonescz force-pushed the 69325-stackalloc-NonByte branch from 5261199 to b452022 Compare December 14, 2023 17:46
@jjonescz jjonescz marked this pull request as ready for review December 14, 2023 19:51
@jjonescz jjonescz requested a review from a team as a code owner December 14, 2023 19:51
_builder.EmitOpCode(ILOpCode.Call, 0);
EmitSymbolToken(getPinnableReference, syntaxNode, optArgList: null);
_builder.EmitIntConstant(data.Length);
_builder.EmitOpCode(ILOpCode.Cpblk, -3);
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_builder.EmitOpCode(ILOpCode.Cpblk, -3);

ECMA-335 says:

cpblk assumes that both destaddr and srcaddr are aligned to the natural size of the machine

Are we ensuring that this requirement is met? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We emit cpblk in a very similar way as above (under if (elementType.EnumUnderlyingTypeOrSelf().SpecialType.SizeInBytes() == 1)), so I assumed the alignment is fine.

ECMA-335 also says:

All such structures, allocated by the CLI, are naturally aligned for the current platform

and the field has a structure type, so I think it should be aligned properly.

Copy link
Contributor

@AlekseyTs AlekseyTs Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the field has a structure type, so I think it should be aligned properly.

A blob of bytes is not an instance of a structure type. We are not copying data from a regular field. We are copying data from a blob stored in a module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we are emitting a special struct (e.g., __StaticArrayInitTypeSize=12) and a field in <PrivateImplementationDetails> which is of the type of that struct, initialized with a blob of data, and we are using that field - so that field should be naturally aligned as it's of the struct type. Plus CreateSpan can return pointer to a different blob of data with fixed-up endianness, but it should be also naturally aligned as it allocates (and caches) an array in that case.

Copy link
Contributor

@AlekseyTs AlekseyTs Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I think:

  1. We are not working with a regular field. We are working with a field which is mapped into a region of memory inside module's image, so-called an RVA field.
  2. Alignment of that region cannot change at runtime, it is the alignment that was used during emit.
  3. Some operations have certain alignment requirements. For example, we already established that CreateSpanAPI has them.
  4. A comment for PrivateImplementationDetails.CreateDataField explains how compiler enforces the right alignment for an RVA field. Here is a verbatim quote: "a type is generated of the same size as
    the data, and that type needs its .pack set to the alignment required for the underlying data. While that
    .pack value isn't required by anything else in the compiler (the compiler always aligns RVA fields at 8-byte
    boundaries, which accomodates any element type that's relevant), it is necessary for IL rewriters. Such rewriters
    also need to ensure an appropriate alignment is maintained for the RVA field, and while they could also simplify
    by choosing a worst-case alignment as does the compiler, they may instead use the .pack value as the alignment
    to use for that field, since it's an opaque blob with no other indication as to what kind of data is
    stored and what alignment might be required."
  5. It looks like we already addressed the CreateSpan required alignment, which is the natural boundary of the span's element type.
  6. The span produced by CreateSpan points into the same blob when the data are not moved or copied anywhere else. Therefore, still have the original alignment.
  7. By comparison to CreateSpan, cpblk command has a different alignment requirement: "aligned to the natural size of the machine". Does the RVA meet this requirement? The spec doesn't provide any specific number. As of commit 10, we are using different alignment for different element types. It is quite possible that some of the numbers aren't matching "the natural size of the machine". Perhaps we should always use 8-byte alignment. It would be good to get some guidance from the runtime team here.

Copy link
Member Author

@jjonescz jjonescz Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub can you please advise how we ensure we meet cpblk's alignment requirements here? Thanks.

Btw, looks like this code creates an RVA field with alignment 1 and uses it with cpblk, which, if correct, would suggest cpblk doesn't need any special alignment of the field:

else if (elementType.EnumUnderlyingTypeOrSelf().SpecialType.SizeInBytes() == 1)
{
// Initialize the stackalloc by copying the data from a metadata blob
var field = _builder.module.GetFieldForData(data, alignment: 1, inits.Syntax, _diagnostics.DiagnosticBag);
_builder.EmitOpCode(ILOpCode.Dup);
_builder.EmitOpCode(ILOpCode.Ldsflda);
_builder.EmitToken(field, inits.Syntax, _diagnostics.DiagnosticBag);
_builder.EmitIntConstant(data.Length);
_builder.EmitOpCode(ILOpCode.Cpblk, -3);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can prefix the cpblk instruction with unaligned in order to avoid the alignment requirement, which IIRC wants the pointers to be aligned to the machine's natural size. It's possible the cited existing code should be using unaligned as well, or maybe the requirement is less about the machine's natural word size and more about the size of the type in use (or maybe this is a place where ECMA and coreclr differ). @jkotas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would like to be compliant with ECMA, emit the unaligned prefix. It is not required to make things work on current runtimes. cpblk implementations in current runtimes do not make any assumptions about the alignment of the inputs.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 1)

EmitSymbolToken(spanGetItem, syntaxNode, optArgList: null);

_builder.EmitIntConstant(data.Length);
if (sizeInBytes != 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (sizeInBytes != 8)

Looking at the unaligned. section of ECMA-335

shall use unaligned. if the alignment is not known at compile time to be 8-byte.

From that, I infer that, if we align the RVA field type at 8 bytes, we will be compliant without the unaligned prefix. Should we use that option instead, rather than having conditional IL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any downsides to conditional IL? On the other hand, it seems that aligning properly to 1/2/4/8 bytes instead of always to 8 bytes saves space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any downsides to conditional IL?

IL is bigger. Compiler's code is more complex. Test matrix is bigger, more scenarios need special attention.

it seems that aligning properly to 1/2/4/8 bytes instead of always to 8 bytes saves space.

I do not think it actually does. At least not for the code emitted by our compilers because, I think, compiler aligns all blobs at 8 bytes regardless of the requested alignment, this is mentioned in one of the comments in code. The alignment that we explicitly set on the types is important for IL rewriting scenarios.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment that we explicitly set on the types is important for IL rewriting scenarios.

I think this means it improves size for AOT and similar precompiled code (right?) which seems important as well.

Copy link
Contributor

@AlekseyTs AlekseyTs Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means it improves size for AOT and similar precompiled code (right?) which seems important as well.

To be honest, I do not know the answer to these questions, And, even if that was the case, I am not sure if that is something that we should care about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it makes sense to deoptimize the 1-byte case like that.

To make the two cases as similar as possible, you can emit unaligned.1 cpblk in both cases and emit the blob with elementType.EnumUnderlyingTypeOrSelf().SpecialType.SizeInBytes() alignment in both cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you want that to be aligned to 8 bytes as well?

Yes, this is what I would do. I would align both at 8 bytes explicitly (we are aligning blobs at 8 bytes regardless), and would completely get rid of the .unaligned instruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would align both at 8 bytes explicitly (we are aligning blobs at 8 bytes regardless), and would completely get rid of the .unaligned instruction.

It is size de-optimization for trimmer and AOT that are scenarios where we care about the side the most.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is size de-optimization for trimmer and AOT that are scenarios where we care about the side the most.

Perhaps I misinterpreted you first message on this discussion thread.

If the blob size optimization is important for AOT, then, I think we should keep the .unaligned usage on both code paths. We can get there by reverting the last commit. Sorry for the confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I misinterpreted you first message on this discussion thread.

I was commenting on why we set the alignment and why it would not work to omit it on the types. Sorry for the confusion.

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 14)

@jkotas
Copy link
Member

jkotas commented Jan 10, 2024

Is the JIT able to optimize the new pattern well? For example, what is performance of stackalloc int[] { 1, 2, 3 } before/after this change?

@jjonescz
Copy link
Member Author

Is the JIT able to optimize the new pattern well? For example, what is performance of stackalloc int[] { 1, 2, 3 } before/after this change?

Yes. Here's a benchmark: https://github.com/jjonescz/RoslynIssue69325

Method Job BuildConfiguration Mean Ratio Code Size
Run_3 Job-WJVUQY Release 2.372 ns 1.00 117 B
Run_3 Job-PMTJCT ReleaseCustomRoslyn 2.307 ns 0.97 114 B
Run_1000 Job-WJVUQY Release 164.043 ns 1.00 10,023 B
Run_1000 Job-PMTJCT ReleaseCustomRoslyn 60.540 ns 0.37 149 B
[Benchmark]
public void Run_3() => Use(stackalloc int[] { 1, 2, 3 });

[Benchmark]
public void Run_1000() => Use(stackalloc int[] {});

[MethodImpl(MethodImplOptions.NoInlining)]
static void Use(Span<int> span) { }

@jjonescz jjonescz enabled auto-merge (squash) January 11, 2024 16:45
@jjonescz jjonescz merged commit 514b536 into dotnet:main Jan 11, 2024
@jjonescz jjonescz deleted the 69325-stackalloc-NonByte branch January 11, 2024 18:21
@ghost ghost added this to the Next milestone Jan 11, 2024
@Cosifne Cosifne modified the milestones: Next, 17.10 P1 Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using RuntimeHelpers.CreateSpan as part of stackalloc initialization
7 participants