Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[LLVM][Triple] Drop unknown object types from normalized triples #135571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

UsmanNadeem
Copy link
Contributor

@UsmanNadeem UsmanNadeem commented Apr 13, 2025

[LLVM][Triple] Drop unknown object types from normalized triples

According to the LangRef the longest canonical form for the triple is:
ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT

Seems like object format may also appear at the end of the triple separated by
an additional - but it looks like object format is part of the enviornment
as opposed to a seperate identifier. This appears to be the case because
various pieces of code that parse the enviornment substring also handle the
object format, and often the code only assumes four componenets where the
enviornment string may also hold the version number and the object format.
Also see: getEnvironmentName().

While creating a Triple, in case of an invalid or unknown object format we
call the getDefaultFormat() function which sets the appropriate format. So,
the object format is never really unknown. Since we always set a default
format, having unknown as a placeholder can cause issues. This is supported
by the fact that the string expectation for an UnknownObjectFormat is "",
as seen in getObjectFormatTypeName() instead of "unknown". So, to me it
makes sense to drop "unknown" for the triple for object format.

#122629 introduces some build bot failures. Failures are because the
expectation of getEnvironmentVersionString() is that if the enviornment
string contains a "-" then it has the object format at the end and object
format name and type should match, which is not the case if "-unknown" is
present in the triple.

As a part of this patch I also removed Triple::CanonicalForm::FIVE_IDENT.

@UsmanNadeem UsmanNadeem requested review from shiltian and removed request for shiltian April 13, 2025 22:12
@UsmanNadeem UsmanNadeem marked this pull request as draft April 13, 2025 23:25
According to the LangRef the longest canonical form for the triple is:
    `ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT`

Seems like object format may also appear at the end of the triple separated by
an additional `-` but it looks like object format is part of the `enviornment`
as opposed to a seperate identifier. This appears to be the case because
various pieces of code that parse the enviornment substring also handle the
object format, and often the code only assumes four componenets where the
enviornment string may also hold the version number and the object format.
Also see: `getEnvironmentName()`.

While creating a Triple, in case of an invalid or unknown object format we
call the `getDefaultFormat()` function which sets the appropriate format. So,
the object format is never really unknown. Since we always set a default
format, having `unknown` as a placeholder can cause issues. This is supported
by the fact that the string expectation for an `UnknownObjectFormat` is `""`,
as seen in `getObjectFormatTypeName()` instead of `"unknown"`. So, to me it
makes sense to drop "unknown" for the triple for object format.

expectation of `getEnvironmentVersionString()` is that if the enviornment
string contains a `"-"` then it has the object format at the end and object
format name and type should match, which is not the case if "-unknown" is
present in the triple.

As a part of this patch I also removed `Triple::CanonicalForm::FIVE_IDENT`.

Change-Id: I5c6ef8fef4ff029ab28f4c3afdab573251cf629c
@UsmanNadeem UsmanNadeem marked this pull request as ready for review April 14, 2025 01:20
@UsmanNadeem UsmanNadeem requested a review from shiltian April 14, 2025 01:20
@@ -1265,6 +1266,14 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) {
}
}

// Environment "unknown-elf" is just "elf".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rest of the changes is fine, but this doesn't look right to me, especially looking at parseEnvironment, they are not the same. I agree that the object format is treated as part of the environment, but that doesn't mean that unknown-elf is elf. Also, I think unknown environment will yield UnknownEnvironment, based on parseEnvironment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseEnvironment("elf") will also yield UnknownEnvironment though.

Before this patch the following command
clang -### --target=aarch64-pc-linux--elf or clang -### --target=aarch64-pc-linux--elf gives an error due to bug in version parsing but the resultant triple is:
Target: aarch64-pc-linux-elf-unknown which seems wrong.

After this patch we get:
Target: aarch64-pc-linux-elf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is wrong right? We need to treat the root cause not the symptoms. Even if we take format as part of environment, aarch64-pc-linux--elf should also be a valid one, since -elf means the environment part is empty, and elf is the format part. We should not treat aarch64-pc-linux--elf as aarch64-pc-linux-elf-unknown. Instead, we should fix it such that it would be treated as aarch64-pc-linux-unknown-elf instead of aarch64-pc-linux-elf.

Copy link
Contributor Author

@UsmanNadeem UsmanNadeem Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your explanation makes sense to me but the problem is that goes against current toolchain expectations and will cause clang to not find libraries for some targets, e.g. some baremetal toolchains use triples like aarch64-none-elf and riscv64-unknown-elf.
These would normally be canonicalized to aarch64-unknown-none-elf and riscv-unknown-unknown-elf. Adding an extra unknown to account for the missing environment will break things.

Also, if you also look at some of the code in Baremetal.cpp they check for Triple.getEnvironmentName() == "elf", seems like both "elf" and "unknown-elf" should be synonymous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, their code is also part of LLVM code base such that they can be updated as well right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more reviewers to get additional opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context on five-part triples, see f80b49b / 3547633 / feb805f . Five-part triples are not commonly used for anything; it was just implemented that way because it was convenient at the time for the JIT. (So, for example, msvc-elf is a variant of the msvc environment, but using ELF object files.)

Everyone else wants nothing to do with five-part triples. "-elf" is just a generic baremetal ELF environment not tied to any particular operating system or libraries. Similar for "-coff" etc.

Changing the canonical form of "aarch64-pc-linux-elf" to "aarch64-pc-linux-unknown-elf" seems like it's making a lot of work without any real benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants