-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[LLVM][Triple] Drop unknown object types from normalized triples #135571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
According to the LangRef the longest canonical form for the triple is: `ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT` Seems like object format may also appear at the end of the triple separated by an additional `-` but it looks like object format is part of the `enviornment` as opposed to a seperate identifier. This appears to be the case because various pieces of code that parse the enviornment substring also handle the object format, and often the code only assumes four componenets where the enviornment string may also hold the version number and the object format. Also see: `getEnvironmentName()`. While creating a Triple, in case of an invalid or unknown object format we call the `getDefaultFormat()` function which sets the appropriate format. So, the object format is never really unknown. Since we always set a default format, having `unknown` as a placeholder can cause issues. This is supported by the fact that the string expectation for an `UnknownObjectFormat` is `""`, as seen in `getObjectFormatTypeName()` instead of `"unknown"`. So, to me it makes sense to drop "unknown" for the triple for object format. expectation of `getEnvironmentVersionString()` is that if the enviornment string contains a `"-"` then it has the object format at the end and object format name and type should match, which is not the case if "-unknown" is present in the triple. As a part of this patch I also removed `Triple::CanonicalForm::FIVE_IDENT`. Change-Id: I5c6ef8fef4ff029ab28f4c3afdab573251cf629c
@@ -1265,6 +1266,14 @@ std::string Triple::normalize(StringRef Str, CanonicalForm Form) { | |||
} | |||
} | |||
|
|||
// Environment "unknown-elf" is just "elf". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rest of the changes is fine, but this doesn't look right to me, especially looking at parseEnvironment
, they are not the same. I agree that the object format is treated as part of the environment, but that doesn't mean that unknown-elf
is elf
. Also, I think unknown
environment will yield UnknownEnvironment
, based on parseEnvironment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseEnvironment("elf")
will also yield UnknownEnvironment
though.
Before this patch the following command
clang -### --target=aarch64-pc-linux--elf
or clang -### --target=aarch64-pc-linux--elf
gives an error due to bug in version parsing but the resultant triple is:
Target: aarch64-pc-linux-elf-unknown
which seems wrong.
After this patch we get:
Target: aarch64-pc-linux-elf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is wrong right? We need to treat the root cause not the symptoms. Even if we take format as part of environment, aarch64-pc-linux--elf
should also be a valid one, since -elf
means the environment part is empty, and elf
is the format part. We should not treat aarch64-pc-linux--elf
as aarch64-pc-linux-elf-unknown
. Instead, we should fix it such that it would be treated as aarch64-pc-linux-unknown-elf
instead of aarch64-pc-linux-elf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your explanation makes sense to me but the problem is that goes against current toolchain expectations and will cause clang to not find libraries for some targets, e.g. some baremetal toolchains use triples like aarch64-none-elf
and riscv64-unknown-elf
.
These would normally be canonicalized to aarch64-unknown-none-elf
and riscv-unknown-unknown-elf
. Adding an extra unknown
to account for the missing environment will break things.
Also, if you also look at some of the code in Baremetal.cpp
they check for Triple.getEnvironmentName() == "elf"
, seems like both "elf" and "unknown-elf" should be synonymous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, their code is also part of LLVM code base such that they can be updated as well right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more reviewers to get additional opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context on five-part triples, see f80b49b / 3547633 / feb805f . Five-part triples are not commonly used for anything; it was just implemented that way because it was convenient at the time for the JIT. (So, for example, msvc-elf is a variant of the msvc environment, but using ELF object files.)
Everyone else wants nothing to do with five-part triples. "-elf" is just a generic baremetal ELF environment not tied to any particular operating system or libraries. Similar for "-coff" etc.
Changing the canonical form of "aarch64-pc-linux-elf" to "aarch64-pc-linux-unknown-elf" seems like it's making a lot of work without any real benefit.
[LLVM][Triple] Drop unknown object types from normalized triples
According to the LangRef the longest canonical form for the triple is:
ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
Seems like object format may also appear at the end of the triple separated by
an additional
-
but it looks like object format is part of theenviornment
as opposed to a seperate identifier. This appears to be the case because
various pieces of code that parse the enviornment substring also handle the
object format, and often the code only assumes four componenets where the
enviornment string may also hold the version number and the object format.
Also see:
getEnvironmentName()
.While creating a Triple, in case of an invalid or unknown object format we
call the
getDefaultFormat()
function which sets the appropriate format. So,the object format is never really unknown. Since we always set a default
format, having
unknown
as a placeholder can cause issues. This is supportedby the fact that the string expectation for an
UnknownObjectFormat
is""
,as seen in
getObjectFormatTypeName()
instead of"unknown"
. So, to me itmakes sense to drop "unknown" for the triple for object format.
#122629 introduces some build bot failures. Failures are because the
expectation of
getEnvironmentVersionString()
is that if the enviornmentstring contains a
"-"
then it has the object format at the end and objectformat name and type should match, which is not the case if "-unknown" is
present in the triple.
As a part of this patch I also removed
Triple::CanonicalForm::FIVE_IDENT
.