Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ipaddress: Shared Address Space (100.64.0.0/10) is neither private nor global #119812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jstasiak opened this issue May 30, 2024 · 10 comments
Open
Labels
stdlib Python modules in the Lib dir

Comments

@jstasiak
Copy link
Contributor

jstasiak commented May 30, 2024

Something I discovered when working on GH-65056 and GH-113171.

IPv4Address' and IPv4Network's is_global and is_private are false at the same time for addresses in this range:

>>> ipaddress.IPv4Address('100.64.0.0').is_global
False
>>> ipaddress.IPv4Address('100.64.0.0').is_private
False
>>> ipaddress.IPv4Network('100.64.0.0/10').is_global
False
>>> ipaddress.IPv4Network('100.64.0.0/10').is_private
False

I don't believe this is right and I'll explain why.

Historical context

Initial additions

is_private introduced in dc9b255 ("Issue #14814: addition of the ipaddress module (stage 1 - code and tests)") on 2012-05-20.

It only handled the actual private-use ranges for IPv4 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).

Documented as such.

100.64.0.0/10 handling and is_private semantics changes

There have been four ipaddress patches involving this range:

  1. 22c3176 ("bpo-26730: Fix SpooledTemporaryFile data corruption #17400; ipaddress should make it easy to identify rfc6598 addresses")
  2. be9c1b1 ("bpo-26730: Fix SpooledTemporaryFile data corruption #17400: fix documentation, add cache to is_global and correctly handle 100.64.0.0/10")
  3. e5019d5 ("bpo-26730: Fix SpooledTemporaryFile data corruption #17400: correct handling of 100.64.0.0/10, fixing the docs and updating NEWS")
  4. 742192a ("Issue bpo-41205: Document Decimal power 0 to the 0 #21386: Implement missing IPv4Address.is_global property")

The first three were part of GH-61602 (https://bugs.python.org/issue17400), the fourth one just adds a missing IPv4Address.is_global property to match the IPv4Network.is_global semantics and follows the semantics established in commits 1-3.

Commit 1 changed the semantics of is_private from "Is this a Private-use IPv4 address or a Unique-Local IPv6 address?" to roughly "Do [1] and [2] say globally reachable = false for this address". The documentation change was not quite precise

         A boolean, True if the address is reserved per
        iana-ipv4-special-registry or iana-ipv6-special-registry.

but the intent of the implementation is quite clear.

Commit 1 also added is_global that was effectively not is_private.

In commit 3 an exception for 100.64.0.0/10 is made and both is_global and is_private are false for that range.

is_global/is_private semantics clarification

As part of GH-65056 we changed the documentation to make it clear that we follow the "globally reachable" information from IANA (with some caveats, like special handling of IPv4-mapped IPv6 addresses, see 83f0f8d ("bpo-33433 Fix private address checking for IPv4 mapped IPv6. (GH-26172)")).

The problem

The motivation for handling 100.64.0.0/10 like this can be found here:

#61602 (comment)

The rationale for RFC 6598 is precisely that 100.64.0.0/10 is not private in the common sense, so it would deserve a different treatment in the ipaddress module as well.

I have to admit I don't find it convincing enough to make an exception for it.

[1] says the range is not globally reachable so in my opinion is_private should return true for it as it does for the rest of the not globally reachable address blocks (again, with the exception of IPv4-mapped IPv6 address handling).

I'd find is_private being false for the range surprising if I wasn't clearly aware of the semantics after reading this code multiple times. I believe it may lead to real-world issues.

Additionally the behavior where is_private is not the opposite of is_global is likely to be surprising too.

In short: IMO there should be no exception.

The downside of the proposed solution: this is technically breaking backwards compatibility if some code depends on the current semantics. I'm not sure I'd classify the proposed change strictly as a bug fix.

[1] https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml
[2] https://www.iana.org/assignments/iana-ipv6-special-registry/iana-ipv6-special-registry.xhtml

Linked PRs

@encukou
Copy link
Member

encukou commented May 31, 2024

It seems to me that if is_private is always supposed to be exactly the opposite of is_global, then one of them is redundant and should be deprecated.

@jstasiak
Copy link
Contributor Author

Yes, agreed. If the Shared Address Space exception is removed I'll be happy to push that agenda.

ccing @pitrou @ncoghlan – you participated in the GH-61602 discussion, wondering what your thoughts are.

@encukou
Copy link
Member

encukou commented Jun 3, 2024

If the Shared Address Space exception is removed I'll be happy to push that agenda.

If we decide to deprecate something, it should keep its existing warts.
Both a breaking change and a deprecation mean that we're asking users to adapt their code. We want them to only do that once.

@jstasiak
Copy link
Contributor Author

jstasiak commented Jun 3, 2024

Are you saying we should should deprecate and eventually remove is_private and the removal will effectively get rid of the problem of the 100.64.0.0/10 exception?

IMO such deprecation and removal would have to be connected to a claim about the Shared Address Space exception being needless or unwarranted.

Please help me understand what should happen here (a hypothetical sequence or a timeline of events could be useful).

@encukou
Copy link
Member

encukou commented Jun 4, 2024

This needs an expert, or someone looking at the usage and use cases and related context, to determine if this is a problem that needs fixing. Right now I can't commit the necessary time to do this.

If it should be fixed, with a deprecation we can avoid changing existing behaviour and make the (non-deprecated) API smaller/simpler.

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 4, 2024

Re-reading the old issue, I'm still of the view that since there are 3 categories of interest (global, private, private-to-carrier), there should be three flags rather than having the third category be soley implied by the first two flags both being false. The fact that one of the categories is much smaller than the other two shouldn't be a major consideration.

Peter didn't want a public is_carrier_private flag for a single network range though, and he was the lead module maintainer at the time. (I also genuinely didn't have a strong preference on the topic, so I have no idea if he might have relented given further consideration of the documentation challenges inherent in not explicitly naming the third category)

I still think it would be wrong to make is_private True for these ranges though (for the reasons given in the original issue). It's genuinely the case that is_private and not is_global don't mean exactly the same thing (is_private excludes private-to-carrier addresses, while not is_global includes them).

If the documentation of is_private has changed such that it suggests that doing so would be correct, then it's the documentation that has diverged from the code's intentional behaviour, not the code.

@ncoghlan
Copy link
Contributor

ncoghlan commented Sep 7, 2024

The is_private/is_global CVE fix going out brought this to my attention again.

Considering options for new properties, I wonder if the following might be the best option:

  1. Document that the is_global and is_private flags are short for is_global_for_all and is_private_for_all, so it's possible for them both to be False when the same address is considered private for some use cases, but global for others.
  2. Add is_global_for_endpoints and is_private_for_endpoints
  3. Add is_global_for_carriers and is_private_for_carriers

There's no way for the standard library to know how the carrier shared address space should be handled in a given application unless it's told whether it's an endpoint use case or a carrier use case.

If we went down this road, the four new property implementations would be:

@property
def is_global_for_endpoints(self):
    return not self.is_private  # Carrier shared addresses are global for endpoints

@property
def is_private_for_endpoints(self):
    return self.is_private  # Addresses are only private for endpoints if they're private for everyone

@property
def is_global_for_carriers(self):
    return self.is_global  # Addresses are only global for carriers if they're global for everyone

@property
def is_private_for_carriers(self):
    return not self.is_global  # Carrier shared addresses are private for carriers

It feels like a fairly heavy solution for a relatively rare case, but I'm thinking that "Complex is better than complicated" applies here: exposing separate property pairs for endpoint use cases and carrier use cases (in addition to the existing "for all" flag attributes) forces people to decide which pair they want for their use case, but once you've made that decision, you can just ignore the other two property pairs.

The main alternative to that "choose a pair of properties and use them consistently" solution is the following:

# Existing `is_global` setting means "Is global for everyone"

@property
def is_global_for_endpoints(self):
    return not self.is_private  # Carrier shared addresses are global for endpoints

@property
def is_private_for_carriers(self):
    return not self.is_global  # Carrier shared addresses are private for carriers

# Existing `is_private` setting means "Is private for everyone"

While superficially simpler (only 2 new properties rather than 4), this feels more complicated to use in practice, since endpoint use cases will want the is_global_for_endpoints and is_private pair, while carrier use cases will want the is_global and is_private_for_carriers pair. Even identifying that those are the valid pairs assumes a reader already understands the problem we're wanting to provide a clearer API for.

@picnixz picnixz added the stdlib Python modules in the Lib dir label Sep 8, 2024
@ncoghlan
Copy link
Contributor

ncoghlan commented Oct 9, 2024

I thought of a potential solution that moves the handling of this problem to the point where addresses/networks/etc are defined rather than every time is_global and is_private are queried.

Network range semantics

We want to define 3 different sets of potential network range categorisation semantics for IPv6 address and network instances to follow:

  • universal: the current behaviour, where both is_private and is_global are False for the carrier private address range
  • carrier: the carrier private range reports addr.is_private == True (addr.is_global remains False)
  • endpoint: the carrier private range reports addr.is_global == True (addr.is_private remains False)

The first case will ideally be transitional - the eventual default should be endpoint semantics, with the ability to explicitly request carrier semantics.

To allow for deprecation of the universal case, we'd need to allow carrier or endpoint semantics to be explicitly requested:

  • ipaddress.set_default_network_range_semantics(carrier=True): makes carrier semantics the default
  • ipaddress.set_default_network_range_semantics(carrier=False): makes endpoint semantics the default

Which implies a public API to query the default semantics as well:

  • ipaddress.get_default_network_range_semantics(): returns a string value ("universal" if no default semantics have been set, "carrier" or "endpoint" as appropriate if they have)

Defining a real enum seems like overkill here (ipaddress currently only depends on functools and nothing else, so bringing in 2000 lines of enum support code isn't a trivial change, especially for an obscure issue like this one), but a tri-state boolean would be too cryptic as a status reporting API (even though it's reasonably clear when setting the value) , so returning string names seems like a decent compromise.

IPv6 address and network changes

IPv6 address and network instances would take a new tri-state boolean carrier option (where the default None means "use the module level default").

That option (and potentially the module default setting) would control how is_private and is_global behave on that instance.

They would gain a new string property range_semantics that would report "universal"/"carrier"/"endpoint" as appropriate.

Defaulting to "endpoint" semantics

We don't actually know if telcos and other backbone operators are relying on the current is_global and is_private behaviour to identify addresses in the shared address space. This means the conservative option for getting to endpoint semantics as the default behaviour would be to:

  • emit DeprecationWarning if an address or network in the carrier private range is defined and no default network semantics have been set (with directions to call set_default_semantics() or specify carrier to make the warning go away)
  • eventually drop the deprecation warning, get rid of the "unknown" default semantics, and make endpoint semantics the default behaviour

Alternatively, we could decide that endpoint use cases so massively outweigh carrier use cases that it's OK to skip the deprecation period, and instead just add a porting note that carriers relying on is_global being False for carrier-private addresses will need to call ipaddress.set_default_semantics(carrier=True) in their applications (which will also make is_private return True).

Addresses in the private range could still be specifically identified with that default behaviour, but you'd have to create two address instances (one with carrier range semantics, one with endpoint semantics) and see if they reported different is_global or is_private values.

We'd need SC approval for that (as per PEP 387), and I'm not sure they'd grant it (we can reasonably issue a deprecation notice, it's just highly arguable how useful it would be to do so when we're pretty sure 99.9% of the userbase will want the new behaviour and would see it as a bug fix rather than as functional change). A straw poll on discuss.python.org might be helpful to get a sense of whether folks think it's OK to spend a couple of years getting away from the currently ambiguous default range handling semantics.

Edit: changed the suggested name of the current default semantics from "unknown" to "universal" ("unknown" made it sound like the range semantics were unknown, which isn't the case. Calling that case "universal range semantics" reflects the fact that is_global and is_private are only true when they're true under both "carrier" and "endpoint" semantics)

Edit 2: I realised what this case was reminding me of: the fold flag in the datetime module where the need to disambiguate the edge case of the overlapping hour when time zones switch out of daylight saving ends up needing to be tracked through the entire data model. Here, the network range semantics for endpoint and carrier use cases are almost identical... except in one particular range, where they're precise mirrors of each other. At least this case has the benefit that setting an application wide default is likely to be a reasonable approach (unlike timezone DST handling) (see next comment).

Edit 3: An even simpler third option would be to never change the default. carrier=None would request "universal" semantics, while any other false value would request "endpoint" semantics, and the current "universal" behaviour would remain the default indefinitely. I didn't like that idea with my original name for the "universal" semantics, but I'm more amenable to it now they have a reasonable name for the query APIs to return. Assuming we do end up using this "explicit range semantics" design, changing the default semantics should be split out into a follow-up issue, rather than being part of the same issue that adds the ability to explicitly request "carrier" or "endpoint" semantics.

@ncoghlan
Copy link
Contributor

ncoghlan commented Oct 9, 2024

There's a concrete problem in the "Default range semantics" idea as currently written: if instances eagerly save the default semantics that were in effect when the application started up, then attempting to change the default later will end up with an unpredictable mix of instances using universal range semantics and those using specifically carrier or endpoint semantics.

So maybe a true enum solution would be justified:

@enum
class NetworkRangeSemantics:
    Universal = "Common range properties for carrier and endpoint networks"
    Carrier = "Range properties for carrier networks"
    Endpoint = "Range properties for endpoint networks"

Given a true enum, the initial solution could completely ignore any prospect of changing the default behaviour, and instead just accept a range_semantics parameter to the relevant instance construction APIs (with the default being Universal for backwards compatibility reasons).

@encukou
Copy link
Member

encukou commented Oct 11, 2024

IMO, we're getting into a situation where it's better to ask interested people to put a better-ipaddress module on PyPI, and after the design is solid, start talking about including it (or parts of it) back in the stdlib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

4 participants