-
Notifications
You must be signed in to change notification settings - Fork 158
UserAgentParser does not support patch_minor, but patch_minor is supported in uap-core #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there any update on this issue? We are also running into this problem when using the Python module. π |
Same issue with missing |
If somebody wants to contribute it, I don't mind merging it, but I won't be the one doing that: as I wrote on ua-parser/uap-core#322#issuecomment-1118955704 the thing doesn't seem to be finished or to have been adopted by any other implementation (not even the reference impl). As I also do not see its point, I do not care for it. |
New API with full typing ======================== Seems pretty self-explanatory, rather than returning somewhat ad-hoc dicts this API works off of dataclasses, it should be compatible with the legacy version through the magic of ~~buying two of them~~ `dataclasses.asdict`. Parser API ========== The legacy version had "parsers" which really represent individual parsing rules. In the new API the job of a parser is what the top-level functions did, they wrap around the entire job of parsing a user-agent string. The core API is just `__call__`, with a selection flag for the domains (seems like the least bad term for what "user agent", "os", and "device" are, other alternatives I considered are "component" and "category", but I'm still ambivalent). Overridable helpers are provided which match the old API's methods (with PEP8 conventions), as well as the same style of helpers at the package toplevel. This resolves a number of limitations: Concurrency ----------- While the library should be thread-safe (and I need to find a way to test that) the ability to instantiate parsers should provide the opportunity for things like thread-local parsers, or actual parallelism if we start using native extensions (regex, re2). It also allows running multiple *parser configurations* concurrently, including e.g. multiple independent custom yaml sets. Not sure there's a use for it, but why not? At the very least it should make using custom YAML datasets much easier than having to set envvars. Customization ------------- Public APIs are provided both to instantiate and tune parsers, and to set the global parser. Hopefully this makes evaluating proposed parsers as well as evaluating & tuning caches (algorithm & size) easier. Even more so as we should provide some sort of evaluation CLI in ua-parser#163. Caches ------ In the old API, package-provided API could only be global and with a single implementation as it had to integrate with the toplevel parsing functions. By reifying the parsing job, a cache is just a parser which delegates the parse if it doesn't have a hit. This allows more easily providing, testing, and evolving alternative cache strategies. Bulk APIs --------- The current parser checks rules (regexes) one at a time on the input, but there are advanced regex APIs which can check a regex *set* and return which one(s) matched, allowing much more efficicent bulk matching e.g. google's re2, rust's regex. With the old scheme, this would be a pretty significant change in use / behaviour, obviating the use of the "parsers" with no recourse. Under the new parsing scheme, these can just be different "base" parsers, they can be the default, they can be cached, and users can instantiate their own parser instead. Fixes ua-parser#93, fixes ua-parser#142, closes ua-parser#116
New API with full typing ======================== Seems pretty self-explanatory, rather than returning somewhat ad-hoc dicts this API works off of dataclasses, it should be compatible with the legacy version through the magic of ~~buying two of them~~ `dataclasses.asdict`. Parser API ========== The legacy version had "parsers" which really represent individual parsing rules. In the new API the job of a parser is what the top-level functions did, they wrap around the entire job of parsing a user-agent string. The core API is just `__call__`, with a selection flag for the domains (seems like the least bad term for what "user agent", "os", and "device" are, other alternatives I considered are "component" and "category", but I'm still ambivalent). Overridable helpers are provided which match the old API's methods (with PEP8 conventions), as well as the same style of helpers at the package toplevel. This resolves a number of limitations: Concurrency ----------- While the library should be thread-safe (and I need to find a way to test that) the ability to instantiate parsers should provide the opportunity for things like thread-local parsers, or actual parallelism if we start using native extensions (regex, re2). It also allows running multiple *parser configurations* concurrently, including e.g. multiple independent custom yaml sets. Not sure there's a use for it, but why not? At the very least it should make using custom YAML datasets much easier than having to set envvars. Customization ------------- Public APIs are provided both to instantiate and tune parsers, and to set the global parser. Hopefully this makes evaluating proposed parsers as well as evaluating & tuning caches (algorithm & size) easier. Even more so as we should provide some sort of evaluation CLI in ua-parser#163. Caches ------ In the old API, package-provided API could only be global and with a single implementation as it had to integrate with the toplevel parsing functions. By reifying the parsing job, a cache is just a parser which delegates the parse if it doesn't have a hit. This allows more easily providing, testing, and evolving alternative cache strategies. Bulk APIs --------- The current parser checks rules (regexes) one at a time on the input, but there are advanced regex APIs which can check a regex *set* and return which one(s) matched, allowing much more efficicent bulk matching e.g. google's re2, rust's regex. With the old scheme, this would be a pretty significant change in use / behaviour, obviating the use of the "parsers" with no recourse. Under the new parsing scheme, these can just be different "base" parsers, they can be the default, they can be cached, and users can instantiate their own parser instead. Misc ---- The new API's UA extractor pipeline supports `patch_minor`, though that requires excluding that bit from the tests as there are apparently broken test cases around that item (ua-parser/uap-core#562). Init Helpers ============ Having proper parsers is the opportunity to allow setting parsers at runtime more easily (instead of load-time envvars), however optional constructors (classmethods) turns out to be iffy from an API and typing perspective both. Instead have the "base" parsers (the ones doing the actual parsing of the UAs) just take a uniform parsed data set, and have utility loaders provide that from various data sources (precompiled, preformatted, or data files). This avoids redundancy and the need for mixins / inheritance, and mypy is *much* happier. Legacy Parsers -> New Matchers ============================== The bridging of the legacy parsers and the new results turned out to be pretty mid. Instead, the new API relies on similar but better typed matcher classes, with a slightly different API: they return `None` on a match failure instead of a triplet, which make them compose better in iteration (e.g. can just `filter` them out). Add a `Matchers` alias to carry them around (a tuple of lists of matchers) for convenience, as well as as base parser parameter. Also clarify the replacer rules, and hopefully implement the thing more clearly. Fixes ua-parser#93, fixes ua-parser#142, closes ua-parser#116
New API with full typing ======================== Seems pretty self-explanatory, rather than returning somewhat ad-hoc dicts this API works off of dataclasses, it should be compatible with the legacy version through the magic of ~~buying two of them~~ `dataclasses.asdict`. Parser API ========== The legacy version had "parsers" which really represent individual parsing rules. In the new API the job of a parser is what the top-level functions did, they wrap around the entire job of parsing a user-agent string. The core API is just `__call__`, with a selection flag for the domains (seems like the least bad term for what "user agent", "os", and "device" are, other alternatives I considered are "component" and "category", but I'm still ambivalent). Overridable helpers are provided which match the old API's methods (with PEP8 conventions), as well as the same style of helpers at the package toplevel. This resolves a number of limitations: Concurrency ----------- While the library should be thread-safe (and I need to find a way to test that) the ability to instantiate parsers should provide the opportunity for things like thread-local parsers, or actual parallelism if we start using native extensions (regex, re2). It also allows running multiple *parser configurations* concurrently, including e.g. multiple independent custom yaml sets. Not sure there's a use for it, but why not? At the very least it should make using custom YAML datasets much easier than having to set envvars. The caching parser being stateful, it's protected by an optional lock seems like the best way to make caching thread-safe. When only using a single thread, or using thread-local parsers, caching can be disabled by using a `contextlib.nullcontext` as lock. Customization ------------- Public APIs are provided both to instantiate and tune parsers, and to set the global parser. Hopefully this makes evaluating proposed parsers as well as evaluating & tuning caches (algorithm & size) easier. Even more so as we should provide some sort of evaluation CLI in ua-parser#163. Caches ------ In the old API, package-provided API could only be global and with a single implementation as it had to integrate with the toplevel parsing functions. By reifying the parsing job, a cache is just a parser which delegates the parse if it doesn't have a hit. This allows more easily providing, testing, and evolving alternative cache strategies. Bulk APIs --------- The current parser checks rules (regexes) one at a time on the input, but there are advanced regex APIs which can check a regex *set* and return which one(s) matched, allowing much more efficicent bulk matching e.g. google's re2, rust's regex. With the old scheme, this would be a pretty significant change in use / behaviour, obviating the use of the "parsers" with no recourse. Under the new parsing scheme, these can just be different "base" parsers, they can be the default, they can be cached, and users can instantiate their own parser instead. Misc ---- The new API's UA extractor pipeline supports `patch_minor`, though that requires excluding that bit from the tests as there are apparently broken test cases around that item (ua-parser/uap-core#562). Init Helpers ============ Having proper parsers is the opportunity to allow setting parsers at runtime more easily (instead of load-time envvars), however optional constructors (classmethods) turns out to be iffy from an API and typing perspective both. Instead have the "base" parsers (the ones doing the actual parsing of the UAs) just take a uniform parsed data set, and have utility loaders provide that from various data sources (precompiled, preformatted, or data files). This avoids redundancy and the need for mixins / inheritance, and mypy is *much* happier. Legacy Parsers -> New Matchers ============================== The bridging of the legacy parsers and the new results turned out to be pretty mid. Instead, the new API relies on similar but better typed matcher classes, with a slightly different API: they return `None` on a match failure instead of a triplet, which make them compose better in iteration (e.g. can just `filter` them out). Add a `Matchers` alias to carry them around (a tuple of lists of matchers) for convenience, as well as as base parser parameter. Also clarify the replacer rules, and hopefully implement the thing more clearly. Fixes ua-parser#93, fixes ua-parser#142, closes ua-parser#116
New API with full typing ======================== Seems pretty self-explanatory, rather than returning somewhat ad-hoc dicts this API works off of dataclasses, it should be compatible with the legacy version through the magic of ~~buying two of them~~ `dataclasses.asdict`. Parser API ========== The legacy version had "parsers" which really represent individual parsing rules. In the new API the job of a parser is what the top-level functions did, they wrap around the entire job of parsing a user-agent string. The core API is just `__call__`, with a selection flag for the domains (seems like the least bad term for what "user agent", "os", and "device" are, other alternatives I considered are "component" and "category", but I'm still ambivalent). Overridable helpers are provided which match the old API's methods (with PEP8 conventions), as well as the same style of helpers at the package toplevel. This resolves a number of limitations: Concurrency ----------- While the library should be thread-safe (and I need to find a way to test that) the ability to instantiate parsers should provide the opportunity for things like thread-local parsers, or actual parallelism if we start using native extensions (regex, re2). It also allows running multiple *parser configurations* concurrently, including e.g. multiple independent custom yaml sets. Not sure there's a use for it, but why not? At the very least it should make using custom YAML datasets much easier than having to set envvars. The caching parser being stateful, it's protected by an optional lock seems like the best way to make caching thread-safe. When only using a single thread, or using thread-local parsers, caching can be disabled by using a `contextlib.nullcontext` as lock. Customization ------------- Public APIs are provided both to instantiate and tune parsers, and to set the global parser. Hopefully this makes evaluating proposed parsers as well as evaluating & tuning caches (algorithm & size) easier. Even more so as we should provide some sort of evaluation CLI in ua-parser#163. Caches ------ In the old API, package-provided API could only be global and with a single implementation as it had to integrate with the toplevel parsing functions. By reifying the parsing job, a cache is just a parser which delegates the parse if it doesn't have a hit. This allows more easily providing, testing, and evolving alternative cache strategies. Bulk APIs --------- The current parser checks rules (regexes) one at a time on the input, but there are advanced regex APIs which can check a regex *set* and return which one(s) matched, allowing much more efficicent bulk matching e.g. google's re2, rust's regex. With the old scheme, this would be a pretty significant change in use / behaviour, obviating the use of the "parsers" with no recourse. Under the new parsing scheme, these can just be different "base" parsers, they can be the default, they can be cached, and users can instantiate their own parser instead. Misc ---- The new API's UA extractor pipeline supports `patch_minor`, though that requires excluding that bit from the tests as there are apparently broken test cases around that item (ua-parser/uap-core#562). Init Helpers ============ Having proper parsers is the opportunity to allow setting parsers at runtime more easily (instead of load-time envvars), however optional constructors (classmethods) turns out to be iffy from an API and typing perspective both. Instead have the "base" parsers (the ones doing the actual parsing of the UAs) just take a uniform parsed data set, and have utility loaders provide that from various data sources (precompiled, preformatted, or data files). This avoids redundancy and the need for mixins / inheritance, and mypy is *much* happier. Legacy Parsers -> New Matchers ============================== The bridging of the legacy parsers and the new results turned out to be pretty mid. Instead, the new API relies on similar but better typed matcher classes, with a slightly different API: they return `None` on a match failure instead of a triplet, which make them compose better in iteration (e.g. can just `filter` them out). Add a `Matchers` alias to carry them around (a tuple of lists of matchers) for convenience, as well as as base parser parameter. Also clarify the replacer rules, and hopefully implement the thing more clearly. Fixes ua-parser#93, fixes ua-parser#142, closes ua-parser#116
New API with full typing ======================== Seems pretty self-explanatory, rather than returning somewhat ad-hoc dicts this API works off of dataclasses, it should be compatible with the legacy version through the magic of ~~buying two of them~~ `dataclasses.asdict`. Parser API ========== The legacy version had "parsers" which really represent individual parsing rules. In the new API the job of a parser is what the top-level functions did, they wrap around the entire job of parsing a user-agent string. The core API is just `__call__`, with a selection flag for the domains (seems like the least bad term for what "user agent", "os", and "device" are, other alternatives I considered are "component" and "category", but I'm still ambivalent). Overridable helpers are provided which match the old API's methods (with PEP8 conventions), as well as the same style of helpers at the package toplevel. This resolves a number of limitations: Concurrency ----------- While the library should be thread-safe (and I need to find a way to test that) the ability to instantiate parsers should provide the opportunity for things like thread-local parsers, or actual parallelism if we start using native extensions (regex, re2). It also allows running multiple *parser configurations* concurrently, including e.g. multiple independent custom yaml sets. Not sure there's a use for it, but why not? At the very least it should make using custom YAML datasets much easier than having to set envvars. The caching parser being stateful, it's protected by an optional lock seems like the best way to make caching thread-safe. When only using a single thread, or using thread-local parsers, caching can be disabled by using a `contextlib.nullcontext` as lock. Customization ------------- Public APIs are provided both to instantiate and tune parsers, and to set the global parser. Hopefully this makes evaluating proposed parsers as well as evaluating & tuning caches (algorithm & size) easier. Even more so as we should provide some sort of evaluation CLI in ua-parser#163. Caches ------ In the old API, package-provided API could only be global and with a single implementation as it had to integrate with the toplevel parsing functions. By reifying the parsing job, a cache is just a parser which delegates the parse if it doesn't have a hit. This allows more easily providing, testing, and evolving alternative cache strategies. Bulk APIs --------- The current parser checks rules (regexes) one at a time on the input, but there are advanced regex APIs which can check a regex *set* and return which one(s) matched, allowing much more efficicent bulk matching e.g. google's re2, rust's regex. With the old scheme, this would be a pretty significant change in use / behaviour, obviating the use of the "parsers" with no recourse. Under the new parsing scheme, these can just be different "base" parsers, they can be the default, they can be cached, and users can instantiate their own parser instead. Misc ---- The new API's UA extractor pipeline supports `patch_minor`, though that requires excluding that bit from the tests as there are apparently broken test cases around that item (ua-parser/uap-core#562). Init Helpers ============ Having proper parsers is the opportunity to allow setting parsers at runtime more easily (instead of load-time envvars), however optional constructors (classmethods) turns out to be iffy from an API and typing perspective both. Instead have the "base" parsers (the ones doing the actual parsing of the UAs) just take a uniform parsed data set, and have utility loaders provide that from various data sources (precompiled, preformatted, or data files). This avoids redundancy and the need for mixins / inheritance, and mypy is *much* happier. Legacy Parsers -> New Matchers ============================== The bridging of the legacy parsers and the new results turned out to be pretty mid. Instead, the new API relies on similar but better typed matcher classes, with a slightly different API: they return `None` on a match failure instead of a triplet, which make them compose better in iteration (e.g. can just `filter` them out). Add a `Matchers` alias to carry them around (a tuple of lists of matchers) for convenience, as well as as base parser parameter. Also clarify the replacer rules, and hopefully implement the thing more clearly. Fixes ua-parser#93, fixes ua-parser#142, closes ua-parser#116
New API with full typing ======================== Seems pretty self-explanatory, rather than returning somewhat ad-hoc dicts this API works off of dataclasses, it should be compatible with the legacy version through the magic of ~~buying two of them~~ `dataclasses.asdict`. Parser API ========== The legacy version had "parsers" which really represent individual parsing rules. In the new API the job of a parser is what the top-level functions did, they wrap around the entire job of parsing a user-agent string. The core API is just `__call__`, with a selection flag for the domains (seems like the least bad term for what "user agent", "os", and "device" are, other alternatives I considered are "component" and "category", but I'm still ambivalent). Overridable helpers are provided which match the old API's methods (with PEP8 conventions), as well as the same style of helpers at the package toplevel. This resolves a number of limitations: Concurrency ----------- While the library should be thread-safe (and I need to find a way to test that) the ability to instantiate parsers should provide the opportunity for things like thread-local parsers, or actual parallelism if we start using native extensions (regex, re2). It also allows running multiple *parser configurations* concurrently, including e.g. multiple independent custom yaml sets. Not sure there's a use for it, but why not? At the very least it should make using custom YAML datasets much easier than having to set envvars. The caching parser being stateful, it's protected by an optional lock seems like the best way to make caching thread-safe. When only using a single thread, or using thread-local parsers, caching can be disabled by using a `contextlib.nullcontext` as lock. Customization ------------- Public APIs are provided both to instantiate and tune parsers, and to set the global parser. Hopefully this makes evaluating proposed parsers as well as evaluating & tuning caches (algorithm & size) easier. Even more so as we should provide some sort of evaluation CLI in ua-parser#163. Caches ------ In the old API, package-provided API could only be global and with a single implementation as it had to integrate with the toplevel parsing functions. By reifying the parsing job, a cache is just a parser which delegates the parse if it doesn't have a hit. This allows more easily providing, testing, and evolving alternative cache strategies. Bulk APIs --------- The current parser checks rules (regexes) one at a time on the input, but there are advanced regex APIs which can check a regex *set* and return which one(s) matched, allowing much more efficicent bulk matching e.g. google's re2, rust's regex. With the old scheme, this would be a pretty significant change in use / behaviour, obviating the use of the "parsers" with no recourse. Under the new parsing scheme, these can just be different "base" parsers, they can be the default, they can be cached, and users can instantiate their own parser instead. Misc ---- The new API's UA extractor pipeline supports `patch_minor`, though that requires excluding that bit from the tests as there are apparently broken test cases around that item (ua-parser/uap-core#562). Init Helpers ============ Having proper parsers is the opportunity to allow setting parsers at runtime more easily (instead of load-time envvars), however optional constructors (classmethods) turns out to be iffy from an API and typing perspective both. Instead have the "base" parsers (the ones doing the actual parsing of the UAs) just take a uniform parsed data set, and have utility loaders provide that from various data sources (precompiled, preformatted, or data files). This avoids redundancy and the need for mixins / inheritance, and mypy is *much* happier. Legacy Parsers -> New Matchers ============================== The bridging of the legacy parsers and the new results turned out to be pretty mid. Instead, the new API relies on similar but better typed matcher classes, with a slightly different API: they return `None` on a match failure instead of a triplet, which make them compose better in iteration (e.g. can just `filter` them out). Add a `Matchers` alias to carry them around (a tuple of lists of matchers) for convenience, as well as as base parser parameter. Also clarify the replacer rules, and hopefully implement the thing more clearly. Fixes ua-parser#93, fixes ua-parser#142, closes ua-parser#116
New API with full typing ======================== Seems pretty self-explanatory, rather than returning somewhat ad-hoc dicts this API works off of dataclasses, it should be compatible with the legacy version through the magic of ~~buying two of them~~ `dataclasses.asdict`. Parser API ========== The legacy version had "parsers" which really represent individual parsing rules. In the new API the job of a parser is what the top-level functions did, they wrap around the entire job of parsing a user-agent string. The core API is just `__call__`, with a selection flag for the domains (seems like the least bad term for what "user agent", "os", and "device" are, other alternatives I considered are "component" and "category", but I'm still ambivalent). Overridable helpers are provided which match the old API's methods (with PEP8 conventions), as well as the same style of helpers at the package toplevel. This resolves a number of limitations: Concurrency ----------- While the library should be thread-safe (and I need to find a way to test that) the ability to instantiate parsers should provide the opportunity for things like thread-local parsers, or actual parallelism if we start using native extensions (regex, re2). It also allows running multiple *parser configurations* concurrently, including e.g. multiple independent custom yaml sets. Not sure there's a use for it, but why not? At the very least it should make using custom YAML datasets much easier than having to set envvars. The caching parser being stateful, it's protected by an optional lock seems like the best way to make caching thread-safe. When only using a single thread, or using thread-local parsers, caching can be disabled by using a `contextlib.nullcontext` as lock. Customization ------------- Public APIs are provided both to instantiate and tune parsers, and to set the global parser. Hopefully this makes evaluating proposed parsers as well as evaluating & tuning caches (algorithm & size) easier. Even more so as we should provide some sort of evaluation CLI in #163. Caches ------ In the old API, package-provided API could only be global and with a single implementation as it had to integrate with the toplevel parsing functions. By reifying the parsing job, a cache is just a parser which delegates the parse if it doesn't have a hit. This allows more easily providing, testing, and evolving alternative cache strategies. Bulk APIs --------- The current parser checks rules (regexes) one at a time on the input, but there are advanced regex APIs which can check a regex *set* and return which one(s) matched, allowing much more efficicent bulk matching e.g. google's re2, rust's regex. With the old scheme, this would be a pretty significant change in use / behaviour, obviating the use of the "parsers" with no recourse. Under the new parsing scheme, these can just be different "base" parsers, they can be the default, they can be cached, and users can instantiate their own parser instead. Misc ---- The new API's UA extractor pipeline supports `patch_minor`, though that requires excluding that bit from the tests as there are apparently broken test cases around that item (ua-parser/uap-core#562). Init Helpers ============ Having proper parsers is the opportunity to allow setting parsers at runtime more easily (instead of load-time envvars), however optional constructors (classmethods) turns out to be iffy from an API and typing perspective both. Instead have the "base" parsers (the ones doing the actual parsing of the UAs) just take a uniform parsed data set, and have utility loaders provide that from various data sources (precompiled, preformatted, or data files). This avoids redundancy and the need for mixins / inheritance, and mypy is *much* happier. Legacy Parsers -> New Matchers ============================== The bridging of the legacy parsers and the new results turned out to be pretty mid. Instead, the new API relies on similar but better typed matcher classes, with a slightly different API: they return `None` on a match failure instead of a triplet, which make them compose better in iteration (e.g. can just `filter` them out). Add a `Matchers` alias to carry them around (a tuple of lists of matchers) for convenience, as well as as base parser parameter. Also clarify the replacer rules, and hopefully implement the thing more clearly. Fixes #93, fixes #142, closes #116
Ended up adding it to the new API and eventual future 1.0, so closing this as fixed, though it's not available yet in a packaged form |
Thank you! |
The string 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36' should parse out to
but, instead, we get:
User Agent
patch_minor
is already supported in uap-core; see ua-parser/uap-core#22 and ua-parser/uap-core#322 for an example.UserAgentParser.Parse
and_ParseUserAgent
should be updated to support User Agentpatch_minor
, as well.The text was updated successfully, but these errors were encountered: