stream.hls: parse M3U8 from Response obj directly#4552
Conversation
27a4344 to
4f4bd5c
Compare
This sounds like a valid change and I don't see any downside to it.
I'm not too concerned with breaking changes. I'm not really sure what other feedback to provide here, what would be the downside of implementing this change? Is there more you want to do in this PR? If not I'll merge it, the changes look good. |
- Accept both `str` and `requests.Response` in `M3U8Parser.parse` - Always set encoding of variant and media HLS playlists to utf-8 https://datatracker.ietf.org/doc/html/rfc8216#section-9 - Remove old custom utf-8 overrides
4f4bd5c to
4870866
Compare
|
No, there's nothing to be added here apart from new tests which test invalid encodings, which is not that useful.
Compared to |
…#4552) - Accept both `str` and `requests.Response` in `M3U8Parser.parse` - Always set encoding of variant and media HLS playlists to utf-8 https://datatracker.ietf.org/doc/html/rfc8216#section-9 - Remove old custom utf-8 overrides
strandrequests.ResponseinM3U8Parser.parsehttps://datatracker.ietf.org/doc/html/rfc8216#section-9
Passing the HTTP Response object to the parser and iterating its content prevents having to keep a copy of the entire response content in memory. Support for
stris kept for backwards compatibility and because some tests rely on it and would need to get rewritten.The content is now also always read as UTF-8, as defined by RFC 8216. This was previously guessed by
chardet/charset_normalizerif no HTTP response headers were set (see #4329) and it required custom overrides if there were issues while figuring out the unknown encodings. All HLS tests have been using implicit utf-8 encoding the entire time. We could add tests for invalid encodings in the future, but I don't think it's important.I have a couple more changes planned for the
HLSStream+ andM3U8+ implementations. But that's not ready yet and unrelated to this PR. Just want to quickly talk about that.One of the changes is using
typing.Generic+typing.TypeVarfor making it easier to subclass HLS streams, the parser and other stuff, without having to suppress invalid type informations or method signatures (see the Twitch plugin for example). Using dataclasses instead of named tuples is also one of the things which will make subclassing/extending HLS logic easier.Another change is passing the parsed master playlist object to the HLSStreams created by
parse_variant_playlist, so additional data can be read by the media playlists and their parsers. It currently only adds the master playlist URL to the{Muxed,}HLSStreamfor theto_manifest_urlmethod (I also want to change this interface eventually, but that may be a breaking change). I noticed the master playlist issue while rewriting Twitch's low latency stuff two days ago, because there's metadata in the master playlist that should be read by the media playlist. It's also relevant forEXT-X-SESSION-DATAandEXT-X-SESSION-KEY, which are currently not implemented.