Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@tompng
Copy link
Member

@tompng tompng commented Apr 30, 2025

XPath.match, XPath.first, XPath.each, XPathParser#parse and XPathParser#match accepted nodeset as element.
This pull request changes the first parameter of these method to be an element instead of nodeset.
Passing nodeset will be deprecated.

# Documented usage. OK
REXML::XPath.match(element, xpath)

# Undocumented usage. Deprecate in this pull request
nodeset = [element]
REXML::XPath.match(nodeset, xpath)

Background

#249 will introduce a temporary cache.

def parse path, nodeset
  path_stack = @parser.parse( path )
  nodeset.first.document.send(:enable_cache) do
    match( path_stack, nodeset )
  end
end

But the signature XPathParser#match(path, nodeset) does not guarantee that all nodes in the nodeset has the same root document.
So cache does not work in the code below. It's still slow.

REXML::XPath.match(2.times.map { REXML::Document.new('<a>'*400+'</a>'*400) }, 'a//a')

The interface is holding our back, so I propose to drop accepting array as element.
This change is a backward incompatibility, but it just drops undocumented feature. I think only the test code was unintentionally using this feature.

XPath.match with array

XPath.match only traverse the first element of the array for some selectors.

nodeset = [REXML::Document.new("<a><b/></a>"), REXML::Document.new("<a><c/></a>")]

REXML::XPath.match(nodeset, "a/*")
#=> [<b/>, <c/>]

REXML::XPath.match(nodeset, "//a/*")
#=> [<b/>] # I expect [<b/>, <c/>] but the second document is ignored

It indicates that XPath.match is not designed to search inside multiple nodes/documents.

XPath.match, first, each accepted array as an element.
This behavior is not documented, and making hard to optimize and refactor.
The second argument of XPathParser#parse and XPathParser#match is also changed from nodeset to node
@naitoh
Copy link
Contributor

naitoh commented May 3, 2025

But the signature XPathParser#match(path, nodeset) does not guarantee that all nodes in the nodeset has the same root document. So cache does not work in the code below. It's still slow.

In this case, I made further improvements to #249, which eliminated the slowness.

Drop accepting array as an element in XPath.match, first and each

How about instead of removing a feature, a deprecated message should be displayed if it is specified in an array?

What do you think @kou?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK that we drop support for nodeset because we don't have enough resource to complete nodeset support.

In general, we want to keep backward compatibility as much as possible. But we can remove the feature without keeping backward compatibility because:

  1. It's not documented
  2. It doesn't work in some cases

But could you report a warning as @naitoh suggested something like:

diff --git a/lib/rexml/xpath_parser.rb b/lib/rexml/xpath_parser.rb
index 5eb1e5a..a2b2ef5 100644
--- a/lib/rexml/xpath_parser.rb
+++ b/lib/rexml/xpath_parser.rb
@@ -136,11 +136,12 @@ module REXML
     end
 
 
-    def match(path_stack, nodeset)
-      nodeset = nodeset.collect.with_index do |node, i|
-        position = i + 1
-        XPathNode.new(node, position: position)
+    def match(path_stack, node)
+      if node.is_a?(Array)
+        warn("REXML::XPath.XXX dropped support for nodeset...", uplevel: N)
+        node = node.first
       end
+      nodeset = [XPathNode.new(node, position: 1)]
       result = expr(path_stack, nodeset)
       case result
       when Array # nodeset

@tompng tompng force-pushed the xpath_no_array branch from 6a02a69 to e2968dc Compare May 5, 2025 06:20
Copy link
Contributor

@naitoh naitoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks!

@naitoh
Copy link
Contributor

naitoh commented May 6, 2025

@tompng
Can you update this PR's description?

@tompng tompng changed the title Drop accepting array as an element in XPath.match, first and each Deprecate accepting array as an element in XPath.match, first and each May 6, 2025
@tompng
Copy link
Member Author

tompng commented May 6, 2025

@naitoh
Updated. Change the title DropDeprecate and mention about deprecation in the description

@naitoh naitoh merged commit cd575a1 into ruby:master May 7, 2025
66 of 67 checks passed
@naitoh
Copy link
Contributor

naitoh commented May 7, 2025

Thanks!

@tompng tompng deleted the xpath_no_array branch May 7, 2025 12:55
@voxik
Copy link

voxik commented Oct 31, 2025

voxik added a commit to voxik/vagrant-libvirt that referenced this pull request Oct 31, 2025
REXML 3.4.2+ deprecated accepting array as an element in `XPath.match`
[1]. This led to test errors such as:

~~~
  3) VagrantPlugins::ProviderLibvirt::Action::ResolveDiskSettings#call when vm box is in use when box metadata is not available when multiple volumes in domain config should populate domain volumes with devices
     Failure/Error:
       expect(env[:domain_volumes]).to match(
         [
           hash_including(
             device: 'vda',
             absolute_path: '/var/lib/libvirt/images/vagrant-test_default.img'
           ),
           hash_including(
             device: 'vdb',
             absolute_path: '/var/lib/libvirt/images/vagrant-test_default_1.img'
           ),
       expected [{absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img", bus: "virtio", cache: "default", device: "vda", name: "vagrant-test_default.img"}] to match [#<RSpec::Mocks::ArgumentMatchers::HashIncludingMatcher:0x00007fff921b3db8 @expected={device: "vda", absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img"}>, #<RSpec::Mocks::ArgumentMatchers::HashIncludingMatcher:0x00007fff921b3d40 @expected={device: "vdb", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_1.img"}>, #<RSpec::Mocks::ArgumentMatchers::HashIncludingMatcher:0x00007fff921b3cc8 @expected={device: "vdc", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_2.img"}>]
       Diff:
       @@ -1,4 +1,6 @@
       -[hash_including(device: "vda", absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img"),
       - hash_including(device: "vdb", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_1.img"),
       - hash_including(device: "vdc", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_2.img")]
       +[{absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img",
       +  bus: "virtio",
       +  cache: "default",
       +  device: "vda",
       +  name: "vagrant-test_default.img"}]
     # ./spec/unit/action/resolve_disk_settings_spec.rb:200:in 'block (6 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:51:in 'block (3 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:43:in 'block (2 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:51:in 'block (3 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:43:in 'block (2 levels) in <top (required)>'
~~~

This changes the logic in a way, that XPath is matching against whole
XML document, instead of array of XML elements.

[1]: ruby/rexml#252
voxik added a commit to voxik/vagrant-libvirt that referenced this pull request Oct 31, 2025
REXML 3.4.2+ deprecated accepting array as an element in `XPath.match`
[[1]]. This led to test errors such as:

~~~
  3) VagrantPlugins::ProviderLibvirt::Action::ResolveDiskSettings#call when vm box is in use when box metadata is not available when multiple volumes in domain config should populate domain volumes with devices
     Failure/Error:
       expect(env[:domain_volumes]).to match(
         [
           hash_including(
             device: 'vda',
             absolute_path: '/var/lib/libvirt/images/vagrant-test_default.img'
           ),
           hash_including(
             device: 'vdb',
             absolute_path: '/var/lib/libvirt/images/vagrant-test_default_1.img'
           ),
       expected [{absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img", bus: "virtio", cache: "default", device: "vda", name: "vagrant-test_default.img"}] to match [#<RSpec::Mocks::ArgumentMatchers::HashIncludingMatcher:0x00007fff921b3db8 @expected={device: "vda", absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img"}>, #<RSpec::Mocks::ArgumentMatchers::HashIncludingMatcher:0x00007fff921b3d40 @expected={device: "vdb", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_1.img"}>, #<RSpec::Mocks::ArgumentMatchers::HashIncludingMatcher:0x00007fff921b3cc8 @expected={device: "vdc", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_2.img"}>]
       Diff:
       @@ -1,4 +1,6 @@
       -[hash_including(device: "vda", absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img"),
       - hash_including(device: "vdb", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_1.img"),
       - hash_including(device: "vdc", absolute_path: "/var/lib/libvirt/images/vagrant-test_default_2.img")]
       +[{absolute_path: "/var/lib/libvirt/images/vagrant-test_default.img",
       +  bus: "virtio",
       +  cache: "default",
       +  device: "vda",
       +  name: "vagrant-test_default.img"}]
     # ./spec/unit/action/resolve_disk_settings_spec.rb:200:in 'block (6 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:51:in 'block (3 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:43:in 'block (2 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:51:in 'block (3 levels) in <top (required)>'
     # ./spec/support/unit_context.rb:43:in 'block (2 levels) in <top (required)>'
~~~

This changes the logic in a way, that XPath is matching against whole
XML document, instead of array of XML elements.

[1]: ruby/rexml#252
@naitoh
Copy link
Contributor

naitoh commented Nov 1, 2025

@voxik

Just FTR, this likely breaks vagrant-libvirt:

I have no idea ATM how to fix this :/

We apologize for causing incompatibility with vagrant-libvirt.
To avoid the impact of this change, please implement the following countermeasures at the location where the message REXML::XPath.each, REXML::XPath.first, REXML::XPath.match dropped support for nodeset... appears.

  • Please specify a single node, not an array of nodes, for REXML::XPath.match.
  • When specifying an array of nodes and wanting to receive results for multiple nodes, please specify a single node using a loop with each.

https://github.com/vagrant-libvirt/vagrant-libvirt/blob/a94ce0d7b6c90129a38435698ca97b364130313d/lib/vagrant-libvirt/action/resolve_disk_settings.rb#L53-L62

Fixed example:

$ git diff -ub
diff --git a/lib/vagrant-libvirt/action/resolve_disk_settings.rb b/lib/vagrant-libvirt/action/resolve_disk_settings.rb
index f1a3f67..bf5d951 100644
--- a/lib/vagrant-libvirt/action/resolve_disk_settings.rb
+++ b/lib/vagrant-libvirt/action/resolve_disk_settings.rb
@@ -51,14 +51,16 @@ module VagrantPlugins
               xml_descr = REXML::Document.new(domain_xml)
               domain_name = xml_descr.elements['domain'].elements['name'].text
               disks_xml = REXML::XPath.match(xml_descr, '/domain/devices/disk[@device="disk"]')
-              have_aliases = !REXML::XPath.match(disks_xml, './alias[@name="ua-box-volume-0"]').first.nil?
+              have_aliases = !REXML::XPath.match(disks_xml.first, './alias[@name="ua-box-volume-0"]').first.nil?
               env[:ui].warn(I18n.t('vagrant_libvirt.domain_xml.obsolete_method')) unless have_aliases
 
               if have_aliases
-                REXML::XPath.match(disks_xml,
+                disks_xml.each do |disk_xml|
+                  REXML::XPath.match(disk_xml,
                                      './alias[contains(@name, "ua-box-volume-")]').each_with_index do |alias_xml, idx|
                     domain_volumes.push(volume_from_xml(alias_xml.parent, domain_name, idx))
                   end
+                end
               else
                 # fallback to try and infer which boxes are box images, as they are listed first
                 # as soon as there is no match, can exit

@voxik
Copy link

voxik commented Nov 1, 2025

Fixed example:

Thanks, very useful to see real life example from experts 👍

Since I have proposed slightly different fix in vagrant-libvirt/vagrant-libvirt#1861 would you mind to check it and elaborate there what are the pros and cons of your vs my proposal?

@naitoh
Copy link
Contributor

naitoh commented Nov 2, 2025

@voxik

Since I have proposed slightly different fix in vagrant-libvirt/vagrant-libvirt#1861 would you mind to check it and elaborate there what are the pros and cons of your vs my proposal?

Fixed example:

  • pros

The existing logic remains unchanged, so there is no impact on performance.

  • cons

Nothing.

vagrant-libvirt/vagrant-libvirt#1861

  • pros

XPath specifies elements starting from the top of the XML document, making the processing easier to understand.

  • cons

Each time, since we specify the XPath from the top of the XML, the number of processing steps increases.
Therefore, when the target XML is large, there may be a performance impact.

@voxik
Copy link

voxik commented Nov 2, 2025

Each time, since we specify the XPath from the top of the XML, the number of processing steps increases. Therefore, when the target XML is large, there may be a performance impact.

That was also my worry, right. Thanks for confirming.

On top of that, I think your proposal uncovers flaw in the logic which was earlier elaborated in the original description: "XPath.match only traverse the first element of the array for some selectors.", because I does not seem to me it was intention to look only at first "disk". But I am not an author of the code, so I'll leave the decision to the upstream.

Thx for the suggestion and elaborating 👍 I appreciate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants