-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Labels
Milestone
Description
According to Richard, the cause of the XmlXPathReader problem, which is explained below, is as follows:
"It looks like the processNode() method in XmlXPathReader expects the nodes selected by PARAM_XPATH_EXPRESSION to have sub-elements. It doesn't seem to consider that the
nodes only have text content" (Richard).
The problem description is as follows:
I have problems using XmlXPathReader. I tried to parse the xml file provided by Dkpro test (full_tag_format.xml) with the following code:
================
CollectionReaderDescription reader = createReaderDescription(XmlXPathReader.class,
XmlXPathReader.PARAM_LANGUAGE, "en",
XmlXPathReader.PARAM_SOURCE_LOCATION, "input",
XmlXPathReader.PARAM_XPATH_EXPRESSION, "/topics/topic/description",
XmlXPathReader.PARAM_PATTERNS, new String[] { "[+]full*.xml" });
================
I get "sofaString="[Parse error]":
================
<xmi:XMI xmlns:WhatAliceDoesExample="http:///de/tudarmstadt/ukp/tutorial/gscl2013/ruta/WhatAliceDoesExample.ecore" xmlns:pos="http:///de/tudarmstadt/ukp/dkpro/core/api/lexmorph/type/pos.ecore" xmlns:tcas="http:///uima/tcas.ecore" xmlns:xmi="http://www.omg.org/XMI" xmlns:cas="http:///uima/cas.ecore" xmlns:type9="http:///org/apache/uima/ruta/type.ecore" xmlns:html="http:///org/apache/uima/ruta/type/html.ecore" xmlns:tweet="http:///de/tudarmstadt/ukp/dkpro/core/api/lexmorph/type/pos/tweet.ecore" xmlns:morph="http:///de/tudarmstadt/ukp/dkpro/core/api/lexmorph/type/morph.ecore" xmlns:dependency="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type/dependency.ecore" xmlns:type5="http:///de/tudarmstadt/ukp/dkpro/core/api/semantics/type.ecore" xmlns:type8="http:///de/tudarmstadt/ukp/dkpro/core/api/transform/type.ecore" xmlns:type7="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type.ecore" xmlns:type2="http:///de/tudarmstadt/ukp/dkpro/core/api/metadata/type.ecore" xmlns:type3="http:///de/tudarmstadt/ukp/dkpro/core/api/ner/type.ecore" xmlns:type4="http:///de/tudarmstadt/ukp/dkpro/core/api/segmentation/type.ecore" xmlns:type="http:///de/tudarmstadt/ukp/dkpro/core/api/coref/type.ecore" xmlns:type6="http:///de/tudarmstadt/ukp/dkpro/core/api/structure/type.ecore" xmlns:constituent="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type/constituent.ecore" xmlns:chunk="http:///de/tudarmstadt/ukp/dkpro/core/api/syntax/type/chunk.ecore" xmi:version="2.0">
<cas:NULL xmi:id="0"/>
<type2:DocumentMetaData xmi:id="1" sofa="12" begin="0" end="13" language="en" documentTitle="full_tag_format.xml" documentId="full_tag_format.xml" documentUri="file:/C:/Workspace/LunaWS/DkproTrial-2/input/full_tag_format.xml" collectionId="input" documentBaseUri="file:/C:/Workspace/LunaWS/DkproTrial-2/input/" isLastSegment="false"/>
<type4:Sentence xmi:id="19" sofa="12" begin="0" end="13"/>
<type4:Token xmi:id="24" sofa="12" begin="0" end="6"/>
<type4:Token xmi:id="34" sofa="12" begin="7" end="12"/>
<type4:Token xmi:id="44" sofa="12" begin="12" end="13"/>
<cas:Sofa xmi:id="12" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="[Parse error]"/>
<cas:View sofa="12" members="1 19 24 34 44"/>
</xmi:XMI>
=================
However if I try with
XmlXPathReader.PARAM_XPATH_EXPRESSION, "/topics/topic",
it goes fine and I get partially correct answer:
===================
2
Gender bias and poverty
Find documents on gender bias and resultant poverty
problems.
Documents and research reports that look into gender bias
as well as its revelations on poverty problems.
===================
Please note that the answer is partially correct because there are two topic tags and not one. So, I think this is another bug.
Thanks