Entityhub Query
Subresource /query
Description | Allows to parse JSON serialized field queries to the Entityhub. Only Entities managed by the Emtityhub are searched |
---|---|
Request | -X POST -H "Content-Type:application/json" --data "@fieldQuery.json" /entityhub/query |
Parameter | The JSON serialised FieldQuery |
Produces | The results of the query serialised in the format as specified by the Accept header |
Example
curl -X POST -H "Content-Type:application/json" --data "@fieldQuery.json" https://enrich.acdh.oeaw.ac.at/entityhub/query
Note: "@fieldQuery.json" links to a local file that contains the parsed Fieldquery (see ection "FieldQuery JSON format" for examples).
FieldQuery Documentation:
The
FieldQuery is part of the java API defined in the
org.apache.stanbol.entityhub.servicesapi
bundle
Main Elements
"selected"
: json array with the name of the fields selected by this query"offset"
: the offset of the first result returned by this query"limit"
: the maximum number of results returned"constraints"
: json array holding all the constraints of the query"ldpath"
: LDpath program that is executed for all results of the query. More powerful alternative to the"selected"
parameter to define returned information for query results.
Examples:
Simple Field Query that selects rdfs:label and rdf:type with no offset that returns at max three results. Constraints are skipped
{
"selected": [
"http:\/\/www.w3.org\/2000\/01\/rdf-schema#label",
"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type"],
"offset": "0",
"limit": "3",
"constraints": [...]
}
The following example uses an LDPath program to select the rdfs:type and the rdfs:labels as schema:name. The offset is set to 5 and a maximum of 5 results are returned. This is similar the 2nd page if the number of items is set to 5.
{
"ldpath": "schema:name = rdfs:label;rdf:type;",
"offset": "5",
"limit": "5",
"constraints": [...]
}
FieldQuery Constraints:
Constraints are always applied to a field. Currently the implementation is limited to a single constraint/field. This is an limitation of the implementation and not a theoretical one.
While there are five different Constraint types the following attributes are required by all types.
field
: the field to apply the constraint.type
: the type of the constraint. One of"reference"
,"value"
,"text"
,"range"
or"similarity"
In addition the following optional attributes are supported by all constraints
boost
: Allows to define a boost for a constraint. If supported boosts will influence the ranking of query results. The boost value MUST BE a number>= 0
. The default is1
.
There are 4 different constraint types.
- ValueConstraint: Checks if the value of the field is equals to the parsed value and data type
- ReferenceConstraint: A special form of the ValueConstraint that defaults the data type to references (links to other entities)
- TextConstraint: Checks if the value of the field is equals to the parsed value, language. It supports also wildcard and regex searches.
- RangeConstraint: Checks if the value of the field is within the parsed range
- SimilarityConstraint: Checks if the value of the field is within the parsed range
Reference Constraint:
Additional key:
value
(required): the URI value(s). For a single value a string can be used. Multiple values need to be parsed as JSON arraymode
: If multiple values are parsed this can be used to specify if query results must have "any
" or "all
" parsed values (default: "any
")
Example:
Search for instances of the type Place as defined in the dbpedia ontology
{
"type": "reference",
"field": "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
"value": "http:\/\/dbpedia.org\/ontology\/Place",
}
Search Entities that link to all of the following Entities. NOTE that the
field "http://stanbol.apache.org/ontology/entityhub/query#references
is special as it will cause a search in any outgoing relation. See the section
special fields for details
{
"type": "reference",
"field": "http:\/\/stanbol.apache.org\/ontology\/entityhub\/query#references",
"value": [
"http:\/\/dbpedia.org\/resource\/Category:Capitals_in_Europe",
"http:\/\/dbpedia.org\/resource\/Category:Host_cities_of_the_Summer_Olympic_Games",
"http:\/\/dbpedia.org\/ontology\/City"
],
"mode": "all"
}
Value Constraint
Value Constraints are very similar to Reference Constraints however they can
be used to check values of fields for any data type.
If no data type is defined the data type will be guessed based on the provided
JSON type of the value. For details please see the table below.
Additional keys:
value
(required): the value(s). For multiple values a JSON array must be used.datatype
: the data type of the value as a string. Multiple data types can also be parsed by using a JSON array. Note that if no datatype is define, the default is guessed based on the type of the parsed value.
Especially note that string values are mapped to "xsd:string" and not "entityhub:text" as used for natural language texts within the entityhub. Users that want to query for natural language text values should use TextConstraints instead.mode
: If multiple values are parsed this can be used to specify if query results must have "any
" or "all
" parsed values (default: "any
"). For an usage example see the 2nd reference constraint example
Example:
Search for all entities with an altitude of 34 meter. Note that a String is parsed as value, but the datatype is explicitly set to 'xsd:integer'
{
"selected": [
"http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"],
"offset": "0",
"limit": "3",
"constraints": [{
"type": "value",
"value": "34",
"field": "http:\/\/www.w3.org\/2003\/01\/geo\/wgs84_pos#alt",
"datatype": "xsd:int"
}]
}
The same can be achieved by parsing numerical 34 and not specifying the datatype. In this case "xsd:interger" would be guessed based on the provided value. Note however that this would not work for "xsd:long".
{
"type": "value",
"value": 34,
"field": "http:\/\/www.w3.org\/2003\/01\/geo\/wgs84_pos#alt",
}
Expected Results on DBPedia.org for this query include Berlin and Baghdad
Text Constraint
Additional key:
text
(required): the text to search. Multiple values can be parsed by using a JSON array. Note that multiple values are considerd optional. (e.g. parsing "Barack Obama" returns Entities that contain both "Barack" and "Obama" while parsing ["Barack","Obama"] will also return documents with any of the two words; Also combinations like ["Barack Obama","USA","United States"] are allowed)language
: the language of the searched text as string. Multiple languages can be parsed as JSON array. Parsing "" as language will include values with missing language information. If no language is defined values in any language will be used.patternType
: one of "wildcard", "regex" or "none" (default is "none")caseSensitive
: boolean (default is "false")proximityRanking
: boolean (default is undefined). This tells Sites that the proximity of parsed texts should be used for ranking. The default is undefined and may depend on the actual Site executing the query
Example:
(1) Searches for entities with an german rdfs:label starting with "Frankf"
(2) Searches for entities that contain "Frankfurt" OR "Main" OR "Airport" in
any language
Typically the "Frankfurt am Main Airport" should be ranked first because it
contains all the optional terms.
{
"type": "text",
"language": "de",
"patternType": "wildcard",
"text": "Frankf*",
"field": "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"
}
{
"type": "text",
"text": ["Frankfurt","Main","Airport"]
"field": "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"
},
Expected Results on DBPedia.org for (1) include "Frankfurt am Main", "Eintracht Frankfurt" and "Frankfort, Kentucky" and for (2) the Airport of Frankfurt am Main, Frankfurt as well as Airport.
Range Constraint:
Additional key:
lowerBound
: The lower bound of the range (one of lower and upper bound MUST BE defined)upperBound
: The upper bound of the range (one of lower and upper bound MUST BE defined)inclusive
: used for both upper and lower bound (default is "false")
Example:
The following Query combines two range constraints and a reference constraint to search for cities with more than one million inhabitants that are more than 1000 meter above sea level.
Note that the range for the population needs to parse the datatype "xsd:long" because otherwise the parsed value would be converted the "xsd:integer".
{
"selected": [
"http:\/\/www.w3.org\/2000\/01\/rdf-schema#label",
"http:\/\/dbpedia.org\/ontology\/populationTotal",
"http:\/\/www.w3.org\/2003\/01\/geo\/wgs84_pos#alt"],
"offset": "0",
"limit": "3",
"constraints": [{
"type": "range",
"field": "http:\/\/dbpedia.org\/ontology\/populationTotal",
"lowerBound": 1000000,
"inclusive": true,
"datatype": "xsd:long"
},{
"type": "range",
"field": "http:\/\/www.w3.org\/2003\/01\/geo\/wgs84_pos#alt",
"lowerBound": 1000,
"inclusive": true,
},{
"type": "reference",
"field": "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
"value": "http:\/\/dbpedia.org\/ontology\/City",
}]
}
Expected Results on DBPedia.org include Mexico City, Bogota and Quito.
The following query searches for persons born in 1946
{
"selected": [
"http:\/\/www.w3.org\/2000\/01\/rdf-schema#label",
"http:\/\/dbpedia.org\/ontology\/birthDate",
"http:\/\/dbpedia.org\/ontology\/deathDate"],
"offset": "0",
"limit": "3",
"constraints": [{
"type": "range",
"field": "http:\/\/dbpedia.org\/ontology\/birthDate",
"lowerBound": "1946-01-01T00:00:00.000Z",
"upperBound": "1946-12-31T23:59:59.999Z",
"inclusive": true,
"datatype": "xsd:dateTime"
},{
"type": "reference",
"field": "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
"value": "http:\/\/dbpedia.org\/ontology\/Person",
}]
}
Expected Results on DBPedia.org include Bill Clinton, George W. Bush and Donald Trump.
Similarity Constraint:
This constaint allows to select entities similar to the parsed context. This
constraint is curretly only supported by the Solr based storage of the Entityhub.
It can not be implemented on storages that use SPARQL for search.
NOTE also that only a single Similarity Constraint can be used per Field Query.
Additional key:
context
(required): The text used as context to search for similar entities. Users can parse values form single words up to the text of the current section or an whole document.addFields
: This allows to parse additional fields (properties) used for the similarity search. This fields will be added to the value of the "field
".
Example:
This example combines a filter for Entities with the type Place with an
similarity search for "Wolfgang Amadeus Mozart". The field
http://stanbol.apache.org/ontology/entityhub/query#fullText
is
a special field that allows to search the full
text (all textual and xsd:string
values) of an Entity.
{
"type": "reference",
"value": "http:\/\/dbpedia.org\/ontology\/Place",
"field": "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
},
{
"type": "similarity",
"context": "Wolfgang Amadeus Mozart",
"field": "http:\/\/stanbol.apache.org\/ontology\/entityhub\/query#fullText",
}
Expected results with the default DBpedia dataset include Salzurg. However because the default dataset only includes the short rdfs:comment texts results of similarity searches are very limited. Typically the use of similarity searches needs already considered when indexing data sets.
Special Fields
Currently the following special fields are defined
http://stanbol.apache.org/ontology/entityhub/query#fullText
: Allows to search within the all natuaral langauge andxsd:string
values that are linked with the Entity. This field is especially usefull for Text Constraints and Similarity Constraint searches.
NOTE that for text queries language constrains may be ignored as the full text field MAY NOT be able to support language constraints.http://stanbol.apache.org/ontology/entityhub/query#references
: Allows to search far all entities referenced by this Entity. This includes other entities andxsd:anyURI
values (e.g. foaf:homepage values). Because if this Reference Constraints applied to this field are queries for the semantic context of an Entity.