Search documents

  • first-steps
Created: 27-06-2024 - Last updated: 27-06-2024

The engine (Elasticsearch)

SONAR uses Elasticsearch, a fast and powerful search engine. The key to its performance is that it doesn't query a database directly, but rather text indexes in JSON format, enabling it to find information quickly even among very large quantities of data. Its various mechanisms make it a Google-like search engine, flexible and easy to use for the uninitiated, while offering more advanced functions for technical users as well.

Elasticsearch uses mathematical vectors to assign scores to the resources returned by a query and rank them by relevance.

Boolean operators and search tips

By default, spaces between words are treated as AND operators. Search modifiers and other boolean operators can be used with Elasticsearch's simple query string syntax.

See the basic help for examples of boolean search shortcuts.

Building advanced searches

To build an advanced search, two tools are available and can be combined:

  • URL parameters are predefined by the system to facilitate frequent queries. These parameters also control the display of search results and the filtering facets. In the URL, the parameters are separated by a & symbol.
  • Elasticsearch query string queries target one or more fields in the index, and are entered after the q= parameter.

URL parameters

Parameters are the elements in the URL after the question mark. They are combined with each other using the & symbol.

Parameter Use API example Interface examples
Resource PID Accesses a specific resource
The user interfaces return the detailed view of the resource, and the API returns the raw record as stored in the database without indexing enrichments.
api/documents/111874 public / admin
q Introduce an Elasticsearch query api/documents/?q=mountain public / admin
page Set the results page number api/documents?q=mountain&page=5 public / admin
size Set the number of elements displayed in each results page api/documents?q=mountain&size=25 public / admin
sort Define how the results are sorted. The possible sort options may vary for different resources. api/documents?q=mountain&sort=title public / admin
prettyprint Format the JSON display api/documents/?q=mountain&prettyprint=1 only available in the API
Other preset filter Apply a filter/facet preset by the system. api/documents?q=mountain&document_type=coar:c_6501 public / admin

Query syntax

A query is entered using the q parameter. The possibilities of the query syntax are described in detail in the Elasticsearch documentation: Query String Syntax.

A query allows you to target fields or sub-fields of the resource using the index names of these fields. Each . in a query indicates that a sub-field is being searched, and the : introduces the value sought. It is therefore important to be familiar with the structure of the document fields when constructing a query. For example, provisionActivity.statement.value:Zürich searches for Zürich in value which is a subfield of statement, itself a subfield of provisionActivity in the document index.

In some cases, the backslash is used in queries, to allow certain characters to escape processing by Elasticsearch. In a browser's address bar, this backslash must be encoded in URL format: %5C. For example, the * operator, which searches in all subfields of a structured field, must be escaped: %5c*.

Operators

Operator Description API example Interface Examples
Search in multiple subfields (\*) Includes all subfields of title, including subtitles and additional titles api/documents/?q=title.\*:study public / admin
* Truncation of a word within a search api/documents/?q=title.mainTitle.value:"myopath*" public / admin
AND Boolean AND operator api/documents/?q=title.\*:(mountain AND biology) public / admin
OR Boolean OR operator api/documents/?q=title.\*:(mountain OR biology) public / admin
NOT Boolean NOT operator api/documents/?q=title.\*:(mountain NOT biology)) public / admin
(_exists_:<field name>) Search resources where a specific field is present api/documents/?q=_exists_:partOf public / admin
Quotes "" Search for resources containing an expression api/documents/?q=title.\*:"à la recherche du temps" public / admin
Operators can combine multiple query terms in a subfield (?q=subjects.\*:(moutain AND Matterhorn) : documents with "mountain" AND "Matterhorn" in the subject field) or multiple subqueries (?q=subjects.\*:moutain AND contribution.\*:"ramuz" : documents with "mountain" in the subjects field AND "ramuz" in the contribution field).

Search examples

Query description Syntax Example
everywhere - ?q=study
by title title.\*: ?q=title.\*:study
by author contribution.\*: ?q=contribution.\*:(rené schneider)
in the fulltext fulltext: ?q=fulltext:remerciements
by identifier identifiers.\*: ?q=identifiers.\*:333332
by type of diploma dissertation.degree: ?q=dissertation.degree:(Mémoire de bachelor)
by place, editor or date provisionActivity.\*: ?q=provisionActivity.\*:(Fribourg 2022)
By record creation date range (square brackets are inclusive) [*date* TO *date*] ?q=_created:[2022-01-01 TO 2022-12-31]
By record update date range (brackets are exclusiv) {*date* TO *date*} ?q=_updated:{2021-10-24 TO *}

Basic principles | Deposit a document