Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
317690b
fix/datagen
mfshao Mar 10, 2020
2fe2910
fix/dep
mfshao Mar 10, 2020
c11d057
fix/gen data
mfshao Mar 18, 2020
6759eca
lint
mfshao Mar 18, 2020
d4c9ce9
feat/schema
mfshao Apr 7, 2020
d54aba8
feat/resolver
mfshao Apr 7, 2020
06d8791
feat/resolver
mfshao Apr 10, 2020
8f6fc4f
fix/disable no console for genData
mfshao Apr 10, 2020
9092d45
feat/update es version
mfshao Apr 13, 2020
40a1f4a
feat/update resolver
mfshao Apr 13, 2020
65b8e29
feat/nested test agg
mfshao Apr 13, 2020
fc264e2
feat/nested numeric aggs
mfshao Apr 13, 2020
6bb5315
fix/tests
mfshao Apr 13, 2020
0adc762
chore/rename test
mfshao Apr 13, 2020
90502bc
feat/unit tests
mfshao Apr 13, 2020
6f19424
feat/new doc
mfshao Apr 13, 2020
191c2f5
chore/comments
mfshao Apr 13, 2020
cbf5de3
Merge branch 'master' into feat/nested-agg
mfshao Apr 13, 2020
13ebe53
update package
mfshao Apr 13, 2020
bb22c6a
fix/doc
mfshao Apr 13, 2020
fd1c2b2
fix/bot alert
mfshao Apr 13, 2020
af2877c
chore/doc update
mfshao Apr 13, 2020
84975f4
fix/typo
mfshao Apr 14, 2020
2fa9965
fix/nested query
mfshao Apr 14, 2020
8c1194f
fix/doc
mfshao Apr 14, 2020
a1ee95c
fix/doc
mfshao Apr 14, 2020
002f6fe
Merge branch 'master' into feat/nested-agg
mfshao Apr 14, 2020
dbe782c
Update doc/queries.md
mfshao Apr 14, 2020
828a57a
Update genData/genData.js
mfshao Apr 14, 2020
6e2db0a
Update generate_data.sh
mfshao Apr 14, 2020
bd20991
chore/remove unnecessary printouts
mfshao Apr 15, 2020
4923f7d
chore/explaination for nested vs sub aggs
mfshao Apr 15, 2020
4e319fc
chore/move chapt
mfshao Apr 15, 2020
9f897e0
chore/missing line break
mfshao Apr 15, 2020
2f7e5a5
chore/no dir jumpings in doc
mfshao Apr 16, 2020
7e1915e
fix/dont add missing alias to numeric field by default
mfshao Apr 17, 2020
44ec235
feat/generate mock array and update config index
mfshao Apr 25, 2020
562d005
feat/handle array as nested field
mfshao Apr 25, 2020
455f404
comment
mfshao Apr 25, 2020
12f2acd
chore/config index default
mfshao Apr 26, 2020
b0c9b74
chore/doc update
mfshao Apr 26, 2020
5422c18
chore/rename var
mfshao Apr 27, 2020
48364a1
fix/typo
mfshao Apr 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ Please see [this doc](https://github.com/uc-cdis/guppy/blob/master/doc/queries.m

Run `npm start` to start server at port 80.

### Local Deployment and Development:
Guppy has some helper script to help a developer to set up a local ES service using Docker, generate some example ES indices for testing, and pop mock data into these example ES indices. Please refer to [the DEV Helper doc](https://github.com/uc-cdis/guppy/blob/master/devHelper/README.md) for more information.

### Configurations:
Before launch, we need to write config and tell Guppy which elasticsearch indices and which auth control field to use.
You could put following as your config files:
Expand Down Expand Up @@ -40,7 +43,7 @@ export GUPPY_CONFIG_FILEPATH=./example_config.json
npm start
```

#### Authorization
### Authorization:
Guppy connects Arborist for authorization.
The `auth_filter_field` item in your config file is the field used for authorization.
You could set the endpoint by:
Expand All @@ -54,7 +57,7 @@ skip all authorization steps. But if you just want to mock your own authorizatio
behavior for local test without Arborist, just set `INTERNAL_LOCAL_TEST=true`. Please
look into `/src/server/auth/utils.js` for more details.

#### Tier access
### Tiered Access:
Guppy also support 3 different levels of tier access, by setting `TIER_ACCESS_LEVEL`:
- `private` by default: only allows access to authorized resources
- `regular`: allows all kind of aggregation (with limitation for unauthorized resources), but forbid access to raw data without authorization
Expand Down Expand Up @@ -94,7 +97,7 @@ export TIER_ACCESS_LIMIT=100
npm start
```

> ##### Tier Access Sensitive Record Exclusion
> #### Tier Access Sensitive Record Exclusion
> It is possible to configure Guppy to hide some records from being returned in `_aggregation` queries when Tiered Access is enabled (tierAccessLevel: "regular").
> The purpose of this is to "hide" information about certain sensitive resources, essentially making this an escape hatch from Tiered Access.
> Crucially, Sensitive Record Exclusion only applies to records which the user does not have access to. If the user has access to a record, it will
Expand All @@ -104,5 +107,5 @@ npm start
>
> (E.g., `"tier_access_sensitive_record_exclusion_field": "sensitive"` in the Guppy config tells Guppy to look for a field in the ES index called `sensitive`, and to exclude records in the ES index which have `sensitive: "true"`)

#### Download endpoint
### Download Endpoint:
Guppy has another special endpoint `/download` for just fetching raw data from elasticsearch. please see [here](https://github.com/uc-cdis/guppy/blob/master/doc/download.md) for more details.
41 changes: 35 additions & 6 deletions devHelper/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,50 @@
# How to generate mock data and start developing in your local

## Step.1 start elasticsearch

Go to the repository's root directory, do:
```
docker-compose -f ./esearch.yml up -d
docker-compose -f ./devHelper/docker/esearch.yml up -d
```

## Step.2 import mock data into elasticsearch index
Go to the repository's root directory and run the following command.
In the root directory of this repo, run the following command:

```
sh ./generate_data.sh
```

Doing so will automatically generate 3 ES indices (1 for `subject`, 1 for `file`, and 1 for `config`) and populate 100 records into each index.

### Manually generate more mock data for a specific elasticsearch index (optional)
In case we want more mock data, Guppy has a helper function to generate mock data for a specific ES index. For example, to generate data for an ES index called `gen3-dev-subject` with document type `subject`, run the following command:
```
npm run gendata -- -i gen3-dev-subject -d subject
```

Here is a complete list of arguments that `npm run gendata` would take
| argument | description | default |
|------------------------------|--------------------------------------------------------|-------------------|
| -v, --verbose | verbose output | false |
| -h, --hostname `<hostname>` | elasticsearch hostname | http://localhost |
| -p, --port `<port>` | elasticsearch port | 9200 |
| -i, --index `<index>` | elasticsearch index | undefined |
| -d, --doc_type `<doc_type>` | document type | undefined |
| -c, --config_index `<config_index>` | array config index | gen3-dev-config |
| -n, --number `<number>` | number of documents to generate | 500 |
| -r, --random | generate random number of document up to `number` | false |

Also, there are some predefined values in `/genData/valueBank.json`.

:information_source: **Special handling for generating mock data for array type fields**

In Elasticsearch, arrays do not require a dedicated field datatype. In other words, when defining an ES fields in the mapping object, array fields have no difference than other regular fields in terms of syntax. But GQL does differ array from other data types.

So in order to add an array field to mock data, we require that field to explicitly contains the word `array` in its field name. And it is also required to put some predefined values for that array field in `/genData/valueBank.json`.

Doing so will ensure the array config index be updated with names of all the array fields in an ES index. If you have array fields in any of your ES index, then it is necessary to have a correct array config index in order to successfully generate corresponding GQL schemas and resolvers. To specify the name of the array config index, pass a `-c` or `--config_index` argument to the `npm run gendata` command. The default name of test array config index is `gen3-dev-config`.

## Step.3 start server for developing server side code
Go to repo root directory, and run
In the root directory of this repo, run:

```
GUPPY_PORT=3000 INTERNAL_LOCAL_TEST=true npm start
Expand All @@ -24,11 +54,10 @@ The Guppy server will be hosted at [localhost:3000/graphql](http://localhost:300
We use nodemon to start the server, so all code change will be hot applied to the running server in realtime.

## Step.4 start storybook for developing front-end components
Go to repo root directory, and run
In the root directory of this repo, run:

```
npm run storybook
```

[Storybook](https://storybook.js.org/) will be hosted at [localhost:6006](http://localhost:6006).

2 changes: 1 addition & 1 deletion devHelper/docker/esearch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ version: "3.3"
services:
# see https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.5.4
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.7.0
ports:
- "9200:9200"
- "9300:9300"
Expand Down
16 changes: 14 additions & 2 deletions devHelper/scripts/commands.sh
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,13 @@ curl -iv -X PUT "${ESHOST}/${indexName}" \
"auth_resource_path": { "type": "keyword", "fields": { "analyzed": {"type": "text", "analyzer": "ngram_analyzer", "search_analyzer": "search_analyzer", "term_vector": "with_positions_offsets"} } },
"file_count": { "type": "integer" },
"whatever_lab_result_value": { "type": "float" },
"some_string_field": { "type": "keyword", "fields": { "analyzed": {"type": "text", "analyzer": "ngram_analyzer", "search_analyzer": "search_analyzer", "term_vector": "with_positions_offsets"} } },
"some_nested_array_field": {
"type": "nested",
"properties": {
"some_integer_inside_nested": { "type": "integer" },
"some_string_inside_nested": { "type": "keyword", "fields": { "analyzed": {"type": "text", "analyzer": "ngram_analyzer", "search_analyzer": "search_analyzer", "term_vector": "with_positions_offsets"} } }
}
},
"some_integer_field": { "type": "integer" },
"some_long_field": { "type": "long" },
"sensitive": { "type": "keyword" }
Expand Down Expand Up @@ -133,7 +139,13 @@ curl -iv -X PUT "${ESHOST}/${configIndexName}" \
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
},
"mappings": {
"_doc": {
"properties": {
"array": { "type": "keyword" }
}
}
}
}
'
Expand Down
169 changes: 163 additions & 6 deletions doc/queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ Table of Contents
- [Text Aggregation](#aggs-text)
- [Numeric Aggregation](#aggs-numeric)
- [Nested Aggregation](#aggs-nested)
- [Sub-aggregations](#aggs-sub)
- [Filters](#filter)
- [Basic Filter Unit](#filter-unit)
- [Text Search Unit in Filter](#filter-search)
- [Combined Filters](#filter-comb)
- [Nested Filter](#filter-nested)
- [Some other queries and arguments](#other)

<a name="query"></a>
Expand Down Expand Up @@ -395,11 +397,166 @@ Result:
<a name="aggs-nested"></a>

### 4. Nested Aggregation
Guppy supports nested aggregations (sub-aggregations) for fields. Currently Guppy only supports two-level-sub-aggregations.
:bangbang: **This section is for performing aggregations on documents which contain nested fields. For information about Guppy supporting nested sub-aggregations such as terms aggregation and missing aggregation, please refer to [Sub-aggregations](#aggs-sub)**

There are two types of nested aggregations that is supported by Guppy: terms aggregation and missing aggregation, user can mix-and-match the using of both aggregations.
>The difference between Nested Aggregations and Sub-aggregations is that Nested Aggregations are performed on multi-level nested fields, while the sub-aggregations are performed on different fields within a same level.

#### 4.1. Terms Aggregation
Guppy supports performing aggregations (both text and numeric aggregations) on nested fields. For information about using nested fields inside filters, see [Nested Filter](#filter-nested)
> Suppose the ES index has a mapping as the following:
>```
> "mappings": {
> "subject": {
> "properties": {
> "subject_id": { "type": "keyword" },
> "visits": {
> "type": "nested",
> "properties": {
> "days_to_visit": { "type": "integer" },
> "visit_label": { "type": "keyword" },
> "follow_ups": {
> "type": "nested",
> "properties": {
> "days_to_follow_up": { "type": "integer" },
> "follow_up_label": { "type": "keyword" },
> }
> }
> }
> },
> }
> }
> }
>```

An example nested query that Guppy can perform with respect to that ES index could be:
```
query: {
_aggregation: {
subject: {
subject_id: { --> regular non-nested aggregation
histogram: {
key
count
}
}
visits: {
visit_label: { --> one-level nested text aggregation
histogram: {
key
count
}
}
follow_ups: {
days_to_follow_up: { --> two-level nested numeric aggregation
histogram(rangeStep: 1) {
key
count
}
}
}
}
}
}
}
```

Result:
```
{
"data": {
"_aggregation": {
"subject": {
"subject_id": {
"histogram": [
{
"key": "subject_id_1",
"count": 24
},
{
"key": "subject_id_2",
"count": 24
},
{
"key": "subject_id_3",
"count": 21
}
]
},
"visits": {
"visit_label": {
"histogram": [
{
"key": "vst_lbl_3",
"count": 29
},
{
"key": "vst_lbl_1",
"count": 21
},
{
"key": "vst_lbl_2",
"count": 19
}
]
},
"follow_ups": {
"days_to_follow_up": {
"histogram": [
{
"key": [
1,
2
],
"count": 21
},
{
"key": [
2,
3
],
"count": 19
},
{
"key": [
3,
4
],
"count": 29
}
]
}
}
}
}
}
}
}
```


<a name="aggs-sub"></a>

### 5. Sub-aggregations
:warning: **This section is for performing sub-aggregations (terms and missing aggregations) on documents. This section was incorrectly named as "Nested Aggregation" before Guppy 0.5.0 and has been corrected since then.**

Guppy supports sub-aggregations for fields. Currently Guppy only supports two-level-sub-aggregations.

There are two types of sub-aggregations that is supported by Guppy: terms aggregation and missing aggregation, user can mix-and-match the using of both aggregations.

For more information about ES terms aggregation and missing aggregation, please read: [Terms Aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html) and [Missing Aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-missing-aggregation.html)

> For examples in the following sections, assume the ES index has a mapping as the following:
>```
> "mappings": {
> "subject": {
> "properties": {
> "project": { "type": "keyword" },
> "gender": { "type": "keyword" },
> },
> }
> }
>```

#### 5.1. Terms Aggregation
Terms aggregation requires a single `field` for parent aggregation and an array of fields for the nested sub-aggregations. The sub-aggregations will be computed for the buckets which their parent aggregation generates. It is intended to show for each of the `key` of the single `field` in the parent aggregation, what is the distribution of each element from the array of fields in the sub-aggregations.

Results are wrapped by keywords `field` and also `key` and `count` for that `field`, example:
Expand Down Expand Up @@ -521,7 +678,7 @@ Result:
}
```

#### 4.2. Missing Aggregation
#### 5.2. Missing Aggregation
Missing aggregation also requires a single `field` for parent aggregation and an array of fields for the nested sub-aggregations. The sub-aggregations will be computed for the buckets which their parent aggregation generates. It is intended to show for each of the `key` of the single `field` in the parent aggregation, how many elements from the array of fields in the sub-aggregation are missing from it.

Results are wrapped by keywords `field` and `count`, example:
Expand Down Expand Up @@ -789,7 +946,7 @@ In future Guppy will support `SQL` like syntax for filter, like `
{"filter": "(race = 'hispanic' OR race='asian') AND (file_count >= 15 AND file_count <= 75) AND project = 'Proj-1' AND gender = 'female'"}
`.

<a name="other"></a>
<a name="filter-nested"></a>

### Nested filter
Guppy now supports query on nested ElasticSearch schema. The way to query and filter the nested index is similar to the ES query.
Expand Down Expand Up @@ -833,6 +990,7 @@ Assuming that there is `File` node nested inside `subject`. The nested query wil
ElasticSearch only support the nested filter on the level of document for returning data. It means that the filter `file_count >=15` and `file_count<=75` will return the whole document having a `file_count` in the range of `[15, 75]`.
The returned data will not filter the nested `file_count`(s) that are out of that range for that document.

<a name="other"></a>
## Some other queries and arguments

### Mapping query
Expand Down Expand Up @@ -1086,4 +1244,3 @@ Result:
}
}
```

Loading