diff --git a/README.md b/README.md index 1d630bb4..1e73e63b 100644 --- a/README.md +++ b/README.md @@ -20,11 +20,13 @@ You could put following as your config files: "indices": [ { "index": "${ES_INDEX_1}", - "type": "${ES_DOC_TYPE_1}" + "type": "${ES_DOC_TYPE_1}", + "tier_access_level": "${ES_TIER_ACCESS_LEVEL_1}" // optional, set this if there is no global tierAccessLevel }, { "index": "${ES_INDEX_2}", - "type": "${ES_DOC_TYPE_2}" + "type": "${ES_DOC_TYPE_2}", + "tier_access_level": "${ES_TIER_ACCESS_LEVEL_2}" // optional, set this if there is no global tierAccessLevel }, ... ], @@ -35,6 +37,8 @@ You could put following as your config files: } ``` +Note: Guppy expects that either all indices in the guppy config block will have a tier_access_level set OR that a site-wide TIER_ACCESS_LEVEL is set as an environment variable (or in the global block of a commons' manifest). Guppy will throw an error if the config settings do not meet one of these two expectations. See `doc/index_scoped_tiered_access.md` for more information. + Following script will start server using at port 3000, using config file `example_config.json`: ``` @@ -58,17 +62,17 @@ behavior for local test without Arborist, just set `INTERNAL_LOCAL_TEST=true`. P look into `/src/server/auth/utils.js` for more details. ### Tiered Access: -Guppy also support 3 different levels of tier access, by setting `TIER_ACCESS_LEVEL`: +The tiered-access setting is configured through either the `TIER_ACCESS_LEVEL` environment variable or the `tier_access_level` properties on individual indices in the esConfig. Guppy supports 3 different levels of tiered access: - `private` by default: only allows access to authorized resources - `regular`: allows all kind of aggregation (with limitation for unauthorized resources), but forbid access to raw data without authorization - `libre`: access to all data -For `regular` level, there's another configuration environment variable `TIER_ACCESS_LIMIT`, which is the minimum visible count for aggregation results. +For the `regular` level, there's another configuration environment variable `TIER_ACCESS_LIMIT`, which is the minimum visible count for aggregation results. -`regular` level commons could also take in a whitelist of values that won't be encrypted. It is set by `config.encrypt_whitelist`. +`regular` level commons can also take in a whitelist of values that won't be encrypted. It is set by `config.encrypt_whitelist`. By default the whitelist contains missing values: ['\_\_missing\_\_', 'unknown', 'not reported', 'no data']. Also the whitelist is disabled by default due to security reasons. If you would like to enable whitelist, simply put `enable_encrypt_whitelist: true` in your config. -For example `regular` leveled commons with config looks like this will skip encrypting value `do-not-encrypt-me` even if its count is less than `TIER_ACCESS_LIMIT`: +For example, a `regular` leveled commons with config that looks like this will skip encrypting the value `do-not-encrypt-me` even if its count is less than `TIER_ACCESS_LIMIT`: ``` { @@ -89,7 +93,7 @@ For example `regular` leveled commons with config looks like this will skip encr } ``` -For example following script will start a Guppy server with `regular` tier access level, and minimum visible count set to 100: +The following script will start a Guppy server with a site-wide `regular` tier access level, and minimum visible count set to 100: ``` export TIER_ACCESS_LEVEL=regular @@ -97,6 +101,8 @@ export TIER_ACCESS_LIMIT=100 npm start ``` +To learn how to configure Guppy's tiered-access system using a per-index scoping, and which use cases might warrant such a configuration, please see `doc/index_scoped_tiered_access.md`. + > #### Tier Access Sensitive Record Exclusion > It is possible to configure Guppy to hide some records from being returned in `_aggregation` queries when Tiered Access is enabled (tierAccessLevel: "regular"). > The purpose of this is to "hide" information about certain sensitive resources, essentially making this an escape hatch from Tiered Access. diff --git a/doc/index_scoped_tiered_access.md b/doc/index_scoped_tiered_access.md new file mode 100644 index 00000000..19cf8450 --- /dev/null +++ b/doc/index_scoped_tiered_access.md @@ -0,0 +1,41 @@ +# Index-scoped Tiered-Access + +Most commons use a site-wide tiered access configuration that applies across indices. However, some use cases require index-scoped permissioning. One example is the case of an open-access study viewer where studies have a mix of public properties and controlled-access properties. Another example is a Data Explorer that presents data types with different permission requirements meant to serve a variety of audiences. For these use cases, tiered-access settings can be specified at the index-level rather than the site-wide level. + +Guppy expects that either all indices in the guppy config block will have a tiered-access level set OR that a site-wide tiered-access level is set in the global block of the manifest. Guppy will throw an error if the config settings do not meet one of these two expectations. + +You can set index-scoped tiered-access levels using the `tier_access_level` properties in the guppy block of a common's `manifest.json`. Note that the `tier_access_limit` setting is still site-wide and configurable in the manifest's `global` block. +``` +... +"guppy": { + "indices": [ + { + "index": "subject_regular", + "type": "subject", + "tier_access_level": "regular" + }, + { + "index": "subject_private", + "type": "subject_private", + "tier_access_level": "private" + }, + { + "index": "file_private", + "type": "file", + "tier_access_level": "private" + }, + { + "index": "studies_open", + "type": "studies_open", + "tier_access_level": "libre" + }, + { + "index": "studies_controlled_access", + "type": "studies_controlled_access", + "tier_access_level": "private" + } + ], + "auth_filter_field": "auth_resource_path", + ... + }, +``` diff --git a/src/server/__tests__/config.test.js b/src/server/__tests__/config.test.js index 6beb4287..be7f4199 100644 --- a/src/server/__tests__/config.test.js +++ b/src/server/__tests__/config.test.js @@ -37,6 +37,26 @@ describe('config', () => { expect(() => (require('../config'))).toThrow(new Error(`Invalid TIER_ACCESS_LEVEL "${process.env.TIER_ACCESS_LEVEL}"`)); }); + test('should show error if invalid tier access level in guppy block', async () => { + process.env.TIER_ACCESS_LEVEL = null; + const fileName = './testConfigFiles/test-invalid-index-scoped-tier-access.json'; + process.env.GUPPY_CONFIG_FILEPATH = `${__dirname}/${fileName}`; + const invalidItemType = 'subject_private'; + expect(() => (require('../config'))).toThrow(new Error(`tier_access_level invalid for index ${invalidItemType}.`)); + }); + + test('clears out site-wide default tiered-access setting if index-scoped levels set', async () => { + process.env.TIER_ACCESS_LEVEL = null; + process.env.TIER_ACCESS_LIMIT = 50; + const fileName = './testConfigFiles/test-index-scoped-tier-access.json'; + process.env.GUPPY_CONFIG_FILEPATH = `${__dirname}/${fileName}`; + const config = require('../config').default; + const { indices } = require(fileName); + expect(config.tierAccessLevel).toBeUndefined(); + expect(config.tierAccessLimit).toEqual(50); + expect(JSON.stringify(config.esConfig.indices)).toEqual(JSON.stringify(indices)); + }); + /* --------------- For whitelist --------------- */ test('could disable whitelist', async () => { const config = require('../config').default; diff --git a/src/server/__tests__/schema.test.js b/src/server/__tests__/schema.test.js index 8afecefd..1cecb9f9 100644 --- a/src/server/__tests__/schema.test.js +++ b/src/server/__tests__/schema.test.js @@ -6,6 +6,7 @@ import { getAggregationSchema, getAggregationSchemaForEachType, getMappingSchema, + getHistogramSchemas, } from '../schema'; import esInstance from '../es/index'; import config from '../config'; @@ -143,4 +144,36 @@ describe('Schema', () => { expect(removeSpacesNewlinesAndDes(mappingSchema)) .toEqual(removeSpacesAndNewlines(expectedMappingSchema)); }); + + const expectedHistogramSchemas = ` + type HistogramForString { + histogram: [BucketsForNestedStringAgg] + } + type RegularAccessHistogramForString { + histogram: [BucketsForNestedStringAgg] + } + type HistogramForNumber { + histogram( + rangeStart: Int, + rangeEnd: Int, + rangeStep: Int, + binCount: Int, + ): [BucketsForNestedNumberAgg], + asTextHistogram: [BucketsForNestedStringAgg] + } + type RegularAccessHistogramForNumber { + histogram( + rangeStart: Int, + rangeEnd: Int, + rangeStep: Int, + binCount: Int, + ): [BucketsForNestedNumberAgg], + asTextHistogram: [BucketsForNestedStringAgg] + }`; + test('could create histogram schemas for each type', async () => { + await esInstance.initialize(); + const histogramSchemas = getHistogramSchemas(); + expect(removeSpacesNewlinesAndDes(histogramSchemas)) + .toEqual(removeSpacesAndNewlines(expectedHistogramSchemas)); + }); }); diff --git a/src/server/__tests__/testConfigFiles/test-index-scoped-tier-access.json b/src/server/__tests__/testConfigFiles/test-index-scoped-tier-access.json new file mode 100644 index 00000000..0dd87e65 --- /dev/null +++ b/src/server/__tests__/testConfigFiles/test-index-scoped-tier-access.json @@ -0,0 +1,29 @@ +{ + "indices": [ + { + "index": "subject_regular", + "type": "subject", + "tier_access_level": "regular" + }, + { + "index": "subject_private", + "type": "subject_private", + "tier_access_level": "private" + }, + { + "index": "file_private", + "type": "file", + "tier_access_level": "private" + }, + { + "index": "studies_open", + "type": "studies_open", + "tier_access_level": "libre" + }, + { + "index": "studies_controlled_access", + "type": "studies_controlled_access", + "tier_access_level": "private" + } + ] +} diff --git a/src/server/__tests__/testConfigFiles/test-invalid-index-scoped-tier-access.json b/src/server/__tests__/testConfigFiles/test-invalid-index-scoped-tier-access.json new file mode 100644 index 00000000..a8b5ee94 --- /dev/null +++ b/src/server/__tests__/testConfigFiles/test-invalid-index-scoped-tier-access.json @@ -0,0 +1,14 @@ +{ + "indices": [ + { + "index": "subject_regular", + "type": "subject", + "tier_access_level": "regular" + }, + { + "index": "subject_private", + "type": "subject_private", + "tier_access_level": "private____typo" + } + ] +} diff --git a/src/server/config.js b/src/server/config.js index 6f65dd5e..894e2a10 100644 --- a/src/server/config.js +++ b/src/server/config.js @@ -26,7 +26,6 @@ const config = { aggregationIncludeMissingData: typeof inputConfig.aggs_include_missing_data === 'undefined' ? true : inputConfig.aggs_include_missing_data, missingDataAlias: inputConfig.missing_data_alias || 'no data', }, - port: 80, path: '/graphql', arboristEndpoint: 'http://arborist-service', @@ -56,6 +55,15 @@ if (process.env.GUPPY_PORT) { config.port = process.env.GUPPY_PORT; } +const allowedTierAccessLevels = ['private', 'regular', 'libre']; + +if (process.env.TIER_ACCESS_LEVEL) { + if (!allowedTierAccessLevels.includes(process.env.TIER_ACCESS_LEVEL)) { + throw new Error(`Invalid TIER_ACCESS_LEVEL "${process.env.TIER_ACCESS_LEVEL}"`); + } + config.tierAccessLevel = process.env.TIER_ACCESS_LEVEL; +} + if (process.env.TIER_ACCESS_LIMIT) { config.tierAccessLimit = process.env.TIER_ACCESS_LIMIT; } @@ -72,14 +80,26 @@ if (process.env.ANALYZED_TEXT_FIELD_SUFFIX) { config.analyzedTextFieldSuffix = process.env.ANALYZED_TEXT_FIELD_SUFFIX; } -// only three options for tier access level: 'private' (default), 'regular', and 'libre' -if (process.env.TIER_ACCESS_LEVEL) { - if (process.env.TIER_ACCESS_LEVEL !== 'private' - && process.env.TIER_ACCESS_LEVEL !== 'regular' - && process.env.TIER_ACCESS_LEVEL !== 'libre') { - throw new Error(`Invalid TIER_ACCESS_LEVEL "${process.env.TIER_ACCESS_LEVEL}"`); +// Either all indices should have explicit index-scoped tiered-access values or +// the manifest should have a site-wide TIER_ACCESS_LEVEL value. +// This approach is backwards-compatible with commons configured for past versions of tiered-access. +let allIndicesHaveTierAccessSettings = true; +config.esConfig.indices.forEach((item) => { + if (!item.tier_access_level && !config.tierAccessLevel) { + throw new Error('Either set all index-scoped tiered-access levels or a site-wide tiered-access level.'); } - config.tierAccessLevel = process.env.TIER_ACCESS_LEVEL; + if (item.tier_access_level && !allowedTierAccessLevels.includes(item.tier_access_level)) { + throw new Error(`tier_access_level invalid for index ${item.type}.`); + } + if (!item.tier_access_level) { + allIndicesHaveTierAccessSettings = false; + } +}); + +// If the indices all have settings, empty out the default +// site-wide TIER_ACCESS_LEVEL from the config. +if (allIndicesHaveTierAccessSettings) { + delete config.tierAccessLevel; } // check whitelist is enabled diff --git a/src/server/download.js b/src/server/download.js index 981c1ca3..b4816ad3 100644 --- a/src/server/download.js +++ b/src/server/download.js @@ -11,21 +11,23 @@ const downloadRouter = async (req, res, next) => { } = req.body; log.debug('[download] ', JSON.stringify(req.body, null, 4)); - const esIndex = esInstance.getESIndexByType(type); + const esIndexConfig = esInstance.getESIndexConfigByType(type); + const tierAccessLevel = (config.tierAccessLevel + ? config.tierAccessLevel : esIndexConfig.tier_access_level); const jwt = headerParser.parseJWT(req); const authHelper = await getAuthHelperInstance(jwt); try { let appliedFilter; /** - * Tier acces strategy for download endpoint: - * 1. if data commons is secure, add auth filter layer onto filter - * 2. if data commons is regular: + * Tier access strategy for download endpoint: + * 1. if the data commons or the index is private, add auth filter layer onto filter + * 2. if the data commons or the index is regular: * a. if request contains out-of-access resource, return 401 * b. if request contains only accessible resouces, return response - * 3. if data commons is private, always return reponse without any auth check + * 3. if the data commons or the index is libre, always return reponse without any auth check */ - switch (config.tierAccessLevel) { + switch (tierAccessLevel) { case 'private': { appliedFilter = authHelper.applyAccessibleFilter(filter); break; @@ -36,7 +38,7 @@ const downloadRouter = async (req, res, next) => { appliedFilter = authHelper.applyAccessibleFilter(filter); } else { const outOfScopeResourceList = await authHelper.getOutOfScopeResourceList( - esIndex, type, filter, + esIndexConfig.index, type, filter, ); // if requesting resources > allowed resources, return 401, if (outOfScopeResourceList.length > 0) { @@ -54,13 +56,14 @@ const downloadRouter = async (req, res, next) => { break; } default: - throw new Error(`Invalid TIER_ACCESS_LEVEL "${config.tierAccessLevel}"`); + throw new Error(`Invalid TIER_ACCESS_LEVEL "${tierAccessLevel}"`); } const data = await esInstance.downloadData({ - esIndex, esType: type, filter: appliedFilter, sort, fields, + esIndex: esIndexConfig.index, esType: type, filter: appliedFilter, sort, fields, }); res.send(data); } catch (err) { + log.error(err); next(err); } return 0; diff --git a/src/server/es/index.js b/src/server/es/index.js index a10019d7..54625998 100644 --- a/src/server/es/index.js +++ b/src/server/es/index.js @@ -318,6 +318,34 @@ class ES { ); } + /** + * Get es indexConfig by es type + * Throw 400 error if there's no existing es type + * @param {string} esType + */ + getESIndexConfigByType(esType) { + const index = this.config.indices.find((i) => i.type === esType); + if (index) return index; + throw new CodedError( + 400, + `Invalid es type: "${esType}"`, + ); + } + + /** + * Get es index config by es index name + * Throw 400 error if there's no existing es index of that name + * @param {string} esIndexName + */ + getESIndexConfigByName(esIndexName) { + const indexConfig = this.config.indices.find((i) => i.index === esIndexName); + if (indexConfig) return indexConfig; + throw new CodedError( + 400, + `Invalid es index name: "${esIndexName}"`, + ); + } + /** * Get all es indices and their alias */ diff --git a/src/server/middlewares/authMiddleware/index.js b/src/server/middlewares/authMiddleware/index.js index a1c5fa70..10f4aeaf 100644 --- a/src/server/middlewares/authMiddleware/index.js +++ b/src/server/middlewares/authMiddleware/index.js @@ -1,31 +1,5 @@ -import assert from 'assert'; -import log from '../../logger'; import config from '../../config'; - -const authMWResolver = async (resolve, root, args, context, info) => { - assert(config.tierAccessLevel === 'private', 'Auth middleware layer only for "private" tier access level'); - const { authHelper } = context; - - // if mock arborist endpoint, just skip auth middleware - if (!config.internalLocalTest) { - if (config.arboristEndpoint === 'mock') { - log.debug('[authMiddleware] using mock arborist endpoint, skip auth middleware'); - return resolve(root, args, context, info); - } - } - - // asking arborist for auth resource list, and add to filter args - const parsedFilter = args.filter; - const appliedFilter = await authHelper.applyAccessibleFilter(parsedFilter); - const newArgs = { - ...args, - filter: appliedFilter, - }; - if (typeof newArgs.filter === 'undefined') { - delete newArgs.filter; - } - return resolve(root, newArgs, context, info); -}; +import authMWResolver from './resolvers'; // apply this middleware to all es types' data/aggregation resolvers const typeMapping = config.esConfig.indices.reduce((acc, item) => { diff --git a/src/server/middlewares/authMiddleware/resolvers.js b/src/server/middlewares/authMiddleware/resolvers.js new file mode 100644 index 00000000..c3bc9819 --- /dev/null +++ b/src/server/middlewares/authMiddleware/resolvers.js @@ -0,0 +1,28 @@ +import log from '../../logger'; +import config from '../../config'; + +const authMWResolver = async (resolve, root, args, context, info) => { + const { authHelper } = context; + + // if mock arborist endpoint, just skip auth middleware + if (!config.internalLocalTest) { + if (config.arboristEndpoint === 'mock') { + log.debug('[authMiddleware] using mock arborist endpoint, skip auth middleware'); + return resolve(root, args, context, info); + } + } + + // asking arborist for auth resource list, and add to filter args + const parsedFilter = args.filter; + const appliedFilter = await authHelper.applyAccessibleFilter(parsedFilter); + const newArgs = { + ...args, + filter: appliedFilter, + }; + if (typeof newArgs.filter === 'undefined') { + delete newArgs.filter; + } + return resolve(root, newArgs, context, info); +}; + +export default authMWResolver; diff --git a/src/server/middlewares/index.js b/src/server/middlewares/index.js index eaae74ca..4f56d526 100644 --- a/src/server/middlewares/index.js +++ b/src/server/middlewares/index.js @@ -1,18 +1,28 @@ import authMiddleware from './authMiddleware'; import tierAccessMiddleware from './tierAccessMiddleware'; +import perIndexTierAccessMiddleware from './perIndexTierAccessMiddleware'; import config from '../config'; +import log from '../logger'; const middlewares = []; + +// If a universal tierAccessLevel has not been applied in the manifest, +// we apply ES-index-specific tiered access settings. switch (config.tierAccessLevel) { case 'libre': + log.info('[Server] applying libre middleware across indices.'); break; case 'regular': + log.info('[Server] applying regular middleware across indices.'); middlewares.push(tierAccessMiddleware); break; case 'private': + log.info('[Server] applying private middleware across indices.'); middlewares.push(authMiddleware); break; default: - throw new Error(`Invalid tier access level ${config.tierAccessLevel}`); + log.info('[Server] applying index-scoped middleware.'); + middlewares.push(perIndexTierAccessMiddleware); + break; } export default middlewares; diff --git a/src/server/middlewares/perIndexTierAccessMiddleware/index.js b/src/server/middlewares/perIndexTierAccessMiddleware/index.js new file mode 100644 index 00000000..d39d92c8 --- /dev/null +++ b/src/server/middlewares/perIndexTierAccessMiddleware/index.js @@ -0,0 +1,51 @@ +import config from '../../config'; +import { firstLetterUpperCase } from '../../utils/utils'; +import authMWResolver from '../authMiddleware/resolvers'; +import { tierAccessResolver, hideNumberResolver } from '../tierAccessMiddleware/resolvers'; + +const queryTypeMapping = {}; +const aggsTypeMapping = {}; +const totalCountTypeMapping = {}; +let atLeastOneIndexIsRegularAccess = false; + +config.esConfig.indices.forEach((item) => { + if (item.tier_access_level === 'private') { + queryTypeMapping[item.type] = authMWResolver; + aggsTypeMapping[item.type] = authMWResolver; + } else if (item.tier_access_level === 'regular') { + atLeastOneIndexIsRegularAccess = true; + queryTypeMapping[item.type] = tierAccessResolver({ + isRawDataQuery: true, + esType: item.type, + esIndex: item.index, + }); + aggsTypeMapping[item.type] = tierAccessResolver({ esType: item.type, esIndex: item.index }); + const aggregationName = `${firstLetterUpperCase(item.type)}Aggregation`; + totalCountTypeMapping[aggregationName] = { + _totalCount: hideNumberResolver(true), + }; + } + // No additional resolvers necessary for tier_access_level == 'libre' +}, {}); + +const perIndexTierAccessMiddleware = { + Query: { + ...queryTypeMapping, + }, + Aggregation: { + ...aggsTypeMapping, + }, + ...totalCountTypeMapping, +}; + +if (atLeastOneIndexIsRegularAccess) { + perIndexTierAccessMiddleware.RegularAccessHistogramForNumber = { + histogram: hideNumberResolver(false), + }; + + perIndexTierAccessMiddleware.RegularAccessHistogramForString = { + histogram: hideNumberResolver(false), + }; +} + +export default perIndexTierAccessMiddleware; diff --git a/src/server/middlewares/tierAccessMiddleware/index.js b/src/server/middlewares/tierAccessMiddleware/index.js index f13fa066..86212f30 100644 --- a/src/server/middlewares/tierAccessMiddleware/index.js +++ b/src/server/middlewares/tierAccessMiddleware/index.js @@ -1,218 +1,6 @@ -import _ from 'lodash'; -import assert from 'assert'; -import { ApolloError, UserInputError } from 'apollo-server'; -import log from '../../logger'; import config from '../../config'; -import esInstance from '../../es/index'; -import CodedError from '../../utils/error'; -import { firstLetterUpperCase, isWhitelisted, addTwoFilters } from '../../utils/utils'; - -const ENCRYPT_COUNT = -1; - -const resolverWithAccessibleFilterApplied = ( - resolve, root, args, context, info, authHelper, filter, -) => { - const appliedFilter = authHelper.applyAccessibleFilter(filter); - const newArgs = { - ...args, - filter: appliedFilter, - needEncryptAgg: false, - }; - return resolve(root, newArgs, context, info); -}; - -const resolverWithUnaccessibleFilterApplied = ( - resolve, root, args, context, info, authHelper, filter, -) => { - const appliedFilter = authHelper.applyUnaccessibleFilter(filter); - const newArgs = { - ...args, - filter: appliedFilter, - needEncryptAgg: true, - }; - return resolve(root, newArgs, context, info); -}; - -const tierAccessResolver = ( - { - isRawDataQuery, - esType, - }, -) => async (resolve, root, args, context, info) => { - try { - assert(config.tierAccessLevel === 'regular', 'Tier access middleware layer only for "regular" tier access level'); - const { authHelper } = context; - const esIndex = esInstance.getESIndexByType(esType); - const { filter, filterSelf, accessibility } = args; - - const outOfScopeResourceList = await authHelper.getOutOfScopeResourceList( - esIndex, esType, filter, filterSelf, - ); - // if requesting resources is within allowed resources, return result - if (outOfScopeResourceList.length === 0) { - // unless it's requesting for `unaccessible` data, just resolve this - switch (accessibility) { - case 'accessible': - return resolve(root, { ...args, needEncryptAgg: false }, context, info); - case 'unaccessible': - return resolverWithUnaccessibleFilterApplied( - resolve, root, args, context, info, authHelper, filter, - ); - default: - return resolve(root, { ...args, needEncryptAgg: true }, context, info); - } - } - // else, check if it's raw data query or aggs query - if (isRawDataQuery) { // raw data query for out-of-scope resources are forbidden - if (accessibility === 'accessible') { - return resolverWithAccessibleFilterApplied( - resolve, root, args, context, info, authHelper, filter, - ); - } - log.info('[tierAccessResolver] requesting out-of-scope resources, return 401'); - log.info(`[tierAccessResolver] the following resources are out-of-scope: [${outOfScopeResourceList.join(', ')}]`); - throw new ApolloError('You don\'t have access to all the data you are querying. Try using \'accessibility: accessible\' in your query', 401); - } - - /** - * Here we have a bypass for `regular`-tier-access-leveled commons: - * `accessibility` has 3 options: `all`, `accessible`, and `unaccessible`. - * For `all`, behavior is the same as usual - * For `accessible`, we will apply auth filter on top of filter argument - * For `unaccessible`, we apply unaccessible filters on top of filter argument - */ - const sensitiveRecordExclusionEnabled = !!config.tierAccessSensitiveRecordExclusionField; - if (accessibility === 'all') { - if (sensitiveRecordExclusionEnabled) { - // Sensitive study exclusion is enabled: For all of the projects user does - // not have access to, hide the studies marked 'sensitive' from the aggregation. - // (See doc/queries.md#Tiered_Access_sensitive_record_exclusion) - const projectsUserHasAccessTo = authHelper.getAccessibleResources(); - const sensitiveStudiesFilter = { - OR: [ - { - IN: { - [config.esConfig.authFilterField]: projectsUserHasAccessTo, - }, - }, - { - '!=': { - [config.tierAccessSensitiveRecordExclusionField]: 'true', - }, - }, - ], - }; - return resolve( - root, - { - ...args, - filter: addTwoFilters(filter, sensitiveStudiesFilter), - needEncryptAgg: true, - }, - context, - info, - ); - } - - return resolve( - root, - { - ...args, - filter, - needEncryptAgg: true, - }, - context, - info, - ); - } - if (accessibility === 'accessible') { - // We do not need to apply sensitive studies filter here, because - // user has access to all of these projects. - log.debug('[tierAccessResolver] applying "accessible" to resolver'); - return resolverWithAccessibleFilterApplied( - resolve, root, args, context, info, authHelper, filter, - ); - } - // The below code executes if accessibility === 'unaccessible'. - if (sensitiveRecordExclusionEnabled) { - // Apply sensitive studies filter. Hide the studies marked 'sensitive' from - // the aggregation. - const sensitiveStudiesFilter = { - '!=': { - [config.tierAccessSensitiveRecordExclusionField]: 'true', - }, - }; - return resolverWithUnaccessibleFilterApplied( - resolve, - root, - args, - context, - info, - authHelper, - addTwoFilters(filter, sensitiveStudiesFilter), - ); - } - return resolverWithUnaccessibleFilterApplied( - resolve, root, args, context, info, authHelper, filter, - ); - } catch (err) { - if (err instanceof ApolloError) { - if (err.extensions.code >= 500) { - console.trace(err); // eslint-disable-line no-console - } - } else if (err instanceof CodedError) { - if (err.code >= 500) { - console.trace(err); // eslint-disable-line no-console - } - } else if (!(err instanceof UserInputError)) { - console.trace(err); // eslint-disable-line no-console - } - throw err; - } -}; - -/** - * This resolver middleware is appended after aggregation resolvers, - * it hide number that is less than allowed visible number for regular tier access - * @param {bool} isGettingTotalCount - */ -const hideNumberResolver = (isGettingTotalCount) => async (resolve, root, args, context, info) => { - // for aggregations, hide all counts that are greater than limited number - const { needEncryptAgg } = root; - const result = await resolve(root, args, context, info); - log.debug('[hideNumberResolver] result: ', result); - if (!needEncryptAgg) return result; - - const newRoot = root; - newRoot.accessibility = 'unaccessible'; - const { authHelper } = context; - newRoot.filter = authHelper.applyUnaccessibleFilter(newRoot.filter); - const unaccessibleResult = await resolve(newRoot, args, context, info); - log.debug('[hideNumberResolver] unaccessibleResult: ', unaccessibleResult); - - // if getting total count, only encrypt if unaccessibleResult is between (0, tierAccessLimit) - if (isGettingTotalCount) { - return (unaccessibleResult > 0 - && unaccessibleResult < config.tierAccessLimit) ? ENCRYPT_COUNT : result; - } - - const encryptedResult = result.map((item) => { - // we don't encrypt whitelisted results or if result is not found in unaccessibleResult - if (isWhitelisted(item.key) || !(unaccessibleResult.some((e) => e.key === item.key))) { - return item; - } - // we only encrypt if count from no-access item is small - const unaccessibleResultItem = _.find(unaccessibleResult, (e) => e.key === item.key); - if (unaccessibleResultItem.count < config.tierAccessLimit) { - return { - key: item.key, - count: ENCRYPT_COUNT, - }; - } - return item; - }); - return encryptedResult; -}; +import { firstLetterUpperCase } from '../../utils/utils'; +import { tierAccessResolver, hideNumberResolver } from './resolvers'; // apply this middleware to all es types' data/aggregation resolvers const queryTypeMapping = {}; @@ -222,8 +10,9 @@ config.esConfig.indices.forEach((item) => { queryTypeMapping[item.type] = tierAccessResolver({ isRawDataQuery: true, esType: item.type, + esIndex: item.index, }); - aggsTypeMapping[item.type] = tierAccessResolver({ esType: item.type }); + aggsTypeMapping[item.type] = tierAccessResolver({ esType: item.type, esIndex: item.index }); const aggregationName = `${firstLetterUpperCase(item.type)}Aggregation`; totalCountTypeMapping[aggregationName] = { _totalCount: hideNumberResolver(true), diff --git a/src/server/middlewares/tierAccessMiddleware/resolvers.js b/src/server/middlewares/tierAccessMiddleware/resolvers.js new file mode 100644 index 00000000..5d9bec34 --- /dev/null +++ b/src/server/middlewares/tierAccessMiddleware/resolvers.js @@ -0,0 +1,222 @@ +import _ from 'lodash'; +import assert from 'assert'; +import { ApolloError, UserInputError } from 'apollo-server'; +import log from '../../logger'; +import config from '../../config'; +import esInstance from '../../es/index'; +import CodedError from '../../utils/error'; +import { isWhitelisted, addTwoFilters } from '../../utils/utils'; + +const ENCRYPT_COUNT = -1; + +const resolverWithAccessibleFilterApplied = ( + resolve, root, args, context, info, authHelper, filter, +) => { + const appliedFilter = authHelper.applyAccessibleFilter(filter); + const newArgs = { + ...args, + filter: appliedFilter, + needEncryptAgg: false, + }; + return resolve(root, newArgs, context, info); +}; + +const resolverWithUnaccessibleFilterApplied = ( + resolve, root, args, context, info, authHelper, filter, +) => { + const appliedFilter = authHelper.applyUnaccessibleFilter(filter); + const newArgs = { + ...args, + filter: appliedFilter, + needEncryptAgg: true, + }; + return resolve(root, newArgs, context, info); +}; + +export const tierAccessResolver = ( + { + isRawDataQuery, + esType, + esIndex, + }, +) => async (resolve, root, args, context, info) => { + try { + // Assert that either this index is "regular" access or + // that the index has no setting and site-wide config is "regular". + const indexConfig = esInstance.getESIndexConfigByName(esIndex); + const indexIsRegularAccess = indexConfig.tier_access_level === 'regular'; + const siteIsRegularAccess = config.tierAccessLevel === 'regular'; + assert(indexIsRegularAccess || siteIsRegularAccess, 'Tier access middleware layer only for "regular" tier access level'); + + const { authHelper } = context; + const { filter, filterSelf, accessibility } = args; + + const outOfScopeResourceList = await authHelper.getOutOfScopeResourceList( + esIndex, esType, filter, filterSelf, + ); + // if requesting resources is within allowed resources, return result + if (outOfScopeResourceList.length === 0) { + // unless it's requesting for `unaccessible` data, just resolve this + switch (accessibility) { + case 'accessible': + return resolve(root, { ...args, needEncryptAgg: false }, context, info); + case 'unaccessible': + return resolverWithUnaccessibleFilterApplied( + resolve, root, args, context, info, authHelper, filter, + ); + default: + return resolve(root, { ...args, needEncryptAgg: true }, context, info); + } + } + // else, check if it's raw data query or aggs query + if (isRawDataQuery) { // raw data query for out-of-scope resources are forbidden + if (accessibility === 'accessible') { + return resolverWithAccessibleFilterApplied( + resolve, root, args, context, info, authHelper, filter, + ); + } + log.info('[tierAccessResolver] requesting out-of-scope resources, return 401'); + log.info(`[tierAccessResolver] the following resources are out-of-scope: [${outOfScopeResourceList.join(', ')}]`); + throw new ApolloError('You don\'t have access to all the data you are querying. Try using \'accessibility: accessible\' in your query', 401); + } + + /** + * Here we have a bypass for `regular`-tier-access-leveled commons: + * `accessibility` has 3 options: `all`, `accessible`, and `unaccessible`. + * For `all`, behavior is the same as usual + * For `accessible`, we will apply auth filter on top of filter argument + * For `unaccessible`, we apply unaccessible filters on top of filter argument + */ + const sensitiveRecordExclusionEnabled = !!config.tierAccessSensitiveRecordExclusionField; + if (accessibility === 'all') { + if (sensitiveRecordExclusionEnabled) { + // Sensitive study exclusion is enabled: For all of the projects user does + // not have access to, hide the studies marked 'sensitive' from the aggregation. + // (See doc/queries.md#Tiered_Access_sensitive_record_exclusion) + const projectsUserHasAccessTo = authHelper.getAccessibleResources(); + const sensitiveStudiesFilter = { + OR: [ + { + IN: { + [config.esConfig.authFilterField]: projectsUserHasAccessTo, + }, + }, + { + '!=': { + [config.tierAccessSensitiveRecordExclusionField]: 'true', + }, + }, + ], + }; + return resolve( + root, + { + ...args, + filter: addTwoFilters(filter, sensitiveStudiesFilter), + needEncryptAgg: true, + }, + context, + info, + ); + } + + return resolve( + root, + { + ...args, + filter, + needEncryptAgg: true, + }, + context, + info, + ); + } + if (accessibility === 'accessible') { + // We do not need to apply sensitive studies filter here, because + // user has access to all of these projects. + log.debug('[tierAccessResolver] applying "accessible" to resolver'); + return resolverWithAccessibleFilterApplied( + resolve, root, args, context, info, authHelper, filter, + ); + } + // The below code executes if accessibility === 'unaccessible'. + if (sensitiveRecordExclusionEnabled) { + // Apply sensitive studies filter. Hide the studies marked 'sensitive' from + // the aggregation. + const sensitiveStudiesFilter = { + '!=': { + [config.tierAccessSensitiveRecordExclusionField]: 'true', + }, + }; + return resolverWithUnaccessibleFilterApplied( + resolve, + root, + args, + context, + info, + authHelper, + addTwoFilters(filter, sensitiveStudiesFilter), + ); + } + return resolverWithUnaccessibleFilterApplied( + resolve, root, args, context, info, authHelper, filter, + ); + } catch (err) { + if (err instanceof ApolloError) { + if (err.extensions.code >= 500) { + console.trace(err); // eslint-disable-line no-console + } + } else if (err instanceof CodedError) { + if (err.code >= 500) { + console.trace(err); // eslint-disable-line no-console + } + } else if (!(err instanceof UserInputError)) { + console.trace(err); // eslint-disable-line no-console + } + throw err; + } +}; + +/** + * This resolver middleware is appended after aggregation resolvers, + * it hide number that is less than allowed visible number for regular tier access + * @param {bool} isGettingTotalCount + */ +export const hideNumberResolver = (isGettingTotalCount) => async ( + resolve, root, args, context, info) => { + // for aggregations, hide all counts that are greater than limited number + const { needEncryptAgg } = root; + const result = await resolve(root, args, context, info); + log.debug('[hideNumberResolver] result: ', result); + if (!needEncryptAgg) return result; + + const newRoot = root; + newRoot.accessibility = 'unaccessible'; + const { authHelper } = context; + newRoot.filter = authHelper.applyUnaccessibleFilter(newRoot.filter); + const unaccessibleResult = await resolve(newRoot, args, context, info); + log.debug('[hideNumberResolver] unaccessibleResult: ', unaccessibleResult); + + // if getting total count, only encrypt if unaccessibleResult is between (0, tierAccessLimit) + if (isGettingTotalCount) { + return (unaccessibleResult > 0 + && unaccessibleResult < config.tierAccessLimit) ? ENCRYPT_COUNT : result; + } + + const encryptedResult = result.map((item) => { + // we don't encrypt whitelisted results or if result is not found in unaccessibleResult + if (isWhitelisted(item.key) || !(unaccessibleResult.some((e) => e.key === item.key))) { + return item; + } + // we only encrypt if count from no-access item is small + const unaccessibleResultItem = _.find(unaccessibleResult, (e) => e.key === item.key); + if (unaccessibleResultItem.count < config.tierAccessLimit) { + return { + key: item.key, + count: ENCRYPT_COUNT, + }; + } + return item; + }); + return encryptedResult; +}; diff --git a/src/server/resolvers.js b/src/server/resolvers.js index 19579582..e9d2be4b 100644 --- a/src/server/resolvers.js +++ b/src/server/resolvers.js @@ -265,6 +265,13 @@ const getResolver = (esConfig, esInstance) => { HistogramForString: { histogram: textHistogramResolver, }, + RegularAccessHistogramForNumber: { + histogram: numericHistogramResolver, + asTextHistogram: textHistogramResolver, + }, + RegularAccessHistogramForString: { + histogram: textHistogramResolver, + }, Mapping: { ...mappingResolvers, }, diff --git a/src/server/schema.js b/src/server/schema.js index 3cdc2ea6..b39e7473 100644 --- a/src/server/schema.js +++ b/src/server/schema.js @@ -17,6 +17,8 @@ const esgqlTypeMapping = { nested: 'Object', }; +const histogramTypePrefix = 'RegularAccess'; + const getGQLType = (esInstance, esIndex, field, esFieldType) => { const gqlType = esgqlTypeMapping[esFieldType]; if (!gqlType) { @@ -146,12 +148,15 @@ const getAggregationType = (entry) => { return ''; }; -const getAggregationSchemaForOneIndex = (esInstance, esIndex, esType) => { +const getAggregationSchemaForOneIndex = (esInstance, esConfigElement) => { + const esIndex = esConfigElement.index; + const esType = esConfigElement.type; + const includeHistogramPrefix = Object.prototype.hasOwnProperty.call(esConfigElement, 'tier_access_level') && esConfigElement.tier_access_level === 'regular'; const esTypeObjName = firstLetterUpperCase(esType); const fieldGQLTypeMap = getFieldGQLTypeMapForOneIndex(esInstance, esIndex); const fieldAggsTypeMap = fieldGQLTypeMap.filter((f) => f.esType !== 'nested').map((entry) => ({ field: entry.field, - aggType: getAggsHistogramName(entry.type), + aggType: (includeHistogramPrefix ? histogramTypePrefix : '') + getAggsHistogramName(entry.type), })); const fieldAggsNestedTypeMap = fieldGQLTypeMap.filter((f) => f.esType === 'nested'); return `type ${esTypeObjName}Aggregation { @@ -188,10 +193,11 @@ export const getAggregationSchema = (esConfig) => ` * Multi-level nested fields are "flattened" level by level. * For each level of nested field a new type in schema is created. */ -const getAggregationSchemaForOneNestedIndex = (esInstance, esIndex) => { +const getAggregationSchemaForOneNestedIndex = (esInstance, esDict) => { + const esIndex = esDict.index; const fieldGQLTypeMap = getFieldGQLTypeMapForOneIndex(esInstance, esIndex); const fieldAggsNestedTypeMap = fieldGQLTypeMap.filter((f) => f.esType === 'nested'); - + const includeHistogramPrefix = Object.prototype.hasOwnProperty.call(esDict, 'tier_access_level') && esDict.tier_access_level === 'regular'; let AggsNestedTypeSchema = ''; while (fieldAggsNestedTypeMap.length > 0) { const entry = fieldAggsNestedTypeMap.shift(); @@ -207,7 +213,7 @@ const getAggregationSchemaForOneNestedIndex = (esInstance, esIndex) => { ${propsKey}: NestedHistogramFor${firstLetterUpperCase(propsKey)}`; } return ` - ${propsKey}: ${getAggsHistogramName(esgqlTypeMapping[entryType])}`; + ${propsKey}: ${(includeHistogramPrefix ? histogramTypePrefix : '') + getAggsHistogramName(esgqlTypeMapping[entryType])}`; })} }`; } @@ -216,9 +222,27 @@ const getAggregationSchemaForOneNestedIndex = (esInstance, esIndex) => { return AggsNestedTypeSchema; }; -export const getAggregationSchemaForEachType = (esConfig, esInstance) => esConfig.indices.map((cfg) => getAggregationSchemaForOneIndex(esInstance, cfg.index, cfg.type)).join('\n'); +export const getAggregationSchemaForEachType = (esConfig, esInstance) => esConfig.indices.map((cfg) => getAggregationSchemaForOneIndex(esInstance, cfg)).join('\n'); + +export const getAggregationSchemaForEachNestedType = (esConfig, esInstance) => esConfig.indices.map((cfg) => getAggregationSchemaForOneNestedIndex(esInstance, cfg)).join('\n'); + +const getNumberHistogramSchema = (isRegularAccess) => ` + type ${(isRegularAccess ? histogramTypePrefix : '') + EnumAggsHistogramName.HISTOGRAM_FOR_NUMBER} { + histogram( + rangeStart: Int, + rangeEnd: Int, + rangeStep: Int, + binCount: Int, + ): [BucketsForNestedNumberAgg], + asTextHistogram: [BucketsForNestedStringAgg] + } + `; -export const getAggregationSchemaForEachNestedType = (esConfig, esInstance) => esConfig.indices.map((cfg) => getAggregationSchemaForOneNestedIndex(esInstance, cfg.index)).join('\n'); +const getTextHistogramSchema = (isRegularAccess) => ` + type ${(isRegularAccess ? histogramTypePrefix : '') + EnumAggsHistogramName.HISTOGRAM_FOR_STRING} { + histogram: [BucketsForNestedStringAgg] + } + `; export const getMappingSchema = (esConfig) => ` type Mapping { @@ -228,6 +252,20 @@ export const getMappingSchema = (esConfig) => ` } `; +export const getHistogramSchemas = () => { + const textHistogramSchema = getTextHistogramSchema(false); + + const regularAccessTextHistogramSchema = getTextHistogramSchema(true); + + const numberHistogramSchema = getNumberHistogramSchema(false); + + const regularAccessNumberHistogramSchema = getNumberHistogramSchema(true); + + const histogramSchemas = [textHistogramSchema, regularAccessTextHistogramSchema, numberHistogramSchema, regularAccessNumberHistogramSchema].join('\n'); + + return histogramSchemas; +}; + export const buildSchemaString = (esConfig, esInstance) => { const querySchema = getQuerySchema(esConfig); @@ -263,11 +301,7 @@ export const buildSchemaString = (esConfig, esInstance) => { const aggregationSchemasForEachNestedType = getAggregationSchemaForEachNestedType(esConfig, esInstance); - const textHistogramSchema = ` - type ${EnumAggsHistogramName.HISTOGRAM_FOR_STRING} { - histogram: [BucketsForNestedStringAgg] - } - `; + const histogramSchemas = getHistogramSchemas(); const textHistogramBucketSchema = ` type BucketsForNestedStringAgg { @@ -278,6 +312,20 @@ export const buildSchemaString = (esConfig, esInstance) => { } `; + const numberHistogramBucketSchema = ` + type BucketsForNestedNumberAgg { + """Lower and higher bounds for this bucket""" + key: [Float] + min: Float + max: Float + avg: Float + sum: Float + count: Int + missingFields: [BucketsForNestedMissingFields] + termsFields: [BucketsForNestedTermsFields] + } + `; + const nestedMissingFieldsBucketSchema = ` type BucketsForNestedMissingFields { field: String @@ -299,32 +347,6 @@ export const buildSchemaString = (esConfig, esInstance) => { } `; - const numberHistogramSchema = ` - type ${EnumAggsHistogramName.HISTOGRAM_FOR_NUMBER} { - histogram( - rangeStart: Int, - rangeEnd: Int, - rangeStep: Int, - binCount: Int, - ): [BucketsForNestedNumberAgg], - asTextHistogram: [BucketsForNestedStringAgg] - } - `; - - const numberHistogramBucketSchema = ` - type BucketsForNestedNumberAgg { - """Lower and higher bounds for this bucket""" - key: [Float] - min: Float - max: Float - avg: Float - sum: Float - count: Int - missingFields: [BucketsForNestedMissingFields] - termsFields: [BucketsForNestedTermsFields] - } - `; - const mappingSchema = getMappingSchema(esConfig); const schemaStr = ` @@ -337,8 +359,7 @@ export const buildSchemaString = (esConfig, esInstance) => { ${aggregationSchema} ${aggregationSchemasForEachType} ${aggregationSchemasForEachNestedType} - ${textHistogramSchema} - ${numberHistogramSchema} + ${histogramSchemas} ${textHistogramBucketSchema} ${nestedMissingFieldsBucketSchema} ${nestedTermsFieldsBucketSchema}