This repository was archived by the owner on Sep 7, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
This repository was archived by the owner on Sep 7, 2023. It is now read-only.
What is the data lifecycle ? #2052
Copy link
Copy link
Open
Labels
Description
Maybe I'm overthinking.
Which data ?
- DOI
- https://github.com/asciimoo/searx/blob/52eba0c7210c53230fb574176d4bf1af771bf0b4/searx/settings.yml#L905-L911
- Update script: No
- Data source: various, https://www.wikidata.org/wiki/Q21980377 (the "official website" property) may help.
- When it should be updated ? check every day (?)
- When it is updated: never.
- Is it a problem to not update ? ❌ outdated URL, disappointing user experience.
- Is it a problem to update ? No.
searx/data/bangs.json:- Update script: none (can be updated automatically and then manually with some clean up)
- Data source: jivesearch (useful sources are DuckDuckGo bangs and Wikidata)
- When it should be updated ? require manual checking / perhaps automatic checking (automatically extract the opensearch.xml, check the URL, etc...). Related to my out of topic comment reduce the number of external bangs #2045 (comment)
- When is it updated ? N/A
- Is it a problem to not update ? ❌ outdated URL, disappointing user experience.
- Is it a problem to update ? No.
searx/data/currencies.json:- Update script: https://github.com/asciimoo/searx/blob/master/utils/fetch_currencies.py
- Data source: https://github.com/asciimoo/searx/pull/993/files
- When it should be updated: ?? check every month / year.
- When it is updated: never
- Is it a problem to not update ? ❔ should not be a problem.
- Is it a problem to update ? No.
searx/data/useragents.json:- Update script: https://github.com/asciimoo/searx/blob/master/utils/fetch_firefox_version.py
- Data source: is https://ftp.mozilla.org/pub/firefox/releases/
- When it should be updated: as soon there is a new Firefox version, but engines compatibility must be check before.
- When it is updated: sometimes.
- Is it a problem to not update ? ❔ An old Firefox version may be a problem with some engines.
- Is it a problem to update ? Some engine may stop working.
searx/data/engines_languages.json:- Update script: https://github.com/asciimoo/searx/blob/master/utils/fetch_languages.py
- Data source: the source is the results of the
fetch_supported_languages/_fetch_supported_languagesfunctions. - When it should be updated: ?? check every week / month / year ??
- When it is updated: sometimes when an engine is updated.
- Is it a problem to not update ? ❔ I don't know. Most probably it doesn't change too much.
- Is it a problem to update ? If the
fetch_supported_languagesfunction doesn't match the actual website, the update result may be worse.
searx/engines- It is code but it is related to
searx/data/engines_languages.jsonand the life cycle is different from the core. - related to Embedded searx-checker #1559
- It is code but it is related to
- certifi package
- https://github.com/asciimoo/searx/blob/52eba0c7210c53230fb574176d4bf1af771bf0b4/requirements.txt#L1
- No update script
- When it should be updated: as soon there is a new version (is there a reason not to updated?)
- When it is updated: rarely.
- Is it a problem to not update ? ❔ for a security point of view, it would be better to update.
- Is it a problem to update ? No.
- HSTS preload package (if
httpxreplacesrequests)- When it should be updated? as soon there a new version.
- Is it a problem to not update ? ✔️ No, since engines use the https protocols (it can be safety net).
- Is it a problem to update ? No.
Data and searx installation
After the data are updated in the git repository, once the searx get clone / install, the data remain the same as long searx is not updated.
How to update the data more often ?
- do nothing, keep the same process.
- keep the data in the searx git repository, add
make data.updateto update everything.- When to call it ?
- manually : same problem as now.
- cron in travis / github action : the script can create a PR.
- When to call it ?
- create a different package
searx-data, automatically updated. It requires trust in this process.
return42 and MarcAbonce