-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
It was found that although Brandwatch claims the "Expanded URLs" column contains all URLs, there is a chance that it doesn't contain some URLs found in Content. Also, there are a bunch of other columns that may contain valid URLs or references to websites. Therefore, the suggestion here is to check every possible column. This task maybe expanded for all data sources, including Reddit and 4chan.
Requirement:
- Detect all URLs
- Expand all URLs
- Detect all unique URLs
- Put them in "article_urls"
Subtask list:
- Detect all URLs and add them to the
search_article_urlscolumn- Brandwatch
- Search Brandwatch URL columns,
Titlecolumn, andFull Textcolumn for URLs and put them in the "search_article_urls"
- Search Brandwatch URL columns,
- 4Chan
- Search in 4chan for URLs in the required columns and put them in the
search_article_urlscolumn - Do full search in 4chan for the
search_article_urlscolumn
- Search in 4chan for URLs in the required columns and put them in the
- Reddit
- Search in Reddit for URLs in the required columns and put them in the
search_article_urlscolumn - Do full search in Reddit for the
search_article_urlscolumn - For RedditComments use it's respective parent RedditSubmission post for URLs? (do we do the same for twitter then?)
- Search in Reddit for URLs in the required columns and put them in the
- Brandwatch
- Expand all URLs
- Detect all unique URLs
- Put them in "article_urls"
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request