Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Aug 29, 2023

Branch python3 refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the python3 branch, then run:

git fetch origin sourcery/python3
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from elsiehupp August 29, 2023 11:14
Copy link
Author

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to GitHub API limits, only the first 60 comments can be shown.

except URLError as reason: # https://docs.python.org/3/library/urllib.error.html
if reason.isinstance(HTTPError):
print(api + "is dead or has errors because:")
print(f"{api}is dead or has errors because:")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function checkcore refactored with the following changes:

apis = []
for api in open("wikistocheck.txt").read().strip().splitlines():
if not api in apis:
if api not in apis:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 93-93 refactored with the following changes:

  • Simplify logical expression using De Morgan identities (de-morgan)

Comment on lines -40 to +43
req = requests.get("https://community.fandom.com%s" % lvl3)
req = requests.get(f"https://community.fandom.com{lvl3}")
if req.status_code != 200:
time.sleep(5)
req = requests.get("https://community.fandom.com%s" % lvl3)
req = requests.get(f"https://community.fandom.com{lvl3}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:


wikis = list(set(wikis))
wikis.sort()
wikis = sorted(set(wikis))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

m = [w.replace("http://", "https://") + "/w/api.php" for w in m]
m = list(set(m))
m.sort()
m = sorted(set(m))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

Comment on lines -37 to +40
f = open("wikis.txt")
for x in f.read().strip().splitlines():
wikiname = x.split(",")[0]
numusers = x.split(",")[1]
wikis[wikiname] = numusers
f.close()
with open("wikis.txt") as f:
for x in f.read().strip().splitlines():
wikiname = x.split(",")[0]
numusers = x.split(",")[1]
wikis[wikiname] = numusers
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function loadWikis refactored with the following changes:

Comment on lines -47 to +49
f = open("users.txt", "w")
output = [f"{x},{y}" for x, y in users.items()]
output.sort()
output = "\n".join(output)
f.write(str(output))
f.close()
with open("users.txt", "w") as f:
output = [f"{x},{y}" for x, y in users.items()]
output.sort()
output = "\n".join(output)
f.write(output)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function saveUsers refactored with the following changes:

Comment on lines -56 to +57
f = open("wikis.txt", "w")
output = [f"{x},{y}" for x, y in wikis.items()]
output.sort()
output = "\n".join(output)
f.write(str(output))
f.close()
with open("wikis.txt", "w") as f:
output = [f"{x},{y}" for x, y in wikis.items()]
output.sort()
output = "\n".join(output)
f.write(output)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function saveWikis refactored with the following changes:

Comment on lines -65 to +67
wikiurl = (
"https://%s.wikispaces.com/wiki/members?utable=WikiTableMemberList&ut_csv=1"
% (wiki)
)
wikiurl = f"https://{wiki}.wikispaces.com/wiki/members?utable=WikiTableMemberList&ut_csv=1"
try:
wikireq = urllib.Request(wikiurl, headers={"User-Agent": "Mozilla/5.0"})
wikicsv = urllib.request.urlopen(wikireq)
reader = csv.reader(wikicsv, delimiter=",", quotechar='"')
headers = next(reader, None)
usersfound = {}
for row in reader:
usersfound[row[0]] = "?"
return usersfound
return {row[0]: "?" for row in reader}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function getUsers refactored with the following changes:

Comment on lines 84 to 85
wikiurl = "https://www.wikispaces.com/user/view/%s" % (user)
wikiurl = f"https://www.wikispaces.com/user/view/{user}"
try:
wikireq = urllib.Request(wikiurl, headers={"User-Agent": "Mozilla/5.0"})
html = urllib.request.urlopen(wikireq).read()
if "Wikis: " in html:
html = html.split("Wikis: ")[1].split("</div>")[0]
wikisfound = {}
for x in re.findall(r'<a href="https://([^>]+).wikispaces.com/">', html):
wikisfound[x] = "?"
return wikisfound
return {
x: "?"
for x in re.findall(
r'<a href="https://([^>]+).wikispaces.com/">', html
)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function getWikis refactored with the following changes:

Comment on lines -117 to +109
print("Scanning https://%s.wikispaces.com for users" % (wiki))
print(f"Scanning https://{wiki}.wikispaces.com for users")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

def download(wiki):
f = urllib.request.urlopen(
"%s/wiki/Special:Statistics" % (wiki), context=ssl_context
f"{wiki}/wiki/Special:Statistics", context=ssl_context
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function download refactored with the following changes:

Comment on lines -81 to +78
f = open("./wikiteam3/listsofwikis/mediawiki/wikia.com")
wikia = f.read().strip().split("\n")
f.close()

with open("./wikiteam3/listsofwikis/mediawiki/wikia.com") as f:
wikia = f.read().strip().split("\n")
print(len(wikia), "wikis in Wikia list")

start = "!"
if len(sys.argv) > 1:
start = sys.argv[1]

start = sys.argv[1] if len(sys.argv) > 1 else "!"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 81-109 refactored with the following changes:

Comment on lines -47 to +54
f = urllib.request.urlopen("%s/backup-index.html" % (dumpsdomain))
f = urllib.request.urlopen(f"{dumpsdomain}/backup-index.html")
raw = f.read()
f.close()

m = re.compile(
r'<a href="(?P<project>[^>]+)/(?P<date>\d+)">[^<]+</a>: <span class=\'done\'>Dump complete</span>'
).finditer(raw)
projects = []
for i in m:
projects.append([i.group("project"), i.group("date")])
projects = [[i.group("project"), i.group("date")] for i in m]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

This removes the following comments ( why? ):

# enwiki is splitted in several files, thats why we need a loop
# here

Comment on lines -66 to +72
print("Error while retrieving: %s" % (url))
print("Retry in %s seconds..." % (sleep))
print(f"Error while retrieving: {url}")
print(f"Retry in {sleep} seconds...")
time.sleep(sleep)
urllib.request.urlretrieve(url, filename2)
return
except:
sleep = sleep * 2
sleep *= 2
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function saveURL refactored with the following changes:

Comment on lines -211 to +207
"Duplicate title found: %s" % self.page["title"]
f'Duplicate title found: {self.page["title"]}'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TitlesHandler.endElement refactored with the following changes:

Comment on lines -283 to +286
print("Duplicate title found: %s" % self.page["title"])
print(f'Duplicate title found: {self.page["title"]}')
else:
self.mediaNsPagesName_set.add(self.page["title"])
# self.mediaNsPages.append(self.page)
# print(self.page)
if self.page["id"] in self.mediaNsPagesID_set:
if not self.silent:
print("Duplicate id found: %s" % self.page["id"])
print(f'Duplicate id found: {self.page["id"]}')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function MediaNsHandler.endElement refactored with the following changes:

Comment on lines -334 to +330
titles = handler.set_titles if return_type == "set" else handler.list_titles

return titles
return handler.set_titles if return_type == "set" else handler.list_titles
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_titles_from_xml refactored with the following changes:

Comment on lines -39 to +41
if re.search("Internet Archive", wtext):
# print('It has IA parameter')
pass
else:
if not re.search("Internet Archive", wtext):
print("\n", "#" * 50, "\n", wtitle, "\n", "#" * 50)
print("https://wikiapiary.com/wiki/%s" % (re.sub(" ", "_", wtitle)))
print(f'https://wikiapiary.com/wiki/{re.sub(" ", "_", wtitle)}')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

This removes the following comments ( why? ):

# print('It has IA parameter')

Comment on lines -41 to +51
print("ERROR, dont understand date format in %s" % (identifier))
print(f"ERROR, dont understand date format in {identifier}")
elif len(t) == 2:
if len(t[0]) == 4 and len(t[1]) == 2: # YYYY-MM
identifiers[f"{t[0]}-{t[1]}"] = identifier
else:
print("ERROR, dont understand date format in %s" % (identifier))
print(f"ERROR, dont understand date format in {identifier}")
elif len(t) == 3:
if len(t[0]) == 4 and len(t[1]) == 2 and len(t[2]) == 2: # YYYY-MM-DD
identifiers[f"{t[0]}-{t[1]}-{t[2]}"] = identifier
else:
print("ERROR, dont understand date format in %s" % (identifier))
print(f"ERROR, dont understand date format in {identifier}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

Comment on lines -71 to +74
"Checking Wikimedia Commons files from %s to %s"
% (startdate.strftime("%Y-%m-%d"), enddate.strftime("%Y-%m-%d"))
f'Checking Wikimedia Commons files from {startdate.strftime("%Y-%m-%d")} to {enddate.strftime("%Y-%m-%d")}'
)
while startdate <= enddate:
print("== %s ==" % (startdate.strftime("%Y-%m-%d")))
print(f'== {startdate.strftime("%Y-%m-%d")} ==')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

Comment on lines -88 to +91
"Downloading Wikimedia Commons files from %s to %s"
% (startdate.strftime("%Y-%m-%d"), enddate.strftime("%Y-%m-%d"))
f'Downloading Wikimedia Commons files from {startdate.strftime("%Y-%m-%d")} to {enddate.strftime("%Y-%m-%d")}'
)
while startdate <= enddate:
print("== %s ==" % (startdate.strftime("%Y-%m-%d")))
print(f'== {startdate.strftime("%Y-%m-%d")} ==')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

This removes the following comments ( why? ):

# do not use r'', it is encoded
# csv header

Comment on lines -26 to +30
filename = "commonssql-%s.csv" % (year)
f = open(filename, "w")
f.write(
"img_name|img_timestamp|img_user|img_user_text|img_size|img_width|img_height\n"
)
f.close()

filename = f"commonssql-{year}.csv"
with open(filename, "w") as f:
f.write(
"img_name|img_timestamp|img_user|img_user_text|img_size|img_width|img_height\n"
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

* advanced: batch downloads, upload to Internet Archive or anywhere
"""

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 74-75 refactored with the following changes:

elif not size or size.lower() == "unknown":
pass
else:
elif size and size.lower() != "unknown":
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function App.sumSizes refactored with the following changes:


def __str__(self):
return "page '%s' not found" % self.title
return f"page '{self.title}' not found"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function PageMissingError.__str__ refactored with the following changes:


def __str__(self):
return "Export from '%s' did not return anything." % self.index
return f"Export from '{self.index}' did not return anything."
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function ExportAbortedError.__str__ refactored with the following changes:

Comment on lines -70 to +103
# API
m = re.findall(
if m := re.findall(
r'(?im)<\s*link\s*rel="EditURI"\s*type="application/rsd\+xml"\s*href="([^>]+?)\?action=rsd"\s*/\s*>',
result,
)
if m:
):
api = m[0]
if api.startswith("//"): # gentoo wiki
api = url.split("//")[0] + api
else:
pass # build API using index and check it

# Index.php
m = re.findall(
r'<li id="ca-viewsource"[^>]*?>\s*(?:<span>)?\s*<a href="([^\?]+?)\?', result
)
if m:
if m := re.findall(
r'<li id="ca-viewsource"[^>]*?>\s*(?:<span>)?\s*<a href="([^\?]+?)\?',
result,
):
index = m[0]
elif m := re.findall(
r'<li id="ca-history"[^>]*?>\s*(?:<span>)?\s*<a href="([^\?]+?)\?',
result,
):
index = m[0]
else:
m = re.findall(
r'<li id="ca-history"[^>]*?>\s*(?:<span>)?\s*<a href="([^\?]+?)\?', result
)
if m:
index = m[0]
if index:
if index.startswith("/"):
if api:
index = urljoin(api, index.split("/")[-1])
else:
index = urljoin(url, index.split("/")[-1])
index = (
urljoin(api, index.split("/")[-1])
if api
else urljoin(url, index.split("/")[-1])
)
# api = index.split("/index.php")[0] + "/api.php"
if index.endswith("/Main_Page"):
index = urljoin(index, "index.php")
else:
if api:
if len(re.findall(r"/index\.php5\?", result)) > len(
re.findall(r"/index\.php\?", result)
):
index = "/".join(api.split("/")[:-1]) + "/index.php5"
else:
index = "/".join(api.split("/")[:-1]) + "/index.php"
elif api:
if len(re.findall(r"/index\.php5\?", result)) > len(
re.findall(r"/index\.php\?", result)
):
index = "/".join(api.split("/")[:-1]) + "/index.php5"
else:
index = "/".join(api.split("/")[:-1]) + "/index.php"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function mwGetAPIAndIndex refactored with the following changes:

This removes the following comments ( why? ):

# Index.php
# build API using index and check it
# API

Comment on lines -124 to +117
print("Connection error: %s" % (str(e)))
print(f"Connection error: {str(e)}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function checkRetryAPI refactored with the following changes:

sys.exit(1)

elif statuscode == 401 or statuscode == 403:
elif statuscode in [401, 403]:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function handleStatusCode refactored with the following changes:

@elsiehupp elsiehupp merged commit 69cb2eb into python3 Aug 29, 2023
@elsiehupp elsiehupp deleted the sourcery/python3 branch August 29, 2023 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant