Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@DiLiuNEUexpresscompany
Copy link

Fix the #411

原始问题分析

错误原因

# 原始代码(已删除)
result = re.search(r"novel:\s({.+}),\s+isOwnWork", r.text)
novel_data = json.loads(result.groups()[0])  # 当result为None时崩溃

问题链:

  1. re.search() 返回 None(网页结构变化)
  2. 调用 None.groups()[0] 导致 'NoneType' object has no attribute 'groups' 错误
  3. 整个方法崩溃

解决方案对比

修改前(已删除的代码)

    # 2. 使用正则表达式解析JavaScript数据
    result = re.search(r"novel:\s({.+}),\s+isOwnWork", r.text)
    novel_data = json.loads(result.groups()[0])  # 容易崩溃

修改后(新代码)

def webview_novel(self, novel_id, raw=False, req_auth=True):
    # 1. 直接调用AJAX API
    url = f"{base_url}/ajax/novel/{novel_id}"
    
    # 2. 使用完整的浏览器headers
    headers = {
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
        'Accept-Encoding': 'gzip, deflate, br',
        'X-Requested-With': 'XMLHttpRequest',
        'Referer': f'{base_url}/novel/show.php?id={novel_id}',
        'User-Agent': 'Mozilla/5.0...',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Sec-Fetch-Dest': 'empty',
        'Sec-Fetch-Mode': 'cors',
        'Sec-Fetch-Site': 'same-origin',
        'Cache-Control': 'no-cache',
        'Pragma': 'no-cache',
        'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
        'Sec-Ch-Ua-Mobile': '?0',
        'Sec-Ch-Ua-Platform': '"Windows"',
    }
    
    # 3. 添加随机延迟
    import time
    import random
    time.sleep(random.uniform(1, 3))
    
    # 4. 安全的API调用
    r = self.no_auth_requests_call("GET", url, headers=headers, req_auth=req_auth)
    
    # 5. 安全的JSON解析
    json_data = self.parse_result(r)
    
    # 6. 检查API错误
    if 'error' in json_data and json_data['error']:
        error_msg = json_data.get('message', '未知错误')
        raise PixivError(f"API错误: {error_msg}")
    
    # 7. 检查响应格式
    if 'body' not in json_data:
        msg = f"AJAX API响应格式不正确: {list(json_data.keys())}"
        raise PixivError(msg, header=r.headers, body=r.text)
    
    # 8. 创建灵活的结果对象
    class SimpleNovelResult:
        def __init__(self, data):
            self.raw_data = data
            self.title = data.get('title', '')
            # 支持多种字段名
            self.text = data.get('text', '') or data.get('content', '') or data.get('novelText', '')
            self.description = data.get('description', '')
            self.author_name = data.get('authorName', '') or data.get('userName', '')
            # ... 其他字段
            
        def __getattr__(self, name):
            return self.raw_data.get(name, None)
    
    return SimpleNovelResult(novel_data)

What this PR does?

1. 消除正则表达式依赖

# ❌ 修改前:容易崩溃
result = re.search(r"novel:\s({.+}),\s+isOwnWork", r.text)
novel_data = json.loads(result.groups()[0])

# ✅ 修改后:安全的API调用
json_data = self.parse_result(r)
novel_data = json_data['body']

2. 改进错误处理

# ❌ 修改前:无错误检查
result.groups()[0]  # 可能崩溃

# ✅ 修改后:完善的错误检查
if 'body' not in json_data:
    msg = f"AJAX API响应格式不正确: {list(json_data.keys())}"
    raise PixivError(msg, header=r.headers, body=r.text)

3. 灵活的字段映射

# ❌ 修改前:固定字段名
self.text = data.get('text', '')

# ✅ 修改后:支持多种字段名
self.text = data.get('text', '') or data.get('content', '') or data.get('novelText', '')

4. 绕过Cloudflare

# ❌ 修改前:基础headers
headers = {
    'Accept': 'application/json, text/plain, */*',
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0...',
}

# ✅ 修改后:完整的浏览器headers
headers = {
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'X-Requested-With': 'XMLHttpRequest',
    'Referer': f'{base_url}/novel/show.php?id={novel_id}',
    'User-Agent': 'Mozilla/5.0...',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache',
    'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
    'Sec-Ch-Ua-Mobile': '?0',
    'Sec-Ch-Ua-Platform': '"Windows"',
}

总结

这次修改原始问题:

  • 根本原因:网页结构变化导致正则表达式失效
  • 解决方案:改用稳定的AJAX API
  • 额外收益:绕过Cloudflare、提高稳定性、增强兼容性

修改后的代码不再依赖易变的网页结构,而是使用稳定的API接口,从根本上避免了 'NoneType' object has no attribute 'groups' 错误。

@DiLiuNEUexpresscompany
Copy link
Author

Hi, this PR is from a fork. Could a maintainer please review and run the pending workflow(s)? Thanks!

@upbit

@DiLiuNEUexpresscompany
Copy link
Author

All checks are green and there are no conflicts.
Kindly asking for a review and merge when you have a moment. Thanks! @upbit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant