Thanks to visit codestin.com
Credit goes to github.com

Skip to content

if-always/tiktok_reverse

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TikTok/Douyin API: Reverse Engineering Security Algorithms, Full Generation from Scripts

A technical overview of TikTok's internal API security, focusing on the various signature algorithms used to authenticate and protect client-server communication. This document is intended for educational and research purposes.

Disclaimer: Interacting with private APIs without authorization may violate the terms of service of the platform. Proceed with caution and at your own risk.


1. Introduction to TikTok's API Security

To combat automated bots and ensure platform integrity, TikTok and its Chinese counterpart, Douyin, employ a sophisticated suite of security algorithms. These algorithms generate dynamic signatures that are attached as headers to most API requests. A request lacking a valid signature will be rejected by TikTok's servers.

These signatures are generated client-side (on the mobile app or web browser) and are designed to be difficult to replicate, proving that the request originates from a legitimate client instance. The primary algorithms are known by their HTTP header names:

  • X-Argus
  • X-Gorgon
  • X-Bogus
  • X-Ladon
  • X-Typhon
  • X-Medusa (More prevalent in Douyin)

While the global version of TikTok and Douyin share similar algorithmic foundations, their API endpoints and risk control strictness differ. Global TikTok's risk management is generally considered more aggressive.


2. Deep Dive into Signature Generation Algorithms

Based on reverse-engineering of the client-side code, we can detail the precise steps involved in generating the X-Bogus and X-Argus signatures.

2.1. The X-Bogus Signature Algorithm

The X-Bogus algorithm is a multi-stage process designed to create a unique signature from request data, user agent, and a timestamp. It relies heavily on hashing, custom ciphers, and data manipulation to achieve obfuscation. The generation process can be broken down into the following stages:

Step 1: Initial Hashing of Core Components

The algorithm first computes three separate MD5 hashes to create a condensed and irreversible representation of the request's key elements.

  1. Data Hashing: The request body is double-MD5 hashed, likely an extra obfuscation step to prevent simple hash lookups.

  2. URL Parameters Hashing: The URL query parameters are also double-MD5 hashed.

  3. User-Agent Hashing: The user agent string undergoes a more complex, three-step process: a. It's first encrypted using a simple RC4 stream cipher with a hardcoded key. b. The resulting ciphertext is then encoded using a custom Base64 implementation. c. Finally, the Base64 string is MD5 hashed.

Step 2: Assembling the "Salt Array"

A core data array is created, serving as the input for the main obfuscation engine. It's a carefully constructed list of integers containing:

  • The current Unix timestamp.
  • A hardcoded "magic" number (536919696).
  • A series of constants.
  • The last two bytes from each of the three MD5 hashes generated in Step 1. This is a key detail—it uses only a fraction of the hashes.
  • The timestamp and the magic number, broken down into their individual bytes.
  • A final checksum byte, calculated by XORing all previous bytes together.

Step 3: Data Filtering and Scrambling

The data array is not used as-is. It's first manipulated to change its order and structure:

  1. Filtering: A custom filtering function selects 19 specific bytes from the array in a predefined, non-sequential order.
  2. Scrambling: These 19 bytes are then interleaved into a new character string, effectively shuffling the data in a deterministic way.

Step 4: Final Encryption and Encoding

The scrambled string from the previous step undergoes a final round of encryption and encoding to produce the signature:

  1. RC4 Encryption: The string is encrypted again using RC4, this time with a very simple key.
  2. Prefixing: The resulting ciphertext is prefixed with two specific control bytes.
  3. Custom Base64 Encoding: The final binary string is encoded using a custom Base64 alphabet. This is a critical obfuscation technique; a standard Base64 decoder will fail on this signature. The custom alphabet makes it impossible to read the underlying data without knowing the exact character mapping.

The output of this final step is the X-Bogus signature string.


2.2. The X-Argus Signature Algorithm

The X-Argus algorithm is significantly more complex than X-Bogus. It involves modern cryptographic primitives, a custom block cipher, and a binary data format (Protocol Buffers) to structure its data.

Step 1: Data Structuring with Protocol Buffers (Protobuf)

Instead of a simple list, X-Argus organizes its input data into a highly structured object containing over 20 fields, including:

  • Device identifiers (device_id, sec_device_id).
  • Application details (aid, version_name, sdk_version).
  • The current timestamp.
  • Cryptographic hashes of the request's query and body.

This object is then serialized into a compact binary format using a Protobuf encoder. Protobuf is a format designed for efficient data exchange between services.

Step 2: Hashing the Request Query and Body

Before being added to the Protobuf structure, the query and body are hashed using the SM3 cryptographic hash function. SM3 is a Chinese national standard, similar in purpose to SHA-256, and is significantly stronger than the MD5 algorithm used in X-Bogus.

Step 3: Core Encryption with the Simon Cipher

This is the cryptographic heart of X-Argus.

  1. Padding: The serialized Protobuf data is padded to be a multiple of the cipher's block size (16 bytes).
  2. Key Preparation: The encryption key is derived from a hardcoded constant found in the application's binary. This key is prepared for use with the Simon cipher.
  3. Simon Encryption: The data is encrypted block-by-block using the Simon block cipher. Simon is a lightweight cipher designed for high performance, making it suitable for mobile devices. This is a far more robust encryption method than the simple RC4 stream cipher.

Step 4: Post-Encryption Obfuscation

The encrypted data from the Simon cipher is further obscured:

  1. Header Prepending: A fixed 8-byte header is prepended to the ciphertext.
  2. Byte Reversal and XORing: A custom function reverses the byte order of the entire buffer and XORs each byte with a byte from the header, effectively scrambling the ciphertext.
  3. Final AES Encryption: The resulting buffer is padded again and then encrypted one last time using the standard AES cipher in CBC mode. The AES key and initialization vector (IV) are themselves derived by MD5-hashing parts of another hardcoded key.

Step 5: Final Formatting

  1. Prefixing: The AES-encrypted data is prefixed with two specific control bytes.
  2. Base64 Encoding: The final binary blob is encoded into a standard Base64 string.

This string is the final X-Argus signature.


3. Endpoint Limitations and Operational Challenges

Interacting with the TikTok API, even with valid signatures, is not straightforward. The platform employs a multi-layered defense system that poses significant challenges.

  • Rate Limiting and IP Reputation: Aggressive rate limits are in place for all endpoints. An IP address making too many requests in a short period will be temporarily or permanently blocked. The reputation of the IP address (e.g., residential vs. datacenter) also plays a crucial role in the level of scrutiny it receives.

  • Geographic Restrictions (Geo-fencing): Many API endpoints return different content or fail entirely based on the geographical location of the client's IP address. This is fundamental to how TikTok customizes the user experience and enforces regional content policies.

  • Device and Account Trust Score: TikTok maintains an internal "trust score" for both devices and user accounts.

    • Device Trust: A newly registered device_id is considered untrusted. It must be "warmed up" through a series of activation requests that mimic real user behavior over time. Without a trusted device, many core API calls will return empty data or errors.
    • Account Trust: Accounts that exhibit bot-like behavior (e.g., rapid following, liking, or commenting) are flagged. A flagged account may lose access to certain features or have its API requests systematically denied.
  • CAPTCHA Challenges: If the risk control engine detects suspicious activity, it will serve a CAPTCHA challenge. Automated clients must be able to detect these challenges and integrate with a solving service, adding another layer of complexity and cost.

  • Signature Algorithm Versioning: The signing algorithms are not static. TikTok frequently pushes updates to the client applications that modify the constants, logic, or entire structure of these algorithms. An implementation that works today may become obsolete overnight, requiring continuous reverse-engineering efforts.


4. Analysis of Historical API Vulnerabilities

Like any large-scale platform, TikTok's API has had its share of security vulnerabilities in the past. Analyzing these provides insight into potential weak points in complex systems.

  • Account Takeover via SMS Link Spoofing (2020): Researchers discovered a flaw in the /v1/mobile/verify/ endpoint. By manipulating the request, an attacker could send a fraudulent SMS to a user that appeared to be from TikTok. If the user clicked the link, it allowed the attacker to associate their own phone number with the victim's account, leading to a full account takeover. This highlighted weaknesses in the logic of account recovery and linking mechanisms.

  • Unauthenticated Information Disclosure (2020): Several API endpoints were found to be improperly secured, allowing unauthenticated access to user data such as secondary email addresses, birth dates, and other profile information. This was a classic case of missing authorization checks on sensitive data endpoints.

  • Deep Link Hijacking and Content Spoofing (2021): A vulnerability was identified in how the application handled deep links (tiktok://). An attacker could craft a malicious link that, when opened, could force the app to make arbitrary API calls on behalf of the user. This could be used to make a user's private videos public or to display spoofed content within the official app.

  • Content Processing Vulnerabilities (CVE-2022-28799): While not a direct API logic flaw, the way the API handled content uploads presented a significant risk. A vulnerability was found in the third-party media processing library used by TikTok on Android. By uploading a specially crafted video file, an attacker could trigger a memory corruption issue, potentially leading to remote code execution on the user's device. This demonstrates that the attack surface extends beyond the API logic itself to how the platform processes user-submitted data.

These historical examples underscore the immense challenge of securing a platform with such a vast and dynamic API. They show that vulnerabilities can arise not just from cryptographic weaknesses, but also from flawed business logic, improper authorization checks, and insecure handling of user-generated content.


Limitations

Don't think that with these signatures you can takeover TikTok 😂

You will need :

- **Unflagged and (Real) Accounts/Cookies, you can buy them from me :) for 0.10$/unit if you buy in mass.**

- **MsTokenEnc and MsTokenDec to generate msToken's, Instead of wasting proxy data on generating while getting page source, you can get it from me (pay).**

About

TikTok Reverse Engineering - Security Algorithms - Useful for Scrapping/Automation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%