Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix bug in normalize_and_hash_email_address function #752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2025

Conversation

arnau126
Copy link
Contributor

First commit

If email_address has no "@" in it, the function raises IndexError list index out of range in:

is_gmail = re.match(r"^(gmail|googlemail)\.com$", email_parts[1])

because email_parts is a list with only one item.

This PR fix this bug by moving the above line inside the if-block if len(email_parts) > 1 .

Second commit

I've added strip to email_address to honor the Enhanced Conversions doc which says:

[...] In order to standardize the hash results, prior to hashing one of these values you must:

  • Remove leading/trailing whitespaces.
  • [...]

@arnau126 arnau126 requested a review from a team as a code owner February 25, 2023 16:28
@google-cla
Copy link

google-cla bot commented Feb 25, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@arnau126
Copy link
Contributor Author

About the second commit, I've just realized that the function calls normalize_and_hash which already does the strip. However I think that we should keep this extra strip in the very beginning of the function so a string like "[email protected] " (with a trailing whitespace) works.
If we don't perfom this previous strip, the regex won't match, and the period won't be removed.

@BenRKarl
Copy link
Contributor

Question - are these changes just to address email address that don't have @ symbols in them?

@arnau126
Copy link
Contributor Author

arnau126 commented Mar 17, 2025

@BenRKarl
Yes.

The original function already tries to do so, but in a wrong way:

    email_parts = normalized_email.split("@")
    is_gmail = re.match(r"^(gmail|googlemail)\.com$", email_parts[1])

    # Check that there are at least two segments and the second segment
    # matches the above regex expression validating the email domain name.
    if len(email_parts) > 1 and is_gmail:

I've just moved the is_gmail regex inside the if so it's only calculated if there are at least two segments.

    email_parts = normalized_email.split("@")

    # Check that there are at least two segments
    if len(email_parts) > 1:
        # Checks whether the domain of the email address is either "gmail.com"
        # or "googlemail.com". If this regex does not match then this statement
        # will evaluate to None.
        if re.match(r"^(gmail|googlemail)\.com$", email_parts[1]):

@BenRKarl
Copy link
Contributor

@arnau126 ok great, sorry for the delay. Looks like these changes are being added to now-deleted files. Could you move them over to the new files where this validation occurs?

remarketing/upload_enhanced_conversions_for_leads.py
remarketing/upload_enhanced_conversions_for_web.py

@arnau126
Copy link
Contributor Author

@BenRKarl
Done (and rebased).

@BenRKarl BenRKarl merged commit bbe1466 into googleads:main Apr 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants