Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@guybedford
Copy link
Member

For #311, here is a tokenizing approach for CommonJS require extraction.

I'm hosting the tokenizer project separately at https://github.com/guybedford/extract-requires, and will be expanding the tests, this is just the very first commit for now.

Looking promising so far. Seeing how performance benchmarks compare will be the big thing.

//cc @theefer

@guybedford
Copy link
Member Author

@theefer well it turns out this passes all our tests, including the ones in #312!

@guybedford guybedford force-pushed the cjs-tokens branch 2 times, most recently from ac41caa to c6f98b4 Compare January 21, 2015 18:18
@guybedford
Copy link
Member Author

Did some perf measurements on this and the tokenizer is less than a quarter of the speed unfortunately - http://jsperf.com/tokenizer-v-regex-commonjs-require-extraction/2.

Will experiment with optimizations.

@guybedford
Copy link
Member Author

@crisptrutski your magical performance eye would be very valued here....

@guybedford
Copy link
Member Author

I managed to optimize the function by using regular expressions for the seeking states instead of stepping everything - http://jsperf.com/tokenizer-v-regex-commonjs-require-extraction/4 (guybedford/extract-requires@817736a).

It's still 60% slower but better than 98% slower!

I also conveniently remembered now tokenizing can't be done comprehensively because division is indistinguishable from regular expressions without deeper lexing knowledge.

So I believe that leaves us with:

  1. Just pick up requires inside strings as requires, and draw a line.
  2. Use tokenizing as above with 60% speed reduction (wouldn't be noticed), and then some sources with minification and division will not get requires picked up in some places.
  3. Use Traceur / 6to5 for CommonJS parsing as well, accepting a much larger speed reduction for full accuracy.

I'm tempted to go with (1) because it doesn't seem a strong enough reason on its own to use a full parser. But I'm open to (3).

@theefer
Copy link
Contributor

theefer commented Jan 23, 2015

(1) will mean some libraries won't be able to be installed at all (as per #311), right? I encountered this bug while trying to install the npm:twitter library, which is the most popular Node Twitter client AFAICS.

Could we still solve this by running a more expensive parsing "offline" (when installing locally or via CDN), and perhaps accept a more naive (and in this case broken) parsing when done dynamically from the original source?

@guybedford
Copy link
Member Author

Ok I've managed to work out a way to combine tokenizing and regexes to form a MORE accurate method that won't fall down with division confusion.

@theefer that is I have a replacement PR now that can work for this problem.

The performance results are in http://jsperf.com/tokenizer-v-regex-commonjs-require-extraction/5.

The question here is now:

  1. Accept a 4x performance loss for adding this feature in.
  2. Continue to work around.

@crisptrutski if you have any further perf suggestions for this updated version at https://github.com/guybedford/extract-requires/blob/master/extract-requires.js let me know.

I'm going to look into the specific twitter API case now and see how plausible a work-around is.

@guybedford
Copy link
Member Author

For what it's worth, I've updated the branch with this valid option as well.

@guybedford guybedford reopened this Jan 23, 2015
@guybedford
Copy link
Member Author

Closing for now as a work around as @theefer says. We can come back to this one day if necessary, possibly comparing performance of this approach to Traceur parsing directly.

@guybedford guybedford closed this Jan 27, 2015
@guybedford guybedford deleted the cjs-tokens branch April 2, 2015 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants