CommonJS tokenizer attempt #326

guybedford · 2015-01-21T17:55:02Z

For #311, here is a tokenizing approach for CommonJS require extraction.

I'm hosting the tokenizer project separately at https://github.com/guybedford/extract-requires, and will be expanding the tests, this is just the very first commit for now.

Looking promising so far. Seeing how performance benchmarks compare will be the big thing.

//cc @theefer

Test for shared dependency bundle bug

guybedford · 2015-01-21T18:00:22Z

@theefer well it turns out this passes all our tests, including the ones in #312!

guybedford · 2015-01-22T16:22:35Z

Did some perf measurements on this and the tokenizer is less than a quarter of the speed unfortunately - http://jsperf.com/tokenizer-v-regex-commonjs-require-extraction/2.

Will experiment with optimizations.

guybedford · 2015-01-22T16:24:24Z

@crisptrutski your magical performance eye would be very valued here....

guybedford · 2015-01-23T01:12:46Z

I managed to optimize the function by using regular expressions for the seeking states instead of stepping everything - http://jsperf.com/tokenizer-v-regex-commonjs-require-extraction/4 (guybedford/extract-requires@817736a).

It's still 60% slower but better than 98% slower!

I also conveniently remembered now tokenizing can't be done comprehensively because division is indistinguishable from regular expressions without deeper lexing knowledge.

So I believe that leaves us with:

Just pick up requires inside strings as requires, and draw a line.
Use tokenizing as above with 60% speed reduction (wouldn't be noticed), and then some sources with minification and division will not get requires picked up in some places.
Use Traceur / 6to5 for CommonJS parsing as well, accepting a much larger speed reduction for full accuracy.

I'm tempted to go with (1) because it doesn't seem a strong enough reason on its own to use a full parser. But I'm open to (3).

theefer · 2015-01-23T01:40:56Z

(1) will mean some libraries won't be able to be installed at all (as per #311), right? I encountered this bug while trying to install the npm:twitter library, which is the most popular Node Twitter client AFAICS.

Could we still solve this by running a more expensive parsing "offline" (when installing locally or via CDN), and perhaps accept a more naive (and in this case broken) parsing when done dynamically from the original source?

guybedford · 2015-01-23T09:23:11Z

Ok I've managed to work out a way to combine tokenizing and regexes to form a MORE accurate method that won't fall down with division confusion.

@theefer that is I have a replacement PR now that can work for this problem.

The performance results are in http://jsperf.com/tokenizer-v-regex-commonjs-require-extraction/5.

The question here is now:

Accept a 4x performance loss for adding this feature in.
Continue to work around.

@crisptrutski if you have any further perf suggestions for this updated version at https://github.com/guybedford/extract-requires/blob/master/extract-requires.js let me know.

I'm going to look into the specific twitter API case now and see how plausible a work-around is.

guybedford · 2015-01-23T09:41:29Z

For what it's worth, I've updated the branch with this valid option as well.

guybedford · 2015-01-27T11:01:34Z

Closing for now as a work around as @theefer says. We can come back to this one day if necessary, possibly comparing performance of this approach to Traceur parsing directly.

guybedford and others added 5 commits January 20, 2015 12:37

dist build

50b67c1

Test for shared dependency bundle bug

b0c3a19

Merge pull request #324 from Bubblyworld/master

0d17966

Test for shared dependency bundle bug

fix linking bug for shared bundles

b8e1ded

tokenizer, first attempt

8cfea68

guybedford force-pushed the cjs-tokens branch from affcddd to 8cfea68 Compare January 21, 2015 17:57

it works..

bf58160

guybedford force-pushed the cjs-tokens branch 2 times, most recently from ac41caa to c6f98b4 Compare January 21, 2015 18:18

more tests

8c75664

guybedford force-pushed the cjs-tokens branch from c6f98b4 to 8c75664 Compare January 21, 2015 18:22

guybedford force-pushed the master branch from b8e1ded to b2e4cf2 Compare January 22, 2015 08:24

use regex seeking states for perf

37e8e57

guybedford closed this Jan 23, 2015

guybedford mentioned this pull request Jan 23, 2015

'require(...)' calls erroneously parsed even within strings in CommonJS #311

Closed

combined tokenizing, regex approach

30c54fa

guybedford reopened this Jan 23, 2015

guybedford closed this Jan 27, 2015

guybedford deleted the cjs-tokens branch April 2, 2015 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CommonJS tokenizer attempt #326

CommonJS tokenizer attempt #326

Uh oh!

guybedford commented Jan 21, 2015

Uh oh!

guybedford commented Jan 21, 2015

Uh oh!

guybedford commented Jan 22, 2015

Uh oh!

guybedford commented Jan 22, 2015

Uh oh!

guybedford commented Jan 23, 2015

Uh oh!

theefer commented Jan 23, 2015

Uh oh!

guybedford commented Jan 23, 2015

Uh oh!

guybedford commented Jan 23, 2015

Uh oh!

guybedford commented Jan 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

CommonJS tokenizer attempt #326

CommonJS tokenizer attempt #326

Uh oh!

Conversation

guybedford commented Jan 21, 2015

Uh oh!

guybedford commented Jan 21, 2015

Uh oh!

guybedford commented Jan 22, 2015

Uh oh!

guybedford commented Jan 22, 2015

Uh oh!

guybedford commented Jan 23, 2015

Uh oh!

theefer commented Jan 23, 2015

Uh oh!

guybedford commented Jan 23, 2015

Uh oh!

guybedford commented Jan 23, 2015

Uh oh!

guybedford commented Jan 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants