Codestin Search App

manrajgrover · 2018-07-09T19:42:55Z

This PR adds multivariate linear regression example.

This change is

…ta api

manrajgrover · 2018-07-10T13:01:28Z

This looks ready for first review.

bileschi

Hi Manraj, thanks for putting this together! it looks like this is designed for node. Is it possible to adjust this to instead focus on the browser? A pattern for loading data without a file system can be cloned from the mnist example. After this, a simple index.html and ui.js should do the trick. Thanks!

Reviewable status: 0 of 1 LGTMs obtained

multivariate-linear-regression/data.js, line 3 at r1 (raw file):

Copyright 2018 Google LLC. All Rights Reserved.

This should read

Copyright 2018 the tfjs-examples Authors.

multivariate-linear-regression/data.js, line 130 at r1 (raw file):

this.dataset[2] = normalizeDataset(this.dataset[2]);

Technically, the normalization parameters are learned parameters. They should be estimated from the training set and applied to the test set.

multivariate-linear-regression/index.js, line 21 at r1 (raw file):

const timer = require('node-simple-timer');

would Date().getTime(); work for these?

nsthorat

I think it's fine to keep it node since we're lacking node tutorials -- however can you name the directory as such: "multivariate-linear-regression-node"?

Nice work Manraj, thank you!

Reviewable status: 0 of 1 LGTMs obtained

multivariate-linear-regression/data.js, line 3 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

Copyright 2018 Google LLC. All Rights Reserved.

This should read

Copyright 2018 the tfjs-examples Authors.

I think it's fine to keep as is -- none of the other examples do this.

multivariate-linear-regression/data.js, line 33 at r1 (raw file):

// Downloads a test file only once and returns the csv
async function loadCsv(filename) {

put this in and readCsv in utils (since their API is pretty easy to understand)

multivariate-linear-regression/data.js, line 70 at r1 (raw file):

// Shuffles data and label using Fisher-Yates algorithm.
const shuffle = (data, label) => {

put this in utils

multivariate-linear-regression/data.js, line 130 at r1 (raw file):

Previously, bileschi (Stanley Bileschi) wrote…

this.dataset[2] = normalizeDataset(this.dataset[2]);
Technically, the normalization parameters are learned parameters. They should be estimated from the training set and applied to the test set.

+1 -- they don't necessarily have to be learned but the normalization parameters should span both training and test set (computing mean / variance across the entire dataset).

multivariate-linear-regression/data.js, line 174 at r1 (raw file):

  }

  _generateBatch(isTrainingData, batchSize) {

no need for undescores

multivariate-linear-regression/data.js, line 223 at r1 (raw file):

}

module.exports = new BostonHousingDataset();

use export const BostonHousingDataset

multivariate-linear-regression/index.js, line 28 at r1 (raw file):

const TEST_SIZE = 173;
const LEARNING_RATE = 0.01;

any reason not to use the layers API? will make the model definition much simpler

multivariate-linear-regression/utils.js, line 1 at r1 (raw file):

// Calculate the arithmetic mean of a vector.

license at the top of thise file

multivariate-linear-regression/utils.js, line 37 at r1 (raw file):

};

module.exports = {

you cant just export const stddev? why module.exports?

manrajgrover · 2018-07-10T13:54:22Z

Hi Stanley,

it looks like this is designed for node. Is it possible to adjust this to instead focus on the browser?

Would this example be completely browser focused?

A pattern for loading data without a file system can be cloned from the mnist example

I'll have a look but current dataset contains csv and not images

Technically, the normalization parameters are learned parameters. They should be estimated from the training set and applied to the test set.

Agreed, will fix this

would Date().getTime(); work for these?

For web, we have performance.now() which would be a better option

nsthorat

Ignore me regarding Node vs Browser, didn't see the other conversation. Browser SGTM.

Reviewable status: 0 of 1 LGTMs obtained

…-regression-core

manrajgrover · 2018-07-10T21:19:06Z

Made the changes

Apologies, I'm not comfortable with permissions asked by Reviewable and hence don't make use of it.

would Date().getTime(); work for these?

Done

Made use of performance.now()

any reason not to use the layers API? will make the model definition much simpler

There is already an example for regression making use of Layers API.

I think it's fine to keep as is -- none of the other examples do this.

Okay

+1 -- they don't necessarily have to be learned but the normalization parameters should span both training and test set (computing mean / variance across the entire dataset).

I feel we should compute mean and variance on the training set and apply it to testing to avoid any leakage of information. I've made the changes accordingly. Please correct me if I'm wrong.

use export const BostonHousingDataset

Done

no need for undescores

Done

put this in utils

Done

put this in and readCsv in utils (since their API is pretty easy to understand)

Done

license at the top of thise file

Done

you cant just export const stddev? why module.exports?

I'm not aware of nodejs version for which examples should be compatible and hence went with it

…structions

caisq

Thanks, Manraj!! In this first round of review, I made some high-level comments. I will dive deeper once you have addressed these comments.

Reviewable status: 0 of 1 LGTMs obtained

multivariate-linear-regression-core/data.js, line 66 at r3 (raw file):

utils.shuffle

This custom shuffling logic won't be necessary if you use the Layers API's fit() method, which performs shuffling by default. (See my comment below.)

multivariate-linear-regression-core/utils.js, line 34 at r3 (raw file):

/**

I suggest we write full JSDocs with arguments and return values included, see example at:
https://github.com/tensorflow/tfjs-examples/blob/master/lstm-text-generation/index.js#L48

multivariate-linear-regression-core/utils.js, line 35 at r3 (raw file):

  

nit: The comment should be separated from the asterisk by one space.

To avoid the overhead of manual formatting, you can run
clang-format -i --style=google util.js
and also on other js files.

multivariate-linear-regression-core/utils.js, line 35 at r3 (raw file):

csv

Nit: end comment lines with period per Google style.

multivariate-linear-regression-core/utils.js, line 55 at r3 (raw file):

export const shuffle

Is there any reason why this is an arrow function, while others like loadCsv are normal functions? If not, can we make then consistent?

multivariate-linear-regression-core/README.md, line 3 at r3 (raw file):

Multivariate Linear Regression

For beginners, please briefly explain what Multivariate Linear Regression is, something like, "linear regression with more than one numerical input feature".

multivariate-linear-regression-core/README.md, line 3 at r3 (raw file):

 Boston Housing Dataset.

For beginners, please briefly explain the background of this dataset, possibly by referring to https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

multivariate-linear-regression/index.js, line 28 at r1 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

any reason not to use the layers API? will make the model definition much simpler

+1.

We plan to start the teaching material from the simpler layers API. Can you please rewrite this in that API. It'll take only a single tf.layers.dense layer and it'll also simply the training code.

…ntation, inconsistencies in syntax

manrajgrover · 2018-07-12T07:17:27Z

@caisq Made the changes. Any inputs on the user interface? I was thinking of adding plot for train and test loss.

caisq

Reviewable status: 0 of 1 LGTMs obtained

multivariate-linear-regression-core/index.js, line 34 at r4 (raw file):

kernelInitializer: 'randomNormal',

Just curious: Why use this instead of the default glorotNormal? Same for the line below.

multivariate-linear-regression-core/index.js, line 36 at r4 (raw file):

useBias: true

This is true by default. So no need for this line.

multivariate-linear-regression-core/index.js, line 46 at r4 (raw file):

    const history = await model.fit(
        batch.data, batch.target, {batchSize: BATCH_SIZE, shuffle: true});

It is my vote that we feed the whole dataset as a single xs tensor and a single ys tensor to this fit() call. The fit() method will take care of the batching and shuffling by itself, under the hood. This will greatly simply data.js.

Remember this is the 2nd example we will show in the teaching material. We still want to keep it simple.

Also, please note that shuffle is true by default. So no need to specify it explicitly here.

multivariate-linear-regression-core/index.js, line 50 at r4 (raw file):

Quoted 5 lines of code…

    if (step && step % 2 === 0) {
      const loss = history.history.loss[0].toFixed(6);
      console.log(`  - step: ${step}: loss: ${loss}`);
    }

This should be replaced with an onEpochEnd callback option to fit(). See example at: https://github.com/tensorflow/tfjs-examples/blob/master/iris/index.js#L64

multivariate-linear-regression-core/index.js, line 53 at r4 (raw file):

await tf.nextFrame();

This should go into the onEpochEnd callback. Please also add a comment here about what this is for.

multivariate-linear-regression-core/index.js, line 66 at r4 (raw file):

 model.predict(evalData.data);

This entire function can and should be replaced with a single call to model.evaluate(). See example at:

https://js.tensorflow.org/api/0.12.0/#tf.Model.evaluate

caisq

Thanks a lot for making the changes for far, @manrajgrover. It's getting very close! I just have a small number of remaining comments.

Reviewable status: 0 of 1 LGTMs obtained

multivariate-linear-regression-core/index.js, line 34 at r5 (raw file):

.fit(trainData.data, trainData.target, {

We need a validationSplit for this fit(). Validation loss is important because it tells us when to stop the training, so that the value of NUM_EPOCHS can be less arbitrary.
Maybe set validationSplit to a value around 0.15.

Also, if the original training data CSV is not in a randomized order, using validationSplitwill require shuffling the data, because the validationSet is always the fraction of the data at the end of xs and ys. I may have previously advised you to remove random shuffling in data.js - if that's the case, sorry about the back and forth. But here we just need to shuffle once, i.e., not for every epoch.

multivariate-linear-regression-core/index.js, line 36 at r5 (raw file):

Can we set this to NUM_EPOCHS, so that we can get rid of the loop in the run function below?

multivariate-linear-regression-core/index.js, line 65 at r5 (raw file):

time

IMO, benchmarking the training speed is not necessary for this example. We aim for simplicity and essence in this example. Can we remove the benchmarking code?

…over/tfjs-examples into multi-linear-regression

caisq

Reviewable status: 0 of 1 LGTMs obtained

multivariate-linear-regression-core/index.js, line 34 at r6 (raw file):

num_features

nit: Use camelCase for consistency. Same elsewhere in this PR.

multivariate-linear-regression-core/index.js, line 51 at r6 (raw file):

Test

--> Validation. We will use "validation" and "test" to mean different things. "Validation" is the held-out data that we run evaluation on during training. It is used for things like deciding when to stop training. "Test" is the held-out data that the model never sees during training. It is used only after the training completes.

caisq

Reviewable status: 0 of 1 LGTMs obtained

multivariate-linear-regression-core/utils.js, line 22 at r6 (raw file):

'https://gist.githubusercontent.com/ManrajGrover/a4b2b6bf0abda231b4b49af8b9950688/raw/661367f1ab938642ff0d216276b77ace5d288b04/';

Please use this version of Boston Housing dataset instead: https://www.datasciencecentral.com/profiles/blogs/boston-housing-dataset-without-the-racial-profiling-field

Thanks.

manrajgrover · 2018-07-17T19:14:13Z

--> Validation. We will use "validation" and "test" to mean different things. "Validation" is the held-out data that we run evaluation on during training. It is used for things like deciding when to stop training.

@caisq In this case, although model is being used to predict on test data after every epoch, early stopping is not being decided on it. It is being decided based on validation data which I assume model.fit is handling internally (since it accepts validation split as a param) Hence, model's learning is not being influenced by test data. Does model.fit return validation loss? I plan to plot it along with training loss (which is one of the reason I had moved test prediction inside callback)

"Test" is the held-out data that the model never sees during training. It is used only after the training completes.

I'll move it out of the callback.

caisq · 2018-07-17T20:15:35Z

@manrajgrover Not sure I fully understood your last comment. But to answer your question: yes, model.fit() returns loss values for both the training and validation sets if validatoinSplit is set to a value >0. For example, const history = await model.fit(...) will return a history object that contains 'loss' and 'val_loss' in its 'history' field, which can be plotted. The other way to get the validation loss is to use the callbacks and use logs.val_loss, as you are currently doing.

I actually think getting the validation loss from the return value of model.fit() is better than getting it from the callbacks, because of simpler code. This is the 2nd concrete code example of the teaching material and we are probably not quite ready to dive into the details of callbacks yet.

In any case, I think plotting val_loss alongside loss is important because it shows the important concept of underfitting and overfitting. Running model.evaluate() on a separate test set to get the test loss value will also be nice, because that's the most objective evaluation of the model's accuracy.

manrajgrover · 2018-07-18T07:28:25Z

yes, model.fit() returns loss values for both the training and validation sets if validatoinSplit is set to a value >0. For example, const history = await model.fit(...) will return a history object that contains 'loss' and 'val_loss' in its 'history' field, which can be plotted. The other way to get the validation loss is to use the callbacks and use logs.val_loss, as you are currently doing.

@caisq Thanks for sharing this. Using callback will enable plotting a live graph. We can surely make use of history for making the plot in one go. I'll make the changes accordingly.

Regarding early stopping, I'm not sure if model.fit is handling it or if it requires a callback like Keras.

In any case, I think plotting val_loss alongside loss is important because it shows the important concept of underfitting and overfitting. Running model.evaluate() on a separate test set to get the test loss value will also be nice, because that's the most objective evaluation of the model's accuracy.

Agreed

caisq · 2018-07-18T17:41:51Z

@manrajgrover Yep. Sorry what I wrote before might be misleading. Early stopping is not done in your code. To do early stopping, you need a callback like https://keras.io/callbacks/#earlystopping, which is not implemented in tensorflow.js yet. For this example, showing plots of training and validation losses (along with a test loss value) suffices. There is no need to actually do early stopping.

caisq · 2018-07-18T17:42:12Z

Looking forward to your next revision before I can approve this PR. Thanks.

…over/tfjs-examples into multi-linear-regression

manrajgrover · 2018-07-19T15:01:10Z

@caisq Done

caisq

Reviewable status: complete! 1 of 1 LGTMs obtained

caisq

Thanks, @manrajgrover This looks great!

manrajgrover · 2018-07-20T15:26:53Z

@caisq Thanks for reviewing. If you could put these CSV's on Google Cloud Storage and share the url, I can update the link and if required, processing part accordingly.

caisq · 2018-07-20T15:28:10Z

@manrajgrover Let's do that later. We will need to settle on a centralized, uniform scheme for storing those data files. When we are done with that, we will update the URLs like this one. Thanks.

caisq · 2018-07-20T15:28:43Z

@bileschi @ericdnielsen let us know if you have any remaining comments.

The review has been ongoing for about two weeks. Manraj has addressed multiple rounds of comments. I am going to merge the PR now so we can iterate on it.

Multi Linear Regression Example: Adds initial data loading api

d1051a8

manrajgrover changed the title ~~[WIP] Multi Linear Regression Example~~ [WIP] Multivariate Linear Regression Example Jul 9, 2018

manrajgrover added 3 commits July 10, 2018 01:26

Multivariate Linear Regression: Fixes fetch and test batch bugs in da…

d3fc30d

…ta api

Multivariate Linear Regression: Normalize the dataset

eef78c1

Multivariate Linear Regression: Apply linear regression

b8ad480

bileschi requested review from bileschi and caisq July 10, 2018 13:02

manrajgrover changed the title ~~[WIP] Multivariate Linear Regression Example~~ Multivariate Linear Regression Example Jul 10, 2018

bileschi previously requested changes Jul 10, 2018

View reviewed changes

nsthorat reviewed Jul 10, 2018

View reviewed changes

manrajgrover added 4 commits July 11, 2018 02:25

Multivariate Linear Regression: Setup example for web

cfd81ab

Multivariate Linear Regression: Move utilities to separate module

05cbcc6

Multivariate Linear Regression: Make index.js web ready

265f48c

Multivariate Linear Regression: Rename example to multivariate-linear…

6d1ef5e

…-regression-core

Multivariate Linear Regression: Update readme with build and watch in…

68f6acf

…structions

caisq requested changes Jul 11, 2018

View reviewed changes

Multivariate Linear Regression: Makes use of Layers API, fixes docume…

c513833

…ntation, inconsistencies in syntax

caisq requested changes Jul 13, 2018

View reviewed changes

manrajgrover added 2 commits July 14, 2018 00:18

Multivariate Linear Regression: Simplify model, data api

8bb79d3

Merge branch 'master' into multi-linear-regression

f858db1

caisq requested changes Jul 16, 2018

View reviewed changes

manrajgrover added 2 commits July 16, 2018 23:40

Multivariate Linear Regression: Cleans up index.js

f7b1507

Merge branch 'multi-linear-regression' of https://github.com/ManrajGr…

1bc03d0

…over/tfjs-examples into multi-linear-regression

caisq requested changes Jul 17, 2018

View reviewed changes

manrajgrover added 5 commits July 19, 2018 02:00

Merge branch 'master' into multi-linear-regression

44d8d09

Merge branch 'master' into multi-linear-regression

01277ad

Multivariate Linear Regression: Logs validation loss

5cfdf17

Multivariate Linear Regression: Adds live plotting

1811531

Merge branch 'multi-linear-regression' of https://github.com/ManrajGr…

2bbc7ee

…over/tfjs-examples into multi-linear-regression

Multivariate Linear Regression: Remove open console message

0f6e187

caisq reviewed Jul 20, 2018

View reviewed changes

caisq approved these changes Jul 20, 2018

View reviewed changes

Merge branch 'master' into multi-linear-regression

e29baf2

manrajgrover mentioned this pull request Jul 20, 2018

Neural Network Regression Example #111

Merged

Merge branch 'master' into multi-linear-regression

2f01009

caisq merged commit de25b48 into tensorflow:master Jul 22, 2018

Conversation

manrajgrover commented Jul 9, 2018 • edited by dsmilkov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manrajgrover commented Jul 10, 2018

Uh oh!

bileschi left a comment

Choose a reason for hiding this comment

Uh oh!

nsthorat left a comment

Choose a reason for hiding this comment

Uh oh!

manrajgrover commented Jul 10, 2018

Uh oh!

nsthorat left a comment

Choose a reason for hiding this comment

Uh oh!

manrajgrover commented Jul 10, 2018

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

manrajgrover commented Jul 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

manrajgrover commented Jul 17, 2018

Uh oh!

caisq commented Jul 17, 2018

Uh oh!

manrajgrover commented Jul 18, 2018

Uh oh!

caisq commented Jul 18, 2018

Uh oh!

caisq commented Jul 18, 2018

Uh oh!

manrajgrover commented Jul 19, 2018

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

manrajgrover commented Jul 20, 2018

Uh oh!

caisq commented Jul 20, 2018

Uh oh!

caisq commented Jul 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

manrajgrover commented Jul 9, 2018 •

edited by dsmilkov

Loading

manrajgrover commented Jul 12, 2018 •

edited

Loading