Rank normalization of features #1062

turian · 2012-08-24T16:29:13Z

I was talking about this feature with @ogrisel and he asked me to place an issue for it.

This is a technique suggested by Yoshua Bengio to handle features with unknown scale:

Convert the features to rank scale, so the lowest rank is 0 and the highest rank is 1. This is superior to z-transform (zero mean, unit variance) because if you have one huge outlier feature, it can mess things up. But a rank-transform is robust.

You can see a description here of Python code to do this:
http://stackoverflow.com/questions/3071415/efficient-method-to-calculate-the-rank-vector-of-a-list-in-python
scipy.stats.rankdata does it.
They convert to ranks, but don't normalize by the number of features.

One thing to be careful of:
If you have a lot of zeros and do a rank transform using scipy.stats.rankdata and then normalize by the number of features, they will end up have rank > 0 so you lose sparsity. I would recommend, to preserve sparsity, that you scale the range of the ranks to [0, 1] and clip any feature from the test set that exceeds the range.

agramfort · 2012-08-25T08:00:15Z

this could be a Ranker object next to the Scaler we have in preprocessing module.
PR welcome :)

mblondel · 2012-08-25T10:21:28Z

Ranker sounds a bit too generic (could be confused with learning to rank). How about RankScaler?

BTW, how does it work for unseen data? (transform method of the transformer)

turian · 2012-08-27T01:17:48Z

The difficulty is that you have to store all the rank information in the transform.

So if you have roughly 50K different ranks in each feature and 100 features, that's a rank matrix of 50K x 100.

I think there should be a parameter that controls the # of ranks per feature. A default of 1000 sounds reasonable.

turian · 2012-08-27T01:18:35Z

RankScaler is a better name IMO

agramfort · 2012-08-27T07:27:58Z

RankScaler is a better name IMO

sounds good. PR welcome :)

GaelVaroquaux · 2012-08-29T07:38:30Z

Ranker sounds a bit too generic (could be confused with learning to rank). How about RankScaler?

+1.

amueller · 2012-08-29T20:21:05Z

+1 for PR ;)

turian · 2013-07-21T23:42:57Z

PR #2176 : #2176

jnothman · 2017-07-12T11:32:01Z

I think this is more-or-less fixed by QuantileTransformer...

ajay9022 · 2018-02-20T18:04:24Z

Is there any resource from where I can learn Rank Transformation. I can't find it given anywhere in detail !!

jnothman · 2018-02-20T21:38:30Z

I think this is now QuantileTransformer

mblondel mentioned this issue Nov 10, 2015

Discretizer #5778

Closed

jnothman closed this as completed Jul 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rank normalization of features #1062

Rank normalization of features #1062

turian commented Aug 24, 2012

agramfort commented Aug 25, 2012

Uh oh!

mblondel commented Aug 25, 2012

Uh oh!

turian commented Aug 27, 2012

Uh oh!

turian commented Aug 27, 2012

Uh oh!

agramfort commented Aug 27, 2012

Uh oh!

GaelVaroquaux commented Aug 29, 2012

Uh oh!

amueller commented Aug 29, 2012

Uh oh!

turian commented Jul 21, 2013

Uh oh!

jnothman commented Jul 12, 2017

Uh oh!

ajay9022 commented Feb 20, 2018

Uh oh!

jnothman commented Feb 20, 2018

Uh oh!

Uh oh!

Rank normalization of features #1062

Rank normalization of features #1062

Comments

turian commented Aug 24, 2012

agramfort commented Aug 25, 2012

Uh oh!

mblondel commented Aug 25, 2012

Uh oh!

turian commented Aug 27, 2012

Uh oh!

turian commented Aug 27, 2012

Uh oh!

agramfort commented Aug 27, 2012

Uh oh!

GaelVaroquaux commented Aug 29, 2012

Uh oh!

amueller commented Aug 29, 2012

Uh oh!

turian commented Jul 21, 2013

Uh oh!

jnothman commented Jul 12, 2017

Uh oh!

ajay9022 commented Feb 20, 2018

Uh oh!

jnothman commented Feb 20, 2018

Uh oh!