Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jan 20, 2022. It is now read-only.

Conversation

@ianoc
Copy link
Collaborator

@ianoc ianoc commented Mar 5, 2014

Faster in microbenchmarks on the cache::

import com.twitter.summingbird.online.option._
import com.twitter.summingbird.option.CacheSize
import com.twitter.algebird._
import com.twitter.summingbird.online._
import com.twitter.util.Duration
import scala.util.Random
import com.twitter.bijection._

val cacheSize = CacheSize(1000)
val flushFrequency = FlushFrequency(Duration.fromMilliseconds(100))
val memFlushPer = SoftMemoryFlushPercent(80.0f)
val combiner = ValueCombinerCacheSize(10)
val poolSize = AsyncPoolSize(1)
val numInputItems = 100000L


implicit val hllMonoid = new HyperLogLogMonoid(24)

def hll[T](t: T)(implicit monoid: HyperLogLogMonoid, inj: Injection[T, Array[Byte]]): HLL = monoid.create(inj(t))

val inputItems: List[(Long, HLL)] = Random.shuffle(((0L until numInputItems).map {inputKey =>
  if(inputKey % 100 == 0) { // 1% be hot keys
    (0L until 100L).map {valIndex => // hot keys have 10x more data
      (inputKey, hll(Random.nextLong))
    }
    } else {
    (0L until 10L).map {valIndex =>
      (inputKey, hll(Random.nextLong))
    }
  }
}).flatten).toList

def buildCaches[Key, Value](implicit sg: Semigroup[Value]) = 
  List(
    YetAnotherCache[Key, Value](cacheSize, flushFrequency, memFlushPer),
    SummingQueueCache[Key, Value](cacheSize, flushFrequency),
    MultiTriggerCache[Key, Value](CacheSize(cacheSize.lowerBound/10), combiner, flushFrequency, memFlushPer, poolSize)
  )

val caches = buildCaches[Long, HLL]

val th = new com.ichi.bench.Thyme

import com.twitter.util.{Await, Future}
def fn(cache: AsyncCache[Long, HLL]): Int = {
  val results = (inputItems.map(item => cache.insert(List(item)).unit) :+ cache.forceTick.unit)
  Await.result(Future.collect(results.toIndexedSeq)).size
}

println("Yet another vs Summing queue")
th.benchOffPair()(fn(caches(0)))(fn(caches(1)))


println("Yet another vs MultiTrigger")
th.benchOffPair()(fn(caches(0)))(fn(caches(2)))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a val?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should really just be undefined for the concrete class to specify it . Going for the notion its a def here, but a val in the concrete class so the clean up will work ok from here.

(If we decide to keep the MultiTriggerCache it can hopefully just use these traits too.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand the CacheSize class..
Wouldn't the two calls to cacheSizeOpt.size.get in this class return different values? Is that okay for this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is just badness, i'm not entirely sure why we have the fudge factor on the CacheSize, but this isn't good. Will change it to the one call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, no docs there :) It's because if all the cache sizes are the same, storm will buffer in every bolt for roughly the same period and then EMIT!!! Then the network dies and @singhala commits seppuku.

The fuzz allows buffering, with a good distribution of network activity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, thanks for the info. I wonder if we need it anymore with the MaxEmitForExecute options, seems it still didn't like cache flushes... locks up all the queue insertion points :) Doesn't do any harm anyway

@ianoc
Copy link
Collaborator Author

ianoc commented Mar 11, 2014

Sorry about the delay in updating, finally got back to this. Merged up to master, comments addressed I think

@ianoc
Copy link
Collaborator Author

ianoc commented Mar 12, 2014

@johnynek This look ok to merge?

@ianoc
Copy link
Collaborator Author

ianoc commented Mar 14, 2014

bump

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Background

@ianoc
Copy link
Collaborator Author

ianoc commented Apr 1, 2014

Should all be fixed/building now

jcoveney added a commit that referenced this pull request Apr 9, 2014
@jcoveney jcoveney merged commit aa14f09 into develop Apr 9, 2014
@jcoveney jcoveney deleted the YetAnotherCache branch April 9, 2014 16:15
@jcoveney
Copy link
Contributor

jcoveney commented Apr 9, 2014

Thanks for the epic patience. Any other reviews can be in the form of bug fixes ;)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants