Codestin Search App

v0.18.0-RC3

Apr 28, 2022
e96797b
zip
tar.gz

v0.18.0-RC2

Apr 10, 2022
ebbfeb7
zip
tar.gz

tlazaro/twitter/20210917

Add support for cogroups in beam-backend (#1945)

In this change we are adding support for `HashCoGroup` and `CoGroupedPipe`.
For evaluating HashCoGroup we are creating a ParDo transformation on the larger pipe with smaller pipe as side input.
For evaluating CoGroupedPipe we are using the `MultiJoinFunction` to evaluate the final output iterator.

Also as part of this change we are doing a minor refactor for code > 100 lines.

TESTS: Added unit tests for both HashCoGroup and CoGroupedPipe.

Sep 16, 2021
767a945
zip
tar.gz

twitter/20210415

Fix race between jvm shutdown and `writer.finished` (#1938)

Currently `writer.finished` happens in `onComplete` callback on `Future` result in `Exucution`. However since `onComplete` isn't being called before future being resolved and called asynchroniously after future being resolved, it leads to a race and runtime error:
- User's code as last operation in `main` executes `Execution`
- `onComplete` with `writer.finished` is being scheduled
- result `Future` gets resolved and jvm starts to shutdown
- `writer.finished` starts to execute and in case of cascading backend adds shutdown hook
- which is not permitted during jvm shutdown and breaks

To fix this behaviour I made `onComplete` logic to happen before result future get resolved by changing `onComplete` to `andThen`

Apr 15, 2021
d044810
zip
tar.gz

twitter/20210128

Alternative implementation to DeprecatedParquetInputFormat with fix (#…

…1937)

When combining N Parquet files, the first record of files 2 to N gets skipped while the last record from the previous file is returned instead. This means losing some records while others get duplicated, quite bad.

This was fixed a month ago in apache/parquet-java#844 but we would need to update the dependencies.

Should we do this approach or work towards updating deps?

Jan 28, 2021
cae587e
zip
tar.gz

twitter/20210121

Remove mapreduce.input.fileinputformat.inputdir setting in memory sou…

…rce (#1936)

Memory source sets the mapreduce.input.fileinputformat.inputdir property to
a random UUID value. Often in clusters with HDFS federation, paths like that are
not valid namespaces.
While this path is not usually checked since this is a memory source, in clusters
where Kerberos is enabled, Hadoop lists the input dirs to a job to get delegation
tokens. Since this path is not valid, this results in a FileNotFoundException on
a Kerberized cluster.

This patch removes this setting in Scalding memory sources since they are not valid anyway.

Co-authored-by: Navin Viswanath <[email protected]>

Jan 25, 2021
ca1e8fd
zip
tar.gz

twitter/20200929

[temporary] bring twitter internal changes to release

Sep 29, 2020
9ec0734
zip
tar.gz

twitter/20200601

Add more guards on ReferencedClassFinder (#1931)

Jun 1, 2020
1dba599
zip
tar.gz

twitter/20200508

Add type ascriptions to serialization code (#1926)

Scalding  macros expand into a large amount of code, most of which
contained no or very little type ascriptions, leaving a lot of
unnecessary work to the compiler. By explicitly adding these type
ascriptions in the generated code, we can reduce compilation times.

May 8, 2020
83f4efe
zip
tar.gz

twitter/20190422

Add explicit type annotation to make code compatible with scala 2.12

Apr 22, 2019
500c86c
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.18.0-RC3

v0.18.0-RC2

tlazaro/twitter/20210917

twitter/20210415

twitter/20210128

twitter/20210121

twitter/20200929

twitter/20200601

twitter/20200508

twitter/20190422

Tags: twitter/scalding