xRay enables simple extraction of your S3 metadata into two data formats
- Parquet files that you can import into a spark cluster for analytics
 - An elastic search file to import into elastic search
 
- Generate a binary for your system
 
$ git clone https://github.com/vardhanv/xray.git
$ cd xray
$ sbt universal:packageBin
$ cd target/universal
$ unzip xray-<version>.zip
$ cd xray-<version>/bin
$ ./xray --help
- 
If you generate an elastic search output file (assume xray.out)
- Create an elastic search cluster on AWS
 - Upload the data into elastic search
$ curl --tr-encoding -XPOST 'http://<your_elastic_search_url>/_bulk' --data-binary @xray.out - Now you can analyze it in the AWS Elastic Search / Kibana service
 
 - 
If you generate a parquet file you can analyze it in a spark cluster
- Go to http://www.databricks.com
 - Click on "Manage Account"
 - Select community edition
 - Create a cluster - wait for the cluster to come online
 - Create a table using the parquet file - (assume "giab")
 - Create a notebook - Workspare/users/.../Create/Notebook/Language Scala
 
 
> import sqlContext.implicits._
> import org.apache.spark.sql.functions._
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val df = sqlContext.table("giab")
> df.count
> df.show()
> // Deduplicated Storage Used in TB
> val TB : Long = 1000000000000L
> val totalTB  = df.select(col("Content_Length")).rdd.map(_(0).asInstanceOf[Long].toDouble/TB).reduce(_+_)
> val totalTB_unique = df.dropDuplicates("ETag").select(col("Content_Length")).rdd.map(_(0).asInstanceOf[Long].toDouble/TB).reduce(_+_)
> val totalSavings = totalTB - totalTB_unique
$ ./xray --help
xRay 1.0
Usage: xRay [options]
  -b, --bucket <value>     target s3 bucket
  -p, --parquet            generate parquet file output
  -l, --elastic            generate elastic search output
  -x, --number-obj:maxObj=objAtATime
                           optional, <x=y>, index "x" objects "y" at a time. defaults: x [1 billion], y:[1000]
  -e, --ep-url <value>     optional, endpoint. default = https://s3.amazonaws.com
  -f, --profile <value>    optional, aws profile. default = default. create using "aws --configure"
  -r, --region <value>     optional, s3 region.
  -o, --output <value>     optional, output file. default = xray.out
  --help                   prints help text