Spice is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query datasets sourced from any database, data warehouse, or data lake.
📣 Read the Spice.ai OSS announcement blog post.
Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and machine learning (ML) in software.
The Spice runtime is written in Rust and leverages industry leading technologies like Apache DataFusion, Apache Arrow, Apache Arrow Flight, and DuckDB.
Spice makes querying data by SQL across one or more data sources simple and fast. Easily co-locate a managed working set of data with your application or ML, locally accelerated in-memory with Arrow, with SQLite/DuckDB, or with an attached database like PostgreSQL for high-performance, low-latency queries. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.
-
Local Acceleration with both OLAP (Arrow/DuckDB) and OLTP (SQLite/PostgreSQL) databases at dataset granularity compared to other OLAP only or OLTP only systems.
-
Separation of materialization and storage/compute compared with monolith data systems and data lakes. Keep compute colocated with source data while bringing a materialized working set next to your application, dashboard, or data/ML pipeline.
-
Edge to cloud native. Chainable and designed to be deployed standalone, as a container sidecar, as a microservice, in a cluster across laptops, the Edge, On-Prem, to a POP, and to all public clouds.
1. Faster applications and frontends. Accelerate and co-locate datasets with applications and frontends, to serve more concurrent queries and users with faster page loads and data updates. Try the CQRS sample app
2. Faster dashboards, analytics, and BI. Faster, more responsive dashboards without massive compute costs. Watch the Apache Superset demo
3. Faster data pipelines, machine learning training and inferencing. Co-locate datasets in pipelines where the data is needed to minimize data-movement and improve query performance. Predict hard drive failure with the SMART data demo
4. Easily query many data sources. Federated SQL query across databases, data warehouses, and data lakes using Data Connectors.
Currently supported data connectors for upstream datasets. More coming soon.
| Name | Description | Status | Protocol/Format | Refresh Modes |
|---|---|---|---|---|
databricks |
Databricks | Alpha | Spark Connect S3/Delta Lake |
full |
postgres |
PostgreSQL | Alpha | full |
|
spiceai |
Spice.ai | Alpha | Arrow Flight | append, full |
s3 |
S3 | Alpha | Parquet | full |
dremio |
Dremio | Alpha | Arrow Flight | full |
mysql |
MySQL | Alpha | full |
|
duckdb |
DuckDB | Alpha | full |
|
snowflake |
Snowflake | Coming soon! | Arrow Flight SQL | full |
bigquery |
BigQuery | Coming soon! | Arrow Flight SQL | full |
Currently supported data stores for local materialization/acceleration. More coming soon.
| Name | Description | Status | Engine Modes | Refresh Modes |
|---|---|---|---|---|
arrow |
In-Memory Arrow Records | Alpha | memory |
append, full |
duckdb |
Embedded DuckDB | Alpha | memory, file |
append, full |
sqlite |
Embedded SQLite | Alpha | memory, file |
append, full |
postgres |
Attached PostgreSQL | Alpha | append, full |
Spice enables developers to build both data and AI-driven applications by co-locating data and ML models with applications. Read more about the vision to enable the development of intelligent AI-driven applications.
quickstart.mp4
Step 1. Install the Spice CLI:
On macOS, Linux, and WSL:
curl https://install.spiceai.org | /bin/bashOr using brew:
brew install spiceai/spiceai/spiceOn Windows:
curl -L "https://install.spiceai.org/Install.ps1" -o Install.ps1 && PowerShell -ExecutionPolicy Bypass -File ./Install.ps1Step 2. Initialize a new Spice app with the spice init command:
spice init spice_qsA spicepod.yaml file is created in the spice_qs directory. Change to that directory:
cd spice_qsStep 3. Start the Spice runtime:
spice runExample output will be shown as follows:
Spice.ai runtime starting...
Using latest 'local' runtime version.
2024-02-21T06:11:56.381793Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
2024-02-21T06:11:56.381853Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-02-21T06:11:56.382038Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052The runtime is now started and ready for queries.
Step 4. In a new terminal window, add the spiceai/quickstart Spicepod. A Spicepod is a package of configuration defining datasets and ML models.
spice add spiceai/quickstartThe spicepod.yaml file will be updated with the spiceai/quickstart dependency.
version: v1beta1
kind: Spicepod
name: PROJECT_NAME
dependencies:
- spiceai/quickstartThe spiceai/quickstart Spicepod will add a taxi_trips data table to the runtime which is now available to query by SQL.
2024-02-22T05:53:48.222952Z INFO runtime: Loaded dataset: taxi_trips
2024-02-22T05:53:48.223101Z INFO runtime::dataconnector: Refreshing data for taxi_tripsStep 5. Start the Spice SQL REPL:
spice sqlThe SQL REPL inferface will be shown:
Welcome to the interactive Spice.ai SQL Query Utility! Type 'help' for help.
show tables; -- list available tables
sql>
Enter show tables; to display the available tables for query:
sql> show tables
+------------+
| table_name |
+------------+
| taxi_trips |
+------------+
Query took: 0.007505084 seconds. 1/1 rows displayed.
Enter a query to display the longest taxi trips:
sql> SELECT trip_distance, total_amount FROM taxi_trips ORDER BY trip_distance DESC LIMIT 10;
Output:
+---------------+--------------+
| trip_distance | total_amount |
+---------------+--------------+
| 312722.3 | 22.15 |
| 97793.92 | 36.31 |
| 82015.45 | 21.56 |
| 72975.97 | 20.04 |
| 71752.26 | 49.57 |
| 59282.45 | 33.52 |
| 59076.43 | 23.17 |
| 58298.51 | 18.63 |
| 51619.36 | 24.2 |
| 44018.64 | 52.43 |
+---------------+--------------+
Query took: 0.002458976 seconds
Using the Docker image locally:
docker pull spiceai/spiceaiIn a Dockerfile:
from spiceai/spiceai:latestUsing Helm:
helm repo add spiceai https://helm.spiceai.org
helm install spiceai spiceai/spiceaiYou can use any number of predefined datasets available from the Spice.ai Cloud Platform in the Spice runtime.
A list of publicly available datasets from Spice.ai can be found here: https://docs.spice.ai/building-blocks/datasets.
In order to access public datasets from Spice.ai, you will first need to create an account with Spice.ai by selecting the free tier membership.
Navigate to spice.ai and create a new account by clicking on Try for Free.
After creating an account, you will need to create an app in order to create to an API key.
You will now be able to access datasets from Spice.ai. For this demonstration, we will be using the spice.ai/eth.recent_blocks dataset.
Step 1. Log in and authenticate from the command line using the spice login command. A pop up browser window will prompt you to authenticate:
spice loginStep 2. Initialize a new project and start the runtime:
# Initialize a new Spice app
spice init spice_app
# Change to app directory
cd spice_app
# Start the runtime
spice runStep 3. Configure the dataset:
In a new terminal window, configure a new dataset using the spice dataset configure command:
spice dataset configureYou will be prompted to enter a name. Enter a name that represents the contents of the dataset
dataset name: (spice_app) eth_recent_blocksEnter the description of the dataset:
description: eth recent logs
Enter the location of the dataset:
from: spice.ai/eth.recent_blocksSelect y when prompted whether to accelerate the data:
Locally accelerate (y/n)? yYou should see the following output from your runtime terminal:
2024-02-21T22:49:10.038461Z INFO runtime: Loaded dataset: eth_recent_blocksStep 4. In a new terminal window, use the Spice SQL REPL to query the dataset
spice sqlSELECT number, size, gas_used from eth_recent_blocks LIMIT 10;The output displays the results of the query along with the query execution time:
+----------+--------+----------+
| number | size | gas_used |
+----------+--------+----------+
| 19281345 | 400378 | 16150051 |
| 19281344 | 200501 | 16480224 |
| 19281343 | 97758 | 12605531 |
| 19281342 | 89629 | 12035385 |
| 19281341 | 133649 | 13335719 |
| 19281340 | 307584 | 18389159 |
| 19281339 | 89233 | 13391332 |
| 19281338 | 75250 | 12806684 |
| 19281337 | 100721 | 11823522 |
| 19281336 | 150137 | 13418403 |
+----------+--------+----------+
Query took: 0.004057791 secondsYou can experiment with the time it takes to generate queries when using non-accelerated datasets. You can change the acceleration setting from true to false in the datasets.yaml file.
Comprehensive documentation is available at docs.spiceai.org.
🚀 See the Roadmap to v1.0-stable for upcoming features.
We greatly appreciate and value your support! You can help Spice in a number of ways:
- Build an app with Spice.ai and send us feedback and suggestions at [email protected] or on Discord, X, or LinkedIn.
- File an issue if you see something not quite working correctly.
- Join our team (We’re hiring!)
- Contribute code or documentation to the project (see CONTRIBUTING.md).
- Follow our blog at blog.spiceai.org
⭐️ star this repo! Thank you for your support! 🙏
