-
Notifications
You must be signed in to change notification settings - Fork 60
test: add code snippets for using bigframes.ml #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
a8576f9
test: add code snippets for using bigframes.ml
ashleyxuu aa442ce
test: add code snippets for loading data from BigQuery Job (#154)
ashleyxuu 14bf68d
feat: add bigframes.options.compute.maximum_bytes_billed option that …
orrbradford 9a42f03
docs: fix indentation on `read_gbq_function` code sample (#163)
tswast fdeb9f2
move ml samples to e2e, make the tests standalone
ashleyxuu d95732d
Merge branch 'main' into add-code-snippets-use-ml
ashleyxuu 32e03cc
feat: add pd.get_dummies (#149)
milkshakeiii a005bd1
fix: fix the failed test
ashleyxuu 80a524e
Merge branch 'main' into add-code-snippets-use-ml
ashleyxuu 7be45b6
fix: reorganize the directory
ashleyxuu 31a7164
Merge remote-tracking branch 'origin/add-code-snippets-use-ml' into a…
ashleyxuu 591c54d
Merge branch 'main' into add-code-snippets-use-ml
ashleyxuu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Copyright 2023 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
def test_clustering_model(): | ||
# [START bigquery_dataframes_clustering_model] | ||
from bigframes.ml.cluster import KMeans | ||
import bigframes.pandas as bpd | ||
|
||
# Load data from BigQuery | ||
query_or_table = "bigquery-public-data.ml_datasets.penguins" | ||
bq_df = bpd.read_gbq(query_or_table) | ||
|
||
# Create the KMeans model | ||
cluster_model = KMeans(n_clusters=10) | ||
cluster_model.fit(bq_df["culmen_length_mm"], bq_df["sex"]) | ||
|
||
# Predict using the model | ||
result = cluster_model.predict(bq_df) | ||
# Score the model | ||
score = cluster_model.score(bq_df) | ||
# [END bigquery_dataframes_clustering_model] | ||
assert result is not None | ||
assert score is not None |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Copyright 2023 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
def test_llm_model(): | ||
PROJECT_ID = "bigframes-dev" | ||
REGION = "us" | ||
CONN_NAME = "bigframes-ml" | ||
# [START bigquery_dataframes_gen_ai_model] | ||
from bigframes.ml.llm import PaLM2TextGenerator | ||
import bigframes.pandas as bpd | ||
|
||
# Create the LLM model | ||
session = bpd.get_global_session() | ||
connection = f"{PROJECT_ID}.{REGION}.{CONN_NAME}" | ||
model = PaLM2TextGenerator(session=session, connection_name=connection) | ||
|
||
df_api = bpd.read_csv("gs://cloud-samples-data/vertex-ai/bigframe/df.csv") | ||
|
||
# Prepare the prompts and send them to the LLM model for prediction | ||
df_prompt_prefix = "Generate Pandas sample code for DataFrame." | ||
df_prompt = df_prompt_prefix + df_api["API"] | ||
|
||
# Predict using the model | ||
df_pred = model.predict(df_prompt.to_frame(), max_output_tokens=1024) | ||
# [END bigquery_dataframes_gen_ai_model] | ||
assert df_pred["ml_generate_text_llm_result"] is not None | ||
assert df_pred["ml_generate_text_llm_result"].iloc[0] is not None |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Copyright 2023 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
def test_regression_model(): | ||
# [START bigquery_dataframes_regression_model] | ||
from bigframes.ml.linear_model import LinearRegression | ||
import bigframes.pandas as bpd | ||
|
||
# Load data from BigQuery | ||
query_or_table = "bigquery-public-data.ml_datasets.penguins" | ||
ashleyxuu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
bq_df = bpd.read_gbq(query_or_table) | ||
|
||
# Filter down to the data to the Adelie Penguin species | ||
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"] | ||
|
||
# Drop the species column | ||
adelie_data = adelie_data.drop(columns=["species"]) | ||
|
||
# Drop rows with nulls to get training data | ||
training_data = adelie_data.dropna() | ||
|
||
# Specify your feature (or input) columns and the label (or output) column: | ||
feature_columns = training_data[ | ||
["island", "culmen_length_mm", "culmen_depth_mm", "flipper_length_mm", "sex"] | ||
] | ||
label_columns = training_data[["body_mass_g"]] | ||
|
||
test_data = adelie_data[adelie_data.body_mass_g.isnull()] | ||
|
||
# Create the linear model | ||
model = LinearRegression() | ||
model.fit(feature_columns, label_columns) | ||
|
||
# Score the model | ||
score = model.score(feature_columns, label_columns) | ||
|
||
# Predict using the model | ||
result = model.predict(test_data) | ||
# [END bigquery_dataframes_regression_model] | ||
assert test_data is not None | ||
assert feature_columns is not None | ||
assert label_columns is not None | ||
assert model is not None | ||
assert score is not None | ||
assert result is not None |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.