Explore query results in notebooks

You can explore BigQuery query results by using Colab Enterprise notebooks in BigQuery.

In this tutorial, you query data from a BigQuery public dataset and explore the query results in a notebook.

Required permissions

To create and run notebooks, you need the following Identity and Access Management (IAM) roles:

Open query results in a notebook

You can run a SQL query and then use a notebook to explore the data. This approach is useful if you want to modify the data in BigQuery before working with it, or if you need only a subset of the fields in the table.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Type to search field, enter bigquery-public-data.

    If the project is not shown, enter bigquery in the search field, and then click Search to all projects to match the search string with the existing projects.

  3. Select bigquery-public-data > ml_datasets > penguins.

  4. For the penguins table, click View actions, and then click Query.

  5. Add an asterisk (*) for field selection to the generated query, so that it reads like the following example:

    SELECT * FROM `bigquery-public-data.ml_datasets.penguins` LIMIT 1000;
  6. Click Run.

  7. In the Query results section, click Explore data, and then click Explore with Python notebook.

Prepare the notebook for use

Prepare the notebook for use by connecting to a runtime and setting application default values.

  1. In the notebook header, click Connect to connect to the default runtime.
  2. In the Setup code block, click Run cell.

Explore the data

  1. To load the penguins data into a BigQuery DataFrame and show the results, click Run cell in the code block in the Result set loaded from BigQuery job as a DataFrame section.
  2. To get descriptive metrics for the data, click Run cell in the code block in the Show descriptive statistics using describe() section.
  3. Optional: Use other Python functions or packages to explore and analyze the data.

The following code sample shows using bigframes.pandas to analyze data, and bigframes.ml to create a linear regression model from penguins data in a BigQuery DataFrame:

import bigframes.pandas as bpd

# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)

# Inspect one of the columns (or series) of the DataFrame:
bq_df["body_mass_g"]

# Compute the mean of this series:
average_body_mass = bq_df["body_mass_g"].mean()
print(f"average_body_mass: {average_body_mass}")

# Find the heaviest species using the groupby operation to calculate the
# mean body_mass_g:
(
    bq_df["body_mass_g"]
    .groupby(by=bq_df["species"])
    .mean()
    .sort_values(ascending=False)
    .head(10)
)

# Create the Linear Regression model
from bigframes.ml.linear_model import LinearRegression

# Filter down to the data we want to analyze
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"]

# Drop the columns we don't care about
adelie_data = adelie_data.drop(columns=["species"])

# Drop rows with nulls to get our training data
training_data = adelie_data.dropna()

# Pick feature columns and label column
X = training_data[
    [
        "island",
        "culmen_length_mm",
        "culmen_depth_mm",
        "flipper_length_mm",
        "sex",
    ]
]
y = training_data[["body_mass_g"]]

model = LinearRegression(fit_intercept=False)
model.fit(X, y)
model.score(X, y)