Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
214 changes: 213 additions & 1 deletion docs/source/recipes/continuous.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,214 @@
"session.select(model=session.recommended_model, notes=\"Lowest AIC; recommended model\")"
]
},
{
"cell_type": "markdown",
"id": "44b45cdf",
"metadata": {},
"source": [
"## Changing how parameters are counted\n",
"\n",
"For the purposes of statistical calculations, e.g., calcuation the AIC value and p-values, BMDS only counts the number of parameters off their respective boundaries by default. But users have the option of parameterizing BMDS so that it counts all the parameters in a model, irrespective if they were estimated on or off of their boundaries. \n",
"\n",
"Parameterizing BMDS to count all parameters will impact AIC and p-value calculations and therefore may impact the selection of models using the automatic model selection logic. For example, consider the following dataset run using the default setting of only counting parameters that are off a boundary."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1a0109b",
"metadata": {},
"outputs": [],
"source": [
"import pybmds\n",
"import pandas as pd\n",
"\n",
"dataset = pybmds.ContinuousDataset(\n",
" doses=[0, 25, 50, 100, 200],\n",
" ns=[20, 20, 20, 20, 20],\n",
" means=[6.1, 6.3, 9.5, 15.2, 30.1],\n",
" stdevs=[3.5, 3.3, 4.1, 3.6, 4.5],\n",
")\n",
"\n",
"# create a BMD session\n",
"session = pybmds.Session(dataset=dataset)\n",
"\n",
"# add all default models\n",
"session.add_default_models()\n",
"\n",
"# execute the session\n",
"session.execute()\n",
"\n",
"# recommend a best-fitting model\n",
"session.recommend()\n",
"\n",
"# # Print out a summary table for the recommended model\n",
"\n",
"model = session.recommended_model\n",
"if model is not None:\n",
" df = pd.DataFrame([[\n",
" model.name(),\n",
" model.results.bmdl,\n",
" model.results.bmd,\n",
" model.results.bmdu,\n",
" model.results.tests.dfs[3],\n",
" model.results.tests.p_values[1],\n",
" model.results.tests.p_values[2],\n",
" model.results.tests.p_values[3],\n",
" model.results.fit.aic\n",
" ]], columns=[\"Model\", \"BMDL\", \"BMD\", \"BMDU\", \"DF\", \"P-Value 2\", \"P-Value 3\", \"P-Value 4\", \"AIC\"])\n",
" df = df.T # Transpose for vertical display\n",
" df.columns = [\"Value\"]\n",
" display(df)\n",
" \n",
"else:\n",
" print(\"No recommended model.\")\n",
"\n",
"fig = session.recommended_model.plot()\n"
]
},
{
"cell_type": "markdown",
"id": "1d150eed",
"metadata": {},
"source": [
"So, for this dataset, we can see that the automated model selection workflow chose the Power model. We can also summarize the modeling results for all models in the session by using the `summary_table` function defined above:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0621419d",
"metadata": {},
"outputs": [],
"source": [
"summary_table(session)"
]
},
{
"cell_type": "markdown",
"id": "ae415af0",
"metadata": {},
"source": [
"Further, we can also define a custom function to print out a table of models that had at least one parameter estimated on a bound:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "569f9d40",
"metadata": {},
"outputs": [],
"source": [
"def summary_parameter_bounds_table(session):\n",
" rows = []\n",
" for model in session.models:\n",
" params = model.results.parameters\n",
" for name, value, bounded in zip(params.names, params.values, params.bounded):\n",
" if bool(bounded):\n",
" rows.append({\n",
" \"Model\": model.name(),\n",
" \"Parameter\": name,\n",
" \"Value\": value,\n",
" \"On Bound\": bool(bounded)\n",
" })\n",
" df = pd.DataFrame(rows)\n",
" display(df)\n",
"\n",
"# Usage:\n",
"summary_parameter_bounds_table(session)"
]
},
{
"cell_type": "markdown",
"id": "48893a23",
"metadata": {},
"source": [
"We can see above that only two models, the Polynomial 3 and Exponential 3, had parameters estimated on a bound. So, although it seems like the issue of parameters not being counted for AIC or p-value calculation when on a boundary, we can can re-model this dataset parameterizing BMDS to count all parameters in a model to to confirm whether that assumption is accurate:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9599e12e",
"metadata": {},
"outputs": [],
"source": [
"session2 = pybmds.Session(dataset=dataset)\n",
"session2.add_default_models(\n",
" settings={\n",
" \"count_all_parameters_on_boundary\": True\n",
" }\n",
")\n",
"session2.execute()\n",
"\n",
"# recommend a best-fitting model\n",
"session2.recommend()\n",
"\n",
"# Print out a summary table for the recommended model\n",
"model = session2.recommended_model\n",
"if model is not None:\n",
" df2 = pd.DataFrame([[\n",
" model.name(),\n",
" model.results.bmdl,\n",
" model.results.bmd,\n",
" model.results.bmdu,\n",
" model.results.tests.dfs[3],\n",
" model.results.tests.p_values[1],\n",
" model.results.tests.p_values[2],\n",
" model.results.tests.p_values[3],\n",
" model.results.fit.aic\n",
" ]], columns=[\"Model\", \"BMDL\", \"BMD\", \"BMDU\", \"DF\", \"P-Value 2\", \"P-Value 3\", \"P-Value 4\", \"AIC\"])\n",
" df2 = df2.T # Transpose for vertical display\n",
" df2.columns = [\"Value\"]\n",
" display(df2)\n",
" \n",
"else:\n",
" print(\"No recommended model.\")\n",
"\n",
"fig = session2.recommended_model.plot()"
]
},
{
"cell_type": "markdown",
"id": "99f79e14",
"metadata": {},
"source": [
"The Power model was still selected as the best fitting model, even when counting all parameters on a boundary. Printing out a summary table that compares the p-values and AICs for `session` and `session2`lets us examine why:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a595b09",
"metadata": {},
"outputs": [],
"source": [
"def compare_pvalue_aic(session_a, session_b):\n",
" rows = []\n",
" for model_a, model_b in zip(session_a.models, session_b.models):\n",
" rows.append({\n",
" \"Model\": model_a.name(),\n",
" \"Session P-Value\": model_a.results.tests.p_values[3],\n",
" \"Session AIC\": model_a.results.fit.aic,\n",
" \"Session2 P-Value\": model_b.results.tests.p_values[3],\n",
" \"Session2 AIC\": model_b.results.fit.aic,\n",
" })\n",
" df = pd.DataFrame(rows)\n",
" display(df)\n",
"\n",
"# Usage:\n",
"compare_pvalue_aic(session, session2)"
]
},
{
"cell_type": "markdown",
"id": "b54885c4",
"metadata": {},
"source": [
"We can see that for the models that have parameters on a boundary (as reported above: Polynomial 3 and Exponential 3), the `session2` AICs are higher than the `session` AICs due to more parameters being counted for the penalization term. The p-values for those models are also lower given the decrease in the degrees of freedom. Hence, because neither of these models were selected previously, the ultimate model selection results were not different between the`session2` and `session` analyses."
]
},
{
"cell_type": "markdown",
"id": "5324f887",
Expand All @@ -582,7 +790,7 @@
"\n",
"### Individual Response Data\n",
"\n",
"To run the Jonckheere-Terpstra trend test on a single dataset with a strong apparent trend, first load an individual response continuous dataset:"
"To test for a trend in a continuous dataset with responses measured in individuals using the Jonckheere-Terpstra trend test, first load an individual response continuous dataset. Note that this individual response dataset has an *apparent* strong increasing trend and the Jonckheere-Terpstra test will provided results that support or refute that observation."
]
},
{
Expand Down Expand Up @@ -623,13 +831,17 @@
"outputs": [],
"source": [
"# Run the Jonckheere-Terpstra trend test for an increasing trend\n",
"#Note that dataset.trend will use the Jonckheere-Terpstra trend test given\n",
"# that the dataset is continuous.\n",
"result1 = dataset1.trend(hypothesis=\"increasing\")\n",
"\n",
"# Display the results for the increasing trend test\n",
"# The alternative hypothesis for the increasting trend test is that there is a monotonic increasing trend in the data.\n",
"print(\"Jonckheere-Terpstra Trend Test Result:\")\n",
"print(result1.tbl())\n",
"\n",
"# Run the two-sided test\n",
"# The alternative hypothesis for the two-sided trend test is that there is a monotonic trend in the data, either increasing or decreasing.\n",
"result1_two_sided = dataset1.trend(hypothesis=\"two-sided\")\n",
"\n",
"# Display the results for the two-sided trend test\n",
Expand Down
Loading