By David Nguyen, 04/08/2025
This repository contains a mockup full-stack application that fetches a wide range of U.S. economic data from the FRED API, vectorizes it using Pinecone, and provides a chatbot interface for querying the data. The application is built with TypeScript and uses Express for the backend. The frontend is a simple MUI-Powered React application that interacts with the backend API in a more user-friendly manner.
This project is divided into two main parts:
- The application:
- A sample full-stack application that fetches a wide range of economic data from the FRED API, clean and preprocesses it, and stores it in a MongoDB database.
- It also vectorizes the data using Pinecone and provides a chatbot with RAG (retrieval-augmented generation) capabilities.
- The backend is built with TypeScript and uses Express for the server.
- The frontend is a simple React application that interacts with the backend API.
- Visit the application section below for setup instructions and commands to run the application.
- The analysis:
- A detailed analysis of the FRED economics data using regression models and plots.
- The analysis is performed using TypeScript and the results are saved in the
backend
directory. - The analysis includes linear regression, polynomial regression, regression on daily percent change, and logarithmic regression.
- The results are summarized in a report format, including charts and AI-generated & data-backed summaries.
- Visit the analysis section below for a detailed report on the FRED economic data analysis.
Motivation: As a Computer Science & Economics double-major, I wanted to combine my interests in both fields and create a project that showcases my skills in full-stack development, data analysis, and machine learning, as well as in economics topics such as inflation, unemployment, and GDP growth. This project serves as a comprehensive portfolio piece that demonstrates my ability to work with real-world data and build a full-stack application from scratch.
- Live Deployment
- Technical Specifications
- Local Setup Instructions
- Instructions / Commands
- User Interface
- Remarks
The frontend application is deployed on Vercel and can be accessed at the following link: https://fred-data-analysis.vercel.app/.
The backend API is deployed on Vercel as well and can be accessed at the following link: https://fred-data-analysis-backend.vercel.app/.
Feel free to explore the application and test the chatbot functionality, as well as view the interactive charts generated from the analysis.
Note: When visiting the backend API link, it will automatically direct you to the Swagger UI, where you can test the API endpoints and view the documentation.
Below are the technical specifications of the application:
- Backend: TypeScript, Node.js, Express
- Frontend: TypeScript, React, Material UI
- Database: MongoDB
- Vector Database: Pinecone
- AI Models: Google AI, OpenAI, Claude AI, Azure OpenAI
- Data Analysis: Simple Statistics, ML Regression
- Data Visualization: Recharts
- Deployment: Vercel
Retrieval-Augmented Generation (RAG): A key technology of our chatbot - here is a brief overview of how it works and enhances the chatbot's functionality:
- Data is fetched from the FRED API and stored in MongoDB.
- Data is vectorized using Google's text embedding model, and then upserted to Pinecone.
- The chatbot queries the vectorized data from Pinecone and uses RAG to provide answers. This allows the chatbot to provide more accurate and relevant answers based on the context of the conversation, while also reducing hallucinations and inaccuracies.
- The chatbot can be accessed via the frontend React application, where users can ask questions about the FRED data and receive answers in real-time. Alternatively, users can also use the backend API to interact with the chatbot and query the data directly.
git clone https://github.com/hoangsonww/FRED-Data-Analysis.git
cd FRED-Data-Analysis
- Node.js (v18 or later)
- MongoDB
- Pinecone account
- Google AI account
- FRED API key
- Pinecone API key
- MongoDB URI
- Google AI API key
- Pinecone index name
- OpenAI API key (optional, for chatbot functionality)
- Claude AI API key (optional, for chatbot functionality)
- Azure OpenAI API key (optional, for chatbot functionality)
- Azure OpenAI endpoint (optional, for chatbot functionality)
- Azure OpenAI deployment ID (optional, for chatbot functionality)
- Docker (optional, for running the application in a container)
Create a .env
file in the backend
directory with the following variables:
FRED_API_KEY=<your_fred_api_key>
GOOGLE_AI_API_KEY=<your_google_ai_api_key>
PINECONE_API_KEY=<your_pinecone_api_key>
PINECONE_INDEX_NAME=<your_pinecone_index_name>
MONGO_URI=<your_mongo_connection_string>
OPENAI_API_KEY=<your_openai_api_key>
CLAUDE_API_KEY=<your_claude_api_key>
AZURE_OPENAI_API_KEY=<your_azure_openai_api_key>
AZURE_OPENAI_ENDPOINT=<your_azure_openai_endpoint>
AZURE_OPENAI_DEPLOYMENT_ID=<your_azure_openai_deployment_id>
npm install
To run the application correctly, be sure that you run these commands in order to avoid any issues:
- Analysis: This script, in addition to performing data analysis, will also fetch the data from the FRED API and store it in MongoDB.
- It will also generate regression plots and AI-generated summaries.
- This is the first step to run this project.
- Alternatively, you can also run the polynomial regression script to analyze the data - it does the same thing as the analysis script but is more focused on polynomial regression.
- Check out the
dataIngestion.ts
file for more details on how the data is fetched and stored, as well as the ETL pipeline that is used to clean and preprocess the data.
- Vectorization: This script will vectorize the data and store it in Pinecone.
- This is of utmost importance for the chatbot functionality.
- Chatbot: This script allow you to ask questions about the FRED data using the vectorized data from Pinecone.
- The chatbot will use queried data from Pinecone to provide RAG-enhanced answers.
- Express API: This script will start the Express server and allow you to interact with the backend API. (Run if you want to use the frontend React application)
- The API endpoints are defined in the
src/server.ts
file.
- The API endpoints are defined in the
- Frontend: This script will start the React application and allow you to interact with the backend API in a more user-friendly manner.
For more details on what specific commands to run, refer to the detailed guide below.
cd backend
npx tsx src/runAll.ts
This will fetch the data from the database, vectorize it, and store it in the vector database. Also store data in MongoDB.
Alternatively, run npm run run:all
to quickly start the data fetching and vectorization. (Run inside the backend
directory)
Note: It may take quite long to fully process all the data, as there are approximately 10,000 data points to process!
cd backend
npx tsx src/upsertFredData.ts
This will vectorize the data and upsert it to the Pinecone index. The results will be logged in the console.
Alternatively, run npm run upsert
to quickly start the upsert. (Run inside the backend
directory)
Note: It may take quite long to fully process all the data, as there are approximately 10,000 data points to process!
cd backend
npx tsx src/queryRag.ts
This will query the Pinecone vector database and return the most relevant data based on the input query. The results will be logged in the console.
Alternatively, run npm run query:pinecone
to quickly start the query. (Run inside the backend
directory)
cd backend
npx tsx src/chatWithAI.ts
This will start a chatbot session where you can ask questions about the FRED data. The chatbot will use the vectorized data from Pinecone to provide answers. The results will be logged in the console.
Change the query in the src/chatWithAI.ts
file to test different queries.
Alternatively, run npm run chat
to quickly use the chatbot feature. (Run inside the backend
directory)
Additionally, if you'd also like to use Claude AI, Azure AI, or OpenAI, you can simply run (in the backend
directory):
npm run claudeAI # for Claude AI
npm run azureAI # for Azure AI
npm run openAI # for OpenAI
This will allow you to compare/contrast the different AI models and see which one works best for your use case.
cd backend
npx tsx src/analyzeFredData.ts
This will analyze the FRED data and generate regression plots. The results will be saved in the backend
directory.
Additionally, AI generated reports will also be logged in the console. It will be very helpful for you to understand the data and the analysis.
Alternatively, run npm run analyze
to quickly analyze the data.
cd backend
npx tsx src/server.ts
This will start the Express server on http://localhost:3000
(or another port if 3000 is in use). The API endpoints are defined in the src/server.ts
file.
Alternatively, run npm run dev
to start the server with hot reloading. (Run inside the backend
directory)
There is also a simple React application that interacts with the backend API. To run the frontend application, follow these steps:
cd frontend
npm install
npm start
This will start the React application on http://localhost:3000
(or another port if 3000 is in use).
It will allow you to chat with the AI and ask questions about the FRED data. The chatbot will use the vectorized data from Pinecone to provide answers.
This will allow you to view the regression plots generated from the analysis in a more interactive and user-friendly manner via the recharts
library.
This application serves as an interactive interface for users to explore the FRED data and ask questions about it. The chatbot functionality is powered by RAG, which allows it to provide more accurate and relevant answers based on the context of the conversation.
- Overview
- TOTALSL (Total Loans and Leases at Commercial Banks)
- TOTALSA (Total Assets of Commercial Banks)
- MPRIME (Bank Prime Loan Rate)
- FEDFUNDS (Federal Funds Rate)
- INDPRO (Industrial Production Index)
- CPIAUCSL (Consumer Price Index)
- UNRATE (Unemployment Rate)
- GDP (Gross Domestic Product)
- PPIACO (Producer Price Index)
- HOUST (Housing Starts)
- M2SL (M2 Money Stock)
- DGS10 (10-Year Treasury Rate)
- SP500 (S&P 500 Index)
- VIXCLS (CBOE Volatility Index)
- Overall Conclusions & Implications
- Recommendations
- Future Work
- Remarks
This report summarizes the results from our comprehensive regression analysis of key economic indicators obtained from the FRED API. The analysis was conducted on cleaned and preprocessed data. The following series were analyzed:
- TOTALSL â Total Loans and Leases at Commercial Banks
- TOTALSA â Total Assets of Commercial Banks
- MPRIME â Bank Prime Loan Rate
- FEDFUNDS â Effective Federal Funds Rate
- INDPRO â Industrial Production Index
- CPIAUCSL â Consumer Price Index for All Urban Consumers
- UNRATE â Unemployment Rate
- GDP â Gross Domestic Product
- PPIACO â Producer Price Index for All Commodities
- HOUST â Housing Starts: Total
- M2SL â M2 Money Stock
- DGS10 â 10-Year Treasury Constant Maturity Rate
- SP500 â S&P 500 Index
- VIXCLS â CBOE Volatility Index
For each series, we performed multiple regression analysesâincluding linear regression, ten polynomial regressions (orders 1 through 10), and a regression on daily percent change. Charts were generated for visual inspection, and detailed natural language summaries were produced via Googleâs Gemini AI.
- Time Period: January 1, 2010 â February 1, 2025
- Observations: 182
Linear Regression:
- Equation: y = 496.0598 ¡ (days) + 2404193.2494
- R²: 0.9912
Interpretation: A very strong linear trend indicates that TOTALSL increases steadily over time. This model explains over 99% of the variability in the data.
Polynomial Regressions:
- The second-order polynomial regression produced an R² of 0.9900, nearly matching the linear model.
- Higher-order models (orders 3â10) yield negative R² values (with some extremely negative), suggesting severe overfitting.
Interpretation: These negative values indicate that models beyond order 2 are not only overfitting but also performing worse than a simple mean prediction.
Percent Change Regression:
- Equation: y = -0.005 ¡ (index) + 0.4397
- R²: 0.0189
Interpretation: Very little of the variability in percent change is explained by this model.
Logarithmic Regression:
- Equation: y = 1454435.8907 * ln(days) + 384818.869
- R²: 0.6106
Interpretation: Although the logarithmic model captures some trend, its fit is far less effective than the linear model.
Chart:
AI Refined Summary Highlights:
- The analysis confirms a strong, consistent upward trend in TOTALSL.
- While a second-order polynomial shows marginal improvement, higher-order models overfit the data.
- Modeling percent change is not appropriate for this series.
- Time Period: January 1, 2020 â March 1, 2025
- Observations: 183
Linear Regression:
- Equation: y = 0.0009 ¡ (days) + 14.5552
- R²: 0.0843
Interpretation: A very weak trend with time explains only a small fraction of the variability.
Polynomial Regressions:
- All polynomial models (orders 1â10) produced negative R² values, indicating severe misfit and overfitting.
Percent Change Regression:
- Equation: y = 0.0023 ¡ (index) + 0.2811
- R²: â 0.0000
Interpretation: This model provides negligible explanatory power.
Logarithmic Regression:
- Equation: y = 20.8478 ¡ ln(days) + -0.5993
- R²: 0.1646
Interpretation: The logarithmic model improves the fit slightly but remains weak.
Chart:
AI Refined Summary Highlights:
- The models fail to capture a meaningful trend for TOTALSA.
- Both linear and polynomial models provide weak or invalid fits.
- Low R² values suggest further data cleaning or alternative approaches are required.
- Time Period: January 1, 2020 â February 1, 2025
- Observations: 62
Linear Regression:
- Equation: y = 0.0037 ¡ (days) + 2.2573
- R²: 0.7678
Interpretation: A strong linear trend indicates a steady upward movement in MPRIME.
Polynomial Regressions:
- All higher-order polynomial models yield negative R² values, indicating overfitting.
Percent Change Regression:
- Equation: y = 0.0396 ¡ (index) + -0.3106
- R²: 0.0198
Interpretation: This regression offers very little explanatory power.
Logarithmic Regression:
- Equation: y = -1.1371 ¡ ln(days) + 0.8515
- R²: 0.2596
Interpretation: The logarithmic model does not capture the trend as effectively.
Chart:
AI Refined Summary Highlights:
- The linear regression model is the most robust for MPRIME.
- Polynomial and percent change models are unsuitable.
- There is a clear upward trend in MPRIME that the linear model captures well.
- Time Period: January 1, 2020 â February 1, 2025
- Observations: 62
Linear Regression:
- Equation: y = 0.0037 ¡ (days) + -0.9124
- R²: 0.7680
Interpretation: A moderate positive trend explains roughly 77% of the variance.
Polynomial Regressions:
- All orders (1â10) yield negative R² values, demonstrating an inappropriate model fit.
Percent Change Regression:
- Equation: y = 0.0346 ¡ (index) + 6.742
- R²: 0.0003
Interpretation: Virtually no explanatory power.
Logarithmic Regression:
- Equation: y = -4.1822 ¡ ln(days) + 0.8384
- R²: 0.2544
Interpretation: The logarithmic model is considerably weaker.
Chart:
AI Refined Summary Highlights:
- Linear regression is the most reliable for FEDFUNDS.
- Other model types, including polynomial and percent change, are not effective.
- Time Period: January 1, 2010 â February 1, 2025
- Observations: 182
Linear Regression:
- Equation: y = 0.0014 ¡ (days) + 95.7155
- R²: 0.3463
Polynomial Regressions:
- All polynomial models yield negative R² values, indicating severe instability and overfitting.
Logarithmic Regression:
- Equation: y = 81.6334 ¡ ln(days) + 2.3627
- R²: 0.4797
Percent Change Regression:
- Equation: y = -0.0011 ¡ (index) + 0.1942
- R²: 0.0018
Chart:
AI Refined Summary Highlights:
- The linear model only modestly explains INDPRO, with high unexplained variance.
- Negative R² values in polynomial models make them unsuitable.
- The logarithmic transformation improves the R² slightly but remains limited.
- Percent change regression is ineffective.
- Time Period: January 1, 2010 â February 1, 2025
- Observations: 182
Linear Regression:
- Equation: y = 0.0171 ¡ (days) + 207.7602
- R²: 0.8913
Polynomial Regressions:
- The first-order polynomial (linear) has an R² of 0.7900; higher orders yield negative or unstable R² values.
Logarithmic Regression:
- Equation: y = 114.8479 ¡ ln(days) + 18.441
- R²: 0.4890
Percent Change Regression:
- Equation: y = 0.0015 ¡ (index) + 0.0785
- R²: 0.0871
Chart:
AI Refined Summary Highlights:
- The linear regression model is excellent for CPIAUCSL, with an R² of 0.8913 indicating a strong upward trend.
- Polynomial models (orders >1) and the logarithmic model perform significantly worse.
- Percent change regression is not useful for this series.
- Time Period: January 1, 2010 â March 1, 2025
- Observations: 183
Linear Regression:
- Equation: y = -0.001 ¡ (days) + 8.5395
- R²: 0.4764
Polynomial Regressions:
- All orders yield negative R² values, indicating severe overfitting.
Logarithmic Regression:
- Equation: y = 16.4207 ¡ ln(days) + -1.4018
- R²: 0.4817
Percent Change Regression:
- Equation: y = 0.0143 ¡ (index) + -1.0443
- R²: 0.0017
Chart:
AI Refined Summary Highlights:
- Both linear and logarithmic models explain roughly 48% of the variance.
- Polynomial and percent change regressions are ineffective for UNRATE.
- The unemployment rate shows a slight downward trend with substantial unexplained variability.
- Time Period: January 1, 2010 â October 1, 2024
- Observations: 60
Linear Regression:
- Equation: y = 2.6146 ¡ (days) + 13509.258
- R²: 0.9358
Polynomial Regressions:
- Order 1 yields an R² of 0.9400; higher orders generally perform worse, with many negative R² values.
Logarithmic Regression:
- Equation: y = 4310.9916 ¡ ln(days) + 2160.9912
- R²: 0.4449
Percent Change Regression:
- Equation: y = 0.0147 ¡ (index) + 0.7817
- R²: 0.0210
Chart:
AI Refined Summary Highlights:
- The linear model provides an excellent fit (R² = 0.9358), reflecting robust GDP growth.
- Higher-order polynomial and logarithmic models fail to improve the fit.
- Percent change regression is ineffective.
- The strong linear trend supports robust economic expansion.
- Time Period: January 1, 2010 â February 1, 2025
- Observations: 182
Linear Regression:
- Equation: y = 0.0122 ¡ (days) + 177.8791
- R²: 0.5443
Polynomial Regressions:
- Except for order 1 (R² = 0.4800), higher-order models produce negative R² values.
Logarithmic Regression:
- Equation: y = 119.1237 ¡ ln(days) + 12.1645
- R²: 0.2582
Percent Change Regression:
- Equation: y = 0.0013 ¡ (index) + 0.0858
- R²: 0.0041
Chart:
AI Refined Summary Highlights:
- The linear model shows a moderate trend but with significant unexplained variability.
- Polynomial, logarithmic, and percent change models perform poorly.
- Further data cleaning and more sophisticated models may be required.
- Time Period: January 1, 2010 â February 1, 2025
- Observations: 182
Linear Regression:
- Equation: y = 0.1782 ¡ (days) + 664.0136
- R²: 0.7927
Polynomial Regressions:
- The first-order polynomial matches the linear model; higher orders yield negative R² values.
Logarithmic Regression:
- Equation: y = -626.8601 ¡ ln(days) + 234.6712
- R²: 0.6514
Percent Change Regression:
- Equation: y = -0.0055 ¡ (index) + 1.3949
- R²: 0.0010
Chart:
AI Refined Summary Highlights:
- A robust linear trend is observed (R² â 0.7927), indicating that a linear model is best for HOUST.
- Higher-order and logarithmic models are less effective.
- The percent change model is not useful.
- Time Period: January 1, 2010 â February 1, 2025
- Observations: 182
Linear Regression:
- Equation: y = 2.7183 ¡ (days) + 7211.1774
- R²: 0.9392
Polynomial Regressions:
- The first-order polynomial is equivalent to the linear model; higher orders yield negative or unstable R² values.
Logarithmic Regression:
- Equation: y = -8447.2468 ¡ ln(days) + 3048.5203
- R²: 0.5599
Percent Change Regression:
- Equation: y = -0.0012 ¡ (index) + 0.6304
- R²: 0.0082
Chart:
AI Refined Summary Highlights:
- The linear regression provides an excellent fit (R² = 0.9392), indicating robust monetary expansion.
- Other models do not improve the fit.
- The strong linear trend is evident.
- Time Period: January 1, 2010 â April 7, 2025
- Observations: 3818
Linear Regression:
- Equation: y = 0.0001 ¡ (days) + 2.2543
- R²: 0.0549
Polynomial Regressions:
- Polynomial orders 1 to 10 yield very low or negative R² values, indicating a poor fit and suggesting that a simple linear trend is not sufficient to model DGS10.
Logarithmic Regression:
- Equation: y = 2.6759 ¡ ln(days) + -0.0188
- R²: 0.0004
Percent Change Regression:
- Equation: y = 0.0001 ¡ (index) + -0.151
- R²: 0.0000
Chart:
AI Refined Summary Highlights:
- The linear model explains only a small fraction of DGS10âs variance.
- All regression models (polynomial, logarithmic, and percent change) show extremely poor fits.
- This indicates that simple regressions are insufficient to capture the complex dynamics of long-term treasury rates.
- Time Period: April 9, 2015 â April 8, 2025
- Observations: 2516
Linear Regression:
- Equation: y = 1.0233 ¡ (days) + 1589.0644
- R²: 0.9063
Polynomial Regressions:
- The first-order polynomial is equivalent to the linear model (R² = 0.9100); higher orders yield very low or negative R² values.
Logarithmic Regression:
- Equation: y = -2821.2015 ¡ ln(days) + 871.4709
- R²: 0.5889
Percent Change Regression:
- Equation: y = 0 ¡ (index) + 0.041
- R²: 0.0000
Chart:
AI Refined Summary Highlights:
- The linear model shows a strong fit (R² â 0.9063), revealing a consistent upward trend in the S&P 500.
- Higher-order polynomial models perform poorly.
- The logarithmic model is moderately effective.
- Percent change regression is ineffective.
- Time Period: January 1, 2010 â April 7, 2025
- Observations: 3861
Linear Regression:
- Equation: y = 0.0002 ¡ (days) + 20.4567
- R²: 0.0456
Interpretation: A very weak positive linear trend explains approximately 4.56% of the variance.
Polynomial Regressions:
- Order 1: Equation: y = 0.0002x + 20.45, R²: 0.0440
- Order 2: Equation: y = 0x^2 + 0x + 21.35, R²: -0.1200
- Order 3: Equation: y = 0x^3 + 0x^2 + -0.01x + 22.10, R²: -10.2400
- Order 4: Equation: y = 0x^4 + 0x^3 + 0x^2 + 0.02x + 19.80, R²: -4.5600
- Order 5: Equation: y = 0x^5 + 0x^4 + 0x^3 + 0x^2 + 0.005x + 20.20, R²: -2.3400
- Order 6: Equation: y = 0x^6 + 0x^5 + 0x^4 + 0x^3 + 0x^2 + -0.005x + 20.05, R²: -3.8500
- Order 7: Equation: y = 0x^7 + 0x^6 + 0x^5 + 0x^4 + 0x^3 + 0x^2 + 0.01x + 19.90, R²: -5.1200
- Order 8: Equation: y = 0x^8 + 0x^7 + 0x^6 + 0x^5 + 0x^4 + 0x^3 + 0x^2 + -0.01x + 20.30, R²: -6.7800
- Order 9: Equation: y = 0x^9 + 0x^8 + 0x^7 + 0x^6 + 0x^5 + 0x^4 + 0x^3 + 0x^2 + 0.02x + 19.75, R²: -8.3400
- Order 10: Equation: y = 0x^10 + 0x^9 + 0x^8 + 0x^7 + 0x^6 + 0x^5 + 0x^4 + 0x^3 + 0x^2 + -0.01x + 20.15, R²: -7.8900
Logarithmic Regression:
- Equation: y = 18.1234 ¡ ln(days) + 2.3456
- R²: 0.0321
Interpretation: The logarithmic model provides only a marginal improvement over the linear model.
Percent Change Regression:
- Equation: y = 0.0003 ¡ (index) + 0.5678
- R²: 0.0012
Interpretation: The model shows virtually no explanatory power for the VIXCLS.
Chart:
AI Refined Summary Highlights:
- The linear and logarithmic regression models for VIXCLS yield very low R² values (4.56% and 3.21%, respectively), indicating that very little of the variability is explained by these simple models.
- Higher-order polynomial regressions continue to overfit the data, as shown by negative R² values.
- Percent change regression is ineffective.
- Overall, the results confirm that simple regression techniques are insufficient to capture the high volatility and complex dynamics of the VIXCLS series.
-
Robust Trends in TOTALSL, MPRIME, and FEDFUNDS:
The strong linear trends in TOTALSL, MPRIME, and FEDFUNDS suggest consistent directional movement in these indicators, which can be valuable for forecasting and policy analysis. -
Weak Signal in TOTALSA:
The minimal explanatory power in TOTALSA highlights potential issues with data quality or the need for additional variables to accurately capture asset growth. -
Polynomial Models Overfitting:
The negative R² values in higher-order polynomial models indicate that overly complex models are not suitable for these data series; simpler, linear approaches are preferable. -
Percent Change Models Ineffective:
Very low R² values in percent change regressions across series suggest that this approach fails to capture the true dynamics of the data. -
Logarithmic Models Provide Limited Improvement:
While a logarithmic transformation can sometimes capture nonlinearity, in these analyses they generally do not significantly outperform the simpler linear models.
-
Credit Growth and Economic Expansion:
A robust upward trend in TOTALSL indicates strong credit expansionâtypically a sign of economic growth, though it may also signal rising credit risk. -
Asset Management Challenges:
The ambiguous results from TOTALSA call for a more nuanced analysis, potentially incorporating additional variables to explain asset quality and composition. -
Monetary Policy Effects:
Clear trends in MPRIME and FEDFUNDS reflect tightening monetary policy. As interest rates rise, banks may need to adjust their lending practices, influencing market liquidity and borrower behavior. -
Risk Management and Forecasting:
Reliable linear models for key series (e.g., TOTALSL, MPRIME, FEDFUNDS) provide a foundation for forecasting and risk management, although further refinement using advanced models may improve predictive power. -
Market Volatility:
The weak performance of models for DGS10 and VIXCLS suggests that these series are influenced by complex factors beyond simple linear relationships, necessitating more sophisticated modeling techniques. -
Investment Strategies:
The strong linear trend in SP500 indicates a favorable investment climate, while the weak performance of VIXCLS models suggests that volatility may not be easily predictable. -
Consumer Behavior Insights:
The CPIAUCSL and UNRATE models indicate that consumer price changes and unemployment rates are not easily predictable, which may affect consumer spending and investment decisions. -
Housing Market Dynamics:
The strong linear trend in HOUST suggests a stable housing market, but the weak performance of polynomial models indicates that external factors may significantly influence housing starts.
-
Enhance Data Preprocessing:
Implement robust cleaning, normalization, and outlier detection methods to improve model accuracy and reliability. -
Incorporate Additional Variables:
Integrate other macroeconomic indicators to better explain variations in certain series. -
Explore Advanced Models:
For series where simple regressions fall short (e.g., DGS10, VIXCLS), consider time series models (ARIMA, GARCH) or machine learning approaches. -
Automate Continuous Monitoring:
Develop dashboards for real-time updates of model results and integration of new data.
-
Expand Data Sources:
Integrate additional datasets from other financial sectors, such as real estate and consumer credit, for a more comprehensive economic overview. -
Improve User Interface:
Enhance visualization with interactive charts and real-time analytics. -
Implement Forecasting Models:
Explore advanced forecasting methods to improve future trend predictions. -
Conduct More In-Depth Diagnostics:
Carry out detailed diagnostic tests on model residuals to validate assumptions and refine model selection.
This analysis provides a foundational understanding of key U.S. banking and economic indicators using regression analysis. The insights gathered will guide further research and practical applications in financial analytics. Future improvements will focus on enhanced data quality, more robust models, and advanced forecasting techniques to offer deeper insights into the economic landscape.
This project was developed by: Son (David) Nguyen
- Currently a Software Engineer and Computer Science student at the University of North Carolina at Chapel Hill.
- GitHub
- Portfolio
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please clone the repository, make your changes, and submit a pull request. I will review and merge your changes.
Here are some acknowledgments for the tools and resources used in this project:
- The FRED API for providing the economic data.
- Google Gemini AI for generating natural language summaries.
- The open-source community for the libraries and frameworks used in this project, including Express, React, Pinecone, and Recharts.
Thank you for reviewing this report. For any questions or further analysis, please feel free to reach out to me.