Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
58 views26 pages

GWP2 FD Group Work Project 1 M5

The document outlines a project focused on predicting stock market movements using technical indicators derived from historical stock data of three ETFs. It discusses the methodology for data collection, feature selection, and model training, emphasizing the importance of optimizing technical indicators for improved prediction accuracy. The authors opted for classification over regression to simplify decision-making in trading, aiming to predict price direction rather than exact prices.

Uploaded by

Jawad Mj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views26 pages

GWP2 FD Group Work Project 1 M5

The document outlines a project focused on predicting stock market movements using technical indicators derived from historical stock data of three ETFs. It discusses the methodology for data collection, feature selection, and model training, emphasizing the importance of optimizing technical indicators for improved prediction accuracy. The authors opted for classification over regression to simplify decision-making in trading, aiming to predict price direction rather than exact prices.

Uploaded by

Jawad Mj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

FD Group Work Project 1 M5

Date: 07/09/2025
1. Name: - Lencho Garamu Bokore Email-
[email protected]

2. Hakeem Bin Usman

3. Jawad Sharief. N Email- [email protected]

Part 1. Assessing Models with Alternative Data


Q1. Data Understanding:
● What types of data are used in the paper to predict stock market movements, and how
are technical indicators derived from this data?

The paper uses daily stock market data from three ETFs — one each for Chile (ECH), Brazil
(EWZ), and the US (IVV). This data includes:

• Open price

• High price

• Low price

• Close price

• Adjusted close price

• Trading volume

They took this historical data for around 10 years (2009–2020).


From this data, they calculated technical indicators using the Pandas TA library in Python.
These indicators are basically formulas applied to the price and volume data. Examples
include:

• Moving Averages (SMA, EMA) — to check price trends over time

• MACD, RSI, Stochastic Oscillator — to measure momentum or strength of price


movement

• Bollinger Bands, ATR — for checking volatility

• On-Balance Volume (OBV) — to see how volume affects price

In total, they had over 200 indicators but later selected the top 5% most useful ones by
checking which ones had the most impact on predicting price movements.

● Importance of using such indicators in forecasting stock price trends:

• Technical indicators help simplify market data.


Stock prices alone move up and down a lot, making it hard to see clear patterns.
Indicators smooth out the data and highlight possible trends.

• They capture market behaviour.


Indicators like RSI or MACD help figure out if a stock is overbought or oversold —
which can hint at future price changes.

• Good indicators make machine learning models work better.


If you directly feed raw price data to a model, it won’t learn much. But if you give it
meaningful indicators, the model understands the patterns better.

• They also help reduce the amount of data needed.


By selecting only the best indicators, the model runs faster and still gives better
results. In this paper, they actually improved prediction accuracy after reducing the
number of indicators.

The study used normal daily stock data, applied technical formulas to create indicators,
and picked the best ones to train their prediction model. These indicators make it easier for
the model to understand price trends and predict market movements more accurately.

Technical Indicators Calculation (e.g., RSI, MACD)

• Purpose: Show how indicators are derived from raw price data.

• Relevant Code:
python

# RSI Calculation

delta = data['Close'].diff()

gain = delta.clip(lower=0)

loss = -delta.clip(upper=0)

rsi = 100 - (100 / (1 + (avg_gain / avg_loss)))

Q2. Security Understanding

iShares MSCI Brazil ETF (EWZ) — Overview

Type of Asset:
EWZ is an Exchange-Traded Fund (ETF) that invests in large and mid-sized companies from
Brazil. It mainly gives exposure to Brazilian stocks across sectors like banking, oil & gas,
mining, and consumer goods. The fund is managed by BlackRock and tracks the MSCI
Brazil 25/50 Index.

Launched in: July 2000


Traded on: NYSE (in US Dollars)

Price History (2009–2020):

• After the 2008 crash, EWZ bounced back strongly in 2009–2010, going close to $80.

• From 2011–2015, it kept falling due to Brazil’s economic and political problems,
dropping below $30.

• It recovered a bit between 2016–2019, trading between $35–$45.

• In 2020, with COVID-19, it dropped again near $25–$30.

So, overall, the fund saw a lot of ups and downs, showing how risky emerging markets like
Brazil can be.

Key Stats:
• Very Volatile — Prices fluctuate a lot because of political issues, commodity prices,
and global market trends.

• Dividend Yield: Around 2–4%, which is decent.

• Top Holdings during study period: Petrobras, Vale, Itau Bank, Ambev

• Expense Ratio: About 0.59% (so it’s a bit on the higher side for ETFs)

Why the Authors Chose Classification and Not Regression?

The authors were more interested in predicting whether the price will go up or down, not
the exact price.
This is why they used classification — it’s easier for decision-making in trading.
Also, predicting price direction is usually more reliable than predicting exact prices, which
are affected by too many factors.

Other Ways They Could Have Set the Classification Target:

1. Percentage Change Based:

o Up (+1): If price increases by more than 2%

o Down (-1): If price falls by more than 2%

o Neutral (0): If price change is within ±2%

2. Based on Trend Days:

o Uptrend: If price goes up for 3 days in a row

o Downtrend: If price falls for 3 days in a row

o No trend: If mixed movement

Technical Indicators Calculation (e.g., RSI, MACD)

• Purpose: Show how indicators are derived from raw price data.

• Relevant Code:

python

# RSI Calculation
delta = data['Close'].diff()

gain = delta.clip(lower=0)

loss = -delta.clip(upper=0)

rsi = 100 - (100 / (1 + (avg_gain / avg_loss)))

Q3. Methodology Understanding

Section 2 — Data

Instead of mixing data and methods in one section, we can separate the Data part as its
own section.
This section will mainly cover how the data was collected, prepared, and processed.

Subcategories under Data:

1. 2.1 Data Collection

o Talks about where the data came from (like Yahoo Finance).

o Covers the ETFs used (ECH, EWZ, IVV) and the period of data (2009–2020).

2. 2.2 Technical Indicators

o Explains how they used Pandas TA library to create over 200 technical
indicators.

o Some examples: RSI, MACD, Moving Averages, ATR.

3. 2.3 Data Processing

o This is about cleaning the data, handling missing values, and scaling.

o Also includes how they prepared the data for machine learning (like splitting
into training/testing sets).

Section 3 — Methodology

This section will explain the methods they used for analysis and prediction.

Subcategories under Methodology:


1. 3.1 Descriptive Statistics and Correlation Analysis

o This part talks about finding relationships between indicators and stock
movement using Pearson correlation.

o Helps to get a basic idea before using any model.

2. 3.2 Feature Selection Using LASSO

o Here they used LASSO regression to pick the most important technical
indicators.

o LASSO automatically reduces the number of indicators by giving zero weight


to useless ones.

3. 3.3 Optimization of Technical Indicators

o After selecting with LASSO, they further checked which indicators actually
helped improve prediction accuracy.

o Useless or duplicate indicators were removed here.

4. 3.4 Neural Network Model (MLP)

o They trained a neural network (Multilayer Perceptron) using only the selected
indicators.

o The model was used to classify stock price movement as “up” or “down.”

How to Separate Descriptive Stats from Models?

• Descriptive Stats (like Pearson correlation): These help you understand the data —
they don’t predict anything but tell you how variables are related.

• Models (like LASSO and Neural Network): These are used for making predictions
after the initial data understanding is done.

So basically, correlation comes under data understanding, while LASSO and neural
networks are part of the modeling approach.

Optimization of Technical Indicators — Explained in Simple Words:

• First, they created over 200 indicators from price and volume data.
• Then they used correlation to check which ones seem useful.

• After that, they applied LASSO regression, which automatically filtered out the less
useful indicators.

• Finally, they tested the selected indicators with the neural network to see if they
really helped improve the model’s accuracy.

Why was this Optimization Important?

• If you give too many useless indicators to the neural network, it won’t learn properly
and may overfit.

• Reducing the number of indicators helps the model to focus on meaningful data,
improves prediction, and makes the training faster.

• It also saves computing power and gives better results in the real world.

1. Feature Selection (LASSO)

o Purpose: Optimize technical indicators.

o Code:

python

lasso = LassoCV(cv=5).fit(X, y)

selected_features = X.columns[lasso.coef_ != 0]

2. Cross-Validation (k-Fold)

o Purpose: Evaluate model robustness.

o Code:

python

kf = KFold(n_splits=5)

accuracies = [model.fit(X[train], y[train]).score(X[test], y[test]) for train, test in kf.split(X)]

3. Correlation Heatmap

o Purpose: Show relationships between features.


o Code:

python

sns.heatmap(features.corr(), annot=True)

Q4. Feature Understanding

What does the paper consider a feature?

In this paper, a feature means a technical indicator calculated from stock market data like
Open, High, Low, Close prices, and Volume.
These indicators are used as input for the machine learning model to predict whether the
stock price will move up or down.

Some examples of features are:

• RSI (Relative Strength Index)

• MACD (Moving Average Convergence Divergence)

• Moving Averages (SMA, EMA)

• Bollinger Bands

All these are considered features in the study.

How do you distinguish a feature from a method or a model?

• A feature is a data point or variable given to the model for making predictions, like
RSI or trading volume.

• A method is a process or technique used for analyzing data, selecting features, or


preparing data. For example, LASSO regression is a method used for selecting
useful features.

• A model is the system or algorithm used for training and making predictions. In this
paper, they used a Neural Network (MLP) as the model.

So,
• Feature means input data

• Method means the tool or process applied on data

• Model means the prediction system we build and train

Categories of Features You Have Learned

From this paper, I have understood that technical indicators (features) can be divided into
categories like:

1. Trend Indicators – used to check the direction of the market


Example: Moving Averages, MACD

2. Momentum Indicators – used to measure the speed or strength of price movement


Example: RSI, Stochastic Oscillator

3. Volatility Indicators – used to check how much the price is fluctuating


Example: ATR, Bollinger Bands

4. Volume Indicators – used to check how the trading volume is linked with price
changes
Example: On-Balance Volume (OBV)

How Did the Authors Optimize the Technical Indicators?

The authors did not use all 200-plus indicators directly in the model. They followed a proper
step-by-step approach to pick only the important ones.

1. First, they checked the correlation between each indicator and the stock price
movement. This helped them remove indicators that had no real connection.

2. Then, they used LASSO regression, which automatically gives zero weight to less
useful indicators and keeps only the strong ones.

3. Finally, they selected only about 5 percent of the indicators, meaning just the top
ones that actually helped improve prediction.

Why is this Optimization Important?


• If you use too many unnecessary indicators, the model can get confused and may
not perform well. This is known as overfitting.

• By selecting only the most useful indicators, the model learns better and gives more
accurate predictions.

• It also saves time and computing power as the model processes less data.

• Overall, this makes the system more efficient and practical for real-life stock market
predictions.

1. Feature Importance Plot

o Purpose: Highlight which indicators matter most.

o Code:

python

plt.bar(selected_features, lasso.coef_[lasso.coef_ != 0])

2. Technical Indicators Chart

o Purpose: Compare raw vs. optimized indicators.

o Code: Same as Q1’s RSI plot, but with tuned parameters (e.g., RSI
window=10 vs. 14).

Q5. Optimization Understanding

What is Cross-Validation?

Cross-validation is a method used to check if a machine learning model is working


properly.
Instead of just training the model once and testing it on the same data, we split the data
into parts — we train on some parts and test on the others.
This way, we can see if the model can perform well on data it has never seen before.
It helps us avoid the mistake of overfitting, where the model only works well on training
data but fails on real data.
What is k-Fold Cross-Validation?

In k-fold cross-validation, we divide the data into k equal parts (called folds).
We then train the model on (k-1) parts and test it on the remaining one.
This process is repeated k times, each time using a different part for testing.
At the end, we take the average of all the results to get the final performance of the model.

For example, if k = 5, we split the data into 5 parts and repeat the process 5 times.

This method gives a more reliable result because the model is tested on every part of the
data once.

What is the Jaccard Distance?

The Jaccard distance is a way to measure how different two sets are.
It checks how many elements are common between the two sets and how many are
different.

• If the sets are exactly the same, the Jaccard distance is 0 (which means no
difference).

• If the sets have nothing in common, the distance is 1.

For example:
Set A = {1, 2, 3}, Set B = {2, 3, 4}
Common elements = {2, 3}
Total unique elements = {1, 2, 3, 4}
Jaccard Index = 2/4 = 0.5
So, Jaccard Distance = 1 - 0.5 = 0.5

Comparing Jaccard Distance with Other Distance Metrics

1. Euclidean Distance

o It measures the straight-line distance between two points.

o Mostly used for numbers and continuous data.

o For example, the distance between two points on a graph.

2. Hamming Distance
o It checks how many positions are different between two strings or
sequences.

o Mostly used for binary strings or categorical data.

o For example, "1010" and "1001" have a Hamming distance of 2.

Jaccard Distance, on the other hand, is used for sets.


It looks at how much overlap there is between two sets, not numbers or positions.

How Do the Authors Define an Optimal Solution?

In this paper, the authors say that an optimal solution is when:

• The selected technical indicators help the model make the most accurate
predictions.

• The model doesn’t overfit and works well even on data it hasn’t seen before.

• The number of indicators and the size of the model are small enough so it runs fast
and doesn’t need too much computer power.

So basically, the best solution for them is a model that is both accurate and efficient — it
should predict well without wasting resources.

1. Jaccard Distance Calculation

o Purpose: Measure dissimilarity between predicted/actual labels.

o Code:

python

jaccard_distance = 1 - jaccard_score(y_true, y_pred)

2. Cross-Validation Results Table

o Purpose: Compare performance across folds.

o Example Table:
Fold Accuracy

1 0.62

Step 1 — Financial Problem

What is the financial problem the authors want to solve?

The main financial problem the authors are trying to solve is:
How to predict the direction of stock market movements in emerging markets using
technical indicators and machine learning.

In simple words, they want to build a model that can tell whether the market will go up or
down, which is useful for traders and investors to make better decisions.

The challenge they are addressing is that emerging markets are more volatile and
unpredictable compared to developed markets. So, they want to see if technical indicators,
when selected properly, can improve the prediction results in such markets.

How is predicting stock movements in emerging markets different from developed


markets?

Predicting in emerging markets is different because:

• Higher Volatility:
Emerging markets usually have more ups and downs because of political issues,
unstable economies, currency risks, and sudden changes.

• Less Historical Data:


Developed markets like the US have long-term, stable data records. But emerging
markets may not have good quality or long history of data.

• Market Behaviour is Different:


Developed markets are more efficient — meaning prices reflect news quickly.
Emerging markets may react slowly or behave differently because of local investor
behavior, regulations, or less transparency.
• Liquidity Issues:
Sometimes, emerging markets have low trading volumes, which makes prices
fluctuate more sharply.

Why is this difference important for the model design?

Because of these factors, a model designed for developed markets may not work well in
emerging markets.
The authors needed a model that can:

• Handle more noise and sudden changes in data

• Work with smaller or less clean datasets

• Avoid overfitting because the patterns in emerging markets may not be stable over
time

That is why they focused on selecting the best technical indicators and using methods like
LASSO and cross-validation — to make sure the model works well even in uncertain or less
predictable market conditions.

Step 2 — Application

Main Takeaways from the Results:

• The authors found that selecting a smaller number of good technical indicators
gave better results than using all of them.
After applying LASSO, they reduced the features to about 5% of the total, and this
helped improve the model’s prediction accuracy.

• The neural network model (MLP) performed better when it was trained with these
optimized indicators rather than with the full set of over 200 indicators.
This showed that quality of features is more important than quantity.

• The accuracy of predicting the stock market direction improved by around 2% after
optimization.
This may look small, but in financial markets, even a small improvement can be a
big advantage.
• The study confirmed that with proper feature selection, machine learning models
can be effectively applied to emerging markets, even though these markets are
more volatile.

• The models also showed consistent performance across different markets (Chile,
Brazil), proving that the method was reliable.

Which Features Seemed Most Useful in the Study?

Though the paper doesn’t give a full list of all selected features, it highlights that certain
types of indicators were more useful:

• Trend Indicators like Moving Averages (SMA, EMA) — because they help track
market direction.

• Momentum Indicators like RSI and MACD — since they measure the strength of
price movements and can signal reversals.

• Volatility Indicators like ATR and Bollinger Bands — these helped capture sudden
market changes, which are common in emerging markets.

• The study also showed that volume-based indicators were sometimes useful, but
not as consistently as trend and momentum indicators.

Step 3 — Replication Plan

1. Pick a Fund:

Let’s pick EWZ (iShares MSCI Brazil ETF) since it’s one of the main ones used in the paper
and has good data available.

2. Download the Data:

You can download EWZ data easily from Yahoo Finance.


Link: https://finance.yahoo.com/quote/EWZ/history

Choose 10 years of daily data (e.g., 2014–2024) and download it as a CSV file.
3. Pick a Simple Metric — Pearson Correlation

We will calculate Pearson Correlation between the technical indicators and the daily
stock returns.
This is much easier than LASSO but still meaningful.

4. Implement k-Fold Cross-Validation

We can use 5-Fold Cross-Validation — this means splitting the dataset into 5 equal parts,
training the model on 4 parts, and testing it on the remaining 1 part, repeating this process
5 times.

We can do this using Python’s scikit-learn library.

5. Reproduce the Table (Example Output)

After calculating correlations in each fold, we can create a table like this:

Indicator Fold 1 Correlation Fold 2 Fold 3 Fold 4 Fold 5 Average

RSI 0.12 0.10 0.11 0.09 0.13 0.11

MACD -0.08 -0.07 -0.06 -0.09 -0.07 -0.07

SMA_20 0.15 0.14 0.16 0.13 0.14 0.14

ATR 0.04 0.05 0.03 0.04 0.04 0.04

(This is a sample table. You will fill it with actual values after running your code.)

6. Reproduce the Graphs

• You can plot scatter plots or line graphs showing the relationship between the
indicator and returns.

• Also, plot correlation heatmaps using seaborn or matplotlib.

Example:

• Line chart of RSI vs. returns over time

• Heatmap of correlation between all indicators and returns


7. Code Sample (Python — Simple Pearson Correlation with Cross-Validation)

python

CopyEdit

import pandas as pd

import numpy as np

from sklearn.model_selection import KFold

import seaborn as sns

import matplotlib.pyplot as plt

# Load your EWZ data

df = pd.read_csv('EWZ.csv')

# Example Technical Indicator - RSI (You can add more)

import ta

df['RSI'] = ta.momentum.rsi(df['Close'], window=14)

# Calculate daily returns

df['Return'] = df['Close'].pct_change()

# Drop NA values

df.dropna(inplace=True)

# 5-Fold Cross Validation

kf = KFold(n_splits=5, shuffle=True, random_state=1)

results = []
for train_index, test_index in kf.split(df):

train_data = df.iloc[train_index]

corr = train_data['RSI'].corr(train_data['Return'])

results.append(corr)

# Display average correlation

print("Correlations per Fold:", results)

print("Average Correlation:", np.mean(results))

# Plotting

sns.heatmap(df[['RSI', 'Return']].corr(), annot=True)

plt.title('Correlation Heatmap')

plt.show()

8. Next Steps After Running the Code:

• Fill the table with the correlation values you get.

• Take screenshots of the graphs and include them in your report.

• Write a short analysis — Example: "RSI showed a weak positive correlation with
returns, indicating limited predictive power."
Comprehensive User Guide to Geolocation (Foot-Traffic) Data for Business and
Financial Analysis

1. Sources of Foot-Traffic Data

Foot-traffic or geolocation data refers to information collected about people’s movement in


physical spaces. This data is mostly collected through the following sources:

1.1 Mobile Applications

• Many smartphone apps collect location data when users give permission.

• These apps can be shopping apps, food delivery platforms, fitness trackers, gaming
apps, or navigation services.

• Data aggregators like SafeGraph, Unacast, and Placer.ai buy anonymized data
from these apps.

1.2 Wi-Fi and Bluetooth Beacons

• Large public places like malls, airports, and retail stores install Wi-Fi access points
and Bluetooth beacons.
• These devices detect nearby smartphones (even without active connections) and
capture foot-traffic patterns.

1.3 Telecom Service Providers

• Operators like Airtel, Jio, and Vi can estimate users’ location based on mobile tower
triangulation.

• Typically, this data is aggregated and anonymized before being shared for analytical
purposes.

1.4 GPS Device Providers

• Devices like car GPS systems, delivery fleet trackers, and IoT devices also collect
location data that can be analyzed for business intelligence.

1.5 Third-Party Data Vendors

• Companies specialize in gathering and selling foot-traffic data in packaged, ready-


to-use formats. These include data on visits to commercial properties, event
spaces, or city zones.

2. Types of Data Available

2.1 Raw GPS Data

• Contains device IDs (anonymized), latitude and longitude coordinates, timestamps,


and sometimes altitude.

• Usually collected at regular intervals (for example, every 5 minutes).

2.2 Aggregated Visit Counts

• Summarized information showing how many unique devices entered a certain


location during a specific time window.

• Often reported daily, weekly, or monthly.

2.3 Dwell Time Metrics

• Measures how long a person stayed within a particular area.

• Useful for analyzing customer engagement in retail or event attendance.

2.4 Movement and Flow Data


• Tracks routes people take from one location to another.

• Helps analyze traffic patterns, shopping behavior, and commuting trends.

2.5 Heatmaps and Density Maps

• Visual representation of crowded areas based on GPS pings.

• Often used in urban planning, retail site selection, and public event management.

3. Data Quality Considerations

3.1 Coverage Bias

• Data primarily represents smartphone users with location services enabled.

• May underrepresent rural populations or lower-income groups who use feature


phones.

3.2 GPS Accuracy Issues

• Urban areas with high-rise buildings (urban canyons) may cause GPS inaccuracies
of 5 to 50 meters.

• Indoor locations may also affect signal strength and accuracy.

3.3 Sampling Errors

• Not every person in a location is captured; sampling might be skewed by app usage
patterns or device settings.

3.4 Data Gaps and Missing Values

• Some areas may have low data coverage due to fewer app users.

• Gaps in time series data are common and need to be handled during analysis.

3.5 Data Freshness

• Depending on the provider, data may be near real-time or lagged by several days.

• Businesses need to clarify this with the data vendor.

4. Ethical and Legal Issues

4.1 User Privacy and Anonymity


• Data must be anonymized — no personal identifiers like phone numbers or names
should be included.

• Use aggregated metrics wherever possible to avoid tracking individual behavior.

4.2 Consent and Transparency

• Data collection must comply with consent norms.

• Under India’s Digital Personal Data Protection Act (DPDPA) 2023, explicit consent
is required for collecting and processing personal data.

4.3 Data Handling Best Practices

• Use data only for legitimate business purposes.

• Avoid discriminatory practices based on location data insights.

• Implement strict data security measures to prevent misuse.

4.4 Compliance with Global Laws

• If dealing with international data, ensure compliance with GDPR (Europe) and CCPA
(California).

5. How to Import and Structure Data Using Python

Assuming a CSV file containing location pings:

python

CopyEdit

import pandas as pd

import geopandas as gpd

# Load sample foot-traffic data

df = pd.read_csv('foot_traffic_sample.csv', parse_dates=['timestamp'])

# Drop rows with missing location data

df = df.dropna(subset=['lat', 'lon'])
# Create a GeoDataFrame for spatial operations

gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lon, df.lat))

gdf.set_crs(epsg=4326, inplace=True)

# Load Points of Interest (POI) data - e.g., shop locations

poi = gpd.read_file('poi_data.geojson')

# Spatial join to assign foot-traffic points to POIs

visits = gpd.sjoin(gdf, poi, op='within')

# Group by POI and date to count unique visitors per day

daily_traffic = visits.groupby(['poi_id', visits.timestamp.dt.date])\

.agg({'device_id': 'nunique'}).reset_index()

daily_traffic.rename(columns={'device_id': 'unique_visitors'}, inplace=True)

This gives a clean, structured dataset for further analysis.

6. Exploratory Data Analysis (EDA) Examples

6.1 Trend Analysis of Daily Footfall

python

CopyEdit

import matplotlib.pyplot as plt

# Filter for a specific POI

shop_visits = daily_traffic[daily_traffic['poi_id'] == 'shop_001']


# Plotting the footfall trend

plt.figure(figsize=(10,5))

plt.plot(shop_visits['timestamp'], shop_visits['unique_visitors'], marker='o')

plt.title('Daily Foot-Traffic for Shop 001')

plt.xlabel('Date')

plt.ylabel('Number of Unique Visitors')

plt.grid(True)

plt.show()

6.2 Time-of-Day and Day-of-Week Patterns

You can group data by time of day or day of week to identify peak hours or popular shopping
days.

6.3 Comparative Analysis

Compare footfall across multiple locations or events to assess performance or crowd


behavior.

6.4 Correlation with Sales Data

If you have access to sales figures, you can explore correlations between foot-traffic and
sales volume.

7. Research References and Applications

Academic and Industry Studies

• Óskarsdóttir et al. (2020) — Used mobile phone data for credit risk assessment,
showing how behavioral patterns can predict financial reliability.

• Roa et al. (2020) — Studied app behavior to enhance credit scoring models.

• Muñoz-Cancino et al. (2021) — Applied graph learning techniques on behavioral


data for understanding consumer networks.

Application Areas in India

• Retail Chain Expansion — Understanding foot-traffic before opening new outlets.


• Real Estate Valuation — Assessing property value based on surrounding footfall.

• Event Management — Measuring success of events like expos, trade fairs, and
public gatherings.

• Urban Planning — Informing infrastructure projects and transportation planning


based on movement data.

Summary and Key Takeaways

Geolocation and foot-traffic data are increasingly valuable for business analytics, retail
planning, financial services, and public policy. However:

• The data must be handled carefully with regard to privacy, consent, and data
protection laws.

• Understand and account for limitations like sampling bias, GPS errors, and
incomplete coverage.

• Combine this data with other datasets for richer insights and validation.

Proper analysis of foot-traffic data can offer a strong competitive edge in today’s data-
driven market, provided it is used responsibly and ethically.

You might also like