SNA Module 1-6
SNA Module 1-6
2. Organizational Analysis
Application: Analyzes employee interactions to improve communication
and collaboration.
Example: A company uses SNA to detect isolated departments and
restructure teams to enhance knowledge sharing.
6. Recommender Systems
Application: Improves suggestions by analyzing user connections and
preferences.
Example: LinkedIn recommends connections and job opportunities
based on your professional network using SNA.
7. Knowledge Management
Application: Identifies experts and knowledge hubs within
organizations.
Example: SNA is used in research institutions to find influential
scientists based on co-authorship and citation networks.
8. Political Analysis
Application: Maps relationships between politicians, voters, and issues.
Example: During elections, SNA analyzes Twitter conversations to
understand public opinion and influence networks.
Conclusion:
SNA is a powerful tool for understanding relationships, identifying key actors,
optimizing processes, and enhancing decision-making across fields such as
business, healthcare, education, law enforcement, and social media.
17 Explain web-based networks' features and point out how they vary from CO1 10
conventional social networks. Discuss about how web-based networks affect
connectivity and information flow.
Features of Web-Based Networks:
1. Global Accessibility: Users can connect and interact from any location
with internet access.
2. Asynchronous Communication: Interactions don’t require real-time
presence (e.g., emails, forum posts).
3. Hyperlinking: Connections are created through hyperlinks, enabling
non-linear navigation of content.
4. Scalability: Web networks can support millions of nodes and
interactions simultaneously.
5. Multimedia Integration: Support for text, images, videos, and
interactive content enhances communication richness.
6. Search and Discovery Tools: Efficient search engines and algorithms
help users navigate massive information structures.
7. User-Generated Content: Individuals can contribute content (blogs,
comments, posts), making the network dynamic and evolving.
8. Data Traceability: Interactions and information flow can be logged,
monitored, and analyzed.
9. Open Standards and APIs: Facilitate interoperability between
platforms and services (e.g., using social logins).
10. Algorithmic Mediation: Content visibility and connections are often
shaped by algorithms (e.g., news feed personalization).
Conclusion:
Web-based networks expand the scope of human interaction and knowledge
sharing by enabling large-scale, content-driven, and asynchronous connections.
They differ from traditional social networks in structure and dynamics, leading
to new forms of connectivity and more complex, faster patterns of information
flow.
18 Apply the concepts of centrality and solve the following: CO1 10
To analyze the given directed graph using centrality concepts (Degree,
Betweenness, and Closeness), we first list all the connections between the nodes.
Conclusion:
Node A: High degree and closeness centrality.
Node C: High betweenness and good closeness.
Node B: Least central—only one incoming link.
Node D: Moderate centrality, mostly connected to/from C.
Node E: Moderate, supports information flow through A and C.
Would you like a full table with normalized centrality scores too?
19 Give two specific instances of real-world applications of social network analysis. CO1 10
Discuss about the lessons learned from these applications and how they affect
problem-solving or decision-making.
Here are two specific real-world applications of Social Network Analysis
(SNA), along with the lessons learned and their impact on decision-making:
Conclusion:
SNA provides actionable insights in various fields—from public safety to
corporate efficiency. It reveals hidden patterns and supports strategic
decisions by identifying key players and optimizing information flow. The main
takeaway is that understanding relationships, not just individuals, is essential for
effective problem-solving.
20 What are the main benefits of employing Social Network Analysis (SNA) in CO1 10
both practical and research settings? Discuss in-depth on two important
restrictions or difficulties related to its application.
Main Benefits of Employing Social Network Analysis (SNA):
1. Reveals Hidden Patterns and Structures:
SNA uncovers invisible relationships and communication flows within a
group, organization, or system, which are not evident through traditional
analysis.
2. Identifies Key Actors and Influencers:
It helps locate central individuals (high degree or betweenness centrality)
who are vital for information dissemination, decision-making, or
controlling influence.
3. Enhances Decision-Making and Strategy:
Businesses and governments use SNA to guide actions—such as
targeting key opinion leaders in marketing, or restructuring teams for
better collaboration.
4. Improves Resource Allocation:
SNA allows for targeted interventions (e.g., in healthcare or security) by
focusing on the most influential or connected nodes, saving time and
costs.
5. Supports Predictive Analysis:
By understanding how information or behavior spreads, organizations
can anticipate outcomes like disease outbreaks, product adoption, or even
organizational failure.
Conclusion:
SNA has evolved from hand-drawn sociograms to AI-powered network
algorithms analyzing millions of connections in real time. With key figures like
Moreno and White shaping its foundation, the field now plays a critical role in
understanding and navigating complex social systems in a hyper-connected
world.
22 Why are communities essential in today’s interconnected world? Discuss their CO2 2
role in fostering collaboration, knowledge exchange, and social cohesion.
Communities are essential in today’s interconnected world because they foster
collaboration, enable knowledge exchange, and build social cohesion among
individuals with shared interests or goals. They provide a platform for collective
problem-solving, mutual support, and the spread of innovation, strengthening
both online and offline social ties in a rapidly globalizing society.
23 Differentiate between Data Mining and Social Media Mining, highlighting their CO2 2
objectives, methodologies, and applications.
Data Mining is the process of discovering patterns, trends, and relationships in
large datasets using statistical, machine learning, and database techniques.
Social Media Mining is a specialized branch of data mining focused on
extracting insights specifically from social media platforms (e.g., Twitter,
Facebook), incorporating social context and network structure.
Aspect Data Mining Social Media Mining
Extract general patterns from any Understand user behavior, influence, and
Objective
data source trends
Statistical analysis, clustering, NLP, sentiment analysis, social network
Methodology
classification analysis
Fraud detection, market analysis, Brand monitoring, political trend tracking,
Applications
healthcare user profiling
Let me know if you want examples of tools or platforms used in each.
24 Explain Graph Modularity. CO2 2
Graph Modularity is a measure used in network analysis to quantify the
strength of division of a network into communities or modules. It compares the
density of edges inside communities with the density of edges between
communities.
High modularity indicates that nodes within the same community are
highly connected, while connections between communities are sparse.
It's often used to evaluate the quality of community detection
algorithms.
Example: In a social network, if friends mostly interact within their friend group
and rarely with outsiders, the network has high modularity.
25 Name any 5 communities popular in its field today. CO2 2
Here are five popular communities across various fields today:
1. GitHub (Software Development) – A leading platform for open-source
software development, where developers collaborate on projects.
2. Stack Overflow (Programming/Technology) – A community of
developers asking questions and providing answers on programming-
related topics.
3. Reddit (General/Multiple Fields) – A vast online community with
various subreddits dedicated to almost every field of interest.
4. Kaggle (Data Science/AI) – A platform for data science competitions,
where professionals and enthusiasts collaborate on machine learning and
data analysis projects.
5. Dev.to (Software Development) – A community-driven platform for
software developers to share articles, tutorials, and experiences.
26 Define Giant Component with an example. CO2 2
A Giant Component in graph theory refers to a connected subgraph that
contains a significant portion of the vertices in a large graph. In a random graph,
as the graph grows in size, the giant component is the largest connected
component that spans a large fraction of the entire graph's vertices, especially in
the case of sparse graphs.
Example:
Consider a random graph where each edge is added with a certain probability.
As the number of vertices increases, a "giant component" will form when the
probability of edges is high enough, causing a large connected subgraph that
includes most of the vertices in the graph.
For instance, in the Erdős–Rényi random graph model (G(n, p)), when the
edge probability pp exceeds a certain threshold, the graph will contain a giant
component that grows as nn, the number of vertices, increases. If pp is large
enough, a single component can include almost all the nodes in the graph,
leaving a few isolated ones.
This phenomenon occurs in many real-world networks, such as social networks
or the internet.
27 State the various V's of Big Data. CO2 2
The V's of Big Data refer to the key characteristics that define big data. The
most commonly referenced V's are:
1. Volume – Refers to the vast amount of data generated every second. This
includes data from various sources such as social media, sensors,
transactions, etc.
2. Velocity – Refers to the speed at which data is generated, processed, and
analyzed. Big data requires the ability to handle real-time or near-real-
time processing.
3. Variety – Refers to the different types of data, such as structured, semi-
structured, and unstructured data, coming from various sources like text,
images, videos, etc.
4. Veracity – Refers to the uncertainty or reliability of the data. With big
data, some data might be incomplete, inconsistent, or noisy, and it is
important to ensure data quality.
5. Value – Refers to the usefulness or business value that can be derived
from big data. It's about extracting meaningful insights that can inform
decisions.
These V's help in understanding the challenges and considerations involved in
handling big data.
28 Define Web mining with an application. CO2 2
Web Mining refers to the process of using data mining techniques to extract
useful information, patterns, and knowledge from web data. This can include
data from websites, web content, web structure, and web usage patterns. Web
mining is typically divided into three main categories: Web Content Mining,
Web Structure Mining, and Web Usage Mining.
Application:
An example of web mining is Personalized Recommendation Systems used by
e-commerce websites like Amazon. Web mining techniques analyze user
behavior, such as past purchases, browsing history, and click patterns, to
recommend products that are likely to interest the user, improving the customer
experience and driving sales.
29 Describe the shingling algorithm with an example in detail. CO2 5
The Shingling Algorithm is used in text mining and data comparison to create a
set of substrings, called shingles, from a given text or document. These shingles
can then be used to detect similarities or measure the similarity between
different documents. Shingling is particularly useful for detecting plagiarism or
finding near-duplicate content.
Steps of the Shingling Algorithm:
1. Tokenization: Break the document into tokens (e.g., words or
characters).
2. Sliding Window: Use a sliding window approach of size kk (shingle
size) to create contiguous subsequences of tokens.
3. Hashing: Often, each shingle is hashed to reduce space complexity.
4. Shingle Set: The set of all shingles forms a representation of the
document.
Example:
Consider the document:
"The quick brown fox."
Step 1 (Tokenization): Tokenize the document into words:
["The", "quick", "brown", "fox"]
Step 2 (Sliding Window): For k=2k = 2 (bigrams), extract the shingles:
o ("The", "quick")
o ("quick", "brown")
o ("brown", "fox")
Step 3 (Hashing): Apply a hash function to each bigram to reduce
storage space:
o Hash("The", "quick") → Hash1
o Hash("quick", "brown") → Hash2
o Hash("brown", "fox") → Hash3
Step 4 (Shingle Set): The shingle set for this document is {Hash1,
Hash2, Hash3}.
Application:
Shingling is useful in duplicate detection and similarity comparison. For
example, when comparing two documents, their shingles are compared, and if
the intersection of their shingle sets is large, the documents are considered
similar. This method is commonly used in plagiarism detection and finding near-
duplicate content on the web.
30 Describe the Leiden Community detection algorithm with advantages and CO2 5
disadvantages in detail. Use suitable diagrams to support your example.
Leiden Community Detection Algorithm
The Leiden Algorithm is a community detection algorithm designed to find
clusters or communities in large networks. It improves upon the Louvain
algorithm by providing better performance and quality of community structure.
The Leiden algorithm focuses on optimizing modularity while refining the
community structure in a hierarchical way.
Steps of the Leiden Algorithm:
1. Initialization: Each node starts in its own community.
2. Local Moving Phase: Each node is moved to a neighboring community
to maximize modularity. This is done by evaluating the quality of
community structure using modularity optimization.
3. Refinement Phase: After local moving, a new network is constructed
where communities from the previous phase become nodes. This step
ensures that the communities are more refined.
4. Hierarchical Process: Steps 2 and 3 are repeated iteratively, refining the
community structure at each level, until no further improvements in
modularity can be made.
Advantages of the Leiden Algorithm:
1. Higher Quality Communities: It provides more accurate and well-
defined community structures compared to the Louvain algorithm.
2. Efficiency: The Leiden algorithm is faster and more scalable than many
other community detection algorithms, especially for large networks.
3. Guaranteed Improvement: The algorithm guarantees improvements in
modularity at each step, ensuring the communities formed are well-
separated.
Disadvantages of the Leiden Algorithm:
1. Computational Complexity: While it is faster than Louvain, the
algorithm can still be computationally intensive for very large networks.
2. Memory Usage: The refinement phase may require considerable
memory when working with very large graphs.
3. Dependency on Modularity: Like other modularity-based methods, the
Leiden algorithm may sometimes produce communities that are not
optimal from a domain-specific perspective (e.g., if modularity doesn't
capture meaningful real-world communities).
Example (Visualization):
Consider a simple network of 6 nodes connected as shown below:
A -- B
| |
C -- D
E -- F
Initially, each node is in its own community: {A}, {B}, {C}, {D}, {E},
{F}.
During the local moving phase, nodes will move between communities
to increase modularity. For example, node B might move to the same
community as node A if it increases modularity.
The refinement phase will then aggregate communities into a new
network and repeat the process.
After applying the Leiden algorithm, the final result might show two
communities: {A, B, C, D} and {E, F}.
Diagram:
Here’s a simple diagram showing the steps of the Leiden algorithm:
1. Initial Network:
A -- B
| |
C -- D
E -- F
2. After Local Moving Phase (initial communities):
{A, B, C, D} {E, F}
3. After Refinement Phase (final communities):
{A, B, C, D} {E, F}
Conclusion:
The Leiden algorithm is an efficient and high-quality method for detecting
communities in networks, offering significant improvements over earlier
methods like Louvain. However, it may still face challenges with very large
datasets in terms of memory and computational requirements.
31 Describe the Louvain Community detection algorithm with advantages and CO2 5
disadvantages in detail. Use suitable diagrams to support your example.
Louvain Community Detection Algorithm
The Louvain Algorithm is one of the most popular community detection
methods used in large networks. It aims to detect communities by optimizing
modularity, a measure that quantifies the strength of division of a network into
communities. The algorithm is hierarchical and works in two main phases that
are repeated iteratively.
Steps of the Louvain Algorithm:
1. Initialization: Initially, each node is placed in its own community.
2. Phase 1 (Local Moving Phase): Each node is moved to the community
of its neighbor that maximizes the modularity. This is done for all nodes
in the network.
3. Phase 2 (Community Aggregation): After the local moves, the
algorithm constructs a new network where each community from Phase 1
is treated as a single node, and edges between these communities are
weighted based on the total weight of edges between nodes in the
original communities.
4. Repetition: Steps 2 and 3 are repeated iteratively until no further
improvement in modularity can be achieved.
Advantages of the Louvain Algorithm:
1. Efficiency: The Louvain algorithm is fast and scalable, making it
suitable for large networks.
2. Modularity Optimization: It is based on optimizing modularity, which
is a good indicator of the quality of community structures in many cases.
3. Hierarchical Nature: The hierarchical approach makes it possible to
detect communities at different levels of granularity.
4. Widely Used: The algorithm is well-known and widely adopted in many
research and real-world applications due to its simplicity and
effectiveness.
Disadvantages of the Louvain Algorithm:
1. Resolution Limit: The Louvain algorithm may have trouble detecting
small communities in large networks (this is a known issue with
modularity optimization).
2. Greedy Nature: Since it is based on local optimization (greedy
approach), the algorithm can get stuck in suboptimal solutions, leading to
less accurate community structures in some cases.
3. Dependency on Modularity: Like many modularity-based algorithms,
the quality of community detection depends on the modularity function,
which may not always reflect meaningful community structures in some
domains.
Example (Visualization):
Consider a simple network of 6 nodes with the following connections:
A -- B
| |
C -- D
E -- F
Step 1 (Initialization): Initially, each node is its own community: {A},
{B}, {C}, {D}, {E}, {F}.
Step 2 (Local Moving Phase): Nodes will move to communities of their
neighbors to increase modularity. For example:
o Node B might join community {A, C, D}, as this increases
modularity.
Step 3 (Community Aggregation): After the local moving, the
algorithm aggregates the communities into a new network:
o The new network has communities {A, B, C, D} and {E, F}.
Step 4 (Repeat): The algorithm repeats this process, but since no further
improvement in modularity can be made, it stops.
Final Communities:
{A, B, C, D}
{E, F}
Diagram:
1. Initial Network:
A -- B
| |
C -- D
E -- F
2. After Local Moving Phase (Communities Formed):
{A, B, C, D} {E, F}
3. Final Community Structure:
{A, B, C, D} {E, F}
Conclusion:
The Louvain algorithm is efficient and widely used for community detection in
networks, offering a good balance between performance and scalability.
However, it may struggle with detecting smaller communities and can get stuck
in local optima due to its greedy approach. Despite these limitations, it remains
one of the most popular methods for large-scale community detection.
32 What is the visualization of social networks? Give examples to illustrate. CO2 5
Visualization of Social Networks
Social network visualization refers to the graphical representation of
relationships, interactions, and connections within a social network. It uses
nodes (representing individuals or entities) and edges (representing relationships
or interactions) to illustrate the structure and dynamics of the network.
Visualization helps to understand the network's topology, identify key players,
community structures, and analyze the flow of information.
Key Components in Visualization:
1. Nodes (Vertices): Represent individuals, organizations, or entities.
2. Edges (Links): Represent relationships, interactions, or connections
between the nodes.
3. Community Clusters: Groups of nodes that are densely connected
internally, representing subgroups within the network.
Examples:
1. Facebook Friendship Network:
o Example: In a Facebook social network, each user is represented
as a node, and the friendships between users are represented as
edges. A visualization can show how individuals are connected,
who are the central figures (e.g., people with many friends), and
how people are grouped together in communities of friends.
Visualization:
o Nodes (users) are connected by edges (friendships), and clusters
of tightly-knit friends can be seen.
2. Twitter Follower Network:
o Example: A Twitter follower network visualizes users (nodes)
and the follower-following relationships (edges). This can show
which users are central (e.g., influencers with many followers), as
well as identify groups of users who are interacting frequently.
Visualization:
o A network with nodes (users) and edges (follower-following
relationships) can highlight influencers, communities, and
connections.
3. Co-authorship Network in Academia:
o Example: In academic research, authors (nodes) are connected by
edges if they have co-authored a paper together. This
visualization helps in identifying research groups, influential
authors, and collaborations.
Visualization:
o A graph with nodes (authors) and edges (co-authorships) shows
collaboration patterns and community structures within academic
fields.
Advantages of Social Network Visualization:
Identifying Key Players: Visualizations help to identify influential
nodes (e.g., celebrities, influencers).
Community Detection: Helps to detect communities or clusters within
the network.
Understanding Network Dynamics: Provides insights into the flow of
information or influence within the network.
Conclusion:
Social network visualization is a powerful tool for analyzing the structure,
interactions, and dynamics within social systems. It helps to visualize complex
relationships in a clear and intuitive way, making it easier to analyze large-scale
social interactions.
33 Solve the graph using Page ranking: Nodes: {A, B, C, D} CO2 5
Edges:
A→C
B→A
B→D
C→D
34 Discuss the concept of PageRank and its significance in ranking web pages CO2 5
within a network. How does the algorithm utilize link structures to determine the
importance of a webpage? Illustrate the working of PageRank with a structured
example, including calculations and an interpretation of results.
Concept of PageRank
PageRank is an algorithm developed by Larry Page and Sergey Brin, the
founders of Google, to rank web pages based on their importance. It is based on
the idea that a page is important if it is linked to by many other important pages.
The algorithm uses the link structure of the web to assign a rank to each
webpage, effectively measuring the webpage’s "importance" in relation to
others.
Significance in Ranking Web Pages
PageRank is significant because it allows search engines like Google to rank
pages in search results by considering not just the content of the page, but also
the network of links pointing to it. This helps to identify authoritative or
reputable pages. A page with more high-quality inbound links from authoritative
pages is likely to be more relevant and useful.
How PageRank Uses Link Structures
The PageRank algorithm treats links as "votes" for a page. However, not all
votes are equal:
A link from an important page (one with a high PageRank) is more
valuable than a link from an unimportant page.
Each page distributes its rank to the pages it links to, with the rank being
divided equally among all outgoing links.
Example and Calculations
Consider a simple web with 3 pages: A, B, and C. The links between the pages
are:
A→B
B → A, C
C→A
Initial PageRank (Assume equal distribution):
PR(A) = PR(B) = PR(C) = 1 (initial value for all pages).
Damping Factor (d): Set to 0.85, commonly used in PageRank calculations.
Interpretation:
Page A has the highest PageRank because it is linked by both B and C,
two other pages with significant rank.
Page B has a moderate rank, as it is linked to by A, but it also has
outgoing links to two pages (A and C).
Page C has the lowest rank because it only receives a link from B, which
has a smaller PageRank.
This demonstrates how the PageRank algorithm works by considering both the
quantity and quality of incoming links when determining the importance of a
page.
35 Describe the Shingling algorithm and its role in detecting near-duplicate CO2 5
documents. How does it transform text into sets of overlapping substrings for
similarity computation? Illustrate the process with a step-by-step example,
including shingle generation and comparison.
Shingling Algorithm and its Role in Detecting Near-Duplicate Documents
The Shingling algorithm is a technique used to break down a document into
overlapping substrings (called "shingles") of fixed length. These shingles are
then used to compare documents for similarity and detect near-duplicate
content. By converting text into sets of shingles, the algorithm can identify
shared patterns between documents, helping to identify near-duplicates even if
they are not exactly the same.
How Shingling Works:
1. Text Transformation: A document is divided into overlapping
substrings (called shingles) of a fixed length (typically 3-5 characters).
2. Shingle Representation: Each shingle is treated as an element in a set.
3. Similarity Computation: By comparing the sets of shingles between
documents, we can calculate a Jaccard similarity score, which measures
the intersection of the shingles divided by their union.
Step-by-Step Example:
Consider the following text from two documents:
Document 1: "I love programming in Python."
Document 2: "I love coding in Python."
Let's assume the shingle length is 2 (bigrams), meaning we will create shingles
of 2 consecutive characters.
Step 1: Generate shingles
For Document 1:
"I love programming in Python."
Bigrams (shingles):
o "I ", " l", "lo", "ov", "ve", "e ", " p", "pr", "ro", "og", "gr", "ra",
"am", "mm", "mi", "in", " n", " Py", "Py", "yt", "th", "ho", "on",
"n."
For Document 2:
"I love coding in Python."
Bigrams (shingles):
o "I ", " l", "lo", "ov", "ve", "e ", " c", "co", "od", "di", "in", " n", "
Py", "Py", "yt", "th", "ho", "on", "n."
Step 2: Shingle Comparison
Now, we compare the sets of shingles from both documents.
Shingles for Document 1:
o {"I ", " l", "lo", "ov", "ve", "e ", " p", "pr", "ro", "og", "gr", "ra",
"am", "mm", "mi", "in", " n", " Py", "Py", "yt", "th", "ho", "on",
"n."}
Shingles for Document 2:
o {"I ", " l", "lo", "ov", "ve", "e ", " c", "co", "od", "di", "in", " n", "
Py", "Py", "yt", "th", "ho", "on", "n."}
Step 3: Calculate Jaccard Similarity
The Jaccard similarity is calculated as:
Intersection of shingles:
{"I ", " l", "lo", "ov", "ve", "e ", " n", " Py", "Py", "yt", "th", "ho", "on",
"n."}
There are 12 common shingles between the two documents.
Union of shingles:
{"I ", " l", "lo", "ov", "ve", "e ", " p", "pr", "ro", "og", "gr", "ra", "am",
"mm", "mi", "in", " c", "co", "od", "di", " n", " Py", "Py", "yt", "th", "ho",
"on", "n."}
There are 18 unique shingles between both documents.
Jaccard Similarity:
Step 4: Interpretation
The Jaccard similarity score of 0.6667 indicates that the documents are
66.67% similar based on the shingles. This high similarity score suggests that
the two documents are near-duplicates, even though they are not identical.
Conclusion
The Shingling algorithm is a powerful tool for detecting near-duplicate
documents. By converting text into sets of overlapping substrings (shingles) and
comparing these sets, we can measure the similarity between documents. This
method is widely used in tasks like plagiarism detection, web crawling, and
deduplication in large document collections.
36 Explain the page ranking algorithm and perform on following graph: CO2 10
Iteration 1 Result
A: 0.3625
B: 0.46875
C: 0.46875
D: 0.15
You would continue this process for several more iterations until the values
stabilize (converge). Would you like me to iterate further or provide a
visualization of convergence?
37 Explain the shingling algorithm. Perform the same on: CO2 10
Document A - As the sun dipped below the horizon, casting hues of orange and
pink across the sky, the weary traveler found solace in the warmth of a crackling
fire under a blanket of stars.
Document B - Solace was found by the weary traveler in the warmth of a
crackling fire under a blanket of stars as hues of orange and pink were cast
across the sky by the setting sun.
Shingling Algorithm (Short Answer - 10 Marks)
The Shingling Algorithm is a method used to compare the similarity between
documents. It involves:
1. Preprocessing: Normalize text (remove punctuation, lowercase, etc.).
2. Tokenization: Split the text into overlapping substrings (called k-
shingles) of a chosen size (e.g., 3 words).
3. Shingle Set Creation: Convert each document into a set of shingles.
4. Similarity Measurement: Use Jaccard Similarity:
Conclusion
The documents share several phrases but differ in word order. The Jaccard
Similarity shows they are moderately similar, with a similarity score of ~0.17
using 3-word shingles.
Would you like a visual or exact list of shingles?
3. Semantic Similarity
Though the sentence structures vary slightly, semantic meaning is
nearly identical.
NLP models would rate this similarity as very high (above 0.9).
Conclusion
The texts are highly similar in both content and semantics, differing mainly in
word order.
✅ Resemblance Score: Very High (~90–95%)
40 Investigate how centrality measures can aid in the early detection of disease CO2 10
spread within a healthcare network. Given a network where nodes represent
individuals and edges represent direct physical interactions, compute the
Betweenness and Closeness centrality of individuals to identify potential super-
spreaders and optimal intervention points.
Centrality Measures in Disease Spread Detection (Short Answer – 10
Marks)
In a healthcare network where nodes = individuals and edges = physical
interactions, centrality measures can help identify key individuals for early
disease control.
1. Betweenness Centrality
Definition: Measures how often a node lies on the shortest path between
other nodes.
Interpretation: High betweenness nodes connect clusters and can
spread disease across groups.
Use:
o Identify super-spreaders who act as bridges between
communities.
o Useful for quarantine or vaccination to break transmission
chains.
2. Closeness Centrality
Definition: Measures how close a node is to all others in the network.
Interpretation: High closeness nodes can reach others quickly.
Use:
o Prioritize these individuals for monitoring or early
intervention.
o Effective for fast containment of outbreaks.
Conclusion
High Betweenness (e.g., C) → Likely super-spreader.
High Closeness (e.g., A or B) → Ideal for early monitoring or
intervention.
✅ Centrality measures help prioritize testing, isolation, and vaccination to
efficiently control disease spread.
Would you like a visual diagram or Python code for real network calculations?
41 Solve the following: CO2 10
Document A:
"The global economy is experiencing rapid shifts due to technological
advancements, market fluctuations, and policy changes affecting businesses
worldwide significantly."
Document B:
"Rapid shifts in the global economy are influenced by technological progress,
market instability, and new government policies impacting companies globally."
Text Similarity Analysis (Short Answer – 10 Marks)
To evaluate the similarity between Document A and Document B, we use
semantic and structural analysis.
2. Semantic Similarity
Though the wording differs, semantic meaning is almost identical.
NLP models like BERT would return a semantic similarity score above
0.85, indicating very high similarity.
Conclusion
Lexical similarity: Medium
Semantic similarity: High (~0.85–0.90)
Overall Resemblance: Strong — paraphrased version of the same
idea
✅ Score: 9/10 (for similarity, structure, and semantic retention)
Would you like a similarity score computed using actual NLP tools?
Conclusion
Jaccard Similarity ≈ 0.18 using 3-word shingles.
Despite semantic similarity, word order differences reduce shingle
overlap.
Interpretation: Low-to-moderate textual similarity, though conceptually
they're quite close.
Would you like a visualization or try with a different shingle size?
43 Examine the idea of homophily in relation to social networks. Give a practical CO3 2
example to demonstrate its significance.
Homophily in social networks refers to the tendency of individuals to form
connections with others who are similar to themselves in terms of attributes such
as age, race, gender, social status, or interests. This phenomenon plays a
significant role in shaping the structure and dynamics of social networks.
Practical Example:
In Facebook, users are more likely to form connections with people who share
similar interests, educational background, or location. For instance, college
students are likely to connect with other students from their university, creating
tightly-knit communities. This homophily helps reinforce group identities,
fosters stronger social ties, and influences how information spreads within
networks.
44 How are important nodes identified using Degree, Betweenness, and Closeness CO3 2
centrality? Give an example to help clarify.
Important nodes in a network can be identified using different centrality
measures:
1. Degree Centrality: Measures the number of direct connections (edges) a
node has. Nodes with higher degree centrality are considered important
because they have more immediate connections.
o Example: In a social network, a user with many friends (high
degree) is influential and well-connected.
2. Betweenness Centrality: Measures how often a node lies on the shortest
path between other nodes. Nodes with high betweenness centrality
control the flow of information and act as bridges between different parts
of the network.
o Example: In a communication network, a node that connects two
distinct groups is crucial for information transfer.
3. Closeness Centrality: Measures the average shortest path from a node to
all other nodes. Nodes with high closeness centrality can reach other
nodes more quickly and are considered influential in spreading
information.
o Example: A person in an organization who can quickly
communicate with everyone, regardless of their position in the
hierarchy, has high closeness centrality.
45 Explain the meaning of triadic closure and discuss its significance for network CO3 2
evolution. Give an example.
Triadic Closure refers to the concept where if two individuals (A and B) are
both connected to a third individual (C), they are likely to form a direct
connection (A and B will connect) over time. This process creates a triangle of
relationships in a social network.
Significance for Network Evolution:
Triadic closure is significant because it fosters the growth of social networks by
increasing the likelihood of new connections, enhancing network stability, and
strengthening ties between individuals. It also facilitates the emergence of
communities or clusters within a network.
Example:
In a social network, if person A is friends with both B and C, triadic closure
suggests that B and C are more likely to become friends themselves. This
increases the density of the network, leading to stronger and more cohesive
groups.
46 Describe the idea of a social network's strong and weak relationships. What CO3 2
effects do they have on structural cohesiveness and information flow? Talk
about it using examples from actual life.
In a social network, strong relationships refer to close, frequent interactions,
typically between friends or family members, while weak relationships are
more distant, occasional connections, such as acquaintances or colleagues.
Effects on Structural Cohesiveness:
Strong relationships contribute to cohesiveness within a small group,
creating tight-knit communities where trust and support are high.
Weak relationships enhance network reach, connecting different
groups and facilitating broader social connections, which helps in
spreading information across larger, less cohesive networks.
Effects on Information Flow:
Strong relationships result in fast, reliable information flow within
small groups, but may limit the diversity of information.
Weak relationships act as bridges between distinct groups, enabling
diverse information flow across the entire network.
Example:
In a workplace, your strong relationships (close colleagues or friends) ensure
smooth communication within your team. However, your weak relationships
(acquaintances from different departments) allow you to access and share
information across the entire organization, enhancing collaboration and
innovation.
47 Explain the Data Wrangling process and its importance in the preparation of CO3 2
data. Give instances of typical data wrangling strategies.
Data Wrangling is the process of cleaning, transforming, and organizing raw
data into a structured and usable format for analysis. It is an essential step in data
preparation, ensuring that data is accurate, consistent, and ready for further
analysis or modeling.
Importance:
Improves Data Quality: Ensures data is accurate, complete, and
consistent.
Prepares for Analysis: Allows for easier and more effective analysis by
structuring data correctly.
Reduces Errors: Helps in identifying and fixing errors or
inconsistencies early in the data pipeline.
Typical Data Wrangling Strategies:
1. Handling Missing Values: Filling in missing data with averages,
medians, or removing incomplete rows/columns.
2. Data Transformation: Converting data into a consistent format (e.g.,
converting text dates into date-time objects).
3. Filtering and Removing Outliers: Identifying and removing data points
that are significantly different from others.
4. Normalization: Scaling numerical values to a standard range (e.g., 0 to
1) to ensure comparability.
5. Data Aggregation: Summarizing data by grouping and aggregating (e.g.,
calculating average sales per region).
Example:
For a sales dataset, data wrangling might involve removing incomplete entries,
converting all date columns to a consistent format, and filling in missing values
for sales figures based on averages for that region.
48 Discuss the role of Network Visualization in Social Network Analysis. CO3 2
Network Visualization plays a crucial role in Social Network Analysis (SNA)
by providing a graphical representation of relationships between individuals or
entities in a network. It helps to easily identify patterns, structures, and key
nodes within the network, which might be difficult to detect from raw data
alone.
Role:
1. Identifying Key Nodes: Visualization helps highlight influential nodes
(e.g., central individuals, hubs, or bridges) by visually representing their
connections and centrality within the network.
2. Understanding Network Structure: It reveals network patterns such as
clusters, communities, and clusters of strong or weak ties, facilitating the
understanding of how information or influence flows through the
network.
Example:
In a social media network, visualization might show how a few central
influencers (high-degree nodes) are connected to a large number of users,
illustrating the spread of content and interactions.
49 State the relationship tie with key Feature: The relationship exists only within CO3 2
the specific context of the shared activity with examples.
The relationship tie with the key feature of "existing only within the specific
context of the shared activity" refers to weak ties that are formed when
individuals connect through a common activity or purpose, but these connections
are limited to that context. These ties are not necessarily strong or long-term
outside the shared activity.
Example:
Workplace Projects: Colleagues may form a relationship while working
on a specific project but may not interact much once the project ends.
Online Communities: Users participating in a specific online forum or
game may interact frequently within that platform but may not maintain
contact outside of that activity.
50 Examine the various types of relationships found in social networks. Give CO3 5
instances.
In social networks, relationships between individuals or entities can vary in
nature based on the level of interaction, trust, and purpose. These relationships
can be broadly classified into several types:
1. Strong Ties
These are close, frequent, and deeply personal relationships, often found
between friends or family members. These ties are characterized by high trust,
emotional support, and frequent communication.
Example: A best friend or a close family member. They provide
emotional support, share personal information, and maintain regular
communication.
2. Weak Ties
Weak ties are more distant connections that do not involve frequent interactions
or deep emotional involvement. They can serve as bridges to new information,
groups, or networks.
Example: An acquaintance from work or a distant friend on social
media. While not offering deep emotional support, they help in
connecting to different groups or offering new opportunities (e.g., a job
referral).
3. Directed Ties
These relationships have a clear direction, where one individual is the source of
influence or communication, and the other is the recipient. They are commonly
seen in hierarchical or asymmetric relationships.
Example: A manager and an employee. The manager influences the
employee's tasks, but the employee's influence on the manager may be
limited.
4. Reciprocal Ties
Reciprocal relationships are mutual, where both individuals benefit or interact
with each other on equal terms. These relationships are marked by mutual trust
and exchanges.
Example: Two colleagues who help each other with tasks and
collaborate on projects. Their relationship is bidirectional, benefiting
both parties equally.
5. Structural Ties (Bridges)
These ties link different social groups or networks and are crucial for the flow of
information across the network. They are typically formed between individuals
who connect separate groups.
Example: A person who works in both marketing and product
development, connecting these two different departments in an
organization. They act as a bridge for information sharing between the
groups.
Conclusion:
Different types of relationships in social networks—strong, weak, directed,
reciprocal, and structural ties—play various roles in shaping the flow of
information, influence, and support within a network. They are vital for
understanding the structure and dynamics of social connections.
51 Describe the causes of social network triadic closure. Give an example and an CO3 5
analytical viewpoint.
Triadic Closure occurs when two individuals who share a common connection
(a third person) are likely to form a direct connection themselves. This process
creates a "triangle" in the network. The causes of triadic closure are rooted in
psychological, social, and network dynamics.
Causes of Triadic Closure:
1. Social Psychological Factors:
o Similarity and Homophily: People are more likely to form
relationships with others who are similar to themselves in terms
of interests, values, or background, which increases the chance of
closure.
o Trust and Familiarity: Shared friends or common acquaintances
can create a sense of familiarity and trust, encouraging new
connections.
2. Network Structural Factors:
o Network Density: In dense networks, where many individuals
are interconnected, the likelihood of triadic closure increases. As
individuals are already embedded in interconnected clusters,
forming a new tie is more likely.
o Mutual Friendships: When two individuals are already
connected through a mutual friend, it provides a social context for
them to meet and interact more easily.
3. Social Influence:
o Peer Pressure and Social Influence: Social pressure from
mutual acquaintances can push individuals to form ties, especially
when social norms emphasize the importance of creating a
cohesive group.
4. Common Interests or Goals:
o Shared Activities: If individuals are involved in a common
activity, event, or organization, the existing link through the third
person can encourage interaction, leading to closure.
Example:
Consider a workplace scenario where Person A is friends with both Person B
and Person C. If Person B and Person C share similar interests (e.g., both work
in the same department or attend the same professional events), they are more
likely to connect through Person A, resulting in triadic closure.
Analytical Viewpoint:
Triadic closure strengthens the structural cohesiveness of a network by creating
more tightly-knit clusters or communities. It also facilitates information flow,
as individuals within a closed triad are more likely to share information quickly
and trust each other. Moreover, triadic closure helps in the stability of social
networks, as connections between individuals are reinforced by mutual
acquaintances, reducing the risk of ties being broken or becoming weak. It can
also enhance the resilience of a network by ensuring that even if one tie is lost,
the overall communication within the group remains intact.
Thus, triadic closure plays a crucial role in shaping the dynamics and robustness
of social networks.
52 Analyze the significance of network visualization for practical uses. Give CO3 5
examples to illustrate your points.
Network Visualization is a powerful tool for understanding and analyzing the
structure of complex relationships in a network. It converts abstract data into a
visual format, making it easier to identify patterns, key nodes, and the flow of
information or influence.
Significance of Network Visualization:
1. Identifying Key Nodes and Influencers:
o Example: In a social media network, network visualization can
highlight users with the most connections or influence (e.g., high-
degree centrality), helping marketers identify influencers or
opinion leaders for campaigns.
2. Understanding Network Structure:
o Network visualization reveals clusters, communities, and patterns
in relationships. It helps detect sub-networks where individuals
or nodes are more tightly connected.
o Example: In a corporate setting, visualization can uncover
departments or teams that are highly interconnected and those
that might need improved collaboration.
3. Analyzing Information Flow:
o Visualizing how information flows through a network helps to
identify bottlenecks or key bridges in the flow of communication.
o Example: In an emergency response system, network
visualization can show which areas are most connected, helping
to identify where resources and information should be directed.
4. Detecting Vulnerabilities and Risks:
o By visualizing relationships, weak ties, or isolated nodes can be
identified. This helps in understanding potential vulnerabilities in
a network’s structure.
o Example: In cybersecurity, a network visualization can highlight
weak links in an organization’s internal communication systems
that might be vulnerable to attacks.
5. Decision-Making and Strategy:
o Visualizations assist in making informed decisions about network
interventions, such as fostering connections between isolated
groups or reinforcing communication channels.
o Example: In a supply chain network, visualization can show
dependencies between suppliers and manufacturers, allowing for
better risk management and resource allocation.
Conclusion:
Network visualization simplifies complex data, making it accessible for strategic
decision-making, identifying influential nodes, understanding structure, and
improving overall network efficiency and resilience. It is a critical tool in fields
ranging from business to healthcare to cybersecurity.
53 Give examples to illustrate the many benefits of network visualization. CO3 5
Network Visualization offers several benefits across various fields, enhancing
the understanding and analysis of complex systems. Below are examples that
illustrate its practical benefits:
1. Enhanced Understanding of Relationships:
Example: In a social network, visualizing connections between
individuals helps identify clusters, communities, and central influencers.
For example, a social media platform can use network visualization to
show groups of users with common interests, making it easier for
marketers to target specific audiences.
2. Identifying Key Influencers:
Example: In marketing, network visualization helps identify
influencers in a brand’s network who have the most connections (high-
degree centrality) or the ability to spread information across the network
(betweenness centrality). This helps in selecting the right influencers for
campaigns.
3. Optimizing Resource Allocation:
Example: In supply chain management, visualizing the network of
suppliers, distributors, and customers helps identify critical links and
potential bottlenecks. For instance, if one supplier has many connections
but is geographically distant, resources may be allocated differently to
avoid delays.
4. Detecting Vulnerabilities and Risks:
Example: In cybersecurity, network visualization can map out the
relationships between devices, networks, and users, helping to identify
potential vulnerabilities. A node with a high number of connections
(central node) might be a target for attacks, and visualization helps
prioritize protection.
5. Improving Collaboration and Communication:
Example: In a corporate setting, network visualization can identify
isolated departments or teams with fewer interconnections. By
visualizing communication paths, managers can foster better
collaboration by encouraging connections between departments that are
less connected.
Conclusion:
Network visualization simplifies the analysis of complex systems, making it
easier to identify patterns, detect risks, optimize resources, and improve
decision-making in fields such as marketing, supply chain, cybersecurity, and
corporate management.
54 Balance Theory is concisely yet thoroughly explained, with an emphasis on its CO3 5
underlying ideas and applications to social network dynamics.
Balance Theory is a psychological theory that focuses on the relationships and
stability between elements within a network, particularly in social networks. It
was developed by Fritz Heider in 1946 and is based on the idea that people
prefer harmony and consistency in their relationships.
Underlying Ideas:
Triadic Relationships: The theory examines triads (sets of three
individuals or entities) and the balance or imbalance of their
relationships. In a triad, if the relationships between the three elements
are consistent (i.e., all positive or two negative and one positive), the
network is considered balanced.
Positive and Negative Relations: The relationships between entities are
either positive (liking, agreement) or negative (disliking, disagreement).
A balanced triad occurs when there are either:
o Three positive relations (all like each other),
o Two negative and one positive relation (two people dislike a third
person, but that person likes one of them).
Applications to Social Network Dynamics:
1. Social Cohesion: Balanced networks lead to social stability and
harmony. For example, if two people dislike a third person but the third
person likes one of them, there is tension, and this could lead to a
potential shift in relationships to restore balance.
2. Conflict Resolution: In group dynamics or organizations, imbalance in
relationships (e.g., conflicts between colleagues) can lead to changes in
opinions or alliances to restore balance. This is especially useful for
understanding social conflicts and how they may resolve.
3. Predicting Social Behavior: Balance theory helps predict how
relationships evolve within groups. For instance, when one individual in
a triad switches their opinion (positive to negative or vice versa), the
entire network may shift to restore balance, influencing group decisions
or collective behavior.
Example:
In a workplace, consider three colleagues: A, B, and C. If A likes both B and C,
but B and C dislike each other, the situation is unbalanced. According to balance
theory, this imbalance may lead to B and C eventually reconciling or further
intensifying their conflict, depending on their effort to restore balance.
Conclusion:
Balance theory provides insight into the dynamics of social relationships and
offers a framework for understanding how individuals adjust their attitudes and
interactions to maintain harmony in their social networks. Its applications extend
to conflict resolution, predicting relationship changes, and understanding group
behavior in social, organizational, and political settings.
55 Explain the example: Triadic relationship between influencer marketing and CO3 5
brand perception.
The triadic relationship between influencer marketing and brand perception
involves three key elements: the brand, the influencer, and the audience
(consumers). These three elements interact to shape consumer attitudes toward
the brand, forming a triadic network where the relationship dynamics can
influence brand perception.
Explanation of the Triadic Relationship:
1. Brand and Influencer:
o The brand partners with an influencer to promote products or
services. The influencer uses their credibility, trust, and influence
over their followers to endorse the brand. If the influencer is
aligned with the values or image of the brand, it strengthens the
association between them.
o Example: A skincare brand partners with a well-known beauty
influencer to showcase its products in a positive light. The
influencer's trustworthiness and expertise in beauty will influence
the perception of the brand as reliable and high-quality.
2. Influencer and Audience:
o The audience (consumers) typically trusts the influencer's
opinions and looks to them for guidance. The relationship
between the influencer and the audience is based on trust and
relatability. The influencer's endorsement directly impacts how
the audience views the product or brand.
o Example: A follower who trusts the influencer's reviews may
develop a positive opinion about the brand simply due to the
influencer's recommendation.
3. Brand and Audience:
o The brand's perception is influenced by how the audience
perceives the influencer and their endorsement. If the influencer
is highly regarded and their opinion resonates with the audience,
the brand’s image is positively impacted.
o Example: If an influencer promotes a product that aligns with
their values (e.g., eco-friendly or ethical), their audience, who
shares similar values, may perceive the brand as more trustworthy
and responsible.
Triadic Closure:
If the relationship between the brand and influencer is positive, and the
audience also has a positive perception of both, a balanced triad is
formed, reinforcing the brand's reputation.
If there is tension (e.g., the audience disagrees with the influencer's
endorsement or the influencer’s image doesn’t match the brand’s values),
this can lead to imbalanced perceptions that may harm brand
reputation.
Example in Action:
A fitness brand partners with a fitness influencer to promote their
workout gear. The influencer shares authentic experiences, showing how
the gear improves performance. The audience, who follows the
influencer for workout tips, views the brand as reliable and high-
performing, leading to a positive brand perception. If the audience
trusts the influencer and aligns with their values, they are more likely to
perceive the brand positively and make purchases.
Conclusion:
The triadic relationship between influencer marketing, brand, and audience is
dynamic and mutually reinforcing. It helps shape brand perception through the
influencer’s credibility and the audience’s trust, ultimately affecting the success
of marketing campaigns. The balance within this triad is crucial in creating a
positive and sustainable brand image.
56 Using diagrammatic examples, differentiate between the Louvain and Leiden CO3 5
approaches.
The Louvain and Leiden algorithms are two popular community detection
methods in network science. Both aim to optimize modularity, but they differ in
how they refine and improve community structure, especially in terms of
stability and speed. Below is a diagrammatic comparison to highlight the core
differences.
Conclusion
The Jaccard similarity between Document A and Document B using 2-shingles
is 0.18.
59 Explain what network visualization is and why it is so important to modern data CO3 10
analysis. Provide examples of its real-world uses to demonstrate its practical
significance.
Network visualization is the graphical representation of relationships or
connections within a network, where entities (such as people, systems, or
organizations) are represented as nodes, and their interactions or relationships as
edges (lines connecting nodes). It helps to visually map complex data to identify
patterns, structures, and connections.
Importance in modern data analysis:
1. Reveals Patterns: Visualizing networks helps uncover hidden
relationships and dependencies that may not be immediately obvious in
raw data.
2. Improves Understanding: It simplifies complex data, making it easier
to interpret and analyze.
3. Identifies Key Nodes: It highlights influential or central nodes in the
network, crucial for decision-making.
4. Enhanced Decision-Making: By illustrating connections and flows,
network visualization aids in strategic planning and forecasting.
Real-world uses:
1. Social Media Analysis: Identifying influencers, communities, and trends
(e.g., Twitter’s network of users and hashtags).
2. Cybersecurity: Detecting vulnerabilities and potential threats by
mapping connections between devices or users.
3. Supply Chain Management: Optimizing processes by visualizing
supplier relationships and logistics networks.
4. Biological Networks: Understanding gene interactions or protein
networks in healthcare research.
Network visualization is essential for extracting actionable insights from
complex datasets and driving informed decisions.
60 Give a methodical explanation of each step in the data wrangling process. CO3 10
Data wrangling is the process of cleaning, structuring, and enriching raw data
into a desired format for better decision-making in data analysis. Here is a step-
by-step explanation of the process:
1. Data Collection:
Gather data from various sources such as databases, CSV files, APIs, or
web scraping. This is the first step where all available raw data is
collected for processing.
2. Data Discovery & Assessment:
Explore and understand the structure, content, and quality of the data.
Identify inconsistencies, missing values, or outliers that need to be
addressed.
3. Data Cleaning:
Fix errors such as missing values, duplicates, typos, and inconsistencies.
This may involve removing nulls, correcting formats, or standardizing
data.
4. Data Structuring:
Reformat or reshape data into a usable structure (e.g., transforming
unstructured data into rows and columns or normalizing tables for
relational databases).
5. Data Enrichment:
Enhance the dataset by merging with additional data sources to provide
more context or fill in gaps (e.g., adding demographic data to customer
records).
6. Data Validation:
Ensure the data is accurate, consistent, and reliable. This includes
checking data types, verifying ranges, and testing business rules.
7. Data Storage:
Save the cleaned and structured data in a suitable format or database for
easy access during analysis or modeling.
These steps ensure the data is high-quality, making it ready for effective
analysis, visualization, or machine learning.
61 Explain, using an example, why Leiden community detection is superior to CO3 10
Louvain.
Leiden community detection is considered superior to Louvain because it
addresses key limitations of Louvain, especially regarding community quality,
stability, and connectedness.
Conclusion:
Leiden improves on Louvain by ensuring connected, stable, and high-quality
communities. In practical tasks like detecting social groups, customer segments,
or biological clusters, this leads to more accurate and meaningful insights,
making Leiden the preferred choice in modern network analysis.
62 Explain the conceptual foundations of balance theory and provide a relevant case CO3 10
to highlight its applications.
Balance Theory is a social psychology concept introduced by Fritz Heider,
which explains how individuals strive for harmony in their relationships and
attitudes. The theory focuses on triadic relationships—involving three elements
(usually a person and two other entities)—and suggests that people prefer
balanced states where their likes and dislikes are logically consistent.
Conceptual Foundations:
A triad includes Person (P), Other (O), and Object or Issue (X).
A triad is balanced if the product of the signs of the relationships is
positive (e.g., P likes O, O likes X, and P likes X).
An imbalanced state causes psychological discomfort, leading
individuals to adjust their attitudes to restore balance.
Real-World Application:
Case: Brand Endorsement in Marketing
If a consumer (P) likes a celebrity (O), and the celebrity endorses a product (X),
then according to balance theory, the consumer is more likely to develop a
positive attitude toward the product to maintain psychological balance.
If the consumer dislikes the product but likes the celebrity, the imbalance
may lead to either:
o Changing their opinion about the product (now liking it), or
o Reevaluating their opinion of the celebrity.
Conclusion:
Balance theory helps explain attitude change, peer influence, and marketing
strategies, making it a valuable tool in understanding and predicting human
behavior in social and commercial contexts.
63 A 20-person team is working on an interdisciplinary project. They are divided CO3 10
into four sub-teams (5 members each), but due to management rules:
Each person must have at least one tie within their sub-team.
Each person must have at least 3 external ties to members outside their team.
Some senior members act as bridges, forming additional long-range external
connections.
Construct this network. Compute E - I. Identify the top 3 most connected
individuals and their roles.
Let's break down and construct the network step-by-step according to the
requirements, compute the E–I index, and identify the top 3 most connected
individuals.
Where:
EE = number of external edges = 38
II = number of internal edges = 20
Conclusion:
E–I Index ≈ 0.31 → moderately externally connected
Top 3 connected individuals: A1, B1, C1
Roles: These are senior members acting as bridges, enhancing cross-
team collaboration.
Would you like a visual representation of this network?
64 Analyze the structural role of cliques and bridges within graph networks by CO4 2
providing real-world scenarios where they significantly impact connectivity and
influence.
Cliques are tightly-knit groups where every member is directly connected to
every other. Bridges are connections or nodes that link separate groups or
communities.
Real-world example of cliques: In a corporate team, a clique of
developers can foster fast collaboration but may resist external input,
limiting innovation.
Real-world example of bridges: A project manager connecting the
marketing and engineering departments acts as a bridge, enabling
information flow and cross-functional coordination.
Impact: Cliques strengthen internal cohesion, while bridges enhance overall
network connectivity and influence by linking diverse groups.
65 Evaluate the effectiveness of cliques in diverse fields, such as social media, CO4 2
biology, and cybersecurity, illustrating their impact with domain-specific
applications.
Cliques are highly interconnected subgroups within networks, and their
effectiveness varies across domains:
Social Media: Cliques represent close friend groups or interest
communities. They enhance engagement but may lead to echo chambers,
limiting exposure to diverse viewpoints.
Biology: In protein interaction networks, cliques often represent protein
complexes, helping identify functional units crucial for understanding
cellular processes.
Cybersecurity: Cliques in network traffic may indicate coordinated
botnet behavior. Detecting such patterns helps in identifying and
neutralizing threats.
Conclusion: Cliques enhance internal efficiency and cohesion but can either
support or hinder broader system goals, depending on the context.
66 Provide a comprehensive explanation of graph partitioning, and its types CO4 2
employed for segmenting complex networks.
Graph partitioning is the process of dividing a graph into smaller, meaningful
subgraphs or clusters, such that nodes within each partition are more densely
connected to each other than to nodes in other partitions. It simplifies analysis,
enhances computation efficiency, and reveals hidden structures in complex
networks.
Types of Graph Partitioning:
1. Community Detection:
Identifies groups (communities) where nodes are densely connected
internally (e.g., social groups on Facebook).
2. Spectral Partitioning:
Uses eigenvectors of the graph's Laplacian matrix to divide the graph
while minimizing edge cuts between partitions.
3. Modularity-Based Partitioning:
Optimizes a modularity score to detect strong community structures (e.g.,
Louvain or Leiden methods).
4. Minimum Cut Partitioning:
Divides the graph by cutting the fewest edges possible, often used in
parallel computing and circuit design.
Conclusion: Graph partitioning is essential for uncovering structure, optimizing
performance, and enabling scalable analysis of large and complex networks.
67 Compare and contrast various graph kernels and their suitability for different CO4 2
network analysis tasks.
Graph kernels are functions that measure similarity between graphs and are
widely used in tasks like classification, clustering, and link prediction. Here's a
comparison of common graph kernels:
2. Shortest-Path Kernel
Strengths: Compares all shortest paths between nodes across graphs.
Use Case: Suitable for structural similarity analysis.
Limitations: Computationally intensive on large graphs.
3. Graphlet Kernel
Strengths: Measures similarity based on small subgraph patterns
(motifs).
Use Case: Works well in biological networks (e.g., protein interactions).
Limitations: High complexity for large graphs.
Conclusion:
WL Kernel is fast and scalable for structured graphs.
Graphlet and Random Walk Kernels excel in biological or richly
connected networks.
Shortest-Path Kernel is ideal when exact structural comparison is
critical.
Choosing a kernel depends on graph size, task, and the nature of
structural patterns to be captured.
68 Differentiate types of graph partitioning techniques based on computational CO4 2
complexity and efficiency in handling real-world networks.
Graph partitioning techniques vary in computational complexity and
efficiency, especially when applied to large, real-world networks:
1. Spectral Partitioning
Complexity: High (involves eigenvalue decomposition, typically 𝑂(𝑛 3 )
Efficiency: Accurate but computationally expensive for large graphs
Use Case: Suitable for medium-sized networks where precision matters
Conclusion:
Louvain and Leiden are preferred for large-scale networks due to their
speed and quality.
Spectral and Min-Cut methods offer higher precision but are limited by
computational cost.
69 Discuss importance of social networks in modern society by analyzing their CO4 2
influence on information dissemination, relationships, and business strategies.
Social networks play a crucial role in modern society by transforming how
people communicate, connect, and conduct business.
Information Dissemination: Platforms like Twitter and Facebook allow
rapid sharing of news, ideas, and trends, enabling real-time updates and
viral content spread.
Relationships: Social networks foster personal and professional
connections across geographical boundaries, strengthening communities
and enabling support networks.
Business Strategies: Companies leverage social media for targeted
marketing, customer engagement, and brand building. Influencer
marketing and data-driven insights help businesses reach specific
audiences effectively.
Conclusion: Social networks significantly shape communication, relationships,
and commerce, making them indispensable to social and economic dynamics
today.
70 Discuss social networks as graphs,demonstrating how nodes and edges represent CO4 2
relationships with a relevant example with example.
Social networks as graphs use nodes to represent individuals (or entities) and
edges to represent relationships or interactions between them.
Structure:
Nodes (Vertices): Represent people, accounts, or organizations.
Edges (Links): Represent connections such as friendships, follows, or
message exchanges.
Example:
In a Facebook network:
Nodes: Users A, B, and C
Edges: A is friends with B and C; B is friends with C
This forms a triangle graph where each node is connected to the others,
representing a tightly-knit group or clique.
Conclusion: Modeling social networks as graphs helps in analyzing influence,
community formation, and information flow efficiently.
71 Provide a comprehensive explanation of graph kernels, elaborating on their role CO4 5
in measuring similarity between graphs. Illustrate the concept with relevant
examples from real-world applications.
Graph kernels are mathematical functions that measure the similarity between
graphs by comparing their structures, labels, or subcomponents. They enable the
use of machine learning algorithms (like SVMs) on graph-structured data by
transforming graphs into feature spaces where similarity can be computed
efficiently.
Conclusion:
Graph kernels are powerful tools that bridge graph theory and machine learning.
By quantifying structural similarity, they support pattern recognition in complex
networks across chemistry, biology, social science, and cybersecurity.
72 Assess the role of a social media analyst from both positive and negative CO4 5
perspectives, substantiating the discussion with real-world examples.
A Social Media Analyst plays a critical role in shaping digital strategies by
monitoring, interpreting, and optimizing social media data for businesses,
governments, and organizations.
Positive Perspectives:
1. Data-Driven Insights:
Analysts track engagement, reach, and sentiment to guide marketing
decisions.
Example: During the 2022 FIFA World Cup, brands used social media
analysts to track fan engagement and optimize live ad campaigns in real-
time.
2. Crisis Management:
Analysts identify negative trends early, helping manage reputational
risks.
Example: Airlines like Delta use social media monitoring to respond
quickly to customer complaints or delays.
3. Audience Understanding:
They help tailor content strategies by analyzing demographics and
behavior patterns.
Example: Netflix uses social media analysts to understand audience
reactions to new shows and adjust promotion strategies accordingly.
Negative Perspectives:
1. Privacy Concerns:
Deep analysis of user data can lead to surveillance or manipulation
concerns.
Example: The Cambridge Analytica scandal highlighted how social
media data can be misused for political influence.
2. Data Misinterpretation:
Relying solely on metrics like likes or shares can lead to misleading
conclusions.
Example: A viral post might be controversial rather than positively
engaging, skewing brand sentiment analysis.
Conclusion:
While social media analysts add value through insights and strategy
optimization, ethical handling of data and contextual analysis are essential to
avoid potential misuse or misinterpretation.
73 Examine the role of social networks in crisis communication, analyzing their CO4 5
effectiveness in disseminating real-time information while addressing risks
related to misinformation and public panic.
Social networks play a pivotal role in crisis communication by enabling the
rapid dissemination of real-time information to large audiences. Their
immediacy and reach make them powerful tools during emergencies such as
natural disasters, pandemics, or terrorist attacks.
Conclusion:
Social networks are invaluable for crisis communication due to their speed and
accessibility, but their effectiveness depends on responsible use, fact-checking
mechanisms, and coordinated communication to prevent misinformation and
manage public response.
74 Analyze the impact of viral content on social media engagement, assessing its CO4 5
potential for brand promotion as well as the risks associated with misinformation
and sensationalism.
Viral content on social media significantly impacts engagement, offering both
opportunities for brand promotion and risks related to misinformation and
sensationalism.
Conclusion:
While viral content can significantly enhance brand visibility and engagement,
it also comes with the risk of misinformation and sensationalism, making it
crucial for brands to maintain control, verify facts, and approach virality
responsibly to mitigate potential damage.
75 Design a workflow for applying graph mining techniques to a dataset of social CO4 5
media interactions to uncover hidden communities.
Workflow for Applying Graph Mining Techniques to Uncover Hidden
Communities in Social Media Interactions
1. Data Collection and Preprocessing:
o Extract data: Gather social media interaction data such as user
posts, comments, likes, shares, and follows.
o Construct the graph: Represent the social media network as a
graph where:
Nodes represent users.
Edges represent interactions (e.g., follows, likes,
comments).
o Data cleaning: Remove noise such as bots, incomplete
interactions, or irrelevant posts.
2. Graph Representation and Transformation:
o Adjacency matrix: Convert the interaction data into a sparse
adjacency matrix or edge list.
o Edge weighting: Optionally, assign weights to edges based on
interaction frequency, sentiment, or engagement level (e.g.,
heavier weight for direct messages or shares).
3. Community Detection Algorithms:
o Apply community detection techniques:
Louvain or Leiden method: For modularity-based
community detection, optimizing the partitioning of the
graph to maximize internal node density.
Spectral clustering: Use eigenvectors of the graph’s
Laplacian matrix to partition the graph.
Label propagation: A fast, scalable algorithm to detect
communities based on label diffusion.
o Parameter tuning: Adjust parameters such as modularity
resolution or clustering thresholds to improve the accuracy of
community detection.
4. Analysis and Evaluation:
o Evaluate community quality: Measure the quality of detected
communities using metrics like modularity, conductance, or
silhouette scores.
o Visualize communities: Use network visualization tools (e.g.,
Gephi, NetworkX) to visually inspect community structures and
their interaction patterns.
5. Interpretation and Application:
o Profile communities: Analyze the demographic, behavioral, or
thematic characteristics of users in each community.
o Insights for engagement: Use the detected communities for
targeted marketing, content recommendations, or influencer
identification.
o Monitor trends: Track community evolution over time to
identify emerging topics or shifts in user behavior.
2. Bridges:
o Definition: A bridge (or cut-edge) is an edge whose removal
would increase the number of components in the graph, i.e., it
disconnects the graph.
o Effect on Connectivity: Bridges are critical in maintaining
connectivity. The removal of a bridge can split a connected
network into two disconnected components, making the network
more vulnerable to fragmentation.
Example:
o In a communication network (like a telephone or internet
network), a bridge can represent a vital link between two regions.
If the bridge fails (e.g., a cable is cut), the network is split into
two disconnected parts, disrupting communication between those
areas.
Practical Illustration:
Internet Backbone: A node represents a data center, and edges represent
high-speed connections. A bridge between two data centers ensures that
they are part of the same global network. If this bridge fails, it can
disrupt the entire flow of data between regions, dividing the internet into
smaller isolated networks.
Social Networks: If a group of friends is highly connected within a
network, but the only way they communicate with another group is
through a single member (a bridge), losing this member would isolate
both groups.
Conclusion:
Components indicate the overall fragmentation of a network, while
bridges are vital in maintaining the connectivity between different parts
of the network. Both are critical for understanding and optimizing
network structure and resilience.
77 Discuss the utilization of graph partitioning algorithms as a method for CO4 5
segmenting large-scale networks into smaller, computationally efficient
substructures, enhancing their manageability and analytical processing.
Graph partitioning algorithms are essential tools for dividing large-scale
networks into smaller, more manageable subgraphs, improving computational
efficiency and enabling more effective analysis.
Role of Graph Partitioning:
Segmentation of Complex Networks: Large networks, like social media
platforms or biological networks, often consist of millions of nodes and
edges, making direct analysis computationally prohibitive. Partitioning
breaks the graph into smaller subgraphs, making them easier to handle.
Improved Scalability: By partitioning a graph, we can process each
subgraph independently in parallel, significantly speeding up
computations, especially in large-scale data analytics and machine
learning tasks.
Types of Graph Partitioning Algorithms:
1. Spectral Partitioning:
o Method: Uses the eigenvalues and eigenvectors of the graph’s
Laplacian matrix to find optimal partitions.
o Effect: Helps to balance partition sizes while minimizing the
number of edges between them, ensuring efficient
communication between subgraphs.
o Use Case: Suitable for clustering in social network analysis and
community detection.
2. Modularity-Based Partitioning (e.g., Louvain, Leiden):
o Method: Maximizes modularity, a measure that quantifies the
density of edges within partitions relative to random graphs.
o Effect: Produces high-quality partitions by ensuring that most
connections remain within subgraphs.
o Use Case: Community detection in social networks or
biological networks (e.g., protein interactions).
3. Kernighan-Lin Algorithm (Min-Cut):
o Method: Minimizes the number of edges between subgraphs (cut
edges), aiming for balanced partitions.
o Effect: Produces balanced partitions but can be computationally
expensive for large graphs.
o Use Case: Used in parallel computing and circuit design.
Benefits:
Enhanced Computational Efficiency: Partitioned networks allow for
parallel processing and reduce the computational load per subgraph.
Improved Analytical Processing: Smaller subgraphs enable more
focused analysis, such as detecting communities or clusters and
identifying key nodes or patterns.
Scalability: Graph partitioning enables the analysis of massive
networks that would otherwise be too large to process effectively.
Conclusion:
Graph partitioning algorithms enhance the manageability and efficiency of
analyzing large networks by breaking them into smaller, more tractable
substructures, thereby facilitating better performance, scalability, and insightful
data analysis.
78 Compare different graph partitioning algorithms, evaluating their advantages, CO4 10
limitations, and effectiveness in segmenting large-scale networks
Comparison of Graph Partitioning Algorithms
Graph partitioning algorithms are essential for dividing large-scale networks into
smaller, more manageable subgraphs. Each algorithm has specific advantages,
limitations, and suitability based on the network characteristics and analysis
requirements. Below is a comparison of popular graph partitioning algorithms.
1. Spectral Partitioning
Overview:
Spectral partitioning involves using the eigenvalues and eigenvectors of the
graph’s Laplacian matrix to find an optimal partition that minimizes the edge
cuts between subgraphs.
Advantages:
Theoretical foundation: Based on spectral graph theory, which
guarantees optimal cuts in terms of minimizing edge cuts.
Quality of partitions: Tends to produce balanced and meaningful
partitions that minimize inter-partition edges.
Effective for community detection: Excellent at detecting clusters or
communities within the graph.
Limitations:
Computational complexity: Eigenvalue decomposition is
computationally expensive, especially for large-scale graphs (e.g.,
𝑂(𝑛3 )).
Scalability: Not suitable for very large graphs unless approximations
(e.g., Lanczos method) are used.
Use Case:
Used in applications like community detection in social networks and graph
clustering.
Comparison Summary
Algorithm Advantages Limitations Use Case
Theoretically Computationally Community
Spectral
grounded, high- expensive for large detection, graph
Partitioning
quality partitions graphs clustering
Fast, scalable, Social network
Resolution limit,
Louvain hierarchical analysis, biological
heuristic nature
partitions networks
Large-scale
Improved accuracy, Still heuristic,
Leiden community
better stability memory-intensive
detection
Computationally
Kernighan- Balanced partitions, Parallel computing,
expensive, local
Lin simple concept circuit design
optima
Resolution limit, Scientific
Scalable, works well
Metis complex computing, load
for sparse graphs
implementation balancing
Conclusion:
Louvain and Leiden are the most popular for community detection in
social and biological networks due to their scalability and ease of use,
with Leiden being a more refined version.
Spectral partitioning is highly accurate but computationally intensive,
suitable for medium-sized graphs.
Kernighan-Lin and Metis are best for applications requiring balanced
partitions, such as parallel computing and mesh generation, though
they face limitations with very large graphs.
79 Solve the following: CO4 10
A technology startup consists of 18 employees categorized into three teams
(Software Development, Marketing, and Operations). The work collaboration
network follows these principles:
- Each employee must collaborate with at least three members within their own
team.
- Every employee must maintain at least two inter-team collaborations.
- The CEO and senior managers act as key connectors, forming additional links
across all teams.
Construct the team collaboration network. Compute E-I.
Conclusion:
E - I = -0.67 indicates that there are more internal collaborations (within
teams) than external collaborations (between teams), highlighting a
strong intra-team focus and a weaker inter-team connectivity despite the
CEO and senior managers' roles as connectors.
80 Solve the following: CO4 10
A research collaboration network consists of 25 researchers working across five
different disciplines. The collaboration rules are as follows:
- Each researcher must have at least two connections within their primary
discipline.
- Every researcher is required to establish at least three inter-disciplinary
collaborations.
- Senior researchers act as knowledge hubs, forming cross-disciplinary
connections beyond the minimum required ties.
Construct the network graph. Compute E-I.
81 Analyze the Random Graph Kernel, analyzing its mathematical formulation and CO4 10
demonstrating its application in network similarity measurement.
Random Graph Kernel: Analysis and Application (10 Marks)
5. Advantages
Captures deep structural similarity between graphs.
Can be computed efficiently using matrix operations with the
Kronecker product.
6. Limitations
Computational complexity: Kronecker products and matrix
exponentiation can be expensive for large graphs.
Sensitive to noise in graph structure (e.g., spurious edges).
8. Conclusion
The Random Graph Kernel is a powerful tool for quantifying graph similarity
based on the structural behavior of random walks. Despite its computational
cost, it is widely used in domains such as bioinformatics, social network
analysis, and computer vision for its robust comparison of intricate graph
structures.
Would you like a diagram to visually illustrate this kernel's concept?
5. Strengths
Linear time complexity in graph size and number of iterations.
Effective in capturing local graph structures.
Works with both labeled and unlabeled graphs.
Extensible: Variants like WL-Subtree, WL-Edge, and WL-ShortestPath
improve specificity.
6. Limitations
May fail to distinguish structurally different graphs that are WL-
equivalent (i.e., indistinguishable by WL test).
Focused on local structure, so may miss global topological patterns.
Requires manual tuning of the number of iterations HH.
Conclusion
By combining centrality-based influence detection with content sentiment
analysis and targeted engagement strategies, a social media analyst can
systematically identify and activate brand advocates to drive organic and
authentic brand promotion.
85 Differentiate between directed and undirected graphs within the context of CO5 2
Gephi, highlighting their structural distinctions and implications for network
visualization and analysis.
Directed vs. Undirected Graphs in Gephi (Short Answer – 2 Marks)
In Gephi, a directed graph consists of edges with a specific direction (from
source to target), representing asymmetric relationships such as "follows",
"retweets", or "cites". An undirected graph, on the other hand, features edges
without direction, implying mutual or bidirectional relationships, like
"friends", "co-authorship", or "collaborations".
Structural Distinctions:
Directed edges are visualized with arrows in Gephi, showing flow.
Undirected edges are simple lines, reflecting equality in connection.
Implications:
Directed graphs are used to analyze influence, information flow, and
hierarchy.
Undirected graphs are suitable for studying cohesion, clustering, and
mutual connectivity.
Thus, the choice of graph type in Gephi depends on the nature of the
relationship being modeled and significantly affects centrality metrics and
community detection.
86 Enumerate three significant applications of Gephi, illustrating its utility in CO5 2
diverse fields such as social network analysis, biological networks, and
information dissemination modeling.
Three Significant Applications of Gephi (Short Answer – 2 Marks)
1. Social Network Analysis:
Gephi helps visualize and analyze relationships on platforms like Twitter
or Facebook, identifying key influencers, community clusters, and
engagement patterns (e.g., during elections or viral campaigns).
2. Biological Networks:
In systems biology, Gephi is used to model protein-protein interaction
networks or gene regulation pathways, revealing critical nodes and
functional modules in complex biological systems.
3. Information Dissemination Modeling:
Gephi supports the simulation of information spread in communication
networks, allowing researchers to understand diffusion patterns, detect
bottlenecks, and design effective outreach strategies (e.g., during public
health campaigns).
These use cases show Gephi’s versatility across disciplines for both structural
insight and decision-making.
87 Identify and explain three key advantages of using Gephi as a network analysis CO5 2
tool compared to alternative software.
1. User-Friendly Interface: Gephi offers an intuitive graphical interface
that makes it easy to visualize and manipulate network graphs without
extensive coding knowledge.
2. Real-Time Visualization: It allows dynamic, real-time interaction with
large networks, making it ideal for exploratory data analysis.
3. Extensive Plugins and Layouts: Gephi supports various built-in layouts
and plugins for advanced network analysis and customization.
88 Examine the limitations of Gephi in handling large, complex network structures. CO5 2
1. Performance Issues: Gephi can struggle with very large or highly
complex networks, leading to slow performance or crashes due to high
memory usage.
2. Limited Scripting and Automation: Unlike tools like NetworkX, Gephi
lacks strong support for scripting and automation, making repetitive or
large-scale analyses harder to manage.
89 List alternative tools used for network analysis apart from Gephi, explaining CO5 2
their comparative advantages.
1. Cytoscape: Better suited for biological network analysis with strong
support for molecular and genetic data, and a rich set of plugins.
2. NetworkX (Python Library): Ideal for complex and large-scale
network analysis with full scripting capabilities, offering greater
flexibility and automation than Gephi.
90 Justify the significance of network analysis tools in extracting insights from CO5 2
large-scale data.
Network analysis tools are essential for identifying patterns, relationships, and
key influencers within large-scale data, which traditional analysis methods may
miss. They help uncover hidden structures, such as communities or central
nodes, enabling better decision-making in fields like social media, biology, and
cybersecurity.
91 Explain node and edge structures in Gephi, and their significance in network CO5 2
visualization.
In Gephi, nodes represent entities (e.g., people, websites) and edges represent
relationships or interactions between them (e.g., friendships, links). Their
structure helps visualize the network’s topology, showing how entities are
connected, identifying central nodes, and revealing patterns like clusters or hubs.
92 Elaborate on the fundamental concepts of nodes and edges in network analysis, CO5 5
detailing their representation and visualization within Gephi.
In network analysis, nodes (also called vertices) represent individual entities
such as people, organizations, or web pages, while edges (or links) represent the
relationships or interactions between these entities, such as communication,
collaboration, or hyperlinks. In Gephi, nodes are visualized as points and edges
as lines connecting them. Gephi allows customization of node size, color, and
labels based on attributes like centrality or category, and edges can vary in
thickness or color to show weight or type of relationship. This visual
representation helps identify patterns such as clusters, key influencers, and
overall network structure.
93 Discuss how Gephi facilitates interactive network exploration and analysis, CO5 5
highlighting key interactive features that enhance data interpretation and
visualization.
Gephi facilitates interactive network exploration through a range of dynamic
features. Users can zoom, pan, and filter networks in real time, making it easy to
focus on specific nodes or communities. The layout algorithms (like
ForceAtlas2) allow users to rearrange nodes visually to reveal structures and
patterns. The data laboratory provides a spreadsheet-like view to edit node and
edge data directly. Dynamic filtering helps isolate parts of the network based on
attributes such as degree or centrality. Additionally, real-time metrics (e.g.,
modularity, betweenness) allow users to analyze structural properties
interactively, enhancing insight and interpretation.
94 Examine the management of node and edge attributes in Gephi, providing CO5 5
illustrative examples of how these attributes influence network analysis.
In Gephi, node and edge attributes are managed through the Data Laboratory,
where users can add, edit, and import data like labels, weights, colors, and
categories. These attributes influence how the network is analyzed and
visualized. For example, a node's degree (number of connections) can be used
to size nodes—larger nodes represent more influential entities. A node attribute
like 'group' can be used to color-code communities. For edges, a weight
attribute might represent interaction frequency, affecting edge thickness. These
visual cues help identify patterns such as central hubs, clusters, or strong
relationships.
95 Differentiate between global and local network properties, offering examples of CO5 5
each and analyzing their relevance in understanding network structure and
dynamics.
Global network properties describe the overall structure and behavior of the
entire network, while local network properties focus on individual nodes or
small groups of nodes.
Global example: Average path length – shows the typical number of
steps between any two nodes, useful for understanding how quickly
information spreads.
Local example: Node degree – the number of connections a node has,
indicating its influence or centrality within its immediate surroundings.
These properties help in understanding both the macro-level structure (e.g.,
how interconnected the network is) and micro-level roles (e.g., key influencers
or bridges), aiding in tasks like optimization, targeting, or vulnerability analysis.
96 Define the principles of assortativity and disassortativity in network analysis, CO5 5
explaining measurement techniques and the insights they provide regarding node
relationships.
Assortativity refers to the tendency of nodes in a network to connect with
similar nodes, while disassortativity indicates a preference for connecting with
dissimilar nodes. This is commonly measured using the assortativity
coefficient, which ranges from -1 to +1:
A positive value (e.g., +0.6) indicates assortativity—nodes connect with
others of similar degree (e.g., high-degree nodes with other high-degree
nodes).
A negative value (e.g., -0.4) shows disassortativity—high-degree nodes
connect with low-degree nodes.
These patterns offer insights into network structure: assortative networks (like
social networks) suggest strong community or peer-group formation, while
disassortative networks (like the internet) reveal hierarchical or hub-and-spoke
structures.
97 Analyze the concept of network density and its role in determining connectivity CO5 5
and clustering in graphs.
Network density measures the proportion of possible connections in a network
that are actual connections. It is calculated as the ratio of the number of edges
present to the number of edges possible, with values ranging from 0 (no
connections) to 1 (complete connectivity).
Density plays a key role in:
1. Connectivity: High density indicates a well-connected network, where
nodes are likely to be directly or indirectly reachable from one another.
Low density suggests sparse connectivity and isolated nodes.
2. Clustering: Dense networks tend to form tightly-knit clusters or
communities, as the higher number of edges increases the likelihood of
connections between neighboring nodes.
In practice, high density networks often represent cohesive groups (e.g., tightly
connected social groups), while low density networks may indicate decentralized
structures (e.g., the web or large-scale organizational networks).
98 Analyze the concept of network density in graph theory, explaining its CO5 5
correlation with the number of nodes and edges and its implications for network
connectivity.
In graph theory, network density is a measure of how many edges exist in a
network relative to the maximum possible number of edges. It is calculated
using the formula:
Where:
E is the number of edges,
N is the number of nodes.
As the number of nodes (N) increases, the number of possible edges grows
exponentially, which means that a network with many nodes needs a
significantly higher number of edges to maintain the same density. High density
implies that most nodes are connected, indicating strong connectivity and
potential for clustering. Low density, conversely, suggests a more sparse or
loosely connected network, with isolated nodes or clusters.
In practical terms:
High density networks often show tight-knit communities or efficient
communication pathways.
Low density networks may indicate decentralized structures, common in
large-scale systems or networks with many isolated or weakly connected
components.
99 Explain modularity in network analysis and discuss its role in detecting CO5 10
community structures using Gephi.
Modularity in network analysis is a measure of the strength of division of a
network into modules or communities. It quantifies the degree to which nodes
within a community are more densely connected to each other than to nodes in
other communities. Modularity values range from -1 (poor division) to 1 (perfect
division), with higher values indicating stronger community structure.
Role in Detecting Community Structures in Gephi:
1. Community Detection: Gephi uses modularity-based algorithms, like
the Louvain method, to automatically detect communities by
maximizing modularity. Communities are groups of nodes that are more
interconnected internally than with the rest of the network.
2. Visualization: Communities are visually represented with distinct colors
and clustered nodes in Gephi, making it easier to interpret the structure
and identify natural groupings in the network.
3. Improved Analysis: Modularity helps reveal hidden patterns and
subgroups in complex networks (e.g., social networks, research
collaboration networks), which might be overlooked using simpler
metrics.
By focusing on modularity, Gephi enables deeper insights into how networks are
organized, aiding in tasks like identifying key groups or understanding network
flow and resilience.
100 Explain the concept of modularity in network analysis, discussing its CO5 10
significance in detecting community structures and its implementation in Gephi.
Modularity in network analysis is a measure that quantifies the strength of
division of a network into communities or modules. It calculates how well the
network can be divided into subgroups of nodes, where nodes within the same
community are more densely connected to each other than to nodes in other
communities. The modularity score ranges from -1 to 1, where higher values
(close to 1) indicate strong community structures.
Significance in Detecting Community Structures:
1. Community Identification: Modularity helps identify clusters or groups
of nodes that are highly interconnected, revealing the hidden structure of
a network.
2. Network Understanding: It highlights important groups within large
networks, such as tightly-knit social circles, research collaborations, or
functional groups in biological networks.
3. Dynamic Analysis: It can be used to detect evolving communities over
time in dynamic networks.
Implementation in Gephi:
Gephi uses modularity-based algorithms like the Louvain method for
community detection. This method works by iteratively optimizing the
modularity score, assigning nodes to communities that maximize internal
connections while minimizing external connections.
Once communities are detected, Gephi visually represents them with
different colors, helping users quickly identify and analyze the structure
of the network.
Gephi’s modularity feature is critical in understanding the organization of
complex networks, offering insights into network flow, stability, and group
dynamics.
101 Elaborate on Impact of Social Network Analysis (SNA) on Public Health Crisis CO5 10
Management.
Social Network Analysis (SNA) has a significant impact on Public Health
Crisis Management by offering insights into the relationships and interactions
between individuals, groups, or organizations during health crises. Its
applications include:
1. Tracking Disease Spread: SNA helps track how infectious diseases
spread within communities by identifying key individuals (e.g.,
"superspreaders") and the flow of information or infection through social
networks.
2. Identifying Vulnerable Populations: It highlights groups or individuals
with limited social connections, who may be more vulnerable to health
risks due to isolation, providing targeted intervention strategies.
3. Optimizing Resource Distribution: By analyzing communication and
interaction networks, SNA helps optimize the allocation of resources
(e.g., vaccines, medical supplies) to areas or individuals most at risk.
4. Improving Communication and Information Flow: SNA identifies
key nodes (influencers) in the network, enabling more efficient
communication strategies during crises, ensuring accurate health
information reaches the right audiences.
5. Coordinating Responses: By analyzing the structure of health
organizations or response teams, SNA improves coordination, identifying
bottlenecks and streamlining collaboration across multiple agencies.
In public health crises such as epidemics, pandemics, or natural disasters, SNA
aids in understanding human behavior, predicting outcomes, and enhancing
response efforts.
102 Elaborate on Impact of Social Network Analysis (SNA) in Political Campaigns. CO5 10
Social Network Analysis (SNA) plays a crucial role in Political Campaigns by
providing insights into the structure and dynamics of voter networks, influencing
political strategies and decision-making. Here are key ways SNA impacts
political campaigns:
1. Voter Segmentation and Targeting: SNA helps identify key
influencers, groups, and communities within the electorate. By analyzing
relationships between voters, campaigns can tailor messages to specific
segments, ensuring a more personalized and effective approach.
2. Identifying Opinion Leaders: In political campaigns, certain
individuals within a network, often referred to as opinion leaders or
influencers, have a greater impact on shaping public opinion. SNA can
pinpoint these individuals, allowing campaigns to focus on them for
spreading messages and encouraging grassroots support.
3. Tracking Information Flow: SNA helps track how information flows
within a population, revealing potential misinformation spread or
identifying networks through which political messages reach voters. This
allows campaigns to leverage efficient channels for communication,
whether traditional or digital.
4. Analyzing Voter Behavior: By examining social connections and
interactions among voters, SNA uncovers patterns in how people form
opinions, shift allegiances, or react to political events. This analysis helps
predict voting behavior, enabling campaigns to adapt strategies and
tactics in real-time.
5. Optimizing Campaign Resources: SNA helps determine which
geographic areas or voter groups have the most interconnected and active
networks. Campaigns can then concentrate resources, such as advertising
and volunteer efforts, in these regions or demographics, improving
outreach efficiency.
6. Social Media Strategy: With the rise of social media, SNA is vital in
political campaigns for analyzing online interactions, identifying key
social media influencers, and understanding how viral content spreads.
This helps campaigns craft targeted digital marketing strategies to engage
voters and amplify messages.
7. Crisis Management: During political campaigns, negative publicity or
scandals can spread quickly. SNA enables campaigns to monitor public
sentiment and the spread of information across networks, allowing for
swift response and mitigation of damage. Understanding the flow of
information can guide the campaign in countering false narratives or
shaping public opinion.
8. Coalition Building: Political campaigns often seek to build broad-based
coalitions across different demographic or interest groups. SNA can map
out existing relationships and find potential allies or groups with shared
interests, helping strategists to form effective alliances and partnerships.
9. Debate and Discourse Analysis: SNA can be used to analyze political
debates, speeches, and public discourse to assess how opinions are
shaped and identify trends in voter concerns. This data helps refine
campaign messaging to address pressing issues directly.
10. Voter Mobilization and Engagement: By analyzing how voters are
connected and mobilized within social networks, campaigns can design
strategies to increase voter turnout. Targeting highly influential nodes or
key groups within the network can encourage more people to vote,
particularly in swing states or undecided voter blocs.
In summary, Social Network Analysis in political campaigns enhances voter
targeting, resource optimization, and the ability to understand and influence
public opinion. Its power lies in its ability to map and analyze relationships and
information flows, making political strategies more data-driven and effective.
103 Examine the role of social network analysis in cybersecurity and fraud detection, CO5 10
proposing effective methodologies.
Social Network Analysis (SNA) plays a crucial role in cybersecurity and
fraud detection by analyzing relationships and patterns of interactions in digital
systems to identify suspicious activities or vulnerabilities. Below are key
methodologies and their roles in these domains:
1. Anomaly Detection: SNA helps detect unusual patterns of behavior
within a network. In cybersecurity, anomalies such as sudden, unusual
communication between seemingly unrelated entities may indicate a
cyber attack (e.g., DDoS, insider threats). For fraud detection, irregular
transaction flows or new connections in financial networks can point to
money laundering or fraudulent activity.
2. Identifying Fraudulent Networks: In financial systems, SNA helps
identify fraudulent actors by mapping transaction patterns. Money mules
or colluding fraudsters often form tightly-knit subgroups, detectable
through community detection algorithms. These groups exhibit
abnormal levels of internal connectivity compared to the rest of the
network.
3. Key Node Detection: SNA can identify central or influential nodes
within a network (e.g., brokers in a fraud ring or the command center in a
cyberattack). Identifying these key nodes enables targeted intervention,
such as blocking accounts or isolating compromised systems, preventing
further damage.
4. Link Prediction and Risk Assessment: SNA methodologies like link
prediction can anticipate potential future connections between entities
(users, systems) based on existing patterns. In cybersecurity, this can
help predict and preemptively block potential attack paths. In fraud
detection, it can predict the likelihood of new fraudulent transactions
between accounts, allowing for proactive measures.
5. Behavioral Profiling: By analyzing communication and transaction
patterns, SNA can help create behavioral profiles for normal and
malicious actors. Any deviation from established profiles can raise alerts,
helping cybersecurity systems spot phishing attacks, account
takeovers, or unauthorized access attempts.
6. Collaborative Detection: SNA can be used to track cross-organizational
cyber threats or fraud networks. Cross-institutional collaboration can
enhance detection by sharing insights from different organizations’
networks, making it easier to identify patterns of fraud or coordinated
cyber attacks across multiple targets.
7. Network Visualization: SNA’s ability to visualize connections makes it
easier to identify hidden relationships and suspicious activity. In fraud
detection, visualizing financial transactions or communication flows can
highlight clusters of fraudulent activity, while in cybersecurity,
visualizing connections between compromised systems can help pinpoint
the source of an attack.
8. Decentralized Threats: In cybersecurity, decentralized attacks (e.g.,
botnets, peer-to-peer attacks) are common. SNA can identify patterns of
distributed coordination among botnets or other decentralized threats,
helping to dismantle them by isolating compromised nodes or identifying
the “command and control” centers.
9. Social Engineering Detection: Fraudsters often use social engineering
tactics to manipulate individuals into revealing sensitive information.
SNA can analyze patterns in social media or email interactions,
identifying networks of individuals who may be targeted by phishing
attacks or social manipulation.
10. Real-Time Monitoring: Using SNA for real-time analysis of data from
logs, transactions, or communication, cybersecurity teams can identify
malicious activities as they occur, responding quickly to mitigate
potential threats. In fraud detection, real-time analysis of transactional
behavior allows rapid identification and prevention of fraudulent
transactions.
Effective Methodologies:
Community Detection Algorithms: To find subgroups within larger
networks that may indicate fraud rings or internal collusion.
Graph Theory Metrics: Such as centrality (to find important nodes)
and clustering coefficient (to find tightly-knit groups), used to detect
suspicious activities or vulnerabilities.
Dynamic Network Analysis: For continuously monitoring evolving
network structures in real time, spotting sudden changes or emerging
threats.
In conclusion, SNA is an effective tool in cybersecurity and fraud detection,
enabling the identification of hidden relationships, anomalous behavior, and
central figures within a network, providing actionable insights for timely
interventions.
104 Elaborate on Social Network Analysis in Cybersecurity and Fraud Detection CO5 10
Social Network Analysis (SNA) is a powerful tool in cybersecurity and fraud
detection as it helps identify patterns of relationships, behaviors, and
interactions within a network. Here’s how it plays a crucial role:
1. Identifying Suspicious Connections: In cybersecurity, SNA helps
detect unusual patterns of connections between users, devices, or
systems, often identifying malicious activity like botnets or insider
threats. For fraud detection, it reveals hidden links between accounts or
transactions, potentially pointing to fraudulent networks.
2. Anomaly Detection: SNA helps identify anomalies in network
interactions. Unusual connections or abnormal communication patterns
(such as a surge in transactions or interactions between previously
unconnected entities) can be flagged for further investigation, potentially
indicating cyber-attacks or fraudulent activities.
3. Fraud Ring Detection: By analyzing financial transactions or social
interactions, SNA can identify clusters or tightly-knit groups of
individuals who may be engaged in coordinated fraudulent activities,
such as money laundering or insurance fraud. Detecting these groups
helps dismantle fraud operations.
4. Key Node Identification: SNA identifies influential or central nodes in a
network. In the context of cybersecurity, these nodes could represent
critical infrastructure or attack points, while in fraud detection, they
could point to individuals or accounts coordinating fraudulent schemes.
Isolating or monitoring these key nodes can mitigate damage.
5. Behavioral Profiling: SNA allows the creation of normal behavior
profiles based on network interactions. Any deviation from these patterns
(e.g., sudden changes in user activity or transaction volume) can be
flagged as suspicious, helping to detect phishing attacks, unauthorized
access, or identity theft in real-time.
6. Tracking Information Flow: In cybersecurity, SNA is used to track
how malware spreads or how an attack propagates through a network,
helping to contain and neutralize the threat. For fraud, it helps identify
how fraudulent information or transactions spread through a network,
enabling swift intervention.
7. Social Engineering Detection: Fraudsters often manipulate individuals
through social engineering. SNA can be used to detect patterns of
communication that are typical in phishing scams, fake job offers, or
other manipulative tactics, allowing organizations to intercept fraud
attempts.
8. Collaborative Detection: SNA enhances collaboration between different
organizations or agencies to detect cross-entity fraud or cyber threats.
Sharing insights about relationships and activities in a network can lead
to faster identification of coordinated attacks or fraud rings.
9. Real-Time Analysis: SNA techniques can be applied in real-time to
monitor network traffic or financial transactions. This enables immediate
detection of suspicious activities, allowing for rapid response and
prevention of cyber-attacks or fraudulent transactions.
10. Visualizing Networks: SNA provides visualizations of network
connections, making it easier to spot patterns such as isolated nodes,
centralized hubs, or tight clusters of activity. These visual representations
are essential for understanding the structure of cyber threats or fraud
networks and for developing targeted defense strategies.
In summary, Social Network Analysis provides valuable tools for both
cybersecurity and fraud detection by analyzing the relationships, behaviors,
and structures within networks. It enhances the ability to detect anomalies,
identify key players, and understand the flow of malicious activity, improving
the overall effectiveness of security measures and fraud prevention strategies.
105 Define the concept of social identity, explaining its role in network structures. CO5 10
Social identity refers to the way individuals define themselves based on their
membership in social groups, such as family, culture, religion, profession, or
community. It is shaped by shared values, norms, and experiences, and it
influences how individuals interact with others within and outside their group.
In network structures, social identity plays a crucial role in determining the
connections and dynamics between individuals or nodes. Here's how it impacts
network structures:
1. Group Formation: Social identity helps individuals form tight-knit
communities or subgroups within larger networks, based on shared
values, interests, or goals. This leads to the emergence of clusters or
communities in networks, often with strong internal ties and weaker
external connections.
2. Social Influence: The social identity of individuals affects their
susceptibility to influence and their role in spreading information within
networks. People tend to trust and follow those who share similar social
identities, which shapes the flow of ideas and behaviors through a
network.
3. Segmentation: Social identity can lead to network segmentation, where
different groups within the network form separate clusters. These
divisions may lead to stronger bonding within groups and weaker
connections between groups, impacting collaboration or information
exchange across the network.
4. Conflict or Cooperation: Social identity can foster either conflict or
cooperation in network structures. Groups with opposing identities may
compete for resources or influence, while groups with shared identities
tend to collaborate more effectively, leading to stronger, more cohesive
subgroups.
5. Community Detection: In network analysis, social identity can be a key
factor in community detection algorithms, which identify clusters of
nodes that are more connected to each other than to the rest of the
network. Social identity influences the formation of these communities,
guiding the detection of group-based structures.
6. Identity and Centrality: Nodes with strong social identity ties may
become more central in the network, playing roles as influencers or
leaders who help connect disparate groups or act as bridges between
clusters. These central nodes may hold disproportionate power or
influence within the network.
7. Cultural Exchange and Innovation: Social identity can impact how
information, culture, or innovation spreads through a network. People
with shared identities are more likely to share ideas, collaborate, or
innovate within their group, leading to rapid cultural exchange and the
creation of novel solutions within the network.
8. Network Stability: Social identity contributes to the stability of
networks. Groups with strong social identities may have more resilient
connections, as individuals are motivated to stay engaged and loyal to the
group, which can help stabilize the network against disruptions or
external challenges.
In summary, social identity is a fundamental concept in network analysis,
influencing how groups form, communicate, and interact within a larger network
structure. It shapes network dynamics, fosters cooperation or conflict, and plays
a central role in information flow, community detection, and the overall
connectivity of the network.
106 What is Social Identity? Provide a brief definition. CO6 2
Social identity is the part of an individual’s self-concept that is derived from
their membership in social groups, such as family, culture, religion, or
profession. It shapes how individuals perceive themselves and interact with
others based on shared values, norms, and experiences within those groups.
107 What is Social Affiliation? Explain its meaning. CO6 2
Social affiliation refers to the connection or association an individual has with a
particular group, community, or social network. It represents the sense of
belonging or alignment with others who share common interests, values, or
identities, influencing interactions, behaviors, and relationships within that
group.
108 List some key applications of Social Media Mining. CO6 2
Key applications of Social Media Mining include:
1. Sentiment Analysis: Analyzing public opinions, emotions, or attitudes
towards brands, products, or events from social media content.
2. Trend Detection: Identifying emerging trends, topics, or viral content to
guide marketing strategies or public relations efforts.
3. Influencer Identification: Finding key influencers in a specific domain
or industry to enhance marketing campaigns or brand outreach.
4. Social Network Analysis: Understanding relationships and interactions
between users to improve customer engagement or detect communities.
5. Customer Feedback and Service: Mining social media data to gather
insights on customer satisfaction and improve products or services.
109 Analyze the significance of graph coloring in network analysis and optimization CO6 2
problems.
Graph coloring is significant in network analysis and optimization problems as
it involves assigning colors to the vertices of a graph in such a way that no two
adjacent vertices share the same color. Key applications include:
1. Conflict Resolution: In scheduling problems, graph coloring helps avoid
conflicts (e.g., assigning resources or timeslots to tasks where conflicts
must be minimized).
2. Network Optimization: It is used to optimize resource allocation in
networks, such as minimizing interference in wireless communication or
optimizing frequency assignments.
Overall, graph coloring aids in efficiently solving problems related to resource
allocation, scheduling, and network optimization.
110 Explain the concept of Graph Coloring in network analysis. CO6 2
Graph coloring in network analysis refers to the assignment of labels (or
"colors") to the vertices of a graph such that no two adjacent vertices share the
same color. The objective is to minimize the number of colors used while
ensuring that adjacent nodes have distinct colors. This concept is widely used in
problems like:
1. Scheduling: Assigning resources or time slots in such a way that no
conflicts occur.
2. Frequency Assignment: Ensuring that adjacent communication
channels do not interfere with each other by assigning different
frequencies.
Graph coloring helps in solving optimization problems by reducing resource
usage and avoiding conflicts.
111 Define Information Diffusion and explain its significance in network science. CO6 2
Information diffusion refers to the process by which information spreads
through a network, typically from one node (individual, organization, etc.) to
others. It models how knowledge, ideas, or behaviors propagate across
connected entities in a social or communication network.
Significance in network science:
1. Understanding Spread: Information diffusion helps understand how
trends, innovations, or diseases spread in social networks, aiding in
targeted interventions or marketing strategies.
2. Optimizing Influence: It is crucial for identifying key influencers in a
network to optimize information dissemination and influence decisions
effectively.
112 What challenges arise when analyzing large-scale dynamic social networks? CO6 2
When analyzing large-scale dynamic social networks, several challenges arise:
1. Scalability: Handling massive amounts of data and maintaining
performance while processing networks with millions of nodes and edges
over time can be computationally intensive and complex.
2. Data Continuity and Change: In dynamic networks, relationships and
nodes continuously evolve, making it difficult to track changes, ensure
data consistency, and analyze the network's real-time dynamics
effectively.
113 Describe the Graph Coloring Problem and its significance in network analysis. CO6 5
The Graph Coloring Problem involves assigning colors to the vertices of a
graph such that no two adjacent vertices share the same color, with the goal of
using the fewest number of colors possible. This problem is significant in
network analysis for several reasons:
1. Conflict Minimization: In applications like scheduling or resource
allocation, graph coloring ensures that adjacent tasks or resources do not
conflict. For example, in wireless networks, assigning frequencies
(colors) to transmitters ensures there’s no interference between adjacent
nodes.
2. Optimization: By minimizing the number of colors, the graph coloring
problem helps optimize resource usage, reducing costs and improving
efficiency in systems like frequency assignments or task scheduling.
3. Network Design: In network topology, graph coloring helps in designing
efficient networks by ensuring that neighboring nodes (e.g., routers or
switches) do not interfere with each other, improving performance and
reducing operational issues.
4. Real-World Applications: Graph coloring is used in various fields,
including map coloring, job scheduling, register allocation in compilers,
and even in game theory to minimize conflicts in competitive situations.
5. Computational Complexity: The problem is NP-hard, meaning that it’s
computationally difficult to solve for large graphs. This highlights the
importance of heuristics and approximation algorithms for practical
solutions in network analysis.
In summary, the graph coloring problem is essential in optimizing network
resources, minimizing conflicts, and ensuring efficient operations in dynamic
and complex networks.
114 Explain the Diffusion Process in Social Networks and its role in information CO6 5
spread.
The diffusion process in social networks refers to the spread of information,
behaviors, or influence from one individual (node) to others through the
network's connections (edges). It is a fundamental concept for understanding
how ideas, trends, innovations, or even diseases propagate across social systems.
Here's an explanation of its role in information spread:
1. Propagation Mechanism: Information typically diffuses in social
networks when one individual shares it with their direct connections.
These connections, in turn, share it with their own contacts, creating a
chain reaction. The process can be influenced by various factors such as
the strength of ties (close friends vs. acquaintances) or the type of
information (viral content vs. word-of-mouth).
2. Types of Diffusion Models: Several models describe how information
spreads in social networks. For example:
o Independent Cascade Model: Each active node (influencer) has
a probability of activating its neighbors in subsequent steps.
o Linear Threshold Model: Nodes are activated when a threshold
of influence from their neighbors is exceeded.
These models help in understanding how information spreads
under different circumstances.
3. Role of Network Structure: The structure of the network heavily
impacts how efficiently information diffuses. Networks with high
centrality (key influential nodes) or high clustering (close-knit
communities) can facilitate faster and more widespread diffusion.
Networks with strong bridges (connections between isolated groups) are
also critical in ensuring that information crosses between subgroups.
4. Viral Marketing: In business and marketing, understanding the
diffusion process allows companies to use social media and influencers
strategically to spread information about new products or services.
Targeting key influencers within the network can accelerate the spread of
marketing messages and generate viral campaigns.
5. Impact of Social Influence: Social influence plays a critical role in the
diffusion process. People are more likely to adopt new ideas, behaviors,
or products if they see others within their social network doing the same.
This creates a feedback loop where early adopters influence others,
which in turn leads to broader adoption.
In summary, the diffusion process in social networks describes how information
spreads across individuals and communities, influencing behaviors, trends, and
decision-making. By understanding this process, businesses, policymakers, and
researchers can design strategies to facilitate or control the flow of information
in various contexts, from marketing to public health campaigns.
115 Provide a detailed explanation of Information and Biological Networks, CO6 5
highlighting their key characteristics and applications.
Information and Biological Networks are two distinct types of networks, each
with unique characteristics and applications. Below is a detailed explanation of
both:
1. Information Networks:
Key Characteristics:
Nodes: Represent entities such as users, websites, documents, or
communication devices.
Edges: Represent the relationships or interactions between nodes, such
as hyperlinks between websites, email exchanges, or social connections.
Directed or Undirected: Information networks can be either directed
(e.g., Twitter follows) or undirected (e.g., co-authorship networks).
Dynamic Nature: Information networks constantly evolve, with nodes
and edges changing as new information is created, shared, or updated.
Data-Driven: The flow of information through these networks is based
on data interactions, which can be analyzed for patterns and trends.
Applications:
Social Media Networks: Analyzing the spread of information,
sentiments, or trends through platforms like Facebook, Twitter, and
Instagram.
Recommendation Systems: Online services (e.g., Netflix, Amazon) use
information networks to suggest products, movies, or music based on
user preferences and interactions.
Search Engines: Google and other search engines use information
networks (e.g., links between web pages) to rank search results based on
relevance.
Cybersecurity: Identifying malicious activities or vulnerabilities within
communication networks by monitoring traffic patterns and user
behavior.
2. Biological Networks:
Key Characteristics:
Nodes: Represent biological entities such as genes, proteins, cells, or
metabolic pathways.
Edges: Represent interactions or relationships between these biological
entities, such as gene-protein interactions or metabolic pathways.
Complexity: Biological networks are highly complex, with multiple
interconnected layers (e.g., genetic networks, protein interaction
networks, ecological food webs).
Network Types: They can be metabolic networks, protein-protein
interaction (PPI) networks, or gene regulatory networks, each
representing a different biological process.
Dynamic Nature: Like information networks, biological networks are
dynamic, changing in response to environmental factors, disease states,
or genetic variations.
Applications:
Disease Modeling: Understanding how diseases (like cancer) spread at
the molecular level by analyzing the interactions between genes,
proteins, and other biological entities. This helps identify biomarkers and
therapeutic targets.
Drug Discovery: In pharmaceutical research, biological networks are
used to identify potential drug targets by understanding how proteins or
genes interact within the network.
Genomics: Analyzing gene expression data through networks to uncover
relationships between genes and their roles in development, disease, or
cellular processes.
Ecosystem Modeling: Studying ecological food webs or the interactions
between species to understand biodiversity, ecosystem dynamics, and
environmental impacts.
Conclusion:
Both information networks and biological networks are essential in
understanding complex systems. While information networks help with data-
driven insights, communication, and social interactions, biological networks are
crucial for understanding life at the molecular level, offering insights into health,
disease, and ecology. Their applications range from digital media analysis to
groundbreaking medical discoveries, showing their widespread impact across
multiple domains.
116 Describe Social Learning Networks (SLN) and discuss their fundamental CO6 5
characteristics.
Social Learning Networks (SLN) are systems where individuals learn from
each other through interactions, sharing knowledge, experiences, and resources
within a social network. These networks are based on the principles of social
learning theory, which emphasizes learning through observation, imitation, and
modeling behavior in a social context.
Fundamental Characteristics:
1. Knowledge Sharing: SLNs facilitate the exchange of information, ideas,
and expertise among members, fostering collaborative learning and
problem-solving.
2. Peer Influence: Learning occurs through social influence, where
individuals learn from observing the behaviors, experiences, or
knowledge of others in the network.
3. Collaboration: These networks encourage collaboration and collective
intelligence, enabling members to co-create solutions and enhance their
learning experiences through group dynamics.
4. Dynamism: SLNs evolve over time as individuals interact, contribute,
and learn from each other, adapting to new information or shifts in the
network structure.
5. Connectivity: The effectiveness of SLNs relies on the network's
structure, where central nodes (influencers or experts) play a significant
role in spreading knowledge and guiding learning.
In summary, Social Learning Networks are important for fostering collective
learning, innovation, and collaboration by leveraging social interactions and the
sharing of knowledge.
117 Discuss how can the Graph Coloring Problem be applied to optimize scheduling CO6 5
in universities.
The Graph Coloring Problem can be effectively applied to optimize
scheduling in universities, particularly for tasks like assigning courses to
timeslots, classrooms, and instructors. Here's how it works:
1. Course Scheduling: Each course is represented as a node in a graph, and
an edge is drawn between two nodes if the corresponding courses have
conflicting elements (e.g., the same instructor or shared students). By
coloring the graph, different colors represent distinct timeslots or
classrooms, ensuring that courses with conflicts are assigned different
slots or rooms.
2. Instructor Assignment: Instructors can be assigned to specific timeslots
based on the graph coloring. If two courses share the same instructor,
they must not be scheduled at the same time. The graph coloring
algorithm helps minimize scheduling conflicts by ensuring no two
courses requiring the same instructor are assigned the same color
(timeslot).
3. Classroom Allocation: By treating classrooms as colors, the graph
coloring problem helps optimize the use of available rooms. If two
courses have overlapping student populations, they should not be
scheduled in the same room. The coloring algorithm assigns different
rooms to courses with shared students.
4. Optimization: The goal is to minimize the number of colors (timeslots,
classrooms, or instructors) used, which leads to efficient resource
allocation, reducing the need for additional rooms or timeslots and
preventing scheduling conflicts.
5. Flexibility and Adaptation: Graph coloring can also accommodate
changes in course offerings or student enrollments, allowing universities
to quickly adjust schedules while maintaining optimal use of resources.
In summary, the Graph Coloring Problem is a powerful tool in university
scheduling, ensuring efficient allocation of time, space, and resources while
minimizing conflicts.
118 Differentiate between static and dynamic networks, discussing their structural CO6 5
implications.
Aspect Static Networks Dynamic Networks
Networks with fixed nodes Networks where nodes and edges
Definition
and edges over time. change over time.
Remains unchanged; no new
Continuously evolving with
Structure connections or nodes are
changing connections and nodes.
added.
Time No time dependency;
Time-dependent; analyzed at
Dependenc analysis is based on a single
multiple time intervals.
y snapshot.
Simpler to analyze due to a More complex due to changing
Complexity
fixed structure. structure and dynamics.
Used in social networks,
Suitable for static systems
Application communication systems, and
like transport networks,
s biological networks where change is
organizational structures.
constant.
119 Discuss the key challenges involved in analyzing dynamic social networks as CO6 5
opposed to static networks.
Analyzing dynamic social networks presents several challenges compared to
static networks:
1. Time-Dependent Data: Dynamic networks evolve over time, with nodes
and edges changing frequently. Analyzing such data requires tracking
temporal changes and modeling the network's evolution, making it more
complex than analyzing a fixed structure.
2. Data Volume and Complexity: Dynamic networks generate large
volumes of data as interactions between nodes change over time. This
high-frequency data poses storage, processing, and computational
challenges.
3. Network Stability: Dynamic networks may experience rapid
fluctuations or instability in structure, making it difficult to identify long-
term patterns or trends and complicating predictive analysis.
4. Real-Time Analysis: Unlike static networks, dynamic networks require
real-time monitoring and analysis to capture ongoing changes, which
increases the need for advanced algorithms and tools.
5. Community Detection: In dynamic networks, communities can form,
dissolve, or shift over time, making it challenging to detect and track
communities, as compared to static networks where community
structures are more stable.
These challenges require specialized techniques and algorithms to handle time-
varying interactions, large-scale data, and evolving network structures.
120 Discuss Ethics in Social Network Analysis with example. CO6 10
Ethics in Social Network Analysis (SNA) is crucial for ensuring the
responsible and respectful use of data, particularly when it involves personal or
sensitive information. The ethical considerations in SNA revolve around
privacy, consent, data usage, and the potential impact of analysis on individuals
and communities. Here's a breakdown of key ethical aspects with examples:
1. Privacy and Confidentiality:
Concern: Social network analysis often involves collecting and
analyzing data from individuals' online interactions, which can include
personal information and behaviors.
Example: A researcher analyzing Twitter data must ensure that personal
identifiers are anonymized to protect users’ privacy, especially when
analyzing sensitive topics like mental health or political opinions.
2. Informed Consent:
Concern: Participants in social network studies must be fully informed
about how their data will be used and must voluntarily consent to its
collection.
Example: In a study involving online communities, researchers should
obtain explicit consent from users before extracting their interaction data,
ensuring transparency about the purpose of the research and data sharing.
3. Data Security:
Concern: Ensuring the secure storage and handling of collected data is
critical to protect against data breaches or misuse.
Example: If an organization collects data from a social media platform
for analysis, they must implement strong security measures (e.g.,
encryption) to prevent unauthorized access to sensitive information.
4. Impact on Participants:
Concern: Social network analysis can lead to unintended consequences
for individuals, such as reputation damage, social exclusion, or
stigmatization.
Example: Analyzing online social networks to identify "influencers"
could lead to the unintentional exposure of users who may not want to be
highlighted, affecting their privacy or personal life.
5. Bias and Fairness:
Concern: Social network analysis models can unintentionally perpetuate
bias if the data used is skewed or does not represent all groups fairly.
Example: If an SNA model is used for hiring recommendations based on
professional networks, it may unintentionally favor individuals from
certain social or demographic groups, leading to discriminatory
outcomes.
6. Use of Data for Manipulative Purposes:
Concern: Social network analysis can be used for manipulative or
harmful purposes, such as targeting vulnerable individuals with
misleading information or exploiting social behaviors.
Example: Political campaigns or marketers may misuse social network
analysis to target individuals with personalized content, exploiting their
psychological vulnerabilities (e.g., micro-targeting with misleading ads).
7. Transparency and Accountability:
Concern: Ethical SNA requires transparency in methodology, ensuring
that research processes and data sources are clear to the public or
participants.
Example: A researcher publishing an SNA study on online
misinformation should provide clear information on how the data was
collected, analyzed, and the ethical guidelines followed.
Conclusion:
Ethics in Social Network Analysis is essential to ensure that the collection, use,
and interpretation of data do not harm individuals or communities. Researchers
and organizations must adhere to privacy standards, seek informed consent,
ensure transparency, and strive for fairness to avoid exploiting or causing
negative consequences for participants. Addressing these ethical concerns
ensures that social network analysis contributes positively to society without
infringing on individual rights.
121 Discuss on Privacy in online social networks CO6 10
Privacy in online social networks is a critical concern as users share vast
amounts of personal information through platforms like Facebook, Twitter,
Instagram, and LinkedIn. These networks can expose sensitive data to a wider
audience, creating both opportunities and risks. Here’s a detailed discussion on
privacy issues in online social networks:
1. Personal Information Exposure:
Concern: Users often unknowingly or unknowingly share personal
details such as their location, relationship status, interests, and even daily
activities.
Example: A user posts about a vacation, revealing their absence from
home, which can be exploited by malicious actors.
2. Data Mining and Profiling:
Concern: Social media companies often mine user data to build detailed
profiles for advertising and other commercial purposes, which may
infringe on privacy.
Example: Ads are targeted based on users’ likes, shares, and
interactions, sometimes even before they realize the data is being
collected.
3. Third-Party Access:
Concern: Many social media platforms share user data with third parties,
such as advertisers, marketers, and other businesses, often without
explicit consent from users.
Example: The Facebook-Cambridge Analytica scandal revealed how
personal data from millions of users was exploited for political targeting
without consent.
4. Informed Consent and Control:
Concern: Users may not be fully informed about the extent of the data
being collected or the implications of sharing their data on social
platforms.
Example: Social media privacy settings are often complex and not
always user-friendly, leading many users to unknowingly expose
personal data.
5. Cybersecurity Risks:
Concern: Online social networks are prime targets for cyberattacks and
data breaches, which can lead to the exposure of users' private
information.
Example: High-profile data breaches like those on LinkedIn or Twitter
compromise users' personal data, including email addresses, phone
numbers, and even passwords.
6. Privacy Violations by Apps:
Concern: Many third-party apps connected to social networks collect
user data without adequate protection or transparency, leading to privacy
risks.
Example: Some apps may access users' contacts, photos, or location
without their informed consent, exploiting this data for commercial
purposes.
7. Anonymity and Pseudonymity:
Concern: Users may believe they can remain anonymous online, but
often their data can still be traced back to them through sophisticated
tracking methods.
Example: Even using a pseudonym on platforms like Twitter may not
guarantee privacy, as data analytics can still uncover real identities
through interactions or cross-referencing.
8. Privacy Regulations:
Concern: The lack of uniform privacy regulations across regions leaves
users vulnerable to privacy violations.
Example: The European Union’s GDPR (General Data Protection
Regulation) provides strong privacy protections, but users in regions
without similar regulations may lack such safeguards.
9. User Control and Permissions:
Concern: Users often have limited control over how their data is used,
shared, or stored by social media platforms, which may impact their
privacy.
Example: Even with privacy settings, platforms like Facebook
sometimes change their policies or settings, leading users to
unknowingly share information they previously kept private.
10. Impact on Mental Health:
Concern: Privacy violations and the pressure of managing one’s online
persona can negatively affect users' mental health, particularly when their
data is misused or exploited.
Example: Instances of cyberbullying, online harassment, or unwanted
exposure to personal information can cause significant emotional
distress.
Conclusion:
Privacy in online social networks is a complex issue that requires constant
attention from both users and platforms. Users must be aware of how their data
is being used, and social networks should prioritize transparency, user control,
and data protection to safeguard privacy. Robust privacy policies and the
implementation of stringent security measures are essential in building trust and
ensuring that personal information is protected in an increasingly interconnected
digital world.
122 Suppose you are studying the evolution of online friendships in a social CO6 10
networking site over a year. Design a methodology to capture and analyze
temporal changes in the network structure.
To study the evolution of online friendships on a social networking site over a
year, the methodology should focus on capturing temporal changes in the
network structure, including the dynamics of friendships, interactions, and
structural shifts. Here's a designed methodology:
1. Data Collection:
Timeline: Gather data at multiple time intervals (e.g., monthly,
quarterly) to track changes over the year.
Data Points: Capture key data such as user ID, friendship relationships
(edges), timestamps of friend requests, acceptance, and interaction data
(messages, likes, comments).
Platform API: Utilize the platform's API (e.g., Twitter API, Facebook
Graph API) to extract data on user relationships, interactions, and
demographic details.
Metadata: Collect metadata such as user activity, frequency of
interactions, and changes in profiles (e.g., new interests, location
updates).
2. Data Cleaning and Preprocessing:
Handling Missing Data: Address any missing information regarding
friendship status or interactions by using interpolation or imputation
methods if applicable.
Normalization: Ensure consistency in data formats (e.g., timestamps,
user identifiers) and eliminate irrelevant data (spam accounts or non-
active users).
3. Network Representation:
Graph Construction: Represent the network as an undirected graph
where nodes represent users, and edges represent friendships or
interactions.
Dynamic Graphs: Construct temporal graphs at each time interval,
updating the edges (friendships) and adding new nodes or edges based on
changes in relationships.
4. Temporal Analysis:
Edge Evolution: Track the creation, deletion, or modification of edges
over time (e.g., friendships forming or dissolving).
Graph Metrics: Calculate key network metrics at each time interval,
such as:
o Degree Distribution: Changes in the number of connections each
user has.
o Clustering Coefficient: Measure of how users are clustered
within the network over time.
o Network Density: Changes in overall connectivity within the
network.
Community Detection: Use algorithms like Louvain or Girvan-Newman
to identify evolving communities or subgroups within the network.
5. Statistical Analysis:
Trend Analysis: Perform statistical tests (e.g., Pearson’s correlation,
ANOVA) to detect significant trends in friendship formation, changes in
network density, or interaction frequency over time.
Growth Modeling: Use growth models (e.g., exponential or logistic
growth) to analyze how the network size and density change over the
year.
6. Visualization:
Temporal Visualizations: Use tools like Gephi or NetworkX to create
animated visualizations showing how the network evolves over time,
highlighting key moments like the formation of new communities or the
dissolution of friendships.
Heatmaps: Generate heatmaps to visualize user activity and interaction
patterns at different times.
7. Interpretation:
Behavioral Insights: Analyze how users’ behaviors (e.g., post
frequency, interaction types) influence friendship dynamics.
Community Evolution: Explore how communities form and evolve,
including the emergence of new subgroups or changes in existing ones.
Social Influence: Study how external factors (e.g., events, trends)
influence friendship dynamics and network structure.
Conclusion:
By capturing temporal data at regular intervals and applying network analysis
techniques, this methodology provides insights into the evolution of online
friendships, allowing for the study of dynamic social interactions, community
growth, and changes in network structures over time.
123 Imagine you are tasked with analyzing the spread of a viral marketing campaign CO6 10
in a dynamic social network. Describe your approach, including data collection,
analysis techniques, and key metrics to track.
To analyze the spread of a viral marketing campaign in a dynamic social
network, the approach must capture the temporal flow of information, identify
influential users, and measure campaign impact. Here's a structured
methodology:
1. Data Collection
Platform APIs: Use social media APIs (e.g., Twitter, Instagram) to
collect data on shares, likes, retweets, comments, mentions, hashtags
related to the campaign.
Timestamps: Record when users interacted with campaign content to
track diffusion over time.
User Metadata: Collect user profile data (followers, interests, location)
to understand audience reach.
Network Structure: Capture friend/follow relationships to construct the
social graph.
2. Network Construction
Dynamic Graphs: Represent the network as a time-evolving graph
where:
o Nodes = users.
o Edges = interactions (e.g., retweets, mentions).
o Temporal Layers = snapshots of the network at different time
intervals (e.g., daily, hourly).
3. Analysis Techniques
Diffusion Modeling: Apply models like SIR (Susceptible-Infected-
Recovered) or IC (Independent Cascade) to simulate and analyze how
the message spreads.
Community Detection: Use algorithms like Louvain to detect
communities and analyze campaign spread within and across them.
Influencer Identification: Use centrality measures (degree,
betweenness, eigenvector) to identify key users driving the spread.
5. Visualization
Temporal Network Animation: Show spread of the campaign over
time.
Heatmaps: Visualize engagement levels across regions or demographics.
Cascade Trees: Illustrate how the message propagated through different
user paths.
6. Interpretation & Reporting
Identify trends in user behavior (e.g., peak sharing times, top
influencers).
Evaluate campaign effectiveness by comparing predicted vs actual
spread.
Recommend improvements for targeting, timing, and content based on
network response.
Conclusion
This approach combines dynamic network modeling, diffusion analysis, and
strategic metric tracking to offer a comprehensive understanding of how a viral
marketing campaign propagates and what drives its success.
124 Propose a methodology for analyzing the evolution of online communities in a CO6 10
dynamic social network. Outline the steps involved, including data collection,
preprocessing, analysis techniques, and interpretation of results.
Methodology for Analyzing the Evolution of Online Communities in a
Dynamic Social Network
(Short Answer – 10 Marks)
To analyze the evolution of online communities, the methodology must capture
changes in community structure over time. Below is a step-by-step approach:
1. Data Collection
Source: Use APIs from platforms like Reddit, Twitter, or Facebook to
collect user interaction data (e.g., posts, comments, retweets, mentions).
Data Types: Capture user IDs, interaction timestamps, content metadata,
and relationship data (followers/friends).
Time Windowing: Organize data into discrete time intervals (e.g.,
weekly or monthly snapshots) to observe temporal changes.
2. Data Preprocessing
Cleaning: Remove bots, spam accounts, and irrelevant interactions.
Normalization: Standardize user IDs, timestamps, and interaction types.
Edge Creation: Convert interactions into edges (e.g., comment/reply →
directed edge).
Snapshot Construction: Create dynamic graphs for each time interval.
3. Network Construction
Nodes: Represent users.
Edges: Represent interactions (weighted if needed).
Dynamic Graph: Combine all time-based snapshots to form a time-
evolving network.
4. Community Detection
Algorithms: Apply modularity-based methods (e.g., Louvain, Label
Propagation) on each time-snapshot.
Tracking Evolution: Use techniques like community matching or
evolution graphs to track merges, splits, births, and deaths of
communities.
5. Analysis Techniques
Community Metrics: Size, density, cohesion, and modularity over time.
User Roles: Identify core users, bridges, and influencers within
communities.
Churn Analysis: Track users entering and leaving communities.
Stability: Evaluate persistence of communities across time intervals.
6. Visualization
Use tools like Gephi, Cytoscape, or NetworkX to:
o Create dynamic community maps.
o Animate changes in structure.
o Highlight community interactions and overlaps.
7. Interpretation of Results
Community Growth Patterns: Identify when and why communities
expand or shrink.
Trigger Events: Correlate structural changes with real-world or online
events (e.g., trending topics).
User Influence: Understand how certain users affect community
formation or disruption.
Health of Communities: Measure engagement, longevity, and
fragmentation.
Conclusion
This methodology enables a detailed understanding of how online communities
form, evolve, and dissolve over time, providing valuable insights into user
behavior, group dynamics, and the impact of events on social cohesion.
125 Discuss different algorithms used to solve the Graph Coloring Problem and their CO6 10
real-world applications.
Graph Coloring Algorithms and Real-World Applications
(Short Answer – 10 Marks)
The Graph Coloring Problem involves assigning colors to the vertices of a
graph such that no two adjacent vertices share the same color. This is a
fundamental problem in computer science with numerous practical applications.
Below are key algorithms and their real-world uses:
2. Backtracking Algorithm
Approach: Tries all possible color combinations recursively and
backtracks upon conflict.
Pros: Finds optimal solutions.
Limitation: Time-consuming for large graphs.
Application: Timetable and exam scheduling in universities.
4. Welsh-Powell Algorithm
Approach: Sorts vertices by decreasing degree and colors them
sequentially.
Pros: Efficient and often requires fewer colors than greedy.
Application: Task scheduling in parallel processing.
5. Genetic Algorithms
Approach: Uses evolutionary techniques like mutation and crossover to
evolve colorings.
Pros: Handles large and complex graphs well.
Application: Optimization problems in transportation and logistics.
6. Tabu Search
Approach: Iterative local search using memory structures to avoid
cycles.
Pros: Efficient for large-scale graphs.
Application: Course scheduling and project resource allocation.
7. Simulated Annealing
Approach: Probabilistic technique that explores solutions and accepts
worse ones to escape local optima.
Pros: Good balance between solution quality and performance.
Application: VLSI design and map coloring.
Conclusion
Different graph coloring algorithms offer trade-offs between accuracy and
efficiency. They are crucial in real-world applications such as scheduling,
register allocation, frequency assignment, and resource optimization,
making them indispensable tools in solving complex combinatorial problems.
126 Discuss the computational challenges in processing large-scale dynamic social CO6 10
network data.
Computational Challenges in Processing Large-Scale Dynamic Social
Network Data
(Short Answer – 10 Marks)
Analyzing large-scale dynamic social networks involves complex and resource-
intensive tasks. Key computational challenges include:
1. Scalability
Issue: Social networks consist of millions of nodes and edges.
Challenge: Algorithms must handle high memory and processing
demands efficiently.
Example: Running community detection or shortest path algorithms on
Twitter-sized datasets.
2. Temporal Complexity
Issue: Dynamic networks change over time (nodes/edges appear or
disappear).
Challenge: Need for time-aware models and maintaining historical
states.
Example: Tracking influence propagation or community evolution over
months or years.
3. Real-Time Processing
Issue: Applications like fraud detection and recommendation systems
require immediate insights.
Challenge: Continuous ingestion and real-time computation are difficult
at scale.
Example: Detecting misinformation spread in real time on social media.
4. Data Heterogeneity
Issue: Social data includes diverse formats—text, images, interactions,
location.
Challenge: Integrating and analyzing multi-modal data increases
complexity.
Example: Combining tweet content with retweet patterns and user
location data.
6. Algorithm Adaptability
Issue: Many traditional graph algorithms are not designed for dynamic
data.
Challenge: Need for incremental or streaming versions of algorithms.
Example: Incremental PageRank or modularity updates instead of
recomputing from scratch.
Conclusion
Processing large-scale dynamic social network data is computationally intensive
due to its volume, velocity, and complexity. Scalable, adaptive, and privacy-
aware algorithms and systems are essential for effective real-world analysis.