Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views98 pages

SNA Module 1-6

Uploaded by

Shivam Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views98 pages

SNA Module 1-6

Uploaded by

Shivam Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

1 What is SNA, or social network analysis?

In what ways does it facilitate the CO1 2


comprehension of relationships within a network? Give an example from the real
world.
Social Network Analysis (SNA) is a method used to study relationships and
interactions within a network of individuals, groups, or organizations. It focuses
on mapping and measuring the connections to understand how information,
influence, or resources flow.
It helps identify key actors, detect communities, and analyze the structure and
strength of relationships.
Example: In marketing, SNA is used on social media platforms to identify
influencers who can effectively spread brand messages.
2 Describe the Semantic Web concept and explain how important it is to CO1 2
contemporary information retrieval. Who is recognized as the originator of this
idea?
The Semantic Web is an extension of the current web that enables data to be
shared and reused across applications by giving it well-defined meaning, making
it understandable by machines.
It enhances information retrieval by improving search accuracy and relevance
through better understanding of content context and relationships.
Originator: The concept was proposed by Tim Berners-Lee, the inventor of
the World Wide Web.
3 Describe the structure and purpose of the Resource Description Framework CO1 2
(RDF). How does RDF facilitate data interoperability on the web? Illustrate with
a diagrammatic example.
The Resource Description Framework (RDF) is a standard model for data
interchange on the web. Its structure is based on triples: subject–predicate–
object, which represent relationships between resources.
Purpose: RDF enables data from different sources to be linked and understood
by machines, promoting data interoperability across the web.
Facilitation: RDF allows systems to integrate and process data without needing
a common database schema.
Example Diagram:

This triple means "Book1 has Author Author1".


4 What are Blogs? How have they evolved as a medium for knowledge sharing? CO1 2
Discuss their impact on digital learning.
Blogs are online journals or informational websites where individuals or groups
regularly publish content, often in a conversational or personal style.
They have evolved from personal diaries to powerful platforms for knowledge
sharing, enabling experts, educators, and communities to share insights,
tutorials, and resources widely.
Impact on Digital Learning: Blogs support self-paced learning, encourage
critical thinking through comments and discussions, and provide diverse
perspectives, making learning more accessible and engaging.
5 What are communities in social networks, and why are they essential for CO1 2
information flow and group interactions?
Communities in social networks are groups of users connected by common
interests, goals, or interactions.
They are essential for information flow as they facilitate faster and targeted
sharing of content, and they support group interactions by fostering
collaboration, trust, and collective decision-making within the network.
6 Explain the concepts of nodes and edges in the context of network theory. How CO1 2
do they represent relationships in different types of networks? Provide examples.
In network theory, a node (or vertex) represents an individual entity (such as a
person, device, or webpage), while an edge (or link) represents a relationship or
connection between two nodes.
They model relationships in various networks:
 In a social network, nodes are people, and edges represent friendships or
interactions.
 In a computer network, nodes are devices, and edges are
communication links.
 In the web, nodes are webpages, and edges are hyperlinks.
Example: In Facebook, each user is a node, and a friendship between users is an
edge.
7 Examine the differences between the Leiden and Louvain community discovery CO1 2
strategies. What are the differences between their methods for managing
network architectures and optimizing modularity?
The Louvain and Leiden algorithms are both community detection methods that
optimize modularity in networks, but they differ in efficiency and accuracy.
 Louvain works by merging nodes into communities to maximize
modularity but may produce disconnected or poorly structured
communities.
 Leiden improves on Louvain by refining partitions to ensure well-
connected, non-overlapping communities, leading to more reliable
results.
Key Differences:
 Modularity Optimization: Louvain may get stuck in local optima,
while Leiden refines communities to avoid this.
 Network Architecture: Leiden ensures each community is internally
connected, improving stability and interpretability.
In summary, Leiden offers more accurate and scalable community detection
than Louvain.
8 Provide a brief overview of the Semantic Web and discuss the core technologies CO1 5
that enable its functionality.
The Semantic Web is an extension of the current web that aims to make data
machine-readable and enable automated processes by adding meaning to
information. It allows data from diverse sources to be linked and shared across
applications, improving data interoperability and accessibility.
Core Technologies:
1. Resource Description Framework (RDF): A framework for
representing data as triples (subject-predicate-object), enabling data
interchange and integration across diverse systems.
2. Web Ontology Language (OWL): A language for defining complex
relationships and classifications of data, allowing machines to reason
about the meaning of data.
3. SPARQL: A query language for retrieving and manipulating data stored
in RDF format, enabling sophisticated searches across the Semantic
Web.
4. RDF Schema (RDFS): A vocabulary for defining relationships and
structures within RDF data, supporting reasoning and inferencing.
5. Uniform Resource Identifier (URI): Identifiers for resources on the
web, ensuring that data can be uniquely referenced and linked.
These technologies work together to create a web of data that is more intelligent,
interconnected, and capable of reasoning.
9 List and describe the five main benefits and five drawbacks of social network CO1 5
analysis (SNA) in real-world settings.
Benefits of Social Network Analysis (SNA):
1. Identifying Key Actors: SNA helps identify influential individuals or
"hubs" in a network, enabling targeted marketing, strategic partnerships,
or leadership development.
2. Improved Decision-Making: By understanding network relationships,
organizations can make more informed decisions about resource
allocation and collaboration.
3. Enhanced Communication Flow: SNA highlights communication
bottlenecks and gaps, allowing for improved information flow and
collaboration within teams.
4. Community Detection: SNA helps uncover hidden groups or
communities within networks, providing insights into team dynamics,
customer behavior, or market segmentation.
5. Predictive Power: By analyzing existing relationships, SNA can predict
future interactions or outcomes, aiding in proactive planning and risk
management.
Drawbacks of Social Network Analysis (SNA):
1. Data Privacy Concerns: SNA requires detailed personal or
organizational data, raising concerns about privacy and ethical
considerations.
2. Complexity of Analysis: Large networks with numerous variables can
make SNA difficult to interpret, requiring advanced skills and tools.
3. Resource Intensive: Collecting and maintaining network data can be
time-consuming and costly, particularly for large-scale networks.
4. Incomplete Data: If the network data is incomplete or inaccurate, it can
lead to misleading conclusions or flawed strategies.
5. Overemphasis on Connections: SNA focuses on relationships, which
may overlook individual performance, context, or other factors that
influence outcomes.
10 Discuss the significance of centrality in network analysis. Explain its different CO1 5
parameters with suitable examples.
Centrality in network analysis measures the importance or influence of a node
within a network. It identifies key actors whose positions affect the flow of
information, resources, or influence. Centrality helps understand the structure
and dynamics of networks.
Different Parameters of Centrality:
1. Degree Centrality: Measures the number of direct connections a node
has.
o Example: In a social network, a user with many friends has high
degree centrality.
2. Betweenness Centrality: Measures the extent to which a node lies on
the shortest path between other nodes, indicating its role in connecting
different parts of the network.
o Example: A person in a company who connects different
departments has high betweenness centrality.
3. Closeness Centrality: Measures how close a node is to all other nodes in
the network, based on the average shortest path length.
o Example: A central figure in a supply chain who can quickly
access all suppliers would have high closeness centrality.
4. Eigenvector Centrality: Measures the influence of a node based on the
connections of its neighbors, not just the number of connections.
o Example: In a citation network, a paper cited by influential
papers has high eigenvector centrality.
5. Katz Centrality: Similar to eigenvector centrality but accounts for both
direct and indirect connections with diminishing returns for longer paths.
o Example: A social media influencer who is connected to both
popular and less-known individuals has high Katz centrality.
Each centrality type provides different insights into the role and influence of
nodes in a network.
11 Describe the edges in a network and explain the meanings of edge weight and CO1 5
edge direction. Give instances to demonstrate how they affect network analysis.
Edges in a network represent the connections or relationships between nodes
(entities) and are fundamental in understanding the structure of the network.
Edge Types:
1. Edge Weight: Refers to the value or strength assigned to an edge, which
can represent various attributes such as cost, distance, or intensity of the
relationship.
o Example: In a transportation network, the edge weight could
represent the travel time or distance between two cities. A lower
weight would indicate a quicker or shorter path.
2. Edge Direction: Describes whether the relationship between two nodes
is one-way (directed edge) or two-way (undirected edge).
o Example: In a social network, a directed edge could represent
one user following another on Twitter, while an undirected edge
could represent a mutual friendship on Facebook.
Impact on Network Analysis:
 Edge Weight: Affects the shortest path algorithms. For example, in a
road network, finding the shortest path would depend on minimizing
edge weights (e.g., travel time), not just the number of connections.
 Edge Direction: Influences the analysis of directed flows. In a supply
chain, the direction of edges can indicate the flow of goods from supplier
to retailer, and reverse directionality could show returns or feedback
loops.
Thus, both edge weight and direction provide crucial information for
understanding how nodes interact, how resources flow, and how optimal paths or
connections are determined in various types of networks.
12 Examine the insights that can be obtained by using social network analysis on CO1 5
online social media sites such as Facebook and Twitter.
Social Network Analysis (SNA) on online social media sites like Facebook and
Twitter provides valuable insights into user behavior, interactions, and content
dissemination. Here’s a breakdown of the key insights:
1. Identifying Influencers: SNA helps identify key influencers or central
users within the network based on degree centrality, which can inform
marketing strategies, brand promotion, or political campaigns.
o Example: A user with many followers on Twitter who frequently
shares content may have significant influence over public
opinion.
2. Community Detection: SNA can reveal groups or communities of users
with shared interests or behaviors, often through community detection
algorithms.
o Example: Facebook groups with common topics (e.g., fitness,
gaming) can be detected, helping businesses target specific
audiences.
3. Information Flow Analysis: By analyzing the flow of information
through the network, SNA helps understand how content spreads
(virality) and identify potential bottlenecks or information hubs.
o Example: A viral tweet on Twitter may show how information
spreads quickly, identifying key users who amplify the message.
4. Sentiment Analysis: SNA can be combined with sentiment analysis to
assess public opinion, monitor brand reputation, or track sentiment trends
in real-time.
o Example: Monitoring the sentiment of posts related to a product
launch on Twitter can provide insights into customer reactions.
5. Social Influence and Behavior Patterns: SNA uncovers how
individuals influence each other’s behaviors and decisions, such as
recommendations, follower behavior, or engagement patterns.
o Example: Analyzing which types of posts receive the most likes
or shares on Facebook can provide insights into content
preferences.
In summary, SNA on social media sites enables organizations to optimize
marketing strategies, improve user engagement, monitor trends, and enhance
decision-making based on network structure and interaction patterns.
13 How can SNA assist in locating important personalities and influencers within a CO1 5
company network?
Social Network Analysis (SNA) can assist in locating important personalities
and influencers within a company network by analyzing the structure of
relationships and interactions among employees. Here’s how:
1. Degree Centrality: Identifies employees with the most direct
connections (e.g., colleagues, departments) in the company. High degree
centrality indicates influence and the ability to connect with others across
the network.
o Example: A manager with many direct reports or colleagues
frequently consulted for advice.
2. Betweenness Centrality: Highlights individuals who act as bridges
between different groups or departments. These employees control the
flow of information and often play critical roles in decision-making.
o Example: A team leader who connects marketing and sales teams
is central to interdepartmental communication.
3. Closeness Centrality: Identifies employees who are closest to all other
nodes (employees) in the network, meaning they can efficiently spread
information or influence others within the organization.
o Example: A key senior executive who interacts with all
departments regularly.
4. Influence on Knowledge Sharing: Employees who are central in
knowledge-sharing networks can be identified, as they often disseminate
critical information or expertise across teams.
o Example: An employee known for mentoring others or
frequently sharing insights at company-wide meetings.
5. Social Capital: SNA reveals individuals who hold social capital within
the organization, recognized for their expertise, networks, or ability to
mobilize resources and people.
o Example: A senior employee who is often sought out for advice,
collaboration, or informal leadership.
In essence, SNA uncovers individuals who play key roles in communication,
decision-making, and information flow, which are crucial for identifying
influencers within a company network.
14 A research citation network consists of five papers {P1, P2, P3, P4, P5} with CO1 5
known citations forming a directed graph. Calculate the betweenness centrality
of P3 if it acts as a bridge between multiple paper citations. How does this
influence knowledge flow?
To calculate the betweenness centrality of P3 in the research citation network,
we follow these general steps:
1. Identify All Shortest Paths: For each pair of papers in the network,
calculate the shortest path(s) between them (i.e., the minimum number of
edges that connect the two papers).
2. Count the Paths that Pass Through P3: For each shortest path between
any two papers that passes through P3, count how many times P3 appears
as part of the shortest path.
3. Calculate Betweenness Centrality: Betweenness centrality for P3 is
calculated using the formula:

Influence on Knowledge Flow:


Betweenness centrality measures the extent to which P3 lies on the shortest
paths between other papers, and if P3 has high betweenness centrality, it means
that P3 is acting as a bridge for knowledge flow between other papers. Papers
that are not directly connected but rely on P3 as an intermediary will depend on
it for knowledge transfer.
For example, if P1 cites P3, and P3 cites P5, but P1 does not directly cite P5,
the knowledge flow from P1 to P5 depends on P3. High betweenness centrality
indicates that P3 has significant control over the flow of information, potentially
making it an influential paper in the network.
15 Examine using examples how the Semantic Web improves knowledge CO1 10
representation and data interoperability.
The Semantic Web enhances knowledge representation and data
interoperability by structuring web data in a machine-readable format using
shared standards and vocabularies. It enables computers to understand, integrate,
and reason over distributed information, facilitating intelligent applications.
1. Improved Knowledge Representation:
Semantic Web technologies represent complex relationships between data using
ontologies and formal logic.
 Example: In healthcare, the Semantic Web can represent patient data
like:
o Patient123 hasDiagnosis Diabetes
o Diabetes isA ChronicDisease
Using OWL (Web Ontology Language), systems can infer that Patient123
hasChronicDisease, even if that wasn’t explicitly stated. This allows machines to
"reason" over data, supporting smarter decision-making.
2. Enhanced Data Interoperability:
Semantic Web uses common formats like RDF (Resource Description
Framework) and URIs (Uniform Resource Identifiers), enabling different
systems to understand and link data even if they use different schemas.
 Example: In e-commerce, product information from two websites may
differ:
o Site A: Product → Name
o Site B: Item → Title
Through Semantic Web ontologies, both terms can be mapped to a common
concept like schema:name, allowing applications to merge and compare data
accurately.
3. Cross-Domain Data Integration:
Semantic Web allows combining data from diverse domains into unified views.
 Example: A travel app using data from:
o Weather databases (e.g., location → temperature)
o Transport services (e.g., station → time)
o Tourism websites (e.g., attraction → location)
With RDF and linked data, the app can relate concepts across domains (location,
time, services) and present integrated information to the user.
4. Intelligent Search and Retrieval:
Semantic Web improves search accuracy by understanding the context and
meaning of queries.
 Example: A search for “Apple” on a semantic web-enabled search
engine can distinguish between:
o Apple Inc. (technology company)
o Apple (fruit)
Based on user context or linked data, the system retrieves the most relevant
results.
5. Real-World Semantic Web Applications:
 DBpedia: Extracts structured content from Wikipedia and makes it
available as linked data.
 FOAF (Friend of a Friend): Enables representation of personal
information, relationships, and social networks in a machine-readable
way.
Conclusion:
The Semantic Web enables more precise, interoperable, and intelligent data
management. By using shared vocabularies and logic-based structures, it
improves how knowledge is represented and connected across the web,
empowering smarter applications and seamless integration across systems.
16 Discuss 10 applications of SNA with appropriate examples. CO1 10
Here are 10 applications of Social Network Analysis (SNA) with relevant
examples, highlighting its wide use across domains:

1. Marketing and Influencer Identification


 Application: Identifies key influencers who can promote products
effectively.
 Example: Brands use SNA on Instagram or Twitter to find users with
high centrality who influence large audiences.

2. Organizational Analysis
 Application: Analyzes employee interactions to improve communication
and collaboration.
 Example: A company uses SNA to detect isolated departments and
restructure teams to enhance knowledge sharing.

3. Epidemiology and Disease Tracking


 Application: Maps the spread of diseases through social contacts.
 Example: During COVID-19, SNA was used to trace contacts of
infected individuals and control transmission chains.

4. Criminal Network Detection


 Application: Uncovers hidden connections in criminal or terrorist
networks.
 Example: Law enforcement uses SNA to identify key operatives in drug
trafficking rings by analyzing call and message data.

5. Education and Student Engagement


 Application: Measures student interaction in collaborative learning
environments.
 Example: In online courses, SNA tracks discussion forum activity to
identify active learners and peer leaders.

6. Recommender Systems
 Application: Improves suggestions by analyzing user connections and
preferences.
 Example: LinkedIn recommends connections and job opportunities
based on your professional network using SNA.

7. Knowledge Management
 Application: Identifies experts and knowledge hubs within
organizations.
 Example: SNA is used in research institutions to find influential
scientists based on co-authorship and citation networks.
8. Political Analysis
 Application: Maps relationships between politicians, voters, and issues.
 Example: During elections, SNA analyzes Twitter conversations to
understand public opinion and influence networks.

9. Supply Chain Optimization


 Application: Identifies critical suppliers and risks in supply networks.
 Example: A manufacturer uses SNA to detect which suppliers are most
central and vulnerable in their global supply chain.

10. Online Community Analysis


 Application: Analyzes structure and behavior in online groups or
forums.
 Example: Reddit communities (subreddits) are analyzed to understand
topic influence, user engagement, and moderation patterns.

Conclusion:
SNA is a powerful tool for understanding relationships, identifying key actors,
optimizing processes, and enhancing decision-making across fields such as
business, healthcare, education, law enforcement, and social media.
17 Explain web-based networks' features and point out how they vary from CO1 10
conventional social networks. Discuss about how web-based networks affect
connectivity and information flow.
Features of Web-Based Networks:
1. Global Accessibility: Users can connect and interact from any location
with internet access.
2. Asynchronous Communication: Interactions don’t require real-time
presence (e.g., emails, forum posts).
3. Hyperlinking: Connections are created through hyperlinks, enabling
non-linear navigation of content.
4. Scalability: Web networks can support millions of nodes and
interactions simultaneously.
5. Multimedia Integration: Support for text, images, videos, and
interactive content enhances communication richness.
6. Search and Discovery Tools: Efficient search engines and algorithms
help users navigate massive information structures.
7. User-Generated Content: Individuals can contribute content (blogs,
comments, posts), making the network dynamic and evolving.
8. Data Traceability: Interactions and information flow can be logged,
monitored, and analyzed.
9. Open Standards and APIs: Facilitate interoperability between
platforms and services (e.g., using social logins).
10. Algorithmic Mediation: Content visibility and connections are often
shaped by algorithms (e.g., news feed personalization).

Differences from Conventional Social Networks:


Conventional Social
Aspect Web-Based Networks
Networks
Broad, potentially anonymous Localized, often personal and
Connectivity
and global bounded
Loosely connected through Tightly knit, based on direct
Structure
hyperlinks or tags relationships
Interaction Asynchronous and content- Mostly synchronous and
Type centric relationship-centric
Information Multi-directional and Linear or reciprocal (e.g.,
Flow algorithm-driven conversations)
Highly scalable, not limited by Often limited to physical/social
Scalability
geography proximity
Control of Shared or informal control
User or platform controlled
Content among members

Impact on Connectivity and Information Flow:


1. Enhanced Reach: Users can access diverse viewpoints and connect
beyond physical or social boundaries.
o Example: A developer in India collaborating with open-source
communities worldwide.
2. Rapid Information Dissemination: Viral content spreads quickly due to
the network’s scale and interconnectivity.
o Example: News, memes, or trends reaching global audiences in
minutes via social media.
3. Filter Bubbles: Algorithms may limit exposure to diverse content by
reinforcing existing preferences.
o Example: A user repeatedly shown content that aligns with their
past likes, reducing diversity in information.
4. Emergent Communities: Interest-based or issue-based groups form
organically (e.g., Reddit subreddits, Twitter hashtags).
5. Decentralized Influence: Unlike conventional networks with known
influencers (e.g., leaders), web-based networks can have transient,
content-driven influencers.

Conclusion:
Web-based networks expand the scope of human interaction and knowledge
sharing by enabling large-scale, content-driven, and asynchronous connections.
They differ from traditional social networks in structure and dynamics, leading
to new forms of connectivity and more complex, faster patterns of information
flow.
18 Apply the concepts of centrality and solve the following: CO1 10
To analyze the given directed graph using centrality concepts (Degree,
Betweenness, and Closeness), we first list all the connections between the nodes.

Graph Summary from the Image:


Edges (directed):
 A→B
 A→C
 A→E
 C→A
 C→D
 D→C
 E→A
 E→C

1. Degree Centrality (In-degree and Out-degree):


Node In-degree Out-degree
A 2 (from C, E) 3 (to B, C, E)
B 1 (from A) 0
C 3 (from A, E, D) 2 (to A, D)
D 1 (from C) 1 (to C)
E 1 (from A) 2 (to A, C)

2. Betweenness Centrality (Informally estimated):


Betweenness centrality measures how often a node lies on the shortest path
between other nodes.
 C is on paths like A → D and E → D (A → C → D, E → C → D), so it
has high betweenness.
 A is also central due to multiple outgoing connections, but not on many
shortest paths between other nodes.
 B and D are peripheral—they don’t lie on paths between other nodes.
 E connects to central nodes but is not frequently between others.
Likely Ranking (High to Low):
C>A>E>D≈B

3. Closeness Centrality (Access to Others):


Closeness measures how quickly a node can reach all other nodes.
 A has outgoing edges to B, C, and E — and C connects to D — so A can
reach everyone in 2 steps max.
 C also connects directly to A and D.
 E can reach A and C, then from there to others.
 B and D have limited reach.
Likely Ranking (High to Low):
A≈C>E>D>B

Conclusion:
 Node A: High degree and closeness centrality.
 Node C: High betweenness and good closeness.
 Node B: Least central—only one incoming link.
 Node D: Moderate centrality, mostly connected to/from C.
 Node E: Moderate, supports information flow through A and C.
Would you like a full table with normalized centrality scores too?
19 Give two specific instances of real-world applications of social network analysis. CO1 10
Discuss about the lessons learned from these applications and how they affect
problem-solving or decision-making.
Here are two specific real-world applications of Social Network Analysis
(SNA), along with the lessons learned and their impact on decision-making:

1. Public Health – Epidemic Tracking (e.g., Ebola or COVID-19)


Application:
SNA was used to trace contact networks of infected individuals to understand
how the disease spreads within communities.
Example:
During the Ebola outbreak in West Africa, SNA helped identify "super-
spreaders" and central nodes in the infection network.
Lessons Learned:
 Individuals with high degree centrality (many contacts) significantly
influence transmission.
 Timely isolation of central nodes curbs the outbreak more effectively
than random testing.
Impact on Decision-Making:
 Authorities prioritized targeted interventions (e.g., isolating central
individuals, contact tracing).
 Improved allocation of limited resources like vaccines and quarantine
efforts.

2. Corporate Management – Organizational Communication


Application:
Companies use SNA to analyze internal communication patterns and identify
key knowledge holders or bottlenecks.
Example:
At IBM, SNA revealed that certain mid-level employees had high betweenness
centrality, acting as bridges between departments.
Lessons Learned:
 Not just executives, but often informal leaders facilitate knowledge flow.
 Hidden influencers can drive innovation and collaboration.
Impact on Decision-Making:
 Companies restructured teams to improve collaboration.
 Leadership programs were introduced for informal influencers, not just
top performers.

Conclusion:
SNA provides actionable insights in various fields—from public safety to
corporate efficiency. It reveals hidden patterns and supports strategic
decisions by identifying key players and optimizing information flow. The main
takeaway is that understanding relationships, not just individuals, is essential for
effective problem-solving.
20 What are the main benefits of employing Social Network Analysis (SNA) in CO1 10
both practical and research settings? Discuss in-depth on two important
restrictions or difficulties related to its application.
Main Benefits of Employing Social Network Analysis (SNA):
1. Reveals Hidden Patterns and Structures:
SNA uncovers invisible relationships and communication flows within a
group, organization, or system, which are not evident through traditional
analysis.
2. Identifies Key Actors and Influencers:
It helps locate central individuals (high degree or betweenness centrality)
who are vital for information dissemination, decision-making, or
controlling influence.
3. Enhances Decision-Making and Strategy:
Businesses and governments use SNA to guide actions—such as
targeting key opinion leaders in marketing, or restructuring teams for
better collaboration.
4. Improves Resource Allocation:
SNA allows for targeted interventions (e.g., in healthcare or security) by
focusing on the most influential or connected nodes, saving time and
costs.
5. Supports Predictive Analysis:
By understanding how information or behavior spreads, organizations
can anticipate outcomes like disease outbreaks, product adoption, or even
organizational failure.

Two Important Restrictions or Difficulties:


1. Data Privacy and Ethical Concerns:
 Issue: SNA often relies on personal or sensitive data (e.g., emails, social
media, call logs).
 Challenge: Collecting and analyzing such data may violate privacy
rights, leading to ethical and legal consequences.
 Example: Analyzing employee emails for organizational SNA without
consent can cause trust issues and legal risk.
2. Complexity and Scalability of Large Networks:
 Issue: Real-world networks (e.g., Twitter, biological systems) can be
extremely large and dynamic.
 Challenge: Processing, visualizing, and interpreting such networks
requires significant computational power, advanced algorithms, and
expertise.
 Example: In social media analysis, changes in relationships over time
make static SNA models less effective.
Conclusion:
SNA offers powerful insights for both practical operations and academic
research, enabling smarter strategies and deeper understanding of networks.
However, its effectiveness depends on ethical data use and the technical
capacity to handle complex, evolving datasets. Balancing these factors is
essential for its successful implementation.
21 Describe how social network analysis has changed over time, highlighting CO1 10
significant turning points and significant personalities. What is the field's current
importance in the study of social systems, and how has it evolved over time?
Evolution of Social Network Analysis (SNA):
1. Early Foundations (1930s–1950s):
 Key Figures: Jacob Moreno (sociometry), Kurt Lewin (field theory).
 Turning Point: Moreno introduced sociograms to visualize human
relationships, marking the beginning of formal network mapping.
 Focus: Small group interactions, classroom behavior, and community
structures.
2. Formalization and Mathematical Models (1960s–1970s):
 Advancements: Integration of graph theory and matrix algebra.
 Key Figures: Harrison White, Stanley Wasserman.
 Turning Point: Academic shift from qualitative observation to
quantitative analysis using adjacency matrices and centrality measures.
3. Computational Expansion (1980s–1990s):
 Technology Impact: Emergence of computer software (e.g., UCINET,
Pajek) allowed the handling of larger datasets.
 Application: Expanded into organizational studies, anthropology, and
epidemiology.
4. Digital and Big Data Era (2000s–Present):
 Turning Point: Rise of social media platforms like Facebook, Twitter,
and LinkedIn generated massive, real-time network data.
 Key Contributions: Algorithms for community detection, influencer
ranking, and information diffusion models.
 Current Use: Widely applied in marketing, public health, cybersecurity,
political science, and AI.

Current Importance in Studying Social Systems:


 Behavioral Insight: Reveals how people interact, form groups, and
influence one another.
 Crisis Response: Used in pandemic tracking, disaster response, and
misinformation control.
 Policy Making: Informs decisions in urban planning, education, and
resource distribution.
 Interdisciplinary Reach: Extends beyond sociology into data science,
biology, economics, and computer science.

Conclusion:
SNA has evolved from hand-drawn sociograms to AI-powered network
algorithms analyzing millions of connections in real time. With key figures like
Moreno and White shaping its foundation, the field now plays a critical role in
understanding and navigating complex social systems in a hyper-connected
world.
22 Why are communities essential in today’s interconnected world? Discuss their CO2 2
role in fostering collaboration, knowledge exchange, and social cohesion.
Communities are essential in today’s interconnected world because they foster
collaboration, enable knowledge exchange, and build social cohesion among
individuals with shared interests or goals. They provide a platform for collective
problem-solving, mutual support, and the spread of innovation, strengthening
both online and offline social ties in a rapidly globalizing society.
23 Differentiate between Data Mining and Social Media Mining, highlighting their CO2 2
objectives, methodologies, and applications.
Data Mining is the process of discovering patterns, trends, and relationships in
large datasets using statistical, machine learning, and database techniques.
Social Media Mining is a specialized branch of data mining focused on
extracting insights specifically from social media platforms (e.g., Twitter,
Facebook), incorporating social context and network structure.
Aspect Data Mining Social Media Mining
Extract general patterns from any Understand user behavior, influence, and
Objective
data source trends
Statistical analysis, clustering, NLP, sentiment analysis, social network
Methodology
classification analysis
Fraud detection, market analysis, Brand monitoring, political trend tracking,
Applications
healthcare user profiling
Let me know if you want examples of tools or platforms used in each.
24 Explain Graph Modularity. CO2 2
Graph Modularity is a measure used in network analysis to quantify the
strength of division of a network into communities or modules. It compares the
density of edges inside communities with the density of edges between
communities.
 High modularity indicates that nodes within the same community are
highly connected, while connections between communities are sparse.
 It's often used to evaluate the quality of community detection
algorithms.
Example: In a social network, if friends mostly interact within their friend group
and rarely with outsiders, the network has high modularity.
25 Name any 5 communities popular in its field today. CO2 2
Here are five popular communities across various fields today:
1. GitHub (Software Development) – A leading platform for open-source
software development, where developers collaborate on projects.
2. Stack Overflow (Programming/Technology) – A community of
developers asking questions and providing answers on programming-
related topics.
3. Reddit (General/Multiple Fields) – A vast online community with
various subreddits dedicated to almost every field of interest.
4. Kaggle (Data Science/AI) – A platform for data science competitions,
where professionals and enthusiasts collaborate on machine learning and
data analysis projects.
5. Dev.to (Software Development) – A community-driven platform for
software developers to share articles, tutorials, and experiences.
26 Define Giant Component with an example. CO2 2
A Giant Component in graph theory refers to a connected subgraph that
contains a significant portion of the vertices in a large graph. In a random graph,
as the graph grows in size, the giant component is the largest connected
component that spans a large fraction of the entire graph's vertices, especially in
the case of sparse graphs.
Example:
Consider a random graph where each edge is added with a certain probability.
As the number of vertices increases, a "giant component" will form when the
probability of edges is high enough, causing a large connected subgraph that
includes most of the vertices in the graph.
For instance, in the Erdős–Rényi random graph model (G(n, p)), when the
edge probability pp exceeds a certain threshold, the graph will contain a giant
component that grows as nn, the number of vertices, increases. If pp is large
enough, a single component can include almost all the nodes in the graph,
leaving a few isolated ones.
This phenomenon occurs in many real-world networks, such as social networks
or the internet.
27 State the various V's of Big Data. CO2 2
The V's of Big Data refer to the key characteristics that define big data. The
most commonly referenced V's are:
1. Volume – Refers to the vast amount of data generated every second. This
includes data from various sources such as social media, sensors,
transactions, etc.
2. Velocity – Refers to the speed at which data is generated, processed, and
analyzed. Big data requires the ability to handle real-time or near-real-
time processing.
3. Variety – Refers to the different types of data, such as structured, semi-
structured, and unstructured data, coming from various sources like text,
images, videos, etc.
4. Veracity – Refers to the uncertainty or reliability of the data. With big
data, some data might be incomplete, inconsistent, or noisy, and it is
important to ensure data quality.
5. Value – Refers to the usefulness or business value that can be derived
from big data. It's about extracting meaningful insights that can inform
decisions.
These V's help in understanding the challenges and considerations involved in
handling big data.
28 Define Web mining with an application. CO2 2
Web Mining refers to the process of using data mining techniques to extract
useful information, patterns, and knowledge from web data. This can include
data from websites, web content, web structure, and web usage patterns. Web
mining is typically divided into three main categories: Web Content Mining,
Web Structure Mining, and Web Usage Mining.
Application:
An example of web mining is Personalized Recommendation Systems used by
e-commerce websites like Amazon. Web mining techniques analyze user
behavior, such as past purchases, browsing history, and click patterns, to
recommend products that are likely to interest the user, improving the customer
experience and driving sales.
29 Describe the shingling algorithm with an example in detail. CO2 5
The Shingling Algorithm is used in text mining and data comparison to create a
set of substrings, called shingles, from a given text or document. These shingles
can then be used to detect similarities or measure the similarity between
different documents. Shingling is particularly useful for detecting plagiarism or
finding near-duplicate content.
Steps of the Shingling Algorithm:
1. Tokenization: Break the document into tokens (e.g., words or
characters).
2. Sliding Window: Use a sliding window approach of size kk (shingle
size) to create contiguous subsequences of tokens.
3. Hashing: Often, each shingle is hashed to reduce space complexity.
4. Shingle Set: The set of all shingles forms a representation of the
document.
Example:
Consider the document:
"The quick brown fox."
 Step 1 (Tokenization): Tokenize the document into words:
["The", "quick", "brown", "fox"]
 Step 2 (Sliding Window): For k=2k = 2 (bigrams), extract the shingles:
o ("The", "quick")
o ("quick", "brown")
o ("brown", "fox")
 Step 3 (Hashing): Apply a hash function to each bigram to reduce
storage space:
o Hash("The", "quick") → Hash1
o Hash("quick", "brown") → Hash2
o Hash("brown", "fox") → Hash3
 Step 4 (Shingle Set): The shingle set for this document is {Hash1,
Hash2, Hash3}.
Application:
Shingling is useful in duplicate detection and similarity comparison. For
example, when comparing two documents, their shingles are compared, and if
the intersection of their shingle sets is large, the documents are considered
similar. This method is commonly used in plagiarism detection and finding near-
duplicate content on the web.
30 Describe the Leiden Community detection algorithm with advantages and CO2 5
disadvantages in detail. Use suitable diagrams to support your example.
Leiden Community Detection Algorithm
The Leiden Algorithm is a community detection algorithm designed to find
clusters or communities in large networks. It improves upon the Louvain
algorithm by providing better performance and quality of community structure.
The Leiden algorithm focuses on optimizing modularity while refining the
community structure in a hierarchical way.
Steps of the Leiden Algorithm:
1. Initialization: Each node starts in its own community.
2. Local Moving Phase: Each node is moved to a neighboring community
to maximize modularity. This is done by evaluating the quality of
community structure using modularity optimization.
3. Refinement Phase: After local moving, a new network is constructed
where communities from the previous phase become nodes. This step
ensures that the communities are more refined.
4. Hierarchical Process: Steps 2 and 3 are repeated iteratively, refining the
community structure at each level, until no further improvements in
modularity can be made.
Advantages of the Leiden Algorithm:
1. Higher Quality Communities: It provides more accurate and well-
defined community structures compared to the Louvain algorithm.
2. Efficiency: The Leiden algorithm is faster and more scalable than many
other community detection algorithms, especially for large networks.
3. Guaranteed Improvement: The algorithm guarantees improvements in
modularity at each step, ensuring the communities formed are well-
separated.
Disadvantages of the Leiden Algorithm:
1. Computational Complexity: While it is faster than Louvain, the
algorithm can still be computationally intensive for very large networks.
2. Memory Usage: The refinement phase may require considerable
memory when working with very large graphs.
3. Dependency on Modularity: Like other modularity-based methods, the
Leiden algorithm may sometimes produce communities that are not
optimal from a domain-specific perspective (e.g., if modularity doesn't
capture meaningful real-world communities).
Example (Visualization):
Consider a simple network of 6 nodes connected as shown below:
A -- B
| |
C -- D

E -- F
 Initially, each node is in its own community: {A}, {B}, {C}, {D}, {E},
{F}.
 During the local moving phase, nodes will move between communities
to increase modularity. For example, node B might move to the same
community as node A if it increases modularity.
 The refinement phase will then aggregate communities into a new
network and repeat the process.
After applying the Leiden algorithm, the final result might show two
communities: {A, B, C, D} and {E, F}.
Diagram:
Here’s a simple diagram showing the steps of the Leiden algorithm:
1. Initial Network:
A -- B
| |
C -- D

E -- F
2. After Local Moving Phase (initial communities):
{A, B, C, D} {E, F}
3. After Refinement Phase (final communities):
{A, B, C, D} {E, F}
Conclusion:
The Leiden algorithm is an efficient and high-quality method for detecting
communities in networks, offering significant improvements over earlier
methods like Louvain. However, it may still face challenges with very large
datasets in terms of memory and computational requirements.
31 Describe the Louvain Community detection algorithm with advantages and CO2 5
disadvantages in detail. Use suitable diagrams to support your example.
Louvain Community Detection Algorithm
The Louvain Algorithm is one of the most popular community detection
methods used in large networks. It aims to detect communities by optimizing
modularity, a measure that quantifies the strength of division of a network into
communities. The algorithm is hierarchical and works in two main phases that
are repeated iteratively.
Steps of the Louvain Algorithm:
1. Initialization: Initially, each node is placed in its own community.
2. Phase 1 (Local Moving Phase): Each node is moved to the community
of its neighbor that maximizes the modularity. This is done for all nodes
in the network.
3. Phase 2 (Community Aggregation): After the local moves, the
algorithm constructs a new network where each community from Phase 1
is treated as a single node, and edges between these communities are
weighted based on the total weight of edges between nodes in the
original communities.
4. Repetition: Steps 2 and 3 are repeated iteratively until no further
improvement in modularity can be achieved.
Advantages of the Louvain Algorithm:
1. Efficiency: The Louvain algorithm is fast and scalable, making it
suitable for large networks.
2. Modularity Optimization: It is based on optimizing modularity, which
is a good indicator of the quality of community structures in many cases.
3. Hierarchical Nature: The hierarchical approach makes it possible to
detect communities at different levels of granularity.
4. Widely Used: The algorithm is well-known and widely adopted in many
research and real-world applications due to its simplicity and
effectiveness.
Disadvantages of the Louvain Algorithm:
1. Resolution Limit: The Louvain algorithm may have trouble detecting
small communities in large networks (this is a known issue with
modularity optimization).
2. Greedy Nature: Since it is based on local optimization (greedy
approach), the algorithm can get stuck in suboptimal solutions, leading to
less accurate community structures in some cases.
3. Dependency on Modularity: Like many modularity-based algorithms,
the quality of community detection depends on the modularity function,
which may not always reflect meaningful community structures in some
domains.
Example (Visualization):
Consider a simple network of 6 nodes with the following connections:
A -- B
| |
C -- D

E -- F
 Step 1 (Initialization): Initially, each node is its own community: {A},
{B}, {C}, {D}, {E}, {F}.
 Step 2 (Local Moving Phase): Nodes will move to communities of their
neighbors to increase modularity. For example:
o Node B might join community {A, C, D}, as this increases
modularity.
 Step 3 (Community Aggregation): After the local moving, the
algorithm aggregates the communities into a new network:
o The new network has communities {A, B, C, D} and {E, F}.
 Step 4 (Repeat): The algorithm repeats this process, but since no further
improvement in modularity can be made, it stops.
Final Communities:
 {A, B, C, D}
 {E, F}
Diagram:
1. Initial Network:
A -- B
| |
C -- D

E -- F
2. After Local Moving Phase (Communities Formed):
{A, B, C, D} {E, F}
3. Final Community Structure:
{A, B, C, D} {E, F}
Conclusion:
The Louvain algorithm is efficient and widely used for community detection in
networks, offering a good balance between performance and scalability.
However, it may struggle with detecting smaller communities and can get stuck
in local optima due to its greedy approach. Despite these limitations, it remains
one of the most popular methods for large-scale community detection.
32 What is the visualization of social networks? Give examples to illustrate. CO2 5
Visualization of Social Networks
Social network visualization refers to the graphical representation of
relationships, interactions, and connections within a social network. It uses
nodes (representing individuals or entities) and edges (representing relationships
or interactions) to illustrate the structure and dynamics of the network.
Visualization helps to understand the network's topology, identify key players,
community structures, and analyze the flow of information.
Key Components in Visualization:
1. Nodes (Vertices): Represent individuals, organizations, or entities.
2. Edges (Links): Represent relationships, interactions, or connections
between the nodes.
3. Community Clusters: Groups of nodes that are densely connected
internally, representing subgroups within the network.
Examples:
1. Facebook Friendship Network:
o Example: In a Facebook social network, each user is represented
as a node, and the friendships between users are represented as
edges. A visualization can show how individuals are connected,
who are the central figures (e.g., people with many friends), and
how people are grouped together in communities of friends.
Visualization:
o Nodes (users) are connected by edges (friendships), and clusters
of tightly-knit friends can be seen.
2. Twitter Follower Network:
o Example: A Twitter follower network visualizes users (nodes)
and the follower-following relationships (edges). This can show
which users are central (e.g., influencers with many followers), as
well as identify groups of users who are interacting frequently.
Visualization:
o A network with nodes (users) and edges (follower-following
relationships) can highlight influencers, communities, and
connections.
3. Co-authorship Network in Academia:
o Example: In academic research, authors (nodes) are connected by
edges if they have co-authored a paper together. This
visualization helps in identifying research groups, influential
authors, and collaborations.
Visualization:
o A graph with nodes (authors) and edges (co-authorships) shows
collaboration patterns and community structures within academic
fields.
Advantages of Social Network Visualization:
 Identifying Key Players: Visualizations help to identify influential
nodes (e.g., celebrities, influencers).
 Community Detection: Helps to detect communities or clusters within
the network.
 Understanding Network Dynamics: Provides insights into the flow of
information or influence within the network.
Conclusion:
Social network visualization is a powerful tool for analyzing the structure,
interactions, and dynamics within social systems. It helps to visualize complex
relationships in a clear and intuitive way, making it easier to analyze large-scale
social interactions.
33 Solve the graph using Page ranking: Nodes: {A, B, C, D} CO2 5
Edges:
A→C
B→A
B→D
C→D
34 Discuss the concept of PageRank and its significance in ranking web pages CO2 5
within a network. How does the algorithm utilize link structures to determine the
importance of a webpage? Illustrate the working of PageRank with a structured
example, including calculations and an interpretation of results.
Concept of PageRank
PageRank is an algorithm developed by Larry Page and Sergey Brin, the
founders of Google, to rank web pages based on their importance. It is based on
the idea that a page is important if it is linked to by many other important pages.
The algorithm uses the link structure of the web to assign a rank to each
webpage, effectively measuring the webpage’s "importance" in relation to
others.
Significance in Ranking Web Pages
PageRank is significant because it allows search engines like Google to rank
pages in search results by considering not just the content of the page, but also
the network of links pointing to it. This helps to identify authoritative or
reputable pages. A page with more high-quality inbound links from authoritative
pages is likely to be more relevant and useful.
How PageRank Uses Link Structures
The PageRank algorithm treats links as "votes" for a page. However, not all
votes are equal:
 A link from an important page (one with a high PageRank) is more
valuable than a link from an unimportant page.
 Each page distributes its rank to the pages it links to, with the rank being
divided equally among all outgoing links.
Example and Calculations
Consider a simple web with 3 pages: A, B, and C. The links between the pages
are:
 A→B
 B → A, C
 C→A
Initial PageRank (Assume equal distribution):
 PR(A) = PR(B) = PR(C) = 1 (initial value for all pages).
Damping Factor (d): Set to 0.85, commonly used in PageRank calculations.
Interpretation:
 Page A has the highest PageRank because it is linked by both B and C,
two other pages with significant rank.
 Page B has a moderate rank, as it is linked to by A, but it also has
outgoing links to two pages (A and C).
 Page C has the lowest rank because it only receives a link from B, which
has a smaller PageRank.
This demonstrates how the PageRank algorithm works by considering both the
quantity and quality of incoming links when determining the importance of a
page.
35 Describe the Shingling algorithm and its role in detecting near-duplicate CO2 5
documents. How does it transform text into sets of overlapping substrings for
similarity computation? Illustrate the process with a step-by-step example,
including shingle generation and comparison.
Shingling Algorithm and its Role in Detecting Near-Duplicate Documents
The Shingling algorithm is a technique used to break down a document into
overlapping substrings (called "shingles") of fixed length. These shingles are
then used to compare documents for similarity and detect near-duplicate
content. By converting text into sets of shingles, the algorithm can identify
shared patterns between documents, helping to identify near-duplicates even if
they are not exactly the same.
How Shingling Works:
1. Text Transformation: A document is divided into overlapping
substrings (called shingles) of a fixed length (typically 3-5 characters).
2. Shingle Representation: Each shingle is treated as an element in a set.
3. Similarity Computation: By comparing the sets of shingles between
documents, we can calculate a Jaccard similarity score, which measures
the intersection of the shingles divided by their union.
Step-by-Step Example:
Consider the following text from two documents:
Document 1: "I love programming in Python."
Document 2: "I love coding in Python."
Let's assume the shingle length is 2 (bigrams), meaning we will create shingles
of 2 consecutive characters.
Step 1: Generate shingles
For Document 1:
 "I love programming in Python."
Bigrams (shingles):
o "I ", " l", "lo", "ov", "ve", "e ", " p", "pr", "ro", "og", "gr", "ra",
"am", "mm", "mi", "in", " n", " Py", "Py", "yt", "th", "ho", "on",
"n."
For Document 2:
 "I love coding in Python."
Bigrams (shingles):
o "I ", " l", "lo", "ov", "ve", "e ", " c", "co", "od", "di", "in", " n", "
Py", "Py", "yt", "th", "ho", "on", "n."
Step 2: Shingle Comparison
Now, we compare the sets of shingles from both documents.
 Shingles for Document 1:
o {"I ", " l", "lo", "ov", "ve", "e ", " p", "pr", "ro", "og", "gr", "ra",
"am", "mm", "mi", "in", " n", " Py", "Py", "yt", "th", "ho", "on",
"n."}
 Shingles for Document 2:
o {"I ", " l", "lo", "ov", "ve", "e ", " c", "co", "od", "di", "in", " n", "
Py", "Py", "yt", "th", "ho", "on", "n."}
Step 3: Calculate Jaccard Similarity
The Jaccard similarity is calculated as:
Intersection of shingles:
 {"I ", " l", "lo", "ov", "ve", "e ", " n", " Py", "Py", "yt", "th", "ho", "on",
"n."}
There are 12 common shingles between the two documents.
Union of shingles:
 {"I ", " l", "lo", "ov", "ve", "e ", " p", "pr", "ro", "og", "gr", "ra", "am",
"mm", "mi", "in", " c", "co", "od", "di", " n", " Py", "Py", "yt", "th", "ho",
"on", "n."}
There are 18 unique shingles between both documents.
Jaccard Similarity:

Step 4: Interpretation
The Jaccard similarity score of 0.6667 indicates that the documents are
66.67% similar based on the shingles. This high similarity score suggests that
the two documents are near-duplicates, even though they are not identical.
Conclusion
The Shingling algorithm is a powerful tool for detecting near-duplicate
documents. By converting text into sets of overlapping substrings (shingles) and
comparing these sets, we can measure the similarity between documents. This
method is widely used in tasks like plagiarism detection, web crawling, and
deduplication in large document collections.

36 Explain the page ranking algorithm and perform on following graph: CO2 10

Page Ranking Algorithm (PageRank)


PageRank is an algorithm used by search engines (originally by Google) to rank
web pages in search results. It measures the importance of website pages based
on the link structure. The basic idea is:
1. A page is important if many important pages link to it.
2. Each page's rank is distributed equally among all pages it links to.
3. The process is repeated iteratively until the ranks converge.

Given Graph Analysis


From the image, the directed graph has the following structure:
 A → B, C
 B→A
 C→B
 D→C
Initial rank (assuming equal importance):
Each of 4 pages gets 14=0.25\frac{1}{4} = 0.25

Iteration 0 (Initial ranks)


A: 0.25
B: 0.25
C: 0.25
D: 0.25

Iteration 1 Result
A: 0.3625
B: 0.46875
C: 0.46875
D: 0.15

You would continue this process for several more iterations until the values
stabilize (converge). Would you like me to iterate further or provide a
visualization of convergence?
37 Explain the shingling algorithm. Perform the same on: CO2 10
Document A - As the sun dipped below the horizon, casting hues of orange and
pink across the sky, the weary traveler found solace in the warmth of a crackling
fire under a blanket of stars.
Document B - Solace was found by the weary traveler in the warmth of a
crackling fire under a blanket of stars as hues of orange and pink were cast
across the sky by the setting sun.
Shingling Algorithm (Short Answer - 10 Marks)
The Shingling Algorithm is a method used to compare the similarity between
documents. It involves:
1. Preprocessing: Normalize text (remove punctuation, lowercase, etc.).
2. Tokenization: Split the text into overlapping substrings (called k-
shingles) of a chosen size (e.g., 3 words).
3. Shingle Set Creation: Convert each document into a set of shingles.
4. Similarity Measurement: Use Jaccard Similarity:

Where A and B are sets of shingles from the two documents.

Performing Shingling (k=3 word shingles)


Document A (after cleaning and tokenizing):
as the sun dipped below the horizon casting hues of orange and pink across the
sky the weary traveler found solace in the warmth of a crackling fire under a
blanket of stars
Shingles (A):
 as the sun
 the sun dipped
 sun dipped below
 ...
 blanket of stars
Document B (cleaned):
solace was found by the weary traveler in the warmth of a crackling fire under a
blanket of stars as hues of orange and pink were cast across the sky by the
setting sun
Shingles (B):
 solace was found
 was found by
 ...
 setting sun
Intersection (common shingles) (example):
 the weary traveler
 warmth of a
 crackling fire under
 blanket of stars
 etc.
Calculate Jaccard Similarity (example values):
Assume:
 |A| = 40 shingles
 |B| = 42 shingles
 |A ∩ B| = 12
 |A ∪ B| = 70

Conclusion
The documents share several phrases but differ in word order. The Jaccard
Similarity shows they are moderately similar, with a similarity score of ~0.17
using 3-word shingles.
Would you like a visual or exact list of shingles?

38 Explain any 10 applications of Social Media Data in Business. CO2 10


10 Applications of Social Media Data in Business (Short Answer – 10
Marks)
1. Brand Monitoring: Track brand mentions to understand public
perception and reputation.
2. Customer Sentiment Analysis: Analyze user emotions (positive,
negative, neutral) to improve products or services.
3. Targeted Advertising: Use user behavior and interests to create highly
targeted ad campaigns.
4. Market Research: Gain insights into customer needs, preferences, and
emerging trends.
5. Competitor Analysis: Monitor competitors’ social activity, campaigns,
and customer engagement.
6. Customer Service Improvement: Identify and respond to customer
complaints or queries in real time.
7. Product Development: Use feedback and discussions to guide product
enhancements or innovations.
8. Influencer Identification: Find key influencers to promote products to a
wider audience effectively.
9. Sales Forecasting: Predict demand trends based on consumer chatter and
buying intentions.
10. Crisis Management: Detect early signs of PR crises and act quickly to
control the narrative.
Would you like brief examples for each application?
39 Find resemblence of the following texts: CO2 10
Document A - In the heart of the dense jungle, where emerald leaves were
shimmered under the golden rays of the sun, the ruins of an ancient civilization
were stumbled upon by a curious explorer, hidden from the modern world,
revealing secrets of a forgotten era buried beneath layers of time and
overgrowth.
Document B - A curious explorer stumbled upon the ruins of an ancient
civilization hidden in the heart of the dense jungle, where emerald leaves
shimmered under the golden rays of the sun, revealing secrets of a forgotten era
buried beneath layers of time and overgrowth.
Text Resemblance Analysis (Short Answer – 10 Marks)
To determine the resemblance between Document A and Document B, we can
use text similarity techniques like:

1. Shingling + Jaccard Similarity (k=3 words)


 Both documents describe the same scene using similar phrases with
slightly different word orders.
 After converting both texts into 3-word shingles, many overlapping
shingles will appear (e.g., “curious explorer stumbled”, “ruins of an”,
“dense jungle where”).
✅ High overlap = High Jaccard Similarity (likely > 0.6)

2. Cosine Similarity (Vector-based)


 Convert words into vectors using term frequency.
 Since both documents use nearly the same vocabulary and structure, the
cosine similarity score will be very high (close to 1).

3. Semantic Similarity
 Though the sentence structures vary slightly, semantic meaning is
nearly identical.
 NLP models would rate this similarity as very high (above 0.9).

Conclusion
The texts are highly similar in both content and semantics, differing mainly in
word order.
✅ Resemblance Score: Very High (~90–95%)
40 Investigate how centrality measures can aid in the early detection of disease CO2 10
spread within a healthcare network. Given a network where nodes represent
individuals and edges represent direct physical interactions, compute the
Betweenness and Closeness centrality of individuals to identify potential super-
spreaders and optimal intervention points.
Centrality Measures in Disease Spread Detection (Short Answer – 10
Marks)
In a healthcare network where nodes = individuals and edges = physical
interactions, centrality measures can help identify key individuals for early
disease control.

1. Betweenness Centrality
 Definition: Measures how often a node lies on the shortest path between
other nodes.
 Interpretation: High betweenness nodes connect clusters and can
spread disease across groups.
 Use:
o Identify super-spreaders who act as bridges between
communities.
o Useful for quarantine or vaccination to break transmission
chains.

2. Closeness Centrality
 Definition: Measures how close a node is to all others in the network.
 Interpretation: High closeness nodes can reach others quickly.
 Use:
o Prioritize these individuals for monitoring or early
intervention.
o Effective for fast containment of outbreaks.

Example Calculation (Simplified Network)


Person Shortest Paths to Others Closeness Centrality Betweenness Centrality
A A→B, A→C, A→D High Low
B B→A, B→C, B→D High Medium
C C→B→A, C→B→D Medium High
D D→C→B, D→C→A Low Medium

Conclusion
 High Betweenness (e.g., C) → Likely super-spreader.
 High Closeness (e.g., A or B) → Ideal for early monitoring or
intervention.
✅ Centrality measures help prioritize testing, isolation, and vaccination to
efficiently control disease spread.
Would you like a visual diagram or Python code for real network calculations?
41 Solve the following: CO2 10
Document A:
"The global economy is experiencing rapid shifts due to technological
advancements, market fluctuations, and policy changes affecting businesses
worldwide significantly."
Document B:
"Rapid shifts in the global economy are influenced by technological progress,
market instability, and new government policies impacting companies globally."
Text Similarity Analysis (Short Answer – 10 Marks)
To evaluate the similarity between Document A and Document B, we use
semantic and structural analysis.

1. Lexical and Structural Comparison


 Similar Words/Phrases:
o “global economy”
o “rapid shifts” / “shifts in the global economy”
o “technological advancements” ≈ “technological progress”
o “market fluctuations” ≈ “market instability”
o “policy changes” ≈ “new government policies”
o “affecting businesses” ≈ “impacting companies”
✅ Paraphrased structure, but core ideas match.

2. Semantic Similarity
 Though the wording differs, semantic meaning is almost identical.
 NLP models like BERT would return a semantic similarity score above
0.85, indicating very high similarity.

3. Jaccard Similarity (Word Set)


 Unique words in A and B have a decent overlap.

Conclusion
 Lexical similarity: Medium
 Semantic similarity: High (~0.85–0.90)
 Overall Resemblance: Strong — paraphrased version of the same
idea
✅ Score: 9/10 (for similarity, structure, and semantic retention)
Would you like a similarity score computed using actual NLP tools?

42 Solve using Shingling: w=3 CO2 10


Document A: The company launched its new product in the market with an
extensive marketing campaign.
Document B: The new product was launched in the market by the company with
an extensive marketing campaign.
Shingling Solution (w = 3 words)
Task: Use 3-word shingling to compare Document A and B.

Step 1: Preprocess the Texts


Lowercase, remove punctuation (if any), and tokenize.
Document A:
the company launched its new product in the market with an extensive
marketing campaign
Document B:
the new product was launched in the market by the company with an extensive
marketing campaign

Step 2: Generate 3-Word Shingles


Shingles for Document A:
1. the company launched
2. company launched its
3. launched its new
4. its new product
5. new product in
6. product in the
7. in the market
8. the market with
9. market with an
10. with an extensive
11. an extensive marketing
12. extensive marketing campaign
Shingles for Document B:
1. the new product
2. new product was
3. product was launched
4. was launched in
5. launched in the
6. in the market
7. the market by
8. market by the
9. by the company
10. the company with
11. company with an
12. with an extensive
13. an extensive marketing
14. extensive marketing campaign

Step 3: Compute Jaccard Similarity


 Intersection (common shingles):
o in the market
o with an extensive
o an extensive marketing
o extensive marketing campaign
✅ Total Common Shingles = 4
✅ Total Unique Shingles = 22 (12 from A + 14 from B − 4 overlapping)

Conclusion
 Jaccard Similarity ≈ 0.18 using 3-word shingles.
 Despite semantic similarity, word order differences reduce shingle
overlap.
 Interpretation: Low-to-moderate textual similarity, though conceptually
they're quite close.
Would you like a visualization or try with a different shingle size?
43 Examine the idea of homophily in relation to social networks. Give a practical CO3 2
example to demonstrate its significance.
Homophily in social networks refers to the tendency of individuals to form
connections with others who are similar to themselves in terms of attributes such
as age, race, gender, social status, or interests. This phenomenon plays a
significant role in shaping the structure and dynamics of social networks.
Practical Example:
In Facebook, users are more likely to form connections with people who share
similar interests, educational background, or location. For instance, college
students are likely to connect with other students from their university, creating
tightly-knit communities. This homophily helps reinforce group identities,
fosters stronger social ties, and influences how information spreads within
networks.
44 How are important nodes identified using Degree, Betweenness, and Closeness CO3 2
centrality? Give an example to help clarify.
Important nodes in a network can be identified using different centrality
measures:
1. Degree Centrality: Measures the number of direct connections (edges) a
node has. Nodes with higher degree centrality are considered important
because they have more immediate connections.
o Example: In a social network, a user with many friends (high
degree) is influential and well-connected.
2. Betweenness Centrality: Measures how often a node lies on the shortest
path between other nodes. Nodes with high betweenness centrality
control the flow of information and act as bridges between different parts
of the network.
o Example: In a communication network, a node that connects two
distinct groups is crucial for information transfer.
3. Closeness Centrality: Measures the average shortest path from a node to
all other nodes. Nodes with high closeness centrality can reach other
nodes more quickly and are considered influential in spreading
information.
o Example: A person in an organization who can quickly
communicate with everyone, regardless of their position in the
hierarchy, has high closeness centrality.
45 Explain the meaning of triadic closure and discuss its significance for network CO3 2
evolution. Give an example.
Triadic Closure refers to the concept where if two individuals (A and B) are
both connected to a third individual (C), they are likely to form a direct
connection (A and B will connect) over time. This process creates a triangle of
relationships in a social network.
Significance for Network Evolution:
Triadic closure is significant because it fosters the growth of social networks by
increasing the likelihood of new connections, enhancing network stability, and
strengthening ties between individuals. It also facilitates the emergence of
communities or clusters within a network.
Example:
In a social network, if person A is friends with both B and C, triadic closure
suggests that B and C are more likely to become friends themselves. This
increases the density of the network, leading to stronger and more cohesive
groups.
46 Describe the idea of a social network's strong and weak relationships. What CO3 2
effects do they have on structural cohesiveness and information flow? Talk
about it using examples from actual life.
In a social network, strong relationships refer to close, frequent interactions,
typically between friends or family members, while weak relationships are
more distant, occasional connections, such as acquaintances or colleagues.
Effects on Structural Cohesiveness:
 Strong relationships contribute to cohesiveness within a small group,
creating tight-knit communities where trust and support are high.
 Weak relationships enhance network reach, connecting different
groups and facilitating broader social connections, which helps in
spreading information across larger, less cohesive networks.
Effects on Information Flow:
 Strong relationships result in fast, reliable information flow within
small groups, but may limit the diversity of information.
 Weak relationships act as bridges between distinct groups, enabling
diverse information flow across the entire network.
Example:
In a workplace, your strong relationships (close colleagues or friends) ensure
smooth communication within your team. However, your weak relationships
(acquaintances from different departments) allow you to access and share
information across the entire organization, enhancing collaboration and
innovation.
47 Explain the Data Wrangling process and its importance in the preparation of CO3 2
data. Give instances of typical data wrangling strategies.
Data Wrangling is the process of cleaning, transforming, and organizing raw
data into a structured and usable format for analysis. It is an essential step in data
preparation, ensuring that data is accurate, consistent, and ready for further
analysis or modeling.
Importance:
 Improves Data Quality: Ensures data is accurate, complete, and
consistent.
 Prepares for Analysis: Allows for easier and more effective analysis by
structuring data correctly.
 Reduces Errors: Helps in identifying and fixing errors or
inconsistencies early in the data pipeline.
Typical Data Wrangling Strategies:
1. Handling Missing Values: Filling in missing data with averages,
medians, or removing incomplete rows/columns.
2. Data Transformation: Converting data into a consistent format (e.g.,
converting text dates into date-time objects).
3. Filtering and Removing Outliers: Identifying and removing data points
that are significantly different from others.
4. Normalization: Scaling numerical values to a standard range (e.g., 0 to
1) to ensure comparability.
5. Data Aggregation: Summarizing data by grouping and aggregating (e.g.,
calculating average sales per region).
Example:
For a sales dataset, data wrangling might involve removing incomplete entries,
converting all date columns to a consistent format, and filling in missing values
for sales figures based on averages for that region.
48 Discuss the role of Network Visualization in Social Network Analysis. CO3 2
Network Visualization plays a crucial role in Social Network Analysis (SNA)
by providing a graphical representation of relationships between individuals or
entities in a network. It helps to easily identify patterns, structures, and key
nodes within the network, which might be difficult to detect from raw data
alone.
Role:
1. Identifying Key Nodes: Visualization helps highlight influential nodes
(e.g., central individuals, hubs, or bridges) by visually representing their
connections and centrality within the network.
2. Understanding Network Structure: It reveals network patterns such as
clusters, communities, and clusters of strong or weak ties, facilitating the
understanding of how information or influence flows through the
network.
Example:
In a social media network, visualization might show how a few central
influencers (high-degree nodes) are connected to a large number of users,
illustrating the spread of content and interactions.
49 State the relationship tie with key Feature: The relationship exists only within CO3 2
the specific context of the shared activity with examples.
The relationship tie with the key feature of "existing only within the specific
context of the shared activity" refers to weak ties that are formed when
individuals connect through a common activity or purpose, but these connections
are limited to that context. These ties are not necessarily strong or long-term
outside the shared activity.
Example:
 Workplace Projects: Colleagues may form a relationship while working
on a specific project but may not interact much once the project ends.
 Online Communities: Users participating in a specific online forum or
game may interact frequently within that platform but may not maintain
contact outside of that activity.
50 Examine the various types of relationships found in social networks. Give CO3 5
instances.
In social networks, relationships between individuals or entities can vary in
nature based on the level of interaction, trust, and purpose. These relationships
can be broadly classified into several types:
1. Strong Ties
These are close, frequent, and deeply personal relationships, often found
between friends or family members. These ties are characterized by high trust,
emotional support, and frequent communication.
 Example: A best friend or a close family member. They provide
emotional support, share personal information, and maintain regular
communication.
2. Weak Ties
Weak ties are more distant connections that do not involve frequent interactions
or deep emotional involvement. They can serve as bridges to new information,
groups, or networks.
 Example: An acquaintance from work or a distant friend on social
media. While not offering deep emotional support, they help in
connecting to different groups or offering new opportunities (e.g., a job
referral).
3. Directed Ties
These relationships have a clear direction, where one individual is the source of
influence or communication, and the other is the recipient. They are commonly
seen in hierarchical or asymmetric relationships.
 Example: A manager and an employee. The manager influences the
employee's tasks, but the employee's influence on the manager may be
limited.
4. Reciprocal Ties
Reciprocal relationships are mutual, where both individuals benefit or interact
with each other on equal terms. These relationships are marked by mutual trust
and exchanges.
 Example: Two colleagues who help each other with tasks and
collaborate on projects. Their relationship is bidirectional, benefiting
both parties equally.
5. Structural Ties (Bridges)
These ties link different social groups or networks and are crucial for the flow of
information across the network. They are typically formed between individuals
who connect separate groups.
 Example: A person who works in both marketing and product
development, connecting these two different departments in an
organization. They act as a bridge for information sharing between the
groups.
Conclusion:
Different types of relationships in social networks—strong, weak, directed,
reciprocal, and structural ties—play various roles in shaping the flow of
information, influence, and support within a network. They are vital for
understanding the structure and dynamics of social connections.
51 Describe the causes of social network triadic closure. Give an example and an CO3 5
analytical viewpoint.
Triadic Closure occurs when two individuals who share a common connection
(a third person) are likely to form a direct connection themselves. This process
creates a "triangle" in the network. The causes of triadic closure are rooted in
psychological, social, and network dynamics.
Causes of Triadic Closure:
1. Social Psychological Factors:
o Similarity and Homophily: People are more likely to form
relationships with others who are similar to themselves in terms
of interests, values, or background, which increases the chance of
closure.
o Trust and Familiarity: Shared friends or common acquaintances
can create a sense of familiarity and trust, encouraging new
connections.
2. Network Structural Factors:
o Network Density: In dense networks, where many individuals
are interconnected, the likelihood of triadic closure increases. As
individuals are already embedded in interconnected clusters,
forming a new tie is more likely.
o Mutual Friendships: When two individuals are already
connected through a mutual friend, it provides a social context for
them to meet and interact more easily.
3. Social Influence:
o Peer Pressure and Social Influence: Social pressure from
mutual acquaintances can push individuals to form ties, especially
when social norms emphasize the importance of creating a
cohesive group.
4. Common Interests or Goals:
o Shared Activities: If individuals are involved in a common
activity, event, or organization, the existing link through the third
person can encourage interaction, leading to closure.
Example:
Consider a workplace scenario where Person A is friends with both Person B
and Person C. If Person B and Person C share similar interests (e.g., both work
in the same department or attend the same professional events), they are more
likely to connect through Person A, resulting in triadic closure.
Analytical Viewpoint:
Triadic closure strengthens the structural cohesiveness of a network by creating
more tightly-knit clusters or communities. It also facilitates information flow,
as individuals within a closed triad are more likely to share information quickly
and trust each other. Moreover, triadic closure helps in the stability of social
networks, as connections between individuals are reinforced by mutual
acquaintances, reducing the risk of ties being broken or becoming weak. It can
also enhance the resilience of a network by ensuring that even if one tie is lost,
the overall communication within the group remains intact.
Thus, triadic closure plays a crucial role in shaping the dynamics and robustness
of social networks.
52 Analyze the significance of network visualization for practical uses. Give CO3 5
examples to illustrate your points.
Network Visualization is a powerful tool for understanding and analyzing the
structure of complex relationships in a network. It converts abstract data into a
visual format, making it easier to identify patterns, key nodes, and the flow of
information or influence.
Significance of Network Visualization:
1. Identifying Key Nodes and Influencers:
o Example: In a social media network, network visualization can
highlight users with the most connections or influence (e.g., high-
degree centrality), helping marketers identify influencers or
opinion leaders for campaigns.
2. Understanding Network Structure:
o Network visualization reveals clusters, communities, and patterns
in relationships. It helps detect sub-networks where individuals
or nodes are more tightly connected.
o Example: In a corporate setting, visualization can uncover
departments or teams that are highly interconnected and those
that might need improved collaboration.
3. Analyzing Information Flow:
o Visualizing how information flows through a network helps to
identify bottlenecks or key bridges in the flow of communication.
o Example: In an emergency response system, network
visualization can show which areas are most connected, helping
to identify where resources and information should be directed.
4. Detecting Vulnerabilities and Risks:
o By visualizing relationships, weak ties, or isolated nodes can be
identified. This helps in understanding potential vulnerabilities in
a network’s structure.
o Example: In cybersecurity, a network visualization can highlight
weak links in an organization’s internal communication systems
that might be vulnerable to attacks.
5. Decision-Making and Strategy:
o Visualizations assist in making informed decisions about network
interventions, such as fostering connections between isolated
groups or reinforcing communication channels.
o Example: In a supply chain network, visualization can show
dependencies between suppliers and manufacturers, allowing for
better risk management and resource allocation.
Conclusion:
Network visualization simplifies complex data, making it accessible for strategic
decision-making, identifying influential nodes, understanding structure, and
improving overall network efficiency and resilience. It is a critical tool in fields
ranging from business to healthcare to cybersecurity.
53 Give examples to illustrate the many benefits of network visualization. CO3 5
Network Visualization offers several benefits across various fields, enhancing
the understanding and analysis of complex systems. Below are examples that
illustrate its practical benefits:
1. Enhanced Understanding of Relationships:
 Example: In a social network, visualizing connections between
individuals helps identify clusters, communities, and central influencers.
For example, a social media platform can use network visualization to
show groups of users with common interests, making it easier for
marketers to target specific audiences.
2. Identifying Key Influencers:
 Example: In marketing, network visualization helps identify
influencers in a brand’s network who have the most connections (high-
degree centrality) or the ability to spread information across the network
(betweenness centrality). This helps in selecting the right influencers for
campaigns.
3. Optimizing Resource Allocation:
 Example: In supply chain management, visualizing the network of
suppliers, distributors, and customers helps identify critical links and
potential bottlenecks. For instance, if one supplier has many connections
but is geographically distant, resources may be allocated differently to
avoid delays.
4. Detecting Vulnerabilities and Risks:
 Example: In cybersecurity, network visualization can map out the
relationships between devices, networks, and users, helping to identify
potential vulnerabilities. A node with a high number of connections
(central node) might be a target for attacks, and visualization helps
prioritize protection.
5. Improving Collaboration and Communication:
 Example: In a corporate setting, network visualization can identify
isolated departments or teams with fewer interconnections. By
visualizing communication paths, managers can foster better
collaboration by encouraging connections between departments that are
less connected.
Conclusion:
Network visualization simplifies the analysis of complex systems, making it
easier to identify patterns, detect risks, optimize resources, and improve
decision-making in fields such as marketing, supply chain, cybersecurity, and
corporate management.
54 Balance Theory is concisely yet thoroughly explained, with an emphasis on its CO3 5
underlying ideas and applications to social network dynamics.
Balance Theory is a psychological theory that focuses on the relationships and
stability between elements within a network, particularly in social networks. It
was developed by Fritz Heider in 1946 and is based on the idea that people
prefer harmony and consistency in their relationships.
Underlying Ideas:
 Triadic Relationships: The theory examines triads (sets of three
individuals or entities) and the balance or imbalance of their
relationships. In a triad, if the relationships between the three elements
are consistent (i.e., all positive or two negative and one positive), the
network is considered balanced.
 Positive and Negative Relations: The relationships between entities are
either positive (liking, agreement) or negative (disliking, disagreement).
A balanced triad occurs when there are either:
o Three positive relations (all like each other),
o Two negative and one positive relation (two people dislike a third
person, but that person likes one of them).
Applications to Social Network Dynamics:
1. Social Cohesion: Balanced networks lead to social stability and
harmony. For example, if two people dislike a third person but the third
person likes one of them, there is tension, and this could lead to a
potential shift in relationships to restore balance.
2. Conflict Resolution: In group dynamics or organizations, imbalance in
relationships (e.g., conflicts between colleagues) can lead to changes in
opinions or alliances to restore balance. This is especially useful for
understanding social conflicts and how they may resolve.
3. Predicting Social Behavior: Balance theory helps predict how
relationships evolve within groups. For instance, when one individual in
a triad switches their opinion (positive to negative or vice versa), the
entire network may shift to restore balance, influencing group decisions
or collective behavior.
Example:
In a workplace, consider three colleagues: A, B, and C. If A likes both B and C,
but B and C dislike each other, the situation is unbalanced. According to balance
theory, this imbalance may lead to B and C eventually reconciling or further
intensifying their conflict, depending on their effort to restore balance.
Conclusion:
Balance theory provides insight into the dynamics of social relationships and
offers a framework for understanding how individuals adjust their attitudes and
interactions to maintain harmony in their social networks. Its applications extend
to conflict resolution, predicting relationship changes, and understanding group
behavior in social, organizational, and political settings.
55 Explain the example: Triadic relationship between influencer marketing and CO3 5
brand perception.
The triadic relationship between influencer marketing and brand perception
involves three key elements: the brand, the influencer, and the audience
(consumers). These three elements interact to shape consumer attitudes toward
the brand, forming a triadic network where the relationship dynamics can
influence brand perception.
Explanation of the Triadic Relationship:
1. Brand and Influencer:
o The brand partners with an influencer to promote products or
services. The influencer uses their credibility, trust, and influence
over their followers to endorse the brand. If the influencer is
aligned with the values or image of the brand, it strengthens the
association between them.
o Example: A skincare brand partners with a well-known beauty
influencer to showcase its products in a positive light. The
influencer's trustworthiness and expertise in beauty will influence
the perception of the brand as reliable and high-quality.
2. Influencer and Audience:
o The audience (consumers) typically trusts the influencer's
opinions and looks to them for guidance. The relationship
between the influencer and the audience is based on trust and
relatability. The influencer's endorsement directly impacts how
the audience views the product or brand.
o Example: A follower who trusts the influencer's reviews may
develop a positive opinion about the brand simply due to the
influencer's recommendation.
3. Brand and Audience:
o The brand's perception is influenced by how the audience
perceives the influencer and their endorsement. If the influencer
is highly regarded and their opinion resonates with the audience,
the brand’s image is positively impacted.
o Example: If an influencer promotes a product that aligns with
their values (e.g., eco-friendly or ethical), their audience, who
shares similar values, may perceive the brand as more trustworthy
and responsible.
Triadic Closure:
 If the relationship between the brand and influencer is positive, and the
audience also has a positive perception of both, a balanced triad is
formed, reinforcing the brand's reputation.
 If there is tension (e.g., the audience disagrees with the influencer's
endorsement or the influencer’s image doesn’t match the brand’s values),
this can lead to imbalanced perceptions that may harm brand
reputation.
Example in Action:
 A fitness brand partners with a fitness influencer to promote their
workout gear. The influencer shares authentic experiences, showing how
the gear improves performance. The audience, who follows the
influencer for workout tips, views the brand as reliable and high-
performing, leading to a positive brand perception. If the audience
trusts the influencer and aligns with their values, they are more likely to
perceive the brand positively and make purchases.
Conclusion:
The triadic relationship between influencer marketing, brand, and audience is
dynamic and mutually reinforcing. It helps shape brand perception through the
influencer’s credibility and the audience’s trust, ultimately affecting the success
of marketing campaigns. The balance within this triad is crucial in creating a
positive and sustainable brand image.
56 Using diagrammatic examples, differentiate between the Louvain and Leiden CO3 5
approaches.
The Louvain and Leiden algorithms are two popular community detection
methods in network science. Both aim to optimize modularity, but they differ in
how they refine and improve community structure, especially in terms of
stability and speed. Below is a diagrammatic comparison to highlight the core
differences.

57 Solve the following using shingling and w=2 CO3 10


Document A:
"Artificial Intelligence is revolutionizing industries by automating processes,
enhancing decision-making, and improving efficiency across various domains."
Document B:
"Industries are being transformed by Artificial Intelligence through automation,
data-driven decision-making, and increased operational efficiency in multiple
sectors."
Shingling is a technique used to create a set of substrings (called shingles) of a
fixed length from a text document. In this case, you're using shingling with
w=2w = 2, meaning the shingles will be pairs of consecutive words.
Step 1: Generate the shingles for Document A
Document A:
"Artificial Intelligence is revolutionizing industries by automating processes,
enhancing decision-making, and improving efficiency across various domains."
The shingles for Document A with w=2w = 2 are:
 ("Artificial", "Intelligence")
 ("Intelligence", "is")
 ("is", "revolutionizing")
 ("revolutionizing", "industries")
 ("industries", "by")
 ("by", "automating")
 ("automating", "processes")
 ("processes", "enhancing")
 ("enhancing", "decision-making")
 ("decision-making", "and")
 ("and", "improving")
 ("improving", "efficiency")
 ("efficiency", "across")
 ("across", "various")
 ("various", "domains")
Step 2: Generate the shingles for Document B
Document B:
"Industries are being transformed by Artificial Intelligence through automation,
data-driven decision-making, and increased operational efficiency in multiple
sectors."
The shingles for Document B with w=2w = 2 are:
 ("Industries", "are")
 ("are", "being")
 ("being", "transformed")
 ("transformed", "by")
 ("by", "Artificial")
 ("Artificial", "Intelligence")
 ("Intelligence", "through")
 ("through", "automation")
 ("automation", "data-driven")
 ("data-driven", "decision-making")
 ("decision-making", "and")
 ("and", "increased")
 ("increased", "operational")
 ("operational", "efficiency")
 ("efficiency", "in")
 ("in", "multiple")
 ("multiple", "sectors")
Step 3: Find the common shingles
Now, we find the shingles that are common between Document A and
Document B:
 ("Artificial", "Intelligence")
 ("by", "automating")
 ("automating", "processes")
 ("decision-making", "and")
 ("and", "improving")
 ("improving", "efficiency")
 ("efficiency", "across")
These are the shingles that appear in both documents.
Step 4: Calculate the Jaccard similarity

From the above, we know:


 Common shingles = 7
 Shingles in Document A = 15
 Shingles in Document B = 17
Conclusion
The Jaccard similarity between Document A and Document B using 2-shingles
is 0.28.

58 Solve the following using shingling and w=2 CO3 10


Document A:
"Climate change is a pressing global issue, with rising temperatures, extreme
weather events, and environmental degradation posing significant risks to
ecosystems and human societies."
Document B:
"The global challenge of climate change is marked by increasing temperatures,
frequent extreme weather phenomena, and environmental deterioration,
threatening both natural ecosystems and human communities."
Let's solve this using shingling with w=2w = 2 for the given documents.
Step 1: Generate the shingles for Document A
Document A:
"Climate change is a pressing global issue, with rising temperatures, extreme
weather events, and environmental degradation posing significant risks to
ecosystems and human societies."
The shingles for Document A with w=2w = 2 are:
 ("Climate", "change")
 ("change", "is")
 ("is", "a")
 ("a", "pressing")
 ("pressing", "global")
 ("global", "issue")
 ("issue", "with")
 ("with", "rising")
 ("rising", "temperatures")
 ("temperatures", "extreme")
 ("extreme", "weather")
 ("weather", "events")
 ("events", "and")
 ("and", "environmental")
 ("environmental", "degradation")
 ("degradation", "posing")
 ("posing", "significant")
 ("significant", "risks")
 ("risks", "to")
 ("to", "ecosystems")
 ("ecosystems", "and")
 ("and", "human")
 ("human", "societies")
Step 2: Generate the shingles for Document B
Document B:
"The global challenge of climate change is marked by increasing temperatures,
frequent extreme weather phenomena, and environmental deterioration,
threatening both natural ecosystems and human communities."
The shingles for Document B with w=2w = 2 are:
 ("The", "global")
 ("global", "challenge")
 ("challenge", "of")
 ("of", "climate")
 ("climate", "change")
 ("change", "is")
 ("is", "marked")
 ("marked", "by")
 ("by", "increasing")
 ("increasing", "temperatures")
 ("temperatures", "frequent")
 ("frequent", "extreme")
 ("extreme", "weather")
 ("weather", "phenomena")
 ("phenomena", "and")
 ("and", "environmental")
 ("environmental", "deterioration")
 ("deterioration", "threatening")
 ("threatening", "both")
 ("both", "natural")
 ("natural", "ecosystems")
 ("ecosystems", "and")
 ("and", "human")
 ("human", "communities")
Step 3: Find the common shingles
Now, let's identify the common shingles between Document A and Document B:
 ("global", "issue")
 ("change", "is")
 ("is", "a")
 ("a", "pressing")
 ("pressing", "global")
 ("global", "challenge")
 ("climate", "change")
 ("change", "is")
 ("is", "marked")
 ("extreme", "weather")
 ("weather", "events")
 ("and", "environmental")
 ("environmental", "degradation")
 ("ecosystems", "and")
Step 4: Calculate the Jaccard similarity
From the above, we know:
 Common shingles = 7
 Shingles in Document A = 23
 Shingles in Document B = 22
Thus, the Jaccard similarity is:

Conclusion
The Jaccard similarity between Document A and Document B using 2-shingles
is 0.18.

59 Explain what network visualization is and why it is so important to modern data CO3 10
analysis. Provide examples of its real-world uses to demonstrate its practical
significance.
Network visualization is the graphical representation of relationships or
connections within a network, where entities (such as people, systems, or
organizations) are represented as nodes, and their interactions or relationships as
edges (lines connecting nodes). It helps to visually map complex data to identify
patterns, structures, and connections.
Importance in modern data analysis:
1. Reveals Patterns: Visualizing networks helps uncover hidden
relationships and dependencies that may not be immediately obvious in
raw data.
2. Improves Understanding: It simplifies complex data, making it easier
to interpret and analyze.
3. Identifies Key Nodes: It highlights influential or central nodes in the
network, crucial for decision-making.
4. Enhanced Decision-Making: By illustrating connections and flows,
network visualization aids in strategic planning and forecasting.
Real-world uses:
1. Social Media Analysis: Identifying influencers, communities, and trends
(e.g., Twitter’s network of users and hashtags).
2. Cybersecurity: Detecting vulnerabilities and potential threats by
mapping connections between devices or users.
3. Supply Chain Management: Optimizing processes by visualizing
supplier relationships and logistics networks.
4. Biological Networks: Understanding gene interactions or protein
networks in healthcare research.
Network visualization is essential for extracting actionable insights from
complex datasets and driving informed decisions.
60 Give a methodical explanation of each step in the data wrangling process. CO3 10
Data wrangling is the process of cleaning, structuring, and enriching raw data
into a desired format for better decision-making in data analysis. Here is a step-
by-step explanation of the process:
1. Data Collection:
Gather data from various sources such as databases, CSV files, APIs, or
web scraping. This is the first step where all available raw data is
collected for processing.
2. Data Discovery & Assessment:
Explore and understand the structure, content, and quality of the data.
Identify inconsistencies, missing values, or outliers that need to be
addressed.
3. Data Cleaning:
Fix errors such as missing values, duplicates, typos, and inconsistencies.
This may involve removing nulls, correcting formats, or standardizing
data.
4. Data Structuring:
Reformat or reshape data into a usable structure (e.g., transforming
unstructured data into rows and columns or normalizing tables for
relational databases).
5. Data Enrichment:
Enhance the dataset by merging with additional data sources to provide
more context or fill in gaps (e.g., adding demographic data to customer
records).
6. Data Validation:
Ensure the data is accurate, consistent, and reliable. This includes
checking data types, verifying ranges, and testing business rules.
7. Data Storage:
Save the cleaned and structured data in a suitable format or database for
easy access during analysis or modeling.
These steps ensure the data is high-quality, making it ready for effective
analysis, visualization, or machine learning.
61 Explain, using an example, why Leiden community detection is superior to CO3 10
Louvain.
Leiden community detection is considered superior to Louvain because it
addresses key limitations of Louvain, especially regarding community quality,
stability, and connectedness.

Key Advantages of Leiden over Louvain:


1. Guaranteed Connected Communities:
Louvain can produce disconnected subgroups within a single community,
which is unrealistic in many real-world scenarios. Leiden guarantees that
all communities are internally connected.
2. Faster and More Stable:
Leiden often converges faster and gives more consistent results across
multiple runs, while Louvain can get stuck in local optima.
3. Better Optimization of Modularity/Quality Function:
Leiden refines partitions further, achieving higher modularity or quality
scores.

Example: Social Network Analysis


Imagine analyzing a social network of students in a university, where nodes
represent students and edges represent friendship.
 Louvain Output: Groups some disconnected students into the same
"community" just because they are loosely connected through other
nodes, producing disjointed or misleading groupings.
 Leiden Output: Correctly assigns only closely connected students into
communities, ensuring each group is internally cohesive, better
reflecting real friendship circles.

Conclusion:
Leiden improves on Louvain by ensuring connected, stable, and high-quality
communities. In practical tasks like detecting social groups, customer segments,
or biological clusters, this leads to more accurate and meaningful insights,
making Leiden the preferred choice in modern network analysis.

62 Explain the conceptual foundations of balance theory and provide a relevant case CO3 10
to highlight its applications.
Balance Theory is a social psychology concept introduced by Fritz Heider,
which explains how individuals strive for harmony in their relationships and
attitudes. The theory focuses on triadic relationships—involving three elements
(usually a person and two other entities)—and suggests that people prefer
balanced states where their likes and dislikes are logically consistent.

Conceptual Foundations:
 A triad includes Person (P), Other (O), and Object or Issue (X).
 A triad is balanced if the product of the signs of the relationships is
positive (e.g., P likes O, O likes X, and P likes X).
 An imbalanced state causes psychological discomfort, leading
individuals to adjust their attitudes to restore balance.

Real-World Application:
Case: Brand Endorsement in Marketing
If a consumer (P) likes a celebrity (O), and the celebrity endorses a product (X),
then according to balance theory, the consumer is more likely to develop a
positive attitude toward the product to maintain psychological balance.
 If the consumer dislikes the product but likes the celebrity, the imbalance
may lead to either:
o Changing their opinion about the product (now liking it), or
o Reevaluating their opinion of the celebrity.

Conclusion:
Balance theory helps explain attitude change, peer influence, and marketing
strategies, making it a valuable tool in understanding and predicting human
behavior in social and commercial contexts.
63 A 20-person team is working on an interdisciplinary project. They are divided CO3 10
into four sub-teams (5 members each), but due to management rules:
Each person must have at least one tie within their sub-team.
Each person must have at least 3 external ties to members outside their team.
Some senior members act as bridges, forming additional long-range external
connections.
Construct this network. Compute E - I. Identify the top 3 most connected
individuals and their roles.
Let's break down and construct the network step-by-step according to the
requirements, compute the E–I index, and identify the top 3 most connected
individuals.

Step 1: Structure of the Team


 Total people: 20
 4 sub-teams: A, B, C, D, each with 5 members.
 Each person must:
o Have at least 1 internal (intra-team) tie.
o Have at least 3 external (inter-team) ties.
 Some senior members form additional external (long-range) ties.
Let’s denote members as:
 A1–A5
 B1–B5
 C1–C5
 D1–D5

Step 2: Create Internal Ties (Intra-team)


Each member must have at least one connection within their own sub-team.
For simplicity, let’s create a ring-like structure in each sub-team so that:
 A1 connects to A2
 A2 to A3
 A3 to A4
 A4 to A5
 A5 to A1
Each member now has 2 internal ties, satisfying the condition.
Total internal ties across all teams:
Each team has 5 links, and 4 teams → 4 × 5 = 20 links
Each link connects 2 people → Total internal connections = 20

Step 3: Add External Ties (Inter-team)


Each person needs at least 3 external ties, so total required external links:
 20 people × 3 = 60 external ties
 Each link connects 2 people, so total number of external edges = 60 / 2
= 30
We will assign external connections such that every person connects to people in
at least 2 or 3 other teams.
For example:
 A1 connects to B1, C1, D1
 A2 connects to B2, C2, D2
 ... and so on.
Repeat similarly for B, C, and D.

Step 4: Add Senior Members with Extra External Ties


Let’s designate 4 senior members: A1, B1, C1, D1.
Each senior member adds 2 additional long-range ties.
 A1 connects to B3, C4
 B1 connects to C5, D3
 C1 connects to D4, A4
 D1 connects to A5, B4
Adds 4 × 2 = 8 extra links.

Step 5: Total Edges Calculation


 Internal edges: 20
 Required external edges: 30
 Extra external edges: 8
 Total edges in the network = 20 + 30 + 8 = 58

Step 6: Compute E–I Index


The E–I index is defined as:

Where:
 EE = number of external edges = 38
 II = number of internal edges = 20

A positive E–I index means the network is more externally connected,


indicating good interdisciplinary interaction.

Step 7: Identify Top 3 Most Connected Individuals


Let’s count degrees (connections):
 Senior members A1, B1, C1, D1:
o 2 internal + 3 standard external + 2 extra external = 7
connections
Other members:
 2 internal + 3 external = 5 connections
So, top 3 most connected:
1. A1 (7 connections) – Senior bridge
2. B1 (7 connections) – Senior bridge
3. C1 (7 connections) – Senior bridge

Conclusion:
 E–I Index ≈ 0.31 → moderately externally connected
 Top 3 connected individuals: A1, B1, C1
 Roles: These are senior members acting as bridges, enhancing cross-
team collaboration.
Would you like a visual representation of this network?
64 Analyze the structural role of cliques and bridges within graph networks by CO4 2
providing real-world scenarios where they significantly impact connectivity and
influence.
Cliques are tightly-knit groups where every member is directly connected to
every other. Bridges are connections or nodes that link separate groups or
communities.
 Real-world example of cliques: In a corporate team, a clique of
developers can foster fast collaboration but may resist external input,
limiting innovation.
 Real-world example of bridges: A project manager connecting the
marketing and engineering departments acts as a bridge, enabling
information flow and cross-functional coordination.
Impact: Cliques strengthen internal cohesion, while bridges enhance overall
network connectivity and influence by linking diverse groups.
65 Evaluate the effectiveness of cliques in diverse fields, such as social media, CO4 2
biology, and cybersecurity, illustrating their impact with domain-specific
applications.
Cliques are highly interconnected subgroups within networks, and their
effectiveness varies across domains:
 Social Media: Cliques represent close friend groups or interest
communities. They enhance engagement but may lead to echo chambers,
limiting exposure to diverse viewpoints.
 Biology: In protein interaction networks, cliques often represent protein
complexes, helping identify functional units crucial for understanding
cellular processes.
 Cybersecurity: Cliques in network traffic may indicate coordinated
botnet behavior. Detecting such patterns helps in identifying and
neutralizing threats.
Conclusion: Cliques enhance internal efficiency and cohesion but can either
support or hinder broader system goals, depending on the context.
66 Provide a comprehensive explanation of graph partitioning, and its types CO4 2
employed for segmenting complex networks.
Graph partitioning is the process of dividing a graph into smaller, meaningful
subgraphs or clusters, such that nodes within each partition are more densely
connected to each other than to nodes in other partitions. It simplifies analysis,
enhances computation efficiency, and reveals hidden structures in complex
networks.
Types of Graph Partitioning:
1. Community Detection:
Identifies groups (communities) where nodes are densely connected
internally (e.g., social groups on Facebook).
2. Spectral Partitioning:
Uses eigenvectors of the graph's Laplacian matrix to divide the graph
while minimizing edge cuts between partitions.
3. Modularity-Based Partitioning:
Optimizes a modularity score to detect strong community structures (e.g.,
Louvain or Leiden methods).
4. Minimum Cut Partitioning:
Divides the graph by cutting the fewest edges possible, often used in
parallel computing and circuit design.
Conclusion: Graph partitioning is essential for uncovering structure, optimizing
performance, and enabling scalable analysis of large and complex networks.
67 Compare and contrast various graph kernels and their suitability for different CO4 2
network analysis tasks.
Graph kernels are functions that measure similarity between graphs and are
widely used in tasks like classification, clustering, and link prediction. Here's a
comparison of common graph kernels:

1. Weisfeiler-Lehman (WL) Kernel


 Strengths: Captures hierarchical structure through iterative node
labeling.
 Use Case: Effective for graph classification (e.g., chemical compound
similarity).
 Limitations: May fail on graphs with high symmetry.

2. Shortest-Path Kernel
 Strengths: Compares all shortest paths between nodes across graphs.
 Use Case: Suitable for structural similarity analysis.
 Limitations: Computationally intensive on large graphs.

3. Graphlet Kernel
 Strengths: Measures similarity based on small subgraph patterns
(motifs).
 Use Case: Works well in biological networks (e.g., protein interactions).
 Limitations: High complexity for large graphs.

4. Random Walk Kernel


 Strengths: Compares sequences of nodes by simulating walks.
 Use Case: Useful in link prediction and graph matching.
 Limitations: Prone to tottering (redundant walks) and expensive for
dense graphs.

Conclusion:
 WL Kernel is fast and scalable for structured graphs.
 Graphlet and Random Walk Kernels excel in biological or richly
connected networks.
 Shortest-Path Kernel is ideal when exact structural comparison is
critical.
Choosing a kernel depends on graph size, task, and the nature of
structural patterns to be captured.
68 Differentiate types of graph partitioning techniques based on computational CO4 2
complexity and efficiency in handling real-world networks.
Graph partitioning techniques vary in computational complexity and
efficiency, especially when applied to large, real-world networks:

1. Spectral Partitioning
 Complexity: High (involves eigenvalue decomposition, typically 𝑂(𝑛 3 )
 Efficiency: Accurate but computationally expensive for large graphs
 Use Case: Suitable for medium-sized networks where precision matters

2. Modularity-Based Partitioning (e.g., Louvain, Leiden)


 Complexity: Louvain is O(n log n) ;Leiden is even faster and more
scalable
 Efficiency: Highly efficient and scalable for large, real-world networks
 Use Case: Ideal for community detection in social or biological networks

3. Minimum Cut / Kernighan-Lin Algorithm


 Complexity: 𝑂(𝑛 2 log 𝑛)
 Efficiency: Effective for small graphs but not scalable to large ones
 Use Case: Circuit layout and load balancing in parallel computing

Conclusion:
 Louvain and Leiden are preferred for large-scale networks due to their
speed and quality.
 Spectral and Min-Cut methods offer higher precision but are limited by
computational cost.
69 Discuss importance of social networks in modern society by analyzing their CO4 2
influence on information dissemination, relationships, and business strategies.
Social networks play a crucial role in modern society by transforming how
people communicate, connect, and conduct business.
 Information Dissemination: Platforms like Twitter and Facebook allow
rapid sharing of news, ideas, and trends, enabling real-time updates and
viral content spread.
 Relationships: Social networks foster personal and professional
connections across geographical boundaries, strengthening communities
and enabling support networks.
 Business Strategies: Companies leverage social media for targeted
marketing, customer engagement, and brand building. Influencer
marketing and data-driven insights help businesses reach specific
audiences effectively.
Conclusion: Social networks significantly shape communication, relationships,
and commerce, making them indispensable to social and economic dynamics
today.
70 Discuss social networks as graphs,demonstrating how nodes and edges represent CO4 2
relationships with a relevant example with example.
Social networks as graphs use nodes to represent individuals (or entities) and
edges to represent relationships or interactions between them.
Structure:
 Nodes (Vertices): Represent people, accounts, or organizations.
 Edges (Links): Represent connections such as friendships, follows, or
message exchanges.
Example:
In a Facebook network:
 Nodes: Users A, B, and C
 Edges: A is friends with B and C; B is friends with C
This forms a triangle graph where each node is connected to the others,
representing a tightly-knit group or clique.
Conclusion: Modeling social networks as graphs helps in analyzing influence,
community formation, and information flow efficiently.
71 Provide a comprehensive explanation of graph kernels, elaborating on their role CO4 5
in measuring similarity between graphs. Illustrate the concept with relevant
examples from real-world applications.
Graph kernels are mathematical functions that measure the similarity between
graphs by comparing their structures, labels, or subcomponents. They enable the
use of machine learning algorithms (like SVMs) on graph-structured data by
transforming graphs into feature spaces where similarity can be computed
efficiently.

Role of Graph Kernels:


 Convert complex graphs into numerical representations.
 Allow comparison of graphs without needing exact matching.
 Capture both local (e.g., node neighborhoods) and global (e.g., paths or
subgraphs) structural information.
 Enable tasks like graph classification, clustering, and regression in
domains where data is naturally represented as graphs.

Types & Examples:


1. Weisfeiler-Lehman (WL) Kernel
o Compares graphs by iteratively relabeling nodes based on their
neighbors.
o Use case: Classifying molecular structures in chemoinformatics,
where molecules are graphs of atoms (nodes) and bonds (edges).
2. Shortest-Path Kernel
o Compares graphs by counting matching shortest paths between
node pairs.
o Use case: In transportation networks, it helps compare city
maps or metro systems.
3. Graphlet Kernel
o Measures similarity based on frequency of small subgraphs
(motifs).
o Use case: Detecting protein-protein interaction patterns in
bioinformatics.
4. Random Walk Kernel
o Compares sequences of nodes by simulating random walks over
the graphs.
o Use case: Identifying fraud patterns in financial transaction
networks.

Conclusion:
Graph kernels are powerful tools that bridge graph theory and machine learning.
By quantifying structural similarity, they support pattern recognition in complex
networks across chemistry, biology, social science, and cybersecurity.
72 Assess the role of a social media analyst from both positive and negative CO4 5
perspectives, substantiating the discussion with real-world examples.
A Social Media Analyst plays a critical role in shaping digital strategies by
monitoring, interpreting, and optimizing social media data for businesses,
governments, and organizations.

Positive Perspectives:
1. Data-Driven Insights:
Analysts track engagement, reach, and sentiment to guide marketing
decisions.
Example: During the 2022 FIFA World Cup, brands used social media
analysts to track fan engagement and optimize live ad campaigns in real-
time.
2. Crisis Management:
Analysts identify negative trends early, helping manage reputational
risks.
Example: Airlines like Delta use social media monitoring to respond
quickly to customer complaints or delays.
3. Audience Understanding:
They help tailor content strategies by analyzing demographics and
behavior patterns.
Example: Netflix uses social media analysts to understand audience
reactions to new shows and adjust promotion strategies accordingly.

Negative Perspectives:
1. Privacy Concerns:
Deep analysis of user data can lead to surveillance or manipulation
concerns.
Example: The Cambridge Analytica scandal highlighted how social
media data can be misused for political influence.
2. Data Misinterpretation:
Relying solely on metrics like likes or shares can lead to misleading
conclusions.
Example: A viral post might be controversial rather than positively
engaging, skewing brand sentiment analysis.

Conclusion:
While social media analysts add value through insights and strategy
optimization, ethical handling of data and contextual analysis are essential to
avoid potential misuse or misinterpretation.
73 Examine the role of social networks in crisis communication, analyzing their CO4 5
effectiveness in disseminating real-time information while addressing risks
related to misinformation and public panic.
Social networks play a pivotal role in crisis communication by enabling the
rapid dissemination of real-time information to large audiences. Their
immediacy and reach make them powerful tools during emergencies such as
natural disasters, pandemics, or terrorist attacks.

Effectiveness in Real-Time Information Sharing:


1. Speed and Reach:
Platforms like Twitter and Facebook allow authorities and news agencies
to quickly update the public.
Example: During the COVID-19 pandemic, WHO and health ministries
used social media to share updates, guidelines, and preventive measures
globally.
2. Direct Engagement:
Officials and organizations can address concerns and rumors instantly.
Example: During hurricanes in the U.S., FEMA used social media to
issue evacuation orders and safety tips in real time.

Risks and Challenges:


1. Misinformation Spread:
Unverified content can go viral, misleading the public.
Example: False cures and conspiracy theories during COVID-19 spread
widely on platforms like WhatsApp and YouTube, undermining public
health efforts.
2. Public Panic and Overreaction:
Sensational or emotionally charged posts can incite fear.
Example: In India, false news about fuel shortages during lockdowns led
to hoarding and panic buying.

Conclusion:
Social networks are invaluable for crisis communication due to their speed and
accessibility, but their effectiveness depends on responsible use, fact-checking
mechanisms, and coordinated communication to prevent misinformation and
manage public response.
74 Analyze the impact of viral content on social media engagement, assessing its CO4 5
potential for brand promotion as well as the risks associated with misinformation
and sensationalism.
Viral content on social media significantly impacts engagement, offering both
opportunities for brand promotion and risks related to misinformation and
sensationalism.

Positive Impact on Brand Promotion:


1. Increased Visibility:
Viral content can boost a brand’s reach, making it visible to millions in a
short time.
Example: The "Share a Coke" campaign by Coca-Cola went viral,
significantly increasing customer engagement and brand awareness
globally.
2. Enhanced Engagement:
Viral posts encourage interactions like shares, likes, and comments,
creating a buzz around the brand.
Example: A humorous or emotional ad can generate conversation,
leading to higher interaction rates, driving both engagement and sales.
3. Cost-Effective Marketing:
Brands can benefit from organic exposure, saving on advertising costs.
Example: The "Dove Real Beauty Sketches" campaign went viral with
minimal paid promotion, generating millions of views and widespread
recognition.

Risks of Misinformation and Sensationalism:


1. Spread of False Information:
Viral content can often include misleading claims or false narratives,
damaging credibility.
Example: During the 2020 U.S. elections, viral misinformation about
voting fraud circulated, influencing public opinion negatively.
2. Sensationalism and Manipulation:
Content may be sensationalized to attract attention, compromising
authenticity.
Example: In cases of viral news events, sensational headlines can create
unnecessary panic or spread rumors, as seen with viral claims about
health risks during COVID-19.
3. Brand Reputation Risks:
Brands associated with controversial or false viral content may face
backlash.
Example: If a viral campaign inadvertently supports a divisive topic or
controversy, it may harm the brand's public image.

Conclusion:
While viral content can significantly enhance brand visibility and engagement,
it also comes with the risk of misinformation and sensationalism, making it
crucial for brands to maintain control, verify facts, and approach virality
responsibly to mitigate potential damage.
75 Design a workflow for applying graph mining techniques to a dataset of social CO4 5
media interactions to uncover hidden communities.
Workflow for Applying Graph Mining Techniques to Uncover Hidden
Communities in Social Media Interactions
1. Data Collection and Preprocessing:
o Extract data: Gather social media interaction data such as user
posts, comments, likes, shares, and follows.
o Construct the graph: Represent the social media network as a
graph where:
 Nodes represent users.
 Edges represent interactions (e.g., follows, likes,
comments).
o Data cleaning: Remove noise such as bots, incomplete
interactions, or irrelevant posts.
2. Graph Representation and Transformation:
o Adjacency matrix: Convert the interaction data into a sparse
adjacency matrix or edge list.
o Edge weighting: Optionally, assign weights to edges based on
interaction frequency, sentiment, or engagement level (e.g.,
heavier weight for direct messages or shares).
3. Community Detection Algorithms:
o Apply community detection techniques:
 Louvain or Leiden method: For modularity-based
community detection, optimizing the partitioning of the
graph to maximize internal node density.
 Spectral clustering: Use eigenvectors of the graph’s
Laplacian matrix to partition the graph.
 Label propagation: A fast, scalable algorithm to detect
communities based on label diffusion.
o Parameter tuning: Adjust parameters such as modularity
resolution or clustering thresholds to improve the accuracy of
community detection.
4. Analysis and Evaluation:
o Evaluate community quality: Measure the quality of detected
communities using metrics like modularity, conductance, or
silhouette scores.
o Visualize communities: Use network visualization tools (e.g.,
Gephi, NetworkX) to visually inspect community structures and
their interaction patterns.
5. Interpretation and Application:
o Profile communities: Analyze the demographic, behavioral, or
thematic characteristics of users in each community.
o Insights for engagement: Use the detected communities for
targeted marketing, content recommendations, or influencer
identification.
o Monitor trends: Track community evolution over time to
identify emerging topics or shifts in user behavior.

Conclusion: This workflow allows for the identification of hidden communities


in social media networks, providing valuable insights for personalized content,
marketing strategies, and trend analysis.
76 Explain the role of components and bridges in a network, providing insights into CO4 5
their effects on graph connectivity with practical illustrations.
Role of Components and Bridges in a Network
1. Components:
o Definition: A component in a graph is a subgraph in which all
nodes are connected, and there are no connections to nodes
outside the component. In an undirected graph, the entire network
can be decomposed into several disconnected components.
o Effect on Connectivity: The number of components indicates the
degree of fragmentation in a network. Fewer components
typically mean higher overall connectivity.
Example:
o In a social network, each connected group of friends or followers
is a component. If two distinct groups (components) are isolated
from each other, a bridge or edge between them would connect
them into one component.

2. Bridges:
o Definition: A bridge (or cut-edge) is an edge whose removal
would increase the number of components in the graph, i.e., it
disconnects the graph.
o Effect on Connectivity: Bridges are critical in maintaining
connectivity. The removal of a bridge can split a connected
network into two disconnected components, making the network
more vulnerable to fragmentation.
Example:
o In a communication network (like a telephone or internet
network), a bridge can represent a vital link between two regions.
If the bridge fails (e.g., a cable is cut), the network is split into
two disconnected parts, disrupting communication between those
areas.

Practical Illustration:
 Internet Backbone: A node represents a data center, and edges represent
high-speed connections. A bridge between two data centers ensures that
they are part of the same global network. If this bridge fails, it can
disrupt the entire flow of data between regions, dividing the internet into
smaller isolated networks.
 Social Networks: If a group of friends is highly connected within a
network, but the only way they communicate with another group is
through a single member (a bridge), losing this member would isolate
both groups.

Conclusion:
 Components indicate the overall fragmentation of a network, while
bridges are vital in maintaining the connectivity between different parts
of the network. Both are critical for understanding and optimizing
network structure and resilience.
77 Discuss the utilization of graph partitioning algorithms as a method for CO4 5
segmenting large-scale networks into smaller, computationally efficient
substructures, enhancing their manageability and analytical processing.
Graph partitioning algorithms are essential tools for dividing large-scale
networks into smaller, more manageable subgraphs, improving computational
efficiency and enabling more effective analysis.
Role of Graph Partitioning:
 Segmentation of Complex Networks: Large networks, like social media
platforms or biological networks, often consist of millions of nodes and
edges, making direct analysis computationally prohibitive. Partitioning
breaks the graph into smaller subgraphs, making them easier to handle.
 Improved Scalability: By partitioning a graph, we can process each
subgraph independently in parallel, significantly speeding up
computations, especially in large-scale data analytics and machine
learning tasks.
Types of Graph Partitioning Algorithms:
1. Spectral Partitioning:
o Method: Uses the eigenvalues and eigenvectors of the graph’s
Laplacian matrix to find optimal partitions.
o Effect: Helps to balance partition sizes while minimizing the
number of edges between them, ensuring efficient
communication between subgraphs.
o Use Case: Suitable for clustering in social network analysis and
community detection.
2. Modularity-Based Partitioning (e.g., Louvain, Leiden):
o Method: Maximizes modularity, a measure that quantifies the
density of edges within partitions relative to random graphs.
o Effect: Produces high-quality partitions by ensuring that most
connections remain within subgraphs.
o Use Case: Community detection in social networks or
biological networks (e.g., protein interactions).
3. Kernighan-Lin Algorithm (Min-Cut):
o Method: Minimizes the number of edges between subgraphs (cut
edges), aiming for balanced partitions.
o Effect: Produces balanced partitions but can be computationally
expensive for large graphs.
o Use Case: Used in parallel computing and circuit design.
Benefits:
 Enhanced Computational Efficiency: Partitioned networks allow for
parallel processing and reduce the computational load per subgraph.
 Improved Analytical Processing: Smaller subgraphs enable more
focused analysis, such as detecting communities or clusters and
identifying key nodes or patterns.
 Scalability: Graph partitioning enables the analysis of massive
networks that would otherwise be too large to process effectively.
Conclusion:
Graph partitioning algorithms enhance the manageability and efficiency of
analyzing large networks by breaking them into smaller, more tractable
substructures, thereby facilitating better performance, scalability, and insightful
data analysis.
78 Compare different graph partitioning algorithms, evaluating their advantages, CO4 10
limitations, and effectiveness in segmenting large-scale networks
Comparison of Graph Partitioning Algorithms
Graph partitioning algorithms are essential for dividing large-scale networks into
smaller, more manageable subgraphs. Each algorithm has specific advantages,
limitations, and suitability based on the network characteristics and analysis
requirements. Below is a comparison of popular graph partitioning algorithms.

1. Spectral Partitioning
Overview:
Spectral partitioning involves using the eigenvalues and eigenvectors of the
graph’s Laplacian matrix to find an optimal partition that minimizes the edge
cuts between subgraphs.
Advantages:
 Theoretical foundation: Based on spectral graph theory, which
guarantees optimal cuts in terms of minimizing edge cuts.
 Quality of partitions: Tends to produce balanced and meaningful
partitions that minimize inter-partition edges.
 Effective for community detection: Excellent at detecting clusters or
communities within the graph.
Limitations:
 Computational complexity: Eigenvalue decomposition is
computationally expensive, especially for large-scale graphs (e.g.,
𝑂(𝑛3 )).
 Scalability: Not suitable for very large graphs unless approximations
(e.g., Lanczos method) are used.
Use Case:
Used in applications like community detection in social networks and graph
clustering.

2. Louvain Method (Modularity-based)


Overview:
The Louvain algorithm is a heuristic method that maximizes modularity to find
partitions that have more edges within the subgraphs than expected in random
graphs.
Advantages:
 Fast and scalable: Works efficiently on large networks, often producing
high-quality results.
 Hierarchical structure: It can create multi-level partitions, useful for
hierarchical analysis.
 Widely used: Popular in social network analysis, biological networks,
and recommendation systems.
Limitations:
 Resolution limit: Louvain can struggle to detect small communities in
large networks, often merging smaller communities into larger ones.
 Heuristic nature: It may not always produce the globally optimal
solution, as it is a greedy algorithm.
Use Case:
Used extensively in social network analysis, biological network partitioning,
and community detection.

3. Leiden Algorithm (Modularity-based)


Overview:
An improvement over Louvain, the Leiden algorithm optimizes the modularity
optimization process and addresses the resolution limit problem by refining the
community structure.
Advantages:
 Improved accuracy: It resolves the resolution limit problem better than
Louvain by allowing finer community structures.
 Scalable and efficient: Faster convergence and improved handling of
large networks compared to Louvain.
 Better stability: More consistent results across multiple runs.
Limitations:
 Still heuristic: While improved, it’s still a heuristic method that may not
find the absolute optimal partition.
 Potential for high memory usage: Handling very large networks with
many communities can be memory-intensive.
Use Case:
Ideal for social media analysis, biological networks, and large-scale
community detection tasks.

4. Kernighan-Lin Algorithm (Min-Cut)


Overview:
This algorithm minimizes the number of edges that cross between partitions by
iteratively swapping nodes between partitions to reduce the "cut cost."
Advantages:
 Balanced partitions: Guarantees balanced partitions, which is critical
for certain applications like load balancing in parallel computing.
 Simple concept: Easy to understand and implement for small to
medium-sized graphs.
Limitations:
 Computationally expensive: 𝑂(𝑛 2 log 𝑛) , making it impractical for
large-scale networks.
 Local optima: It can get stuck in local optima and does not guarantee
global optimality.
 Limited scalability: Works well for small graphs but struggles with
large, real-world networks.
Use Case:
Used in parallel computing for load balancing and circuit partitioning in
hardware design.
5. Metis (Multilevel Recursive-bisection)
Overview:
Metis is a multilevel recursive-bisection algorithm that coarsens the graph to a
smaller size, partitions it, and then refines the partition as the graph is unfolded.
Advantages:
 Scalable: Extremely efficient for large-scale networks due to its
multilevel approach.
 Produces balanced partitions: Ensures nearly equal-sized partitions,
which is crucial for parallel applications.
 Well-suited for sparse graphs: Works particularly well on sparse
graphs with fewer edges.
Limitations:
 Resolution limit: Like Louvain, it may struggle with detecting fine-
grained community structures.
 Complex implementation: More difficult to implement and understand
compared to simpler algorithms like Louvain or Kernighan-Lin.
Use Case:
Widely used in scientific computing and parallel processing, such as finite
element methods and mesh generation.

Comparison Summary
Algorithm Advantages Limitations Use Case
Theoretically Computationally Community
Spectral
grounded, high- expensive for large detection, graph
Partitioning
quality partitions graphs clustering
Fast, scalable, Social network
Resolution limit,
Louvain hierarchical analysis, biological
heuristic nature
partitions networks
Large-scale
Improved accuracy, Still heuristic,
Leiden community
better stability memory-intensive
detection
Computationally
Kernighan- Balanced partitions, Parallel computing,
expensive, local
Lin simple concept circuit design
optima
Resolution limit, Scientific
Scalable, works well
Metis complex computing, load
for sparse graphs
implementation balancing

Conclusion:
 Louvain and Leiden are the most popular for community detection in
social and biological networks due to their scalability and ease of use,
with Leiden being a more refined version.
 Spectral partitioning is highly accurate but computationally intensive,
suitable for medium-sized graphs.
 Kernighan-Lin and Metis are best for applications requiring balanced
partitions, such as parallel computing and mesh generation, though
they face limitations with very large graphs.
79 Solve the following: CO4 10
A technology startup consists of 18 employees categorized into three teams
(Software Development, Marketing, and Operations). The work collaboration
network follows these principles:
- Each employee must collaborate with at least three members within their own
team.
- Every employee must maintain at least two inter-team collaborations.
- The CEO and senior managers act as key connectors, forming additional links
across all teams.
Construct the team collaboration network. Compute E-I.
Conclusion:
 E - I = -0.67 indicates that there are more internal collaborations (within
teams) than external collaborations (between teams), highlighting a
strong intra-team focus and a weaker inter-team connectivity despite the
CEO and senior managers' roles as connectors.
80 Solve the following: CO4 10
A research collaboration network consists of 25 researchers working across five
different disciplines. The collaboration rules are as follows:
- Each researcher must have at least two connections within their primary
discipline.
- Every researcher is required to establish at least three inter-disciplinary
collaborations.
- Senior researchers act as knowledge hubs, forming cross-disciplinary
connections beyond the minimum required ties.
Construct the network graph. Compute E-I.
81 Analyze the Random Graph Kernel, analyzing its mathematical formulation and CO4 10
demonstrating its application in network similarity measurement.
Random Graph Kernel: Analysis and Application (10 Marks)

1. Introduction to Graph Kernels


Graph kernels are functions that measure the similarity between graphs by
mapping them into a high-dimensional feature space and computing inner
products. The Random Graph Kernel (RGK) is a type of kernel designed to
compare graphs by analyzing the behavior of random walks on the graph
structures.

3. Intuition Behind the Kernel


 A walk in a graph is a sequence of connected nodes.
 The Random Walk Kernel identifies structural similarity by examining
how similarly nodes are connected across two graphs.
 The decay factor μ\mu ensures convergence and penalizes long, less
meaningful walks.

4. Application: Network Similarity Measurement


Example Use Case 1: Molecular Graph Comparison (Chemoinformatics)
 In drug discovery, molecules are represented as graphs (atoms as nodes,
bonds as edges).
 The Random Walk Kernel is used to compare molecules by matching
paths of bonded atoms.
 This helps predict if two compounds have similar biological activities.
Example Use Case 2: Social Network Analysis
 Nodes = users, Edges = interactions.
 To detect similar communities or user behavior across platforms, the
kernel compares interaction patterns in subgraphs.
 Helps in cross-platform identity resolution or community detection.

5. Advantages
 Captures deep structural similarity between graphs.
 Can be computed efficiently using matrix operations with the
Kronecker product.

6. Limitations
 Computational complexity: Kronecker products and matrix
exponentiation can be expensive for large graphs.
 Sensitive to noise in graph structure (e.g., spurious edges).

7. Improvements and Variants


To address computational challenges, variants such as:
 Truncated Random Walk Kernels (limit kk to a max length),
 Labelled Random Walk Kernels (use node/edge labels),
 Lazy Walks (probability of staying at the same node),
have been developed.

8. Conclusion
The Random Graph Kernel is a powerful tool for quantifying graph similarity
based on the structural behavior of random walks. Despite its computational
cost, it is widely used in domains such as bioinformatics, social network
analysis, and computer vision for its robust comparison of intricate graph
structures.
Would you like a diagram to visually illustrate this kernel's concept?

82 Evaluate the Weisfeiler-Lehman Graph Kernel in terms of computational CO4 10


complexity and efficiency in large-scale graph classification.
Evaluation of the Weisfeiler-Lehman Graph Kernel (10 Marks)

1. Introduction to the Weisfeiler-Lehman (WL) Graph Kernel


The Weisfeiler-Lehman (WL) graph kernel is an efficient and scalable graph
kernel based on the Weisfeiler-Lehman test of graph isomorphism. It is
particularly effective for large-scale graph classification because it captures
hierarchical neighborhood structures while remaining computationally tractable.

2. Working Principle of the WL Kernel


The WL kernel uses iterative neighborhood aggregation to update node labels
and compare graphs based on these updated label sequences.
WL Algorithm Steps:
1. Initialize: Each node is assigned an initial label (e.g., node degree or
category).
2. Relabel: At each iteration hh, concatenate the current label with the
multiset of neighboring labels.
3. Compress: Hash the resulting strings into new compact labels.
4. Feature Extraction: Count the occurrence of each label at every
iteration and form a feature vector.
5. Kernel Computation: Compute the inner product between feature
vectors of different graphs.

4. Scalability in Large-Scale Graph Classification


The WL kernel is highly scalable for large datasets (thousands to millions of
graphs), and is widely used in domains like:
 Chemoinformatics: Predicting molecular properties.
 Social network analysis: Classifying user interaction graphs.
 Knowledge graphs: Entity and relation prediction.
Real-world Example:
In drug discovery, each molecule is represented as a graph of atoms. WL
kernels can classify compounds based on substructure similarities efficiently
across large chemical databases like PubChem or ChEMBL.

5. Strengths
 Linear time complexity in graph size and number of iterations.
 Effective in capturing local graph structures.
 Works with both labeled and unlabeled graphs.
 Extensible: Variants like WL-Subtree, WL-Edge, and WL-ShortestPath
improve specificity.

6. Limitations
 May fail to distinguish structurally different graphs that are WL-
equivalent (i.e., indistinguishable by WL test).
 Focused on local structure, so may miss global topological patterns.
 Requires manual tuning of the number of iterations HH.

6. Comparison with Other Kernels


8. Conclusion
The Weisfeiler-Lehman graph kernel combines excellent computational
efficiency with strong classification performance on large-scale graphs. Its
linear scalability, along with its ability to capture rich neighborhood
information, makes it one of the most effective tools for modern graph
classification tasks.
83 Assess the impact of social network analysis on political campaigns, exploring CO4 10
how network structures influence voter behavior while addressing concerns
regarding misinformation and manipulation.
Impact of Social Network Analysis on Political Campaigns (Short Answer –
10 Marks)
1. Influence of Network Structures on Voter Behavior
Social Network Analysis (SNA) helps political campaigns identify key
influencers, opinion leaders, and tightly connected communities. By mapping
relationships among users, campaigns can:
 Target swing voters using micro-targeted messages.
 Activate influencers within communities to amplify campaign
messages.
 Track sentiment trends and adjust strategies in real-time.
For example, Barack Obama’s 2012 campaign used SNA to segment voters and
mobilize support through peer-to-peer influence.
2. Mobilization and Engagement
SNA reveals how information spreads through networks, helping campaigns
plan viral outreach and grassroots mobilization. It supports:
 Identification of high-centrality nodes (e.g., users with many
connections),
 Use of social contagion effects to maximize engagement,
 Coordinated digital activism across platforms.
3. Data-Driven Decision Making
Through platforms like Facebook and Twitter, campaigns analyze interaction
patterns to optimize messaging, timing, and platform-specific content. Graphs
showing interaction frequency help determine which demographics engage
with which issues.
4. Risks: Misinformation and Manipulation
The same SNA tools can also be exploited to spread fake news or manipulate
opinions through:
 Echo chambers that reinforce biases,
 Bot networks that simulate popularity,
 Cambridge Analytica-type targeting to psychologically profile voters.
These risks raise ethical and legal concerns about transparency, consent, and
electoral integrity.
5. Conclusion
While SNA offers powerful tools for enhancing political outreach and
understanding voter dynamics, it must be regulated responsibly to prevent
misuse. Ensuring transparency and ethical data use is crucial to maintaining
democratic processes.
84 Assume the role of a social media analyst responsible for detecting potential CO4 10
brand advocates within a social network. Develop a systematic approach that
integrates appropriate centrality measures and engagement strategies to
effectively identify and leverage these advocates for brand promotion.
Systematic Approach to Identifying and Leveraging Brand Advocates
(Role: Social Media Analyst – Short Answer, 10 Marks)

1. Define Objectives and Data Collection


 Goal: Identify users with high potential to become brand advocates.
 Data Sources: Collect interaction data from social media platforms
(likes, shares, comments, mentions, follower networks).
 Data Points: Focus on user activity, content relevance, sentiment, and
influence.

2. Build the Social Network Graph


 Nodes = Users
 Edges = Interactions (mentions, retweets, replies, etc.)
 Create a directed, weighted graph to reflect interaction frequency and
direction.

3. Apply Centrality Measures to Identify Influential Users


Centrality Measure Purpose Interpretation
Frequent interactions suggest
Degree Centrality Detect highly active users
engagement
Betweenness Identify connectors across Ideal for spreading messages
Centrality communities across clusters
Eigenvector Find users with influence over other
Reflects global influence
Centrality influential users
Closeness Select users with fast information
Good for viral marketing
Centrality spread potential
Focus on users who rank high in multiple centralities and consistently engage
with brand-related content.

4. Sentiment and Content Alignment Analysis


 Use NLP tools to assess:
o Sentiment of posts related to the brand (positive tone preferred),
o Topical alignment with brand themes (e.g., product reviews,
niche interests).
 Filter out users with inconsistent or negative sentiments despite high
influence.

5. Engagement Strategy for Brand Advocacy


 Outreach & Personalization: Engage top candidates through
personalized messages and exclusive content offers.
 Incentivization: Offer early product access, referral bonuses, or
ambassador programs.
 Amplification: Share their content on official pages, boosting visibility.
 Monitor Performance: Track how their engagement converts into
impressions, traffic, or sales using UTM links and social analytics.
6. Iterative Feedback and Refinement
 Continuously monitor advocate performance.
 Refine criteria based on conversion rates, engagement quality, and
network position shifts.

Conclusion
By combining centrality-based influence detection with content sentiment
analysis and targeted engagement strategies, a social media analyst can
systematically identify and activate brand advocates to drive organic and
authentic brand promotion.
85 Differentiate between directed and undirected graphs within the context of CO5 2
Gephi, highlighting their structural distinctions and implications for network
visualization and analysis.
Directed vs. Undirected Graphs in Gephi (Short Answer – 2 Marks)
In Gephi, a directed graph consists of edges with a specific direction (from
source to target), representing asymmetric relationships such as "follows",
"retweets", or "cites". An undirected graph, on the other hand, features edges
without direction, implying mutual or bidirectional relationships, like
"friends", "co-authorship", or "collaborations".
Structural Distinctions:
 Directed edges are visualized with arrows in Gephi, showing flow.
 Undirected edges are simple lines, reflecting equality in connection.
Implications:
 Directed graphs are used to analyze influence, information flow, and
hierarchy.
 Undirected graphs are suitable for studying cohesion, clustering, and
mutual connectivity.
Thus, the choice of graph type in Gephi depends on the nature of the
relationship being modeled and significantly affects centrality metrics and
community detection.
86 Enumerate three significant applications of Gephi, illustrating its utility in CO5 2
diverse fields such as social network analysis, biological networks, and
information dissemination modeling.
Three Significant Applications of Gephi (Short Answer – 2 Marks)
1. Social Network Analysis:
Gephi helps visualize and analyze relationships on platforms like Twitter
or Facebook, identifying key influencers, community clusters, and
engagement patterns (e.g., during elections or viral campaigns).
2. Biological Networks:
In systems biology, Gephi is used to model protein-protein interaction
networks or gene regulation pathways, revealing critical nodes and
functional modules in complex biological systems.
3. Information Dissemination Modeling:
Gephi supports the simulation of information spread in communication
networks, allowing researchers to understand diffusion patterns, detect
bottlenecks, and design effective outreach strategies (e.g., during public
health campaigns).
These use cases show Gephi’s versatility across disciplines for both structural
insight and decision-making.
87 Identify and explain three key advantages of using Gephi as a network analysis CO5 2
tool compared to alternative software.
1. User-Friendly Interface: Gephi offers an intuitive graphical interface
that makes it easy to visualize and manipulate network graphs without
extensive coding knowledge.
2. Real-Time Visualization: It allows dynamic, real-time interaction with
large networks, making it ideal for exploratory data analysis.
3. Extensive Plugins and Layouts: Gephi supports various built-in layouts
and plugins for advanced network analysis and customization.
88 Examine the limitations of Gephi in handling large, complex network structures. CO5 2
1. Performance Issues: Gephi can struggle with very large or highly
complex networks, leading to slow performance or crashes due to high
memory usage.
2. Limited Scripting and Automation: Unlike tools like NetworkX, Gephi
lacks strong support for scripting and automation, making repetitive or
large-scale analyses harder to manage.
89 List alternative tools used for network analysis apart from Gephi, explaining CO5 2
their comparative advantages.
1. Cytoscape: Better suited for biological network analysis with strong
support for molecular and genetic data, and a rich set of plugins.
2. NetworkX (Python Library): Ideal for complex and large-scale
network analysis with full scripting capabilities, offering greater
flexibility and automation than Gephi.
90 Justify the significance of network analysis tools in extracting insights from CO5 2
large-scale data.
Network analysis tools are essential for identifying patterns, relationships, and
key influencers within large-scale data, which traditional analysis methods may
miss. They help uncover hidden structures, such as communities or central
nodes, enabling better decision-making in fields like social media, biology, and
cybersecurity.
91 Explain node and edge structures in Gephi, and their significance in network CO5 2
visualization.
In Gephi, nodes represent entities (e.g., people, websites) and edges represent
relationships or interactions between them (e.g., friendships, links). Their
structure helps visualize the network’s topology, showing how entities are
connected, identifying central nodes, and revealing patterns like clusters or hubs.
92 Elaborate on the fundamental concepts of nodes and edges in network analysis, CO5 5
detailing their representation and visualization within Gephi.
In network analysis, nodes (also called vertices) represent individual entities
such as people, organizations, or web pages, while edges (or links) represent the
relationships or interactions between these entities, such as communication,
collaboration, or hyperlinks. In Gephi, nodes are visualized as points and edges
as lines connecting them. Gephi allows customization of node size, color, and
labels based on attributes like centrality or category, and edges can vary in
thickness or color to show weight or type of relationship. This visual
representation helps identify patterns such as clusters, key influencers, and
overall network structure.
93 Discuss how Gephi facilitates interactive network exploration and analysis, CO5 5
highlighting key interactive features that enhance data interpretation and
visualization.
Gephi facilitates interactive network exploration through a range of dynamic
features. Users can zoom, pan, and filter networks in real time, making it easy to
focus on specific nodes or communities. The layout algorithms (like
ForceAtlas2) allow users to rearrange nodes visually to reveal structures and
patterns. The data laboratory provides a spreadsheet-like view to edit node and
edge data directly. Dynamic filtering helps isolate parts of the network based on
attributes such as degree or centrality. Additionally, real-time metrics (e.g.,
modularity, betweenness) allow users to analyze structural properties
interactively, enhancing insight and interpretation.
94 Examine the management of node and edge attributes in Gephi, providing CO5 5
illustrative examples of how these attributes influence network analysis.
In Gephi, node and edge attributes are managed through the Data Laboratory,
where users can add, edit, and import data like labels, weights, colors, and
categories. These attributes influence how the network is analyzed and
visualized. For example, a node's degree (number of connections) can be used
to size nodes—larger nodes represent more influential entities. A node attribute
like 'group' can be used to color-code communities. For edges, a weight
attribute might represent interaction frequency, affecting edge thickness. These
visual cues help identify patterns such as central hubs, clusters, or strong
relationships.
95 Differentiate between global and local network properties, offering examples of CO5 5
each and analyzing their relevance in understanding network structure and
dynamics.
Global network properties describe the overall structure and behavior of the
entire network, while local network properties focus on individual nodes or
small groups of nodes.
 Global example: Average path length – shows the typical number of
steps between any two nodes, useful for understanding how quickly
information spreads.
 Local example: Node degree – the number of connections a node has,
indicating its influence or centrality within its immediate surroundings.
These properties help in understanding both the macro-level structure (e.g.,
how interconnected the network is) and micro-level roles (e.g., key influencers
or bridges), aiding in tasks like optimization, targeting, or vulnerability analysis.
96 Define the principles of assortativity and disassortativity in network analysis, CO5 5
explaining measurement techniques and the insights they provide regarding node
relationships.
Assortativity refers to the tendency of nodes in a network to connect with
similar nodes, while disassortativity indicates a preference for connecting with
dissimilar nodes. This is commonly measured using the assortativity
coefficient, which ranges from -1 to +1:
 A positive value (e.g., +0.6) indicates assortativity—nodes connect with
others of similar degree (e.g., high-degree nodes with other high-degree
nodes).
 A negative value (e.g., -0.4) shows disassortativity—high-degree nodes
connect with low-degree nodes.
These patterns offer insights into network structure: assortative networks (like
social networks) suggest strong community or peer-group formation, while
disassortative networks (like the internet) reveal hierarchical or hub-and-spoke
structures.
97 Analyze the concept of network density and its role in determining connectivity CO5 5
and clustering in graphs.
Network density measures the proportion of possible connections in a network
that are actual connections. It is calculated as the ratio of the number of edges
present to the number of edges possible, with values ranging from 0 (no
connections) to 1 (complete connectivity).
Density plays a key role in:
1. Connectivity: High density indicates a well-connected network, where
nodes are likely to be directly or indirectly reachable from one another.
Low density suggests sparse connectivity and isolated nodes.
2. Clustering: Dense networks tend to form tightly-knit clusters or
communities, as the higher number of edges increases the likelihood of
connections between neighboring nodes.
In practice, high density networks often represent cohesive groups (e.g., tightly
connected social groups), while low density networks may indicate decentralized
structures (e.g., the web or large-scale organizational networks).
98 Analyze the concept of network density in graph theory, explaining its CO5 5
correlation with the number of nodes and edges and its implications for network
connectivity.
In graph theory, network density is a measure of how many edges exist in a
network relative to the maximum possible number of edges. It is calculated
using the formula:

Where:
 E is the number of edges,
 N is the number of nodes.
As the number of nodes (N) increases, the number of possible edges grows
exponentially, which means that a network with many nodes needs a
significantly higher number of edges to maintain the same density. High density
implies that most nodes are connected, indicating strong connectivity and
potential for clustering. Low density, conversely, suggests a more sparse or
loosely connected network, with isolated nodes or clusters.
In practical terms:
 High density networks often show tight-knit communities or efficient
communication pathways.
 Low density networks may indicate decentralized structures, common in
large-scale systems or networks with many isolated or weakly connected
components.
99 Explain modularity in network analysis and discuss its role in detecting CO5 10
community structures using Gephi.
Modularity in network analysis is a measure of the strength of division of a
network into modules or communities. It quantifies the degree to which nodes
within a community are more densely connected to each other than to nodes in
other communities. Modularity values range from -1 (poor division) to 1 (perfect
division), with higher values indicating stronger community structure.
Role in Detecting Community Structures in Gephi:
1. Community Detection: Gephi uses modularity-based algorithms, like
the Louvain method, to automatically detect communities by
maximizing modularity. Communities are groups of nodes that are more
interconnected internally than with the rest of the network.
2. Visualization: Communities are visually represented with distinct colors
and clustered nodes in Gephi, making it easier to interpret the structure
and identify natural groupings in the network.
3. Improved Analysis: Modularity helps reveal hidden patterns and
subgroups in complex networks (e.g., social networks, research
collaboration networks), which might be overlooked using simpler
metrics.
By focusing on modularity, Gephi enables deeper insights into how networks are
organized, aiding in tasks like identifying key groups or understanding network
flow and resilience.
100 Explain the concept of modularity in network analysis, discussing its CO5 10
significance in detecting community structures and its implementation in Gephi.
Modularity in network analysis is a measure that quantifies the strength of
division of a network into communities or modules. It calculates how well the
network can be divided into subgroups of nodes, where nodes within the same
community are more densely connected to each other than to nodes in other
communities. The modularity score ranges from -1 to 1, where higher values
(close to 1) indicate strong community structures.
Significance in Detecting Community Structures:
1. Community Identification: Modularity helps identify clusters or groups
of nodes that are highly interconnected, revealing the hidden structure of
a network.
2. Network Understanding: It highlights important groups within large
networks, such as tightly-knit social circles, research collaborations, or
functional groups in biological networks.
3. Dynamic Analysis: It can be used to detect evolving communities over
time in dynamic networks.
Implementation in Gephi:
 Gephi uses modularity-based algorithms like the Louvain method for
community detection. This method works by iteratively optimizing the
modularity score, assigning nodes to communities that maximize internal
connections while minimizing external connections.
 Once communities are detected, Gephi visually represents them with
different colors, helping users quickly identify and analyze the structure
of the network.
Gephi’s modularity feature is critical in understanding the organization of
complex networks, offering insights into network flow, stability, and group
dynamics.
101 Elaborate on Impact of Social Network Analysis (SNA) on Public Health Crisis CO5 10
Management.
Social Network Analysis (SNA) has a significant impact on Public Health
Crisis Management by offering insights into the relationships and interactions
between individuals, groups, or organizations during health crises. Its
applications include:
1. Tracking Disease Spread: SNA helps track how infectious diseases
spread within communities by identifying key individuals (e.g.,
"superspreaders") and the flow of information or infection through social
networks.
2. Identifying Vulnerable Populations: It highlights groups or individuals
with limited social connections, who may be more vulnerable to health
risks due to isolation, providing targeted intervention strategies.
3. Optimizing Resource Distribution: By analyzing communication and
interaction networks, SNA helps optimize the allocation of resources
(e.g., vaccines, medical supplies) to areas or individuals most at risk.
4. Improving Communication and Information Flow: SNA identifies
key nodes (influencers) in the network, enabling more efficient
communication strategies during crises, ensuring accurate health
information reaches the right audiences.
5. Coordinating Responses: By analyzing the structure of health
organizations or response teams, SNA improves coordination, identifying
bottlenecks and streamlining collaboration across multiple agencies.
In public health crises such as epidemics, pandemics, or natural disasters, SNA
aids in understanding human behavior, predicting outcomes, and enhancing
response efforts.
102 Elaborate on Impact of Social Network Analysis (SNA) in Political Campaigns. CO5 10
Social Network Analysis (SNA) plays a crucial role in Political Campaigns by
providing insights into the structure and dynamics of voter networks, influencing
political strategies and decision-making. Here are key ways SNA impacts
political campaigns:
1. Voter Segmentation and Targeting: SNA helps identify key
influencers, groups, and communities within the electorate. By analyzing
relationships between voters, campaigns can tailor messages to specific
segments, ensuring a more personalized and effective approach.
2. Identifying Opinion Leaders: In political campaigns, certain
individuals within a network, often referred to as opinion leaders or
influencers, have a greater impact on shaping public opinion. SNA can
pinpoint these individuals, allowing campaigns to focus on them for
spreading messages and encouraging grassroots support.
3. Tracking Information Flow: SNA helps track how information flows
within a population, revealing potential misinformation spread or
identifying networks through which political messages reach voters. This
allows campaigns to leverage efficient channels for communication,
whether traditional or digital.
4. Analyzing Voter Behavior: By examining social connections and
interactions among voters, SNA uncovers patterns in how people form
opinions, shift allegiances, or react to political events. This analysis helps
predict voting behavior, enabling campaigns to adapt strategies and
tactics in real-time.
5. Optimizing Campaign Resources: SNA helps determine which
geographic areas or voter groups have the most interconnected and active
networks. Campaigns can then concentrate resources, such as advertising
and volunteer efforts, in these regions or demographics, improving
outreach efficiency.
6. Social Media Strategy: With the rise of social media, SNA is vital in
political campaigns for analyzing online interactions, identifying key
social media influencers, and understanding how viral content spreads.
This helps campaigns craft targeted digital marketing strategies to engage
voters and amplify messages.
7. Crisis Management: During political campaigns, negative publicity or
scandals can spread quickly. SNA enables campaigns to monitor public
sentiment and the spread of information across networks, allowing for
swift response and mitigation of damage. Understanding the flow of
information can guide the campaign in countering false narratives or
shaping public opinion.
8. Coalition Building: Political campaigns often seek to build broad-based
coalitions across different demographic or interest groups. SNA can map
out existing relationships and find potential allies or groups with shared
interests, helping strategists to form effective alliances and partnerships.
9. Debate and Discourse Analysis: SNA can be used to analyze political
debates, speeches, and public discourse to assess how opinions are
shaped and identify trends in voter concerns. This data helps refine
campaign messaging to address pressing issues directly.
10. Voter Mobilization and Engagement: By analyzing how voters are
connected and mobilized within social networks, campaigns can design
strategies to increase voter turnout. Targeting highly influential nodes or
key groups within the network can encourage more people to vote,
particularly in swing states or undecided voter blocs.
In summary, Social Network Analysis in political campaigns enhances voter
targeting, resource optimization, and the ability to understand and influence
public opinion. Its power lies in its ability to map and analyze relationships and
information flows, making political strategies more data-driven and effective.
103 Examine the role of social network analysis in cybersecurity and fraud detection, CO5 10
proposing effective methodologies.
Social Network Analysis (SNA) plays a crucial role in cybersecurity and
fraud detection by analyzing relationships and patterns of interactions in digital
systems to identify suspicious activities or vulnerabilities. Below are key
methodologies and their roles in these domains:
1. Anomaly Detection: SNA helps detect unusual patterns of behavior
within a network. In cybersecurity, anomalies such as sudden, unusual
communication between seemingly unrelated entities may indicate a
cyber attack (e.g., DDoS, insider threats). For fraud detection, irregular
transaction flows or new connections in financial networks can point to
money laundering or fraudulent activity.
2. Identifying Fraudulent Networks: In financial systems, SNA helps
identify fraudulent actors by mapping transaction patterns. Money mules
or colluding fraudsters often form tightly-knit subgroups, detectable
through community detection algorithms. These groups exhibit
abnormal levels of internal connectivity compared to the rest of the
network.
3. Key Node Detection: SNA can identify central or influential nodes
within a network (e.g., brokers in a fraud ring or the command center in a
cyberattack). Identifying these key nodes enables targeted intervention,
such as blocking accounts or isolating compromised systems, preventing
further damage.
4. Link Prediction and Risk Assessment: SNA methodologies like link
prediction can anticipate potential future connections between entities
(users, systems) based on existing patterns. In cybersecurity, this can
help predict and preemptively block potential attack paths. In fraud
detection, it can predict the likelihood of new fraudulent transactions
between accounts, allowing for proactive measures.
5. Behavioral Profiling: By analyzing communication and transaction
patterns, SNA can help create behavioral profiles for normal and
malicious actors. Any deviation from established profiles can raise alerts,
helping cybersecurity systems spot phishing attacks, account
takeovers, or unauthorized access attempts.
6. Collaborative Detection: SNA can be used to track cross-organizational
cyber threats or fraud networks. Cross-institutional collaboration can
enhance detection by sharing insights from different organizations’
networks, making it easier to identify patterns of fraud or coordinated
cyber attacks across multiple targets.
7. Network Visualization: SNA’s ability to visualize connections makes it
easier to identify hidden relationships and suspicious activity. In fraud
detection, visualizing financial transactions or communication flows can
highlight clusters of fraudulent activity, while in cybersecurity,
visualizing connections between compromised systems can help pinpoint
the source of an attack.
8. Decentralized Threats: In cybersecurity, decentralized attacks (e.g.,
botnets, peer-to-peer attacks) are common. SNA can identify patterns of
distributed coordination among botnets or other decentralized threats,
helping to dismantle them by isolating compromised nodes or identifying
the “command and control” centers.
9. Social Engineering Detection: Fraudsters often use social engineering
tactics to manipulate individuals into revealing sensitive information.
SNA can analyze patterns in social media or email interactions,
identifying networks of individuals who may be targeted by phishing
attacks or social manipulation.
10. Real-Time Monitoring: Using SNA for real-time analysis of data from
logs, transactions, or communication, cybersecurity teams can identify
malicious activities as they occur, responding quickly to mitigate
potential threats. In fraud detection, real-time analysis of transactional
behavior allows rapid identification and prevention of fraudulent
transactions.
Effective Methodologies:
 Community Detection Algorithms: To find subgroups within larger
networks that may indicate fraud rings or internal collusion.
 Graph Theory Metrics: Such as centrality (to find important nodes)
and clustering coefficient (to find tightly-knit groups), used to detect
suspicious activities or vulnerabilities.
 Dynamic Network Analysis: For continuously monitoring evolving
network structures in real time, spotting sudden changes or emerging
threats.
In conclusion, SNA is an effective tool in cybersecurity and fraud detection,
enabling the identification of hidden relationships, anomalous behavior, and
central figures within a network, providing actionable insights for timely
interventions.
104 Elaborate on Social Network Analysis in Cybersecurity and Fraud Detection CO5 10
Social Network Analysis (SNA) is a powerful tool in cybersecurity and fraud
detection as it helps identify patterns of relationships, behaviors, and
interactions within a network. Here’s how it plays a crucial role:
1. Identifying Suspicious Connections: In cybersecurity, SNA helps
detect unusual patterns of connections between users, devices, or
systems, often identifying malicious activity like botnets or insider
threats. For fraud detection, it reveals hidden links between accounts or
transactions, potentially pointing to fraudulent networks.
2. Anomaly Detection: SNA helps identify anomalies in network
interactions. Unusual connections or abnormal communication patterns
(such as a surge in transactions or interactions between previously
unconnected entities) can be flagged for further investigation, potentially
indicating cyber-attacks or fraudulent activities.
3. Fraud Ring Detection: By analyzing financial transactions or social
interactions, SNA can identify clusters or tightly-knit groups of
individuals who may be engaged in coordinated fraudulent activities,
such as money laundering or insurance fraud. Detecting these groups
helps dismantle fraud operations.
4. Key Node Identification: SNA identifies influential or central nodes in a
network. In the context of cybersecurity, these nodes could represent
critical infrastructure or attack points, while in fraud detection, they
could point to individuals or accounts coordinating fraudulent schemes.
Isolating or monitoring these key nodes can mitigate damage.
5. Behavioral Profiling: SNA allows the creation of normal behavior
profiles based on network interactions. Any deviation from these patterns
(e.g., sudden changes in user activity or transaction volume) can be
flagged as suspicious, helping to detect phishing attacks, unauthorized
access, or identity theft in real-time.
6. Tracking Information Flow: In cybersecurity, SNA is used to track
how malware spreads or how an attack propagates through a network,
helping to contain and neutralize the threat. For fraud, it helps identify
how fraudulent information or transactions spread through a network,
enabling swift intervention.
7. Social Engineering Detection: Fraudsters often manipulate individuals
through social engineering. SNA can be used to detect patterns of
communication that are typical in phishing scams, fake job offers, or
other manipulative tactics, allowing organizations to intercept fraud
attempts.
8. Collaborative Detection: SNA enhances collaboration between different
organizations or agencies to detect cross-entity fraud or cyber threats.
Sharing insights about relationships and activities in a network can lead
to faster identification of coordinated attacks or fraud rings.
9. Real-Time Analysis: SNA techniques can be applied in real-time to
monitor network traffic or financial transactions. This enables immediate
detection of suspicious activities, allowing for rapid response and
prevention of cyber-attacks or fraudulent transactions.
10. Visualizing Networks: SNA provides visualizations of network
connections, making it easier to spot patterns such as isolated nodes,
centralized hubs, or tight clusters of activity. These visual representations
are essential for understanding the structure of cyber threats or fraud
networks and for developing targeted defense strategies.
In summary, Social Network Analysis provides valuable tools for both
cybersecurity and fraud detection by analyzing the relationships, behaviors,
and structures within networks. It enhances the ability to detect anomalies,
identify key players, and understand the flow of malicious activity, improving
the overall effectiveness of security measures and fraud prevention strategies.
105 Define the concept of social identity, explaining its role in network structures. CO5 10
Social identity refers to the way individuals define themselves based on their
membership in social groups, such as family, culture, religion, profession, or
community. It is shaped by shared values, norms, and experiences, and it
influences how individuals interact with others within and outside their group.
In network structures, social identity plays a crucial role in determining the
connections and dynamics between individuals or nodes. Here's how it impacts
network structures:
1. Group Formation: Social identity helps individuals form tight-knit
communities or subgroups within larger networks, based on shared
values, interests, or goals. This leads to the emergence of clusters or
communities in networks, often with strong internal ties and weaker
external connections.
2. Social Influence: The social identity of individuals affects their
susceptibility to influence and their role in spreading information within
networks. People tend to trust and follow those who share similar social
identities, which shapes the flow of ideas and behaviors through a
network.
3. Segmentation: Social identity can lead to network segmentation, where
different groups within the network form separate clusters. These
divisions may lead to stronger bonding within groups and weaker
connections between groups, impacting collaboration or information
exchange across the network.
4. Conflict or Cooperation: Social identity can foster either conflict or
cooperation in network structures. Groups with opposing identities may
compete for resources or influence, while groups with shared identities
tend to collaborate more effectively, leading to stronger, more cohesive
subgroups.
5. Community Detection: In network analysis, social identity can be a key
factor in community detection algorithms, which identify clusters of
nodes that are more connected to each other than to the rest of the
network. Social identity influences the formation of these communities,
guiding the detection of group-based structures.
6. Identity and Centrality: Nodes with strong social identity ties may
become more central in the network, playing roles as influencers or
leaders who help connect disparate groups or act as bridges between
clusters. These central nodes may hold disproportionate power or
influence within the network.
7. Cultural Exchange and Innovation: Social identity can impact how
information, culture, or innovation spreads through a network. People
with shared identities are more likely to share ideas, collaborate, or
innovate within their group, leading to rapid cultural exchange and the
creation of novel solutions within the network.
8. Network Stability: Social identity contributes to the stability of
networks. Groups with strong social identities may have more resilient
connections, as individuals are motivated to stay engaged and loyal to the
group, which can help stabilize the network against disruptions or
external challenges.
In summary, social identity is a fundamental concept in network analysis,
influencing how groups form, communicate, and interact within a larger network
structure. It shapes network dynamics, fosters cooperation or conflict, and plays
a central role in information flow, community detection, and the overall
connectivity of the network.
106 What is Social Identity? Provide a brief definition. CO6 2
Social identity is the part of an individual’s self-concept that is derived from
their membership in social groups, such as family, culture, religion, or
profession. It shapes how individuals perceive themselves and interact with
others based on shared values, norms, and experiences within those groups.
107 What is Social Affiliation? Explain its meaning. CO6 2
Social affiliation refers to the connection or association an individual has with a
particular group, community, or social network. It represents the sense of
belonging or alignment with others who share common interests, values, or
identities, influencing interactions, behaviors, and relationships within that
group.
108 List some key applications of Social Media Mining. CO6 2
Key applications of Social Media Mining include:
1. Sentiment Analysis: Analyzing public opinions, emotions, or attitudes
towards brands, products, or events from social media content.
2. Trend Detection: Identifying emerging trends, topics, or viral content to
guide marketing strategies or public relations efforts.
3. Influencer Identification: Finding key influencers in a specific domain
or industry to enhance marketing campaigns or brand outreach.
4. Social Network Analysis: Understanding relationships and interactions
between users to improve customer engagement or detect communities.
5. Customer Feedback and Service: Mining social media data to gather
insights on customer satisfaction and improve products or services.
109 Analyze the significance of graph coloring in network analysis and optimization CO6 2
problems.
Graph coloring is significant in network analysis and optimization problems as
it involves assigning colors to the vertices of a graph in such a way that no two
adjacent vertices share the same color. Key applications include:
1. Conflict Resolution: In scheduling problems, graph coloring helps avoid
conflicts (e.g., assigning resources or timeslots to tasks where conflicts
must be minimized).
2. Network Optimization: It is used to optimize resource allocation in
networks, such as minimizing interference in wireless communication or
optimizing frequency assignments.
Overall, graph coloring aids in efficiently solving problems related to resource
allocation, scheduling, and network optimization.
110 Explain the concept of Graph Coloring in network analysis. CO6 2
Graph coloring in network analysis refers to the assignment of labels (or
"colors") to the vertices of a graph such that no two adjacent vertices share the
same color. The objective is to minimize the number of colors used while
ensuring that adjacent nodes have distinct colors. This concept is widely used in
problems like:
1. Scheduling: Assigning resources or time slots in such a way that no
conflicts occur.
2. Frequency Assignment: Ensuring that adjacent communication
channels do not interfere with each other by assigning different
frequencies.
Graph coloring helps in solving optimization problems by reducing resource
usage and avoiding conflicts.
111 Define Information Diffusion and explain its significance in network science. CO6 2
Information diffusion refers to the process by which information spreads
through a network, typically from one node (individual, organization, etc.) to
others. It models how knowledge, ideas, or behaviors propagate across
connected entities in a social or communication network.
Significance in network science:
1. Understanding Spread: Information diffusion helps understand how
trends, innovations, or diseases spread in social networks, aiding in
targeted interventions or marketing strategies.
2. Optimizing Influence: It is crucial for identifying key influencers in a
network to optimize information dissemination and influence decisions
effectively.
112 What challenges arise when analyzing large-scale dynamic social networks? CO6 2
When analyzing large-scale dynamic social networks, several challenges arise:
1. Scalability: Handling massive amounts of data and maintaining
performance while processing networks with millions of nodes and edges
over time can be computationally intensive and complex.
2. Data Continuity and Change: In dynamic networks, relationships and
nodes continuously evolve, making it difficult to track changes, ensure
data consistency, and analyze the network's real-time dynamics
effectively.
113 Describe the Graph Coloring Problem and its significance in network analysis. CO6 5
The Graph Coloring Problem involves assigning colors to the vertices of a
graph such that no two adjacent vertices share the same color, with the goal of
using the fewest number of colors possible. This problem is significant in
network analysis for several reasons:
1. Conflict Minimization: In applications like scheduling or resource
allocation, graph coloring ensures that adjacent tasks or resources do not
conflict. For example, in wireless networks, assigning frequencies
(colors) to transmitters ensures there’s no interference between adjacent
nodes.
2. Optimization: By minimizing the number of colors, the graph coloring
problem helps optimize resource usage, reducing costs and improving
efficiency in systems like frequency assignments or task scheduling.
3. Network Design: In network topology, graph coloring helps in designing
efficient networks by ensuring that neighboring nodes (e.g., routers or
switches) do not interfere with each other, improving performance and
reducing operational issues.
4. Real-World Applications: Graph coloring is used in various fields,
including map coloring, job scheduling, register allocation in compilers,
and even in game theory to minimize conflicts in competitive situations.
5. Computational Complexity: The problem is NP-hard, meaning that it’s
computationally difficult to solve for large graphs. This highlights the
importance of heuristics and approximation algorithms for practical
solutions in network analysis.
In summary, the graph coloring problem is essential in optimizing network
resources, minimizing conflicts, and ensuring efficient operations in dynamic
and complex networks.
114 Explain the Diffusion Process in Social Networks and its role in information CO6 5
spread.
The diffusion process in social networks refers to the spread of information,
behaviors, or influence from one individual (node) to others through the
network's connections (edges). It is a fundamental concept for understanding
how ideas, trends, innovations, or even diseases propagate across social systems.
Here's an explanation of its role in information spread:
1. Propagation Mechanism: Information typically diffuses in social
networks when one individual shares it with their direct connections.
These connections, in turn, share it with their own contacts, creating a
chain reaction. The process can be influenced by various factors such as
the strength of ties (close friends vs. acquaintances) or the type of
information (viral content vs. word-of-mouth).
2. Types of Diffusion Models: Several models describe how information
spreads in social networks. For example:
o Independent Cascade Model: Each active node (influencer) has
a probability of activating its neighbors in subsequent steps.
o Linear Threshold Model: Nodes are activated when a threshold
of influence from their neighbors is exceeded.
These models help in understanding how information spreads
under different circumstances.
3. Role of Network Structure: The structure of the network heavily
impacts how efficiently information diffuses. Networks with high
centrality (key influential nodes) or high clustering (close-knit
communities) can facilitate faster and more widespread diffusion.
Networks with strong bridges (connections between isolated groups) are
also critical in ensuring that information crosses between subgroups.
4. Viral Marketing: In business and marketing, understanding the
diffusion process allows companies to use social media and influencers
strategically to spread information about new products or services.
Targeting key influencers within the network can accelerate the spread of
marketing messages and generate viral campaigns.
5. Impact of Social Influence: Social influence plays a critical role in the
diffusion process. People are more likely to adopt new ideas, behaviors,
or products if they see others within their social network doing the same.
This creates a feedback loop where early adopters influence others,
which in turn leads to broader adoption.
In summary, the diffusion process in social networks describes how information
spreads across individuals and communities, influencing behaviors, trends, and
decision-making. By understanding this process, businesses, policymakers, and
researchers can design strategies to facilitate or control the flow of information
in various contexts, from marketing to public health campaigns.
115 Provide a detailed explanation of Information and Biological Networks, CO6 5
highlighting their key characteristics and applications.
Information and Biological Networks are two distinct types of networks, each
with unique characteristics and applications. Below is a detailed explanation of
both:
1. Information Networks:
Key Characteristics:
 Nodes: Represent entities such as users, websites, documents, or
communication devices.
 Edges: Represent the relationships or interactions between nodes, such
as hyperlinks between websites, email exchanges, or social connections.
 Directed or Undirected: Information networks can be either directed
(e.g., Twitter follows) or undirected (e.g., co-authorship networks).
 Dynamic Nature: Information networks constantly evolve, with nodes
and edges changing as new information is created, shared, or updated.
 Data-Driven: The flow of information through these networks is based
on data interactions, which can be analyzed for patterns and trends.
Applications:
 Social Media Networks: Analyzing the spread of information,
sentiments, or trends through platforms like Facebook, Twitter, and
Instagram.
 Recommendation Systems: Online services (e.g., Netflix, Amazon) use
information networks to suggest products, movies, or music based on
user preferences and interactions.
 Search Engines: Google and other search engines use information
networks (e.g., links between web pages) to rank search results based on
relevance.
 Cybersecurity: Identifying malicious activities or vulnerabilities within
communication networks by monitoring traffic patterns and user
behavior.

2. Biological Networks:
Key Characteristics:
 Nodes: Represent biological entities such as genes, proteins, cells, or
metabolic pathways.
 Edges: Represent interactions or relationships between these biological
entities, such as gene-protein interactions or metabolic pathways.
 Complexity: Biological networks are highly complex, with multiple
interconnected layers (e.g., genetic networks, protein interaction
networks, ecological food webs).
 Network Types: They can be metabolic networks, protein-protein
interaction (PPI) networks, or gene regulatory networks, each
representing a different biological process.
 Dynamic Nature: Like information networks, biological networks are
dynamic, changing in response to environmental factors, disease states,
or genetic variations.
Applications:
 Disease Modeling: Understanding how diseases (like cancer) spread at
the molecular level by analyzing the interactions between genes,
proteins, and other biological entities. This helps identify biomarkers and
therapeutic targets.
 Drug Discovery: In pharmaceutical research, biological networks are
used to identify potential drug targets by understanding how proteins or
genes interact within the network.
 Genomics: Analyzing gene expression data through networks to uncover
relationships between genes and their roles in development, disease, or
cellular processes.
 Ecosystem Modeling: Studying ecological food webs or the interactions
between species to understand biodiversity, ecosystem dynamics, and
environmental impacts.

Conclusion:
Both information networks and biological networks are essential in
understanding complex systems. While information networks help with data-
driven insights, communication, and social interactions, biological networks are
crucial for understanding life at the molecular level, offering insights into health,
disease, and ecology. Their applications range from digital media analysis to
groundbreaking medical discoveries, showing their widespread impact across
multiple domains.
116 Describe Social Learning Networks (SLN) and discuss their fundamental CO6 5
characteristics.
Social Learning Networks (SLN) are systems where individuals learn from
each other through interactions, sharing knowledge, experiences, and resources
within a social network. These networks are based on the principles of social
learning theory, which emphasizes learning through observation, imitation, and
modeling behavior in a social context.
Fundamental Characteristics:
1. Knowledge Sharing: SLNs facilitate the exchange of information, ideas,
and expertise among members, fostering collaborative learning and
problem-solving.
2. Peer Influence: Learning occurs through social influence, where
individuals learn from observing the behaviors, experiences, or
knowledge of others in the network.
3. Collaboration: These networks encourage collaboration and collective
intelligence, enabling members to co-create solutions and enhance their
learning experiences through group dynamics.
4. Dynamism: SLNs evolve over time as individuals interact, contribute,
and learn from each other, adapting to new information or shifts in the
network structure.
5. Connectivity: The effectiveness of SLNs relies on the network's
structure, where central nodes (influencers or experts) play a significant
role in spreading knowledge and guiding learning.
In summary, Social Learning Networks are important for fostering collective
learning, innovation, and collaboration by leveraging social interactions and the
sharing of knowledge.
117 Discuss how can the Graph Coloring Problem be applied to optimize scheduling CO6 5
in universities.
The Graph Coloring Problem can be effectively applied to optimize
scheduling in universities, particularly for tasks like assigning courses to
timeslots, classrooms, and instructors. Here's how it works:
1. Course Scheduling: Each course is represented as a node in a graph, and
an edge is drawn between two nodes if the corresponding courses have
conflicting elements (e.g., the same instructor or shared students). By
coloring the graph, different colors represent distinct timeslots or
classrooms, ensuring that courses with conflicts are assigned different
slots or rooms.
2. Instructor Assignment: Instructors can be assigned to specific timeslots
based on the graph coloring. If two courses share the same instructor,
they must not be scheduled at the same time. The graph coloring
algorithm helps minimize scheduling conflicts by ensuring no two
courses requiring the same instructor are assigned the same color
(timeslot).
3. Classroom Allocation: By treating classrooms as colors, the graph
coloring problem helps optimize the use of available rooms. If two
courses have overlapping student populations, they should not be
scheduled in the same room. The coloring algorithm assigns different
rooms to courses with shared students.
4. Optimization: The goal is to minimize the number of colors (timeslots,
classrooms, or instructors) used, which leads to efficient resource
allocation, reducing the need for additional rooms or timeslots and
preventing scheduling conflicts.
5. Flexibility and Adaptation: Graph coloring can also accommodate
changes in course offerings or student enrollments, allowing universities
to quickly adjust schedules while maintaining optimal use of resources.
In summary, the Graph Coloring Problem is a powerful tool in university
scheduling, ensuring efficient allocation of time, space, and resources while
minimizing conflicts.
118 Differentiate between static and dynamic networks, discussing their structural CO6 5
implications.
Aspect Static Networks Dynamic Networks
Networks with fixed nodes Networks where nodes and edges
Definition
and edges over time. change over time.
Remains unchanged; no new
Continuously evolving with
Structure connections or nodes are
changing connections and nodes.
added.
Time No time dependency;
Time-dependent; analyzed at
Dependenc analysis is based on a single
multiple time intervals.
y snapshot.
Simpler to analyze due to a More complex due to changing
Complexity
fixed structure. structure and dynamics.
Used in social networks,
Suitable for static systems
Application communication systems, and
like transport networks,
s biological networks where change is
organizational structures.
constant.
119 Discuss the key challenges involved in analyzing dynamic social networks as CO6 5
opposed to static networks.
Analyzing dynamic social networks presents several challenges compared to
static networks:
1. Time-Dependent Data: Dynamic networks evolve over time, with nodes
and edges changing frequently. Analyzing such data requires tracking
temporal changes and modeling the network's evolution, making it more
complex than analyzing a fixed structure.
2. Data Volume and Complexity: Dynamic networks generate large
volumes of data as interactions between nodes change over time. This
high-frequency data poses storage, processing, and computational
challenges.
3. Network Stability: Dynamic networks may experience rapid
fluctuations or instability in structure, making it difficult to identify long-
term patterns or trends and complicating predictive analysis.
4. Real-Time Analysis: Unlike static networks, dynamic networks require
real-time monitoring and analysis to capture ongoing changes, which
increases the need for advanced algorithms and tools.
5. Community Detection: In dynamic networks, communities can form,
dissolve, or shift over time, making it challenging to detect and track
communities, as compared to static networks where community
structures are more stable.
These challenges require specialized techniques and algorithms to handle time-
varying interactions, large-scale data, and evolving network structures.
120 Discuss Ethics in Social Network Analysis with example. CO6 10
Ethics in Social Network Analysis (SNA) is crucial for ensuring the
responsible and respectful use of data, particularly when it involves personal or
sensitive information. The ethical considerations in SNA revolve around
privacy, consent, data usage, and the potential impact of analysis on individuals
and communities. Here's a breakdown of key ethical aspects with examples:
1. Privacy and Confidentiality:
 Concern: Social network analysis often involves collecting and
analyzing data from individuals' online interactions, which can include
personal information and behaviors.
 Example: A researcher analyzing Twitter data must ensure that personal
identifiers are anonymized to protect users’ privacy, especially when
analyzing sensitive topics like mental health or political opinions.
2. Informed Consent:
 Concern: Participants in social network studies must be fully informed
about how their data will be used and must voluntarily consent to its
collection.
 Example: In a study involving online communities, researchers should
obtain explicit consent from users before extracting their interaction data,
ensuring transparency about the purpose of the research and data sharing.
3. Data Security:
 Concern: Ensuring the secure storage and handling of collected data is
critical to protect against data breaches or misuse.
 Example: If an organization collects data from a social media platform
for analysis, they must implement strong security measures (e.g.,
encryption) to prevent unauthorized access to sensitive information.
4. Impact on Participants:
 Concern: Social network analysis can lead to unintended consequences
for individuals, such as reputation damage, social exclusion, or
stigmatization.
 Example: Analyzing online social networks to identify "influencers"
could lead to the unintentional exposure of users who may not want to be
highlighted, affecting their privacy or personal life.
5. Bias and Fairness:
 Concern: Social network analysis models can unintentionally perpetuate
bias if the data used is skewed or does not represent all groups fairly.
 Example: If an SNA model is used for hiring recommendations based on
professional networks, it may unintentionally favor individuals from
certain social or demographic groups, leading to discriminatory
outcomes.
6. Use of Data for Manipulative Purposes:
 Concern: Social network analysis can be used for manipulative or
harmful purposes, such as targeting vulnerable individuals with
misleading information or exploiting social behaviors.
 Example: Political campaigns or marketers may misuse social network
analysis to target individuals with personalized content, exploiting their
psychological vulnerabilities (e.g., micro-targeting with misleading ads).
7. Transparency and Accountability:
 Concern: Ethical SNA requires transparency in methodology, ensuring
that research processes and data sources are clear to the public or
participants.
 Example: A researcher publishing an SNA study on online
misinformation should provide clear information on how the data was
collected, analyzed, and the ethical guidelines followed.
Conclusion:
Ethics in Social Network Analysis is essential to ensure that the collection, use,
and interpretation of data do not harm individuals or communities. Researchers
and organizations must adhere to privacy standards, seek informed consent,
ensure transparency, and strive for fairness to avoid exploiting or causing
negative consequences for participants. Addressing these ethical concerns
ensures that social network analysis contributes positively to society without
infringing on individual rights.
121 Discuss on Privacy in online social networks CO6 10
Privacy in online social networks is a critical concern as users share vast
amounts of personal information through platforms like Facebook, Twitter,
Instagram, and LinkedIn. These networks can expose sensitive data to a wider
audience, creating both opportunities and risks. Here’s a detailed discussion on
privacy issues in online social networks:
1. Personal Information Exposure:
 Concern: Users often unknowingly or unknowingly share personal
details such as their location, relationship status, interests, and even daily
activities.
 Example: A user posts about a vacation, revealing their absence from
home, which can be exploited by malicious actors.
2. Data Mining and Profiling:
 Concern: Social media companies often mine user data to build detailed
profiles for advertising and other commercial purposes, which may
infringe on privacy.
 Example: Ads are targeted based on users’ likes, shares, and
interactions, sometimes even before they realize the data is being
collected.
3. Third-Party Access:
 Concern: Many social media platforms share user data with third parties,
such as advertisers, marketers, and other businesses, often without
explicit consent from users.
 Example: The Facebook-Cambridge Analytica scandal revealed how
personal data from millions of users was exploited for political targeting
without consent.
4. Informed Consent and Control:
 Concern: Users may not be fully informed about the extent of the data
being collected or the implications of sharing their data on social
platforms.
 Example: Social media privacy settings are often complex and not
always user-friendly, leading many users to unknowingly expose
personal data.
5. Cybersecurity Risks:
 Concern: Online social networks are prime targets for cyberattacks and
data breaches, which can lead to the exposure of users' private
information.
 Example: High-profile data breaches like those on LinkedIn or Twitter
compromise users' personal data, including email addresses, phone
numbers, and even passwords.
6. Privacy Violations by Apps:
 Concern: Many third-party apps connected to social networks collect
user data without adequate protection or transparency, leading to privacy
risks.
 Example: Some apps may access users' contacts, photos, or location
without their informed consent, exploiting this data for commercial
purposes.
7. Anonymity and Pseudonymity:
 Concern: Users may believe they can remain anonymous online, but
often their data can still be traced back to them through sophisticated
tracking methods.
 Example: Even using a pseudonym on platforms like Twitter may not
guarantee privacy, as data analytics can still uncover real identities
through interactions or cross-referencing.
8. Privacy Regulations:
 Concern: The lack of uniform privacy regulations across regions leaves
users vulnerable to privacy violations.
 Example: The European Union’s GDPR (General Data Protection
Regulation) provides strong privacy protections, but users in regions
without similar regulations may lack such safeguards.
9. User Control and Permissions:
 Concern: Users often have limited control over how their data is used,
shared, or stored by social media platforms, which may impact their
privacy.
 Example: Even with privacy settings, platforms like Facebook
sometimes change their policies or settings, leading users to
unknowingly share information they previously kept private.
10. Impact on Mental Health:
 Concern: Privacy violations and the pressure of managing one’s online
persona can negatively affect users' mental health, particularly when their
data is misused or exploited.
 Example: Instances of cyberbullying, online harassment, or unwanted
exposure to personal information can cause significant emotional
distress.
Conclusion:
Privacy in online social networks is a complex issue that requires constant
attention from both users and platforms. Users must be aware of how their data
is being used, and social networks should prioritize transparency, user control,
and data protection to safeguard privacy. Robust privacy policies and the
implementation of stringent security measures are essential in building trust and
ensuring that personal information is protected in an increasingly interconnected
digital world.
122 Suppose you are studying the evolution of online friendships in a social CO6 10
networking site over a year. Design a methodology to capture and analyze
temporal changes in the network structure.
To study the evolution of online friendships on a social networking site over a
year, the methodology should focus on capturing temporal changes in the
network structure, including the dynamics of friendships, interactions, and
structural shifts. Here's a designed methodology:
1. Data Collection:
 Timeline: Gather data at multiple time intervals (e.g., monthly,
quarterly) to track changes over the year.
 Data Points: Capture key data such as user ID, friendship relationships
(edges), timestamps of friend requests, acceptance, and interaction data
(messages, likes, comments).
 Platform API: Utilize the platform's API (e.g., Twitter API, Facebook
Graph API) to extract data on user relationships, interactions, and
demographic details.
 Metadata: Collect metadata such as user activity, frequency of
interactions, and changes in profiles (e.g., new interests, location
updates).
2. Data Cleaning and Preprocessing:
 Handling Missing Data: Address any missing information regarding
friendship status or interactions by using interpolation or imputation
methods if applicable.
 Normalization: Ensure consistency in data formats (e.g., timestamps,
user identifiers) and eliminate irrelevant data (spam accounts or non-
active users).
3. Network Representation:
 Graph Construction: Represent the network as an undirected graph
where nodes represent users, and edges represent friendships or
interactions.
 Dynamic Graphs: Construct temporal graphs at each time interval,
updating the edges (friendships) and adding new nodes or edges based on
changes in relationships.
4. Temporal Analysis:
 Edge Evolution: Track the creation, deletion, or modification of edges
over time (e.g., friendships forming or dissolving).
 Graph Metrics: Calculate key network metrics at each time interval,
such as:
o Degree Distribution: Changes in the number of connections each
user has.
o Clustering Coefficient: Measure of how users are clustered
within the network over time.
o Network Density: Changes in overall connectivity within the
network.
 Community Detection: Use algorithms like Louvain or Girvan-Newman
to identify evolving communities or subgroups within the network.
5. Statistical Analysis:
 Trend Analysis: Perform statistical tests (e.g., Pearson’s correlation,
ANOVA) to detect significant trends in friendship formation, changes in
network density, or interaction frequency over time.
 Growth Modeling: Use growth models (e.g., exponential or logistic
growth) to analyze how the network size and density change over the
year.
6. Visualization:
 Temporal Visualizations: Use tools like Gephi or NetworkX to create
animated visualizations showing how the network evolves over time,
highlighting key moments like the formation of new communities or the
dissolution of friendships.
 Heatmaps: Generate heatmaps to visualize user activity and interaction
patterns at different times.
7. Interpretation:
 Behavioral Insights: Analyze how users’ behaviors (e.g., post
frequency, interaction types) influence friendship dynamics.
 Community Evolution: Explore how communities form and evolve,
including the emergence of new subgroups or changes in existing ones.
 Social Influence: Study how external factors (e.g., events, trends)
influence friendship dynamics and network structure.
Conclusion:
By capturing temporal data at regular intervals and applying network analysis
techniques, this methodology provides insights into the evolution of online
friendships, allowing for the study of dynamic social interactions, community
growth, and changes in network structures over time.
123 Imagine you are tasked with analyzing the spread of a viral marketing campaign CO6 10
in a dynamic social network. Describe your approach, including data collection,
analysis techniques, and key metrics to track.
To analyze the spread of a viral marketing campaign in a dynamic social
network, the approach must capture the temporal flow of information, identify
influential users, and measure campaign impact. Here's a structured
methodology:

1. Data Collection
 Platform APIs: Use social media APIs (e.g., Twitter, Instagram) to
collect data on shares, likes, retweets, comments, mentions, hashtags
related to the campaign.
 Timestamps: Record when users interacted with campaign content to
track diffusion over time.
 User Metadata: Collect user profile data (followers, interests, location)
to understand audience reach.
 Network Structure: Capture friend/follow relationships to construct the
social graph.

2. Network Construction
 Dynamic Graphs: Represent the network as a time-evolving graph
where:
o Nodes = users.
o Edges = interactions (e.g., retweets, mentions).
o Temporal Layers = snapshots of the network at different time
intervals (e.g., daily, hourly).

3. Analysis Techniques
 Diffusion Modeling: Apply models like SIR (Susceptible-Infected-
Recovered) or IC (Independent Cascade) to simulate and analyze how
the message spreads.
 Community Detection: Use algorithms like Louvain to detect
communities and analyze campaign spread within and across them.
 Influencer Identification: Use centrality measures (degree,
betweenness, eigenvector) to identify key users driving the spread.

4. Key Metrics to Track


 Reach: Number of unique users who saw the campaign.
 Engagement Rate: Likes, shares, comments per user.
 Spread Speed: Time taken for the campaign to reach specific user
counts or network regions.
 Viral Coefficient: Average number of new users each participant brings
in.
 Adoption Curve: Visualize how fast users engaged over time (early
adopters vs late adopters).
 Cascade Size and Depth: Size = total number of reshares; Depth =
longest chain of shares.

5. Visualization
 Temporal Network Animation: Show spread of the campaign over
time.
 Heatmaps: Visualize engagement levels across regions or demographics.
 Cascade Trees: Illustrate how the message propagated through different
user paths.
6. Interpretation & Reporting
 Identify trends in user behavior (e.g., peak sharing times, top
influencers).
 Evaluate campaign effectiveness by comparing predicted vs actual
spread.
 Recommend improvements for targeting, timing, and content based on
network response.

Conclusion
This approach combines dynamic network modeling, diffusion analysis, and
strategic metric tracking to offer a comprehensive understanding of how a viral
marketing campaign propagates and what drives its success.
124 Propose a methodology for analyzing the evolution of online communities in a CO6 10
dynamic social network. Outline the steps involved, including data collection,
preprocessing, analysis techniques, and interpretation of results.
Methodology for Analyzing the Evolution of Online Communities in a
Dynamic Social Network
(Short Answer – 10 Marks)
To analyze the evolution of online communities, the methodology must capture
changes in community structure over time. Below is a step-by-step approach:

1. Data Collection
 Source: Use APIs from platforms like Reddit, Twitter, or Facebook to
collect user interaction data (e.g., posts, comments, retweets, mentions).
 Data Types: Capture user IDs, interaction timestamps, content metadata,
and relationship data (followers/friends).
 Time Windowing: Organize data into discrete time intervals (e.g.,
weekly or monthly snapshots) to observe temporal changes.

2. Data Preprocessing
 Cleaning: Remove bots, spam accounts, and irrelevant interactions.
 Normalization: Standardize user IDs, timestamps, and interaction types.
 Edge Creation: Convert interactions into edges (e.g., comment/reply →
directed edge).
 Snapshot Construction: Create dynamic graphs for each time interval.

3. Network Construction
 Nodes: Represent users.
 Edges: Represent interactions (weighted if needed).
 Dynamic Graph: Combine all time-based snapshots to form a time-
evolving network.

4. Community Detection
 Algorithms: Apply modularity-based methods (e.g., Louvain, Label
Propagation) on each time-snapshot.
 Tracking Evolution: Use techniques like community matching or
evolution graphs to track merges, splits, births, and deaths of
communities.

5. Analysis Techniques
 Community Metrics: Size, density, cohesion, and modularity over time.
 User Roles: Identify core users, bridges, and influencers within
communities.
 Churn Analysis: Track users entering and leaving communities.
 Stability: Evaluate persistence of communities across time intervals.

6. Visualization
 Use tools like Gephi, Cytoscape, or NetworkX to:
o Create dynamic community maps.
o Animate changes in structure.
o Highlight community interactions and overlaps.

7. Interpretation of Results
 Community Growth Patterns: Identify when and why communities
expand or shrink.
 Trigger Events: Correlate structural changes with real-world or online
events (e.g., trending topics).
 User Influence: Understand how certain users affect community
formation or disruption.
 Health of Communities: Measure engagement, longevity, and
fragmentation.

Conclusion
This methodology enables a detailed understanding of how online communities
form, evolve, and dissolve over time, providing valuable insights into user
behavior, group dynamics, and the impact of events on social cohesion.
125 Discuss different algorithms used to solve the Graph Coloring Problem and their CO6 10
real-world applications.
Graph Coloring Algorithms and Real-World Applications
(Short Answer – 10 Marks)
The Graph Coloring Problem involves assigning colors to the vertices of a
graph such that no two adjacent vertices share the same color. This is a
fundamental problem in computer science with numerous practical applications.
Below are key algorithms and their real-world uses:

1. Greedy Coloring Algorithm


 Approach: Assigns the lowest possible color to each vertex in a specific
order.
 Pros: Simple and fast for small or sparse graphs.
 Limitation: May not yield the minimum number of colors (non-optimal).
 Application: Register allocation in compilers.

2. Backtracking Algorithm
 Approach: Tries all possible color combinations recursively and
backtracks upon conflict.
 Pros: Finds optimal solutions.
 Limitation: Time-consuming for large graphs.
 Application: Timetable and exam scheduling in universities.

3. DSATUR (Degree of Saturation) Algorithm


 Approach: Chooses the vertex with the highest number of differently
colored neighbors.
 Pros: More intelligent than greedy, good for dense graphs.
 Application: Frequency assignment in mobile networks.

4. Welsh-Powell Algorithm
 Approach: Sorts vertices by decreasing degree and colors them
sequentially.
 Pros: Efficient and often requires fewer colors than greedy.
 Application: Task scheduling in parallel processing.

5. Genetic Algorithms
 Approach: Uses evolutionary techniques like mutation and crossover to
evolve colorings.
 Pros: Handles large and complex graphs well.
 Application: Optimization problems in transportation and logistics.

6. Tabu Search
 Approach: Iterative local search using memory structures to avoid
cycles.
 Pros: Efficient for large-scale graphs.
 Application: Course scheduling and project resource allocation.

7. Simulated Annealing
 Approach: Probabilistic technique that explores solutions and accepts
worse ones to escape local optima.
 Pros: Good balance between solution quality and performance.
 Application: VLSI design and map coloring.

Conclusion
Different graph coloring algorithms offer trade-offs between accuracy and
efficiency. They are crucial in real-world applications such as scheduling,
register allocation, frequency assignment, and resource optimization,
making them indispensable tools in solving complex combinatorial problems.
126 Discuss the computational challenges in processing large-scale dynamic social CO6 10
network data.
Computational Challenges in Processing Large-Scale Dynamic Social
Network Data
(Short Answer – 10 Marks)
Analyzing large-scale dynamic social networks involves complex and resource-
intensive tasks. Key computational challenges include:

1. Scalability
 Issue: Social networks consist of millions of nodes and edges.
 Challenge: Algorithms must handle high memory and processing
demands efficiently.
 Example: Running community detection or shortest path algorithms on
Twitter-sized datasets.

2. Temporal Complexity
 Issue: Dynamic networks change over time (nodes/edges appear or
disappear).
 Challenge: Need for time-aware models and maintaining historical
states.
 Example: Tracking influence propagation or community evolution over
months or years.

3. Real-Time Processing
 Issue: Applications like fraud detection and recommendation systems
require immediate insights.
 Challenge: Continuous ingestion and real-time computation are difficult
at scale.
 Example: Detecting misinformation spread in real time on social media.

4. Data Heterogeneity
 Issue: Social data includes diverse formats—text, images, interactions,
location.
 Challenge: Integrating and analyzing multi-modal data increases
complexity.
 Example: Combining tweet content with retweet patterns and user
location data.

5. Storage and Management


 Issue: Massive volumes of structured and unstructured data.
 Challenge: Efficient storage, retrieval, and versioning of dynamic
snapshots.
 Example: Storing network states at daily intervals for a year.

6. Algorithm Adaptability
 Issue: Many traditional graph algorithms are not designed for dynamic
data.
 Challenge: Need for incremental or streaming versions of algorithms.
 Example: Incremental PageRank or modularity updates instead of
recomputing from scratch.

7. Noise and Uncertainty


 Issue: Social data often contains bots, fake accounts, or missing links.
 Challenge: Ensuring robustness and reliability of analysis.
 Example: Filtering out bot-generated interactions from true user
behavior.

8. Privacy and Security


 Issue: Handling sensitive user data in analysis.
 Challenge: Balancing analytical depth with data protection laws and
ethics.
 Example: Anonymizing user identities while maintaining network
structure for study.

Conclusion
Processing large-scale dynamic social network data is computationally intensive
due to its volume, velocity, and complexity. Scalable, adaptive, and privacy-
aware algorithms and systems are essential for effective real-world analysis.

You might also like