York University EECS 4101/5101
Homework Assignment #7
November 1, 2024
Answers
Answer 1: What is Stored in the RBT
The Red-Black Tree (RBT) stores:
• Left child pointer
• Right child pointer
• Parent pointer
• Color of the node
• Data, which in this case is a pair (x, y)
Answer 2: Key for Sorting the RBT
Here, we use x as the key for sorting the RBT. Using x as the key allows us to
efficiently find values with x < a, which is necessary for calculating functions like
the mean and standard deviation of y-values with x-values less than a threshold.
Answer 3: Fields Added to Nodes (Augmented Fields)
We add three augmented fields to each node:
• Subtree size: The total number of nodes in the subtree rooted at x.
• Sum of y-values: The sum of all y-values in the subtree rooted at x.
• Sum of squared y-values: The sum of the squares of all y-values in the
subtree rooted at x.
These augmented fields enable efficient calculations of mean and standard de-
viation within any range of x-values.
1
Answer 4: Computing the Additional Fields Using Node
Information
Each node’s augmented fields are computed using its own value and the values
from its left and right children. Here’s how each field is computed:
1. Subtree Size
The subtree size of a node is 1 (itself) plus the sizes of its left and right
subtrees:
node.size = 1+(node.left.size if node.left else 0)+(node.right.size if node.right else 0)
2. Sum of y-values
The sum of y-values in the subtree is the y-value of the node plus the sum
of y-values in its left and right subtrees:
node.sum y = node.y+(node.left.sum y if node.left else 0)+(node.right.sum y if node.right else 0)
3. Sum of Squared y-values
The sum of squared y-values is the square of the node’s y-value plus the
sums of squared y-values in its left and right subtrees:
node.sum y squared = node.y2 +(node.left.sum y squared if node.left else 0)+(node.right.sum y squa
Whenever a node is inserted, deleted, or when the RBT undergoes rota-
tions, these fields are updated by propagating changes up to the root to ensure
the augmented values remain accurate. Each of these updates is efficient and
operates in O(log n) due to the balanced nature of the Red-Black Tree.
Answer 5: Implementing the Operations
Insert(x, y): Insert the node (x, y) into the RBT following standard insertion
rules. Update size, sum y, and sum y squared up the tree, from the
inserted node up to the root.
Time Complexity: O(log n)
Delete(x, y): Find and delete the node with key x and value y using standard
RBT deletion rules. Update size, sum y, and sum y squared from the
parent of the deleted node up to the root.
Time Complexity: O(log n)
Search(x, y): Traverse the RBT using x as the key to find the node and check
if y matches the searched value.
Time Complexity: O(log n)
2
Mean(a)
function Mean(a):
total_y = 0
total_count = 0
node = root
while node is not NULL:
if node.x < a:
if node.left is not NULL:
total_y += node.left.sum_y
total_count += node.left.size
total_y += node.y
total_count += 1
node = node.right
else:
node = node.left
if total_count == 0:
return 0
return total_y / total_count
Explanation: Traverse the tree, summing y-values and counting nodes with
x < a. For nodes where x < a, add the left subtree’s sum y and size (if it exists)
to total y and total count, then add the current node’s y-value. Return the
mean y-value as total y/total count, or 0 if no nodes meet the condition.
Time Complexity: O(log n)
SD(a)
function SD(a):
total_y = 0
total_y_squared = 0
total_count = 0
node = root
while node is not NULL:
if node.x < a:
if node.left is not NULL:
total_y += node.left.sum_y
total_y_squared += node.left.sum_y_squared
total_count += node.left.size
total_y += node.y
total_y_squared += node.y^2
total_count += 1
node = node.right
else:
3
node = node.left
if total_count == 0:
return 0
mean_y = total_y / total_count
variance = (total_y_squared / total_count) - (mean_y * mean_y)
return sqrt(variance)
Explanation: This function traverses the tree, summing y-values, squared
y-values, and counting nodes with x < a. For each node and its left subtree (if
x < a), add sum y and sum y squared values to total y and total y squared,
total y
respectively. Calculate the mean y-value as mean y = total count and the variance
as total y squared
total count − (mean y)2
. Return the square root of the variance as the
standard deviation.
Time Complexity: O(log n)
Summary of Worst-Case Time Complexities
• Insert: O(log n)
• Delete: O(log n)
• Search: O(log n)
• Mean: O(log n)
• SD: O(log n)
This implementation leverages the augmented fields in each node, ensuring
efficient computation of Mean(a) and SD(a) by traversing a single path in the
tree.