AmberFlux Interns Screening Test 6
Please read this first:
1. This test has 7 sections and a total of 20 questions. you are NOT required
to answer all, answer any 12 questions.
2. You can refer to any source (internet, any GenAI tool, published material, etc.) for
answering the questions. However, you should not ask any other person (friend or
relative, etc) to answer the question for you.
3. For the test allocate 3 hours and try to complete it within 3 hours. We expect you to
be honest about the time you have taken as we are not monitoring how much time
you take.
4. Remember we expect our interns to be honest, transparent, and truthful about their
work and we trust people. Should you get selected, and your quality of work does not
match with how you have performed in the test and interview, your internship may
be terminated. Therefore, we highly recommend that you be sincere in your
approach.
Part 1 – Python Coding
Q1. Explain the differences between __getattribute__ and __getattr__ methods in Python
classes.
__getattribute__ vs __getattr__
__getattribute__
When Called: Every time any attribute is accessed.
Purpose: Overrides standard attribute access, logging, or restricting access.
Risk: Recursive calls can lead to infinite loops; use super().__getattribute__(name) for
safe access.
__getattr__
When Called: Only when the attribute is missing.
Purpose: Provides dynamic or default values for undefined attributes.
Risk: Lower risk of recursion; triggered only if attribute doesn’t exist.
Comparison at a Glance
Feature __getattribute__ __getattr__
When Called Every time any attribute is accessed Only when attribute is missing
Feature __getattribute__ __getattr__
General access control, logging, Dynamic, default, or computed
Usage Scope
debugging attributes
Risk of Infinite Higher (especially if self.attr is used Lower; only triggered when
Loop within the method) attribute is missing
Typical Use Logging all access, restricting attribute Handling undefined attributes
Cases access gracefully
Invocation Runs before __getattr__ in the lookup Acts as fallback when attribute
Sequence order not found
Example: Using Both in a Class
When both __getattribute__ and __getattr__ are defined, Python first calls __getattribute__. If
__getattribute__ does not find the attribute and raises an AttributeError, then __getattr__ is
called.
class Product:
def __getattribute__(self, name):
print(f"__getattribute__ checking '{name}'")
try:
return super().__getattribute__(name)
except AttributeError:
print(f"'{name}' not found in __getattribute__, delegating to __getattr__")
raise
def __getattr__(self, name):
print(f"'{name}' handled by __getattr__ as a fallback")
return f"Default value for '{name}'"
item = Product()
item.name = "Laptop"
print(item.name) # Accesses existing attribute via __getattribute__
print(item.price) # Fallback to __getattr__ since 'price' isn’t defined
In Summary:
Use __getattribute__ for overarching access control and attribute behavior.
Use __getattr__ as a graceful fallback for handling or generating non-existent
attributes.
These methods give Python classes powerful, flexible control over attribute access, allowing
dynamic behaviors and improved debugging or logging capabilities.
Q2. Implement a context manager to handle database transactions (e.g., commit, rollback)
using Python's contextlib module.
Here's an example implementation of a context manager using Python's contextlib module
that handles database transactions:
import sqlite3
from contextlib import contextmanager
@contextmanager
def transaction(dbname):
conn = sqlite3.connect(dbname)
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
In this implementation:
- `dbname` is the name of the SQLite database file.
- `conn = sqlite3.connect(dbname)` creates a connection to the database.
- The `try` block contains the code that uses the database connection. The `yield`
statement passes the connection object to the caller.
- If the `try` block raises an exception, the `except` block rolls back the transaction and re-
raises the exception.
- If the `try` block completes successfully, the `finally` block commits the transaction and
closes the connection.
Q3. Write a function to compute the least common multiple (LCM) of two numbers using the
Greatest Common Divisor (GCD) method.
import math
def lcm(a, b):
# Calculate GCD using math.gcd
gcd_value = math.gcd(a, b)
# Calculate LCM using the relationship with GCD
lcm_value = abs(a * b) // gcd_value
return lcm_value
# Example usage
num1 = 12
num2 = 15
result = lcm(num1, num2)
print(f"The LCM of {num1} and {num2} is {result}")
Part 2 – MongoDB
Q4. 1. Implement a MongoDB aggregation pipeline to calculate the top 10 products by sales
revenue, considering product variants and discounts.
db.products.aggregate([
$unwind: "$variants" // Step 1: Break down the 'variants' array, creating a separate
document for each variant},
{
$project: { // Step 2: Select fields and calculate revenue for each variant with discounts
applied
product: 1,
revenue: {
$multiply: [
"$variants.price", // Base price of the variant
"$variants.soldQuantity", // Quantity sold of this variant
{ $subtract: [1, "$variants.discount"] } // Calculate revenue after applying discount (1
- discount rate) ]
},
$group: { // Step 3: Sum up the total revenue for each product across its variants
_id: "$product",
totalRevenue: { $sum: "$revenue" }
},
$sort: { totalRevenue: -1 } // Step 4: Sort products by total revenue in descending order
},
$limit: 10 // Step 5: Limit the result to the top 10 products by sales revenue
]);
Explanation of Steps:
1. $unwind: Splits the variants array so each variant becomes a separate document,
making it easier to calculate individual revenue.
2. $project: Calculates revenue for each variant using the formula price * soldQuantity
* (1 - discount).
3. $group: Groups documents by product to calculate totalRevenue by summing up the
revenue of each variant.
4. $sort: Sorts the results in descending order by totalRevenue to list the products with
the highest revenue first.
5. $limit: Restricts the results to only the top 10 products based on totalRevenue.
Q5. Describe the differences between MongoDB's WiredTiger and MMAPv1 storage engines.
How would you choose between them for a high-traffic application?
WiredTiger vs. MMAPv1: A Comparative Analysis
MongoDB offers two primary storage engines: WiredTiger and MMAPv1. Each engine has its
own strengths and weaknesses, making the choice dependent on specific application
requirements.
MMAPv1
Simple: MMAPv1 is a simpler storage engine, easier to implement and maintain.
Read-heavy workloads: It's well-suited for read-heavy workloads, especially when
data is accessed sequentially.
Lower memory footprint: MMAPv1 typically requires less memory overhead.
WiredTiger
High performance: WiredTiger is generally faster, especially for write-heavy and
mixed workloads.
Advanced features: It offers features like compression, encryption, and journaling,
enhancing data integrity and security.
Scalability: WiredTiger is better suited for large-scale deployments and high-traffic
applications.
Flexible schema: It allows for dynamic schema changes, making it more adaptable
to evolving data models.
Choosing the Right Storage Engine
When choosing between WiredTiger and MMAPv1 for a high-traffic application, consider the
following factors:
1. Workload:
o Read-heavy: MMAPv1 can be a good choice due to its simplicity and
performance for sequential reads.
o Write-heavy or mixed: WiredTiger is generally the better choice, especially
for high-write workloads and large datasets.
2. Data Volume:
o Small to medium datasets: MMAPv1 might be sufficient.
o Large datasets: WiredTiger's advanced features and scalability make it more
suitable.
3. Data Integrity and Security:
o High data integrity requirements: WiredTiger's journaling and
checkpointing mechanisms provide stronger data durability.
o Encryption: WiredTiger supports encryption, which is essential for sensitive
data.
4. Performance:
o High-performance requirements: WiredTiger's optimized algorithms and
data structures often deliver superior performance.
5. Flexibility:
o Evolving data model: WiredTiger's flexible schema allows for dynamic
changes, making it more adaptable.
In conclusion, for most high-traffic applications, WiredTiger is the recommended choice
due to its superior performance, advanced features, and scalability. However, for specific use
cases with read-heavy workloads and simpler data models, MMAPv1 might still be a viable
option. It's crucial to carefully evaluate your application's specific needs to make the best
decision
Part 3 – REST API
Q6. Implementing Authentication and Authorization in a REST API Using JSON Web
Tokens (JWT) and Role-Based Access Control (RBAC)
To implement secure authentication and authorization in a REST API, JSON Web Tokens (JWT)
are often used for stateless authentication. JWTs are tokens that contain encoded JSON
objects, typically including user identity information and access roles. Here’s a
comprehensive approach:
1. JWT Authentication Flow:
o User Login: Upon login, the API verifies user credentials and, if valid,
generates a JWT containing claims like user ID, roles, and expiration time.
o Token Generation: Use a secret key (or private key for asymmetric
encryption) to sign the JWT. The token structure includes a header, payload,
and signature.
o Token Response: The JWT is returned to the client, usually in the
Authorization header for subsequent requests.
2. Authorization with RBAC:
o Middleware: A middleware function intercepts each request, decodes the
JWT, and validates it. After decoding, user roles are checked against required
permissions for the endpoint.
o Role-Based Access Control: Define roles like admin, user, or editor.
Permissions are assigned at the endpoint level, allowing role-based
authorization to restrict access based on user roles.
3. Example Code for Middleware
const jwt = require('jsonwebtoken');
function authenticateToken(req, res, next) {
const token = req.headers['authorization'];
if (!token) return res.sendStatus(403);
jwt.verify(token, process.env.SECRET_KEY, (err, user) => {
if (err) return res.sendStatus(403);
req.user = user;
next();
});
function authorizeRoles(...allowedRoles) {
return (req, res, next) => {
if (!allowedRoles.includes(req.user.role)) {
return res.sendStatus(403);
next();
};
}
4. Securing Routes:
app.get('/admin', authenticateToken, authorizeRoles('admin'), (req, res) => {
res.send('Welcome, Admin!');});
Q7. RESTful API Design Principles: HATEOAS and Richardson Maturity Model
1. HATEOAS (Hypermedia As The Engine Of Application State): HATEOAS requires that
each response from the API should include links to related resources, guiding the
client on possible actions. This decouples the client from hardcoded URIs and enables
more flexible interactions.
Example: For an e-commerce API, a GET /orders response could include a link to update the
order status:
json
Copy code
"order_id": 123,
"status": "pending",
"links": [
{ "rel": "update", "href": "/orders/123/status", "method": "PUT" }
2. Richardson Maturity Model: This model measures REST maturity in four levels:
o Level 0: Single URI and non-REST, like a single endpoint handling all actions.
o Level 1: Introduces resources, where each resource has a unique URI.
o Level 2: Utilizes HTTP verbs (GET, POST, PUT, DELETE) for CRUD operations.
o Level 3: Implements HATEOAS, improving discoverability and reducing client-
server coupling.
3. Scalability Impact: By adhering to REST principles, APIs can scale better, support
multiple clients with minimal changes, and reduce network load by guiding clients
dynamically.
Part 4 – Docker & Kubernetes
Q8. Implement a Dockerfile to build a Node.js application with a non-root user
To build a Docker image for a Node.js application while ensuring security by using a non-root
user, follow these steps in the Dockerfile:
1. Use a Base Image: Start with a Node.js official image that is minimal for security
and performance.
2. Create a Non-root User: Add a non-root user and give them permissions to access
necessary files.
3. Copy the Application Files: Copy the Node.js application into the container.
4. Install Dependencies: Use the Node package manager (npm) to install the
application dependencies.
5. Set Permissions: Ensure that the non-root user has appropriate permissions to
access the files and run the application.
6. Expose Ports: Expose the port that the Node.js application listens on.
7. Run the Application: Use the non-root user to start the Node.js application.
Here’s an example Dockerfile for building a Node.js application with a non-root user:
# Step 1: Use an official Node.js image from Docker Hub
FROM node:16-alpine
# Step 2: Create a non-root user and group
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Step 3: Set the working directory for the application
WORKDIR /app
# Step 4: Copy package.json and package-lock.json first (to leverage Docker caching)
package*.json ./
# Step 5: Install dependencies
RUN npm install
# Step 6: Copy the rest of the application files
.
# Step 7: Change ownership of the files to the non-root user
RUN chown -R appuser:appgroup /app
# Step 8: Switch to the non-root user
USER appuser
# Step 9: Expose the port the app will run on
EXPOSE 3000
# Step 10: Run the application
CMD ["npm", "start"]
Explanation:
FROM node:16-alpine: Starts from an official Node.js image based on Alpine Linux,
which is lightweight and optimized.
addgroup and adduser: Create a new non-root user and group for security.
WORKDIR: Sets the working directory for running commands in the container.
COPY: Copies the necessary application files into the container.
RUN npm install: Installs the application dependencies.
chown: Ensures the non-root user has ownership of all the application files.
USER appuser: Switches to the non-root user for running the application.
EXPOSE: Exposes port 3000 for the Node.js application.
CMD: Defines the command to run the application.
Q9. Implement a Kubernetes Service for load balancing and exposing a Pod
To expose your application and provide load balancing, you can create a Kubernetes Service
of type LoadBalancer or ClusterIP. Here, we'll focus on a basic example using a
LoadBalancer Service for external access and automatic load balancing across Pods.
1. Define a Deployment: First, create a Deployment that runs your Pods.
2. Define a Service: Then, create a Kubernetes Service that exposes your Pods.
Here’s an example of a Kubernetes YAML configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-deployment
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-container
image: my-app-image:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: my-app-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
Explanation:
Deployment: Defines a deployment with 3 replicas of the Pod. The Pod runs the
container based on the my-app-image:latest image and listens on port 80.
Service: The Service is of type LoadBalancer, which means it will provision a load
balancer in supported environments (e.g., AWS, GCP). It will automatically route
traffic to the Pods using the selector app: my-app.
o port: 80: The port the service exposes.
o targetPort: 80: The port inside the container where the app is running.
o type: LoadBalancer: Exposes the service to external traffic and enables load
balancing.
You can apply this configuration using kubectl apply -f my-app-deployment.yaml.
Q10. Write a Docker Compose file to deploy a multi-service application (e.g., web,
database, caching)
Docker Compose is used to define and run multi-container Docker applications. Below is an
example docker-compose.yml file for deploying a simple multi-service application with a web
server, a PostgreSQL database, and a caching service (Redis).
version: '3'
services:
web:
image: my-web-app:latest
container_name: web
ports:
- "8080:80"
environment:
- DATABASE_URL=postgres://user:password@db:5432/mydatabase
- REDIS_URL=redis://cache:6379
depends_on:
- db
- cache
db:
image: postgres:alpine
container_name: db
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=password
- POSTGRES_DB=mydatabase
volumes:
- db-data:/var/lib/postgresql/data
ports:
- "5432:5432"
cache:
image: redis:alpine
container_name: cache
ports:
- "6379:6379"
volumes:
db-data:
Explanation:
web service: Runs a web application container (my-web-app:latest), exposes port
8080, and defines environment variables for database and Redis connection strings.
The depends_on section ensures that the web service waits for the database and
caching services to be ready.
db service: Runs a PostgreSQL container, initializes it with a user, password, and
database, and mounts a volume (db-data) to persist database data across container
restarts.
cache service: Runs a Redis container, which is exposed on port 6379.
volumes: Defines a named volume (db-data) to persist PostgreSQL data.
To start the application, use:
Q11. How would you handle Docker container logging and monitoring using tools
like ELK Stack or Fluentd?
Handling Docker container logging and monitoring effectively is critical for managing
production environments. Here's how you can use tools like ELK Stack (Elasticsearch,
Logstash, and Kibana) or Fluentd for logging and monitoring:
Logging with ELK Stack:
1. Elasticsearch: A distributed search and analytics engine where logs are stored and
indexed.
2. Logstash: Collects, parses, and ships logs from Docker containers to Elasticsearch.
3. Kibana: A visualization tool to display and query logs stored in Elasticsearch.
Steps to set up ELK Stack with Docker:
1. Create a Docker Network: You should create a Docker network to allow
communication between containers.
bash
Copy code
docker network create elk
2. Docker Compose for ELK: Use Docker Compose to define the ELK stack.
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.10.0
container_name: elasticsearch
environment:
- discovery.type=single-node
ports:
- "9200:9200"
networks:
- elk
logstash:
image: docker.elastic.co/logstash/logstash:7.10.0
container_name: logstash
ports:
- "5044:5044"
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
networks:
- elk
kibana:
image: docker.elastic.co/kibana/kibana:7.10.0
container_name: kibana
ports:
- "5601:5601"
networks:
- elk
networks:
elk:
external: true
3. Logstash Configuration (logstash.conf):
input {
beats {
port => 5044
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "docker-logs-%{+YYYY.MM.dd}"
4. Configure Fluentd or Filebeat (Client-side log collector): Use Fluentd or Filebeat to
collect logs from Docker containers and forward them to Logstash.
Monitoring with ELK Stack: In addition to logging, you can also use Elastic APM for
monitoring. It provides real-time insights into the performance of your Docker containers and
applications.
Setup Steps:
1. Install Elastic APM Agent: Install APM agents in your application containers to
track metrics.
2. Configure the Agent: Set up the agent to send data to the Elastic APM server.
3. Monitor in Kibana: Use Kibana to visualize application performance metrics and
logs in real-time.
By integrating Fluentd or Filebeat, logs are sent from containers to a centralized location
(Elasticsearch), and you can visualize the logs using Kibana.
Part 5 – GIT
Q12. What is the purpose of git stash? How does it differ from git reset --hard?
Git Stash: The purpose of git stash is to temporarily save uncommitted changes
(both staged and unstaged) so that you can work on something else or switch
branches without committing the changes. Afterward, you can apply the stashed
changes back into your working directory.
o Syntax: git stash
o You can apply the changes later using git stash apply or git stash pop.
Git Reset --hard: The command git reset --hard is used to reset your working
directory and staging area to a specific commit. Any changes to tracked files are lost,
and your working directory is reset to the state of the commit you specify.
o Syntax: git reset --hard <commit>
o Difference: git stash temporarily saves uncommitted changes and allows you
to restore them later, while git reset --hard discards all uncommitted changes
permanently.
Q13. Explain the concept of GIT's "reflog". How is it useful?
Git Reflog is a reference log that records updates to the tip of branches in a Git
repository. Every time you perform an operation that changes the state of a branch
(e.g., commit, merge, rebase, reset), Git records it in the reflog.
Usefulness:
o It allows you to recover lost commits that may not appear in the branch
history due to actions like git reset or git checkout.
o You can view the history of changes in your repository, including operations
that aren’t visible in the regular commit history (such as rebases or resets).
o Command: git reflog
o Example use: If you accidentally reset to an earlier commit, you can use git
reflog to find the commit reference and restore it.
Q14. Write a GIT command to create a patch file for a specific commit.
Command: git format-patch <commit_hash>
o This command generates a patch file for the given commit. You can specify a
commit by its hash or use other range selectors like HEAD~1 for the previous
commit.
Example: git format-patch -1 <commit_hash> will create a patch file for a single commit.
Part 6 – Azure
Q15. Explain the differences between Azure's Hub-and-Spoke, Microservices, and
Event-Driven architectures.
Hub-and-Spoke Architecture:
o In Azure, this architecture involves a central hub (usually a Virtual Network)
that connects to multiple spokes (other VNETs or services).
o It simplifies network management and isolates workloads for security.
o Often used for managing internal services and resources in large-scale
organizations.
Microservices Architecture:
o A way of building applications as a collection of loosely coupled,
independently deployable services.
o Each microservice is focused on a specific business function and
communicates with others via APIs.
o Azure Kubernetes Service (AKS) is commonly used for deploying microservices
in Azure.
Event-Driven Architecture:
o Based on the production, detection, and consumption of events, enabling
decoupled components in an application.
o Services react to events (such as state changes) and are triggered
automatically when a relevant event occurs.
o Azure services like Azure Event Grid and Azure Functions are commonly used
to implement event-driven architectures.
Q16. How would you optimize Azure VM performance for compute-intensive
workloads?
Ways to optimize VM performance:
o Choose the right VM size: Select a VM with sufficient CPU, memory, and
storage to handle the workload.
o Use Premium Storage: Opt for Azure Premium SSDs for faster read/write
operations.
o Enable Azure Accelerated Networking: This reduces latency and improves
throughput for high-performance networking.
o Use Virtual Machine Scale Sets (VMSS): To scale out VMs automatically
based on demand and optimize resource allocation.
o Configure Load Balancers: Distribute traffic evenly across VMs to prevent
overloading any single instance.
o Use Managed Disks and SSD Storage: For faster disk operations.
o Adjust VM settings: Ensure that VM's settings such as CPU affinity, power
plans, and OS optimizations are configured for performance.
Q17. How would you implement Azure auto scale for a headless integration?
Auto-Scaling for Headless Integration:
o Azure Application Gateway: Automatically scale web applications behind
the Application Gateway.
o Azure VM Scale Sets: Set up Auto Scaling for VMs. You can define a scaling
rule based on CPU usage, memory, or custom metrics.
o Azure Functions with Consumption Plan: Auto-scale based on the number
of incoming events, ideal for serverless scenarios.
o Azure Logic Apps: Can scale workflows based on triggers, ideal for headless
APIs or integration with external services.
Part 7 – LLM Fine-Tuning Questions
Q18. Describe the differences between LLaMA-2 and LLaMA-3 architectures.
LLaMA-2: The second iteration of LLaMA (Large Language Model Meta AI) focuses on
improving the performance and efficiency of the architecture. It was designed to
perform well across a wide variety of language tasks with improvements in
generalization and efficiency.
LLaMA-3: While details about LLaMA-3 are still evolving, it is expected to offer
further advancements in model architecture, efficiency, and pretraining strategies.
LLaMA-3 would likely build on the lessons learned from LLaMA-2, possibly offering
enhancements in multi-modal capabilities, model size, and more accurate
performance on specialized tasks.
Key Differences:
o Likely improvements in fine-tuning capabilities and multi-task learning.
o Possible increase in model size, resulting in improved performance but also
higher computational requirements.
Q19. Compare the effects of different regularization techniques on LLaMA fine-
tuning.
Regularization Techniques:
o Dropout: Randomly sets some neurons' outputs to zero during training. This
helps prevent overfitting by forcing the network to rely on a broader set of
features.
o L2 Regularization (Weight Decay): Penalizes large weights, encouraging
the model to find simpler solutions. It helps reduce overfitting by adding a
term to the loss function that penalizes large weights.
o Data Augmentation: Creating new training samples by altering the original
data (e.g., rotation, cropping, noise). In LLaMA fine-tuning, this helps increase
the diversity of the training data.
o Early Stopping: Monitoring validation loss during training and stopping if the
loss stops improving, preventing the model from overfitting.
o Effectiveness: Regularization techniques like dropout and weight decay are
critical for fine-tuning LLaMA models, especially when training data is limited
or prone to overfitting.
Q20. Implement a simple learning rate scheduler for LLaMA fine-tuning using
PyTorch.
import torch
from torch.optim.lr_scheduler import StepLR
# Example of using a learning rate scheduler with LLaMA fine-tuning
# Assume model and optimizer are already defined
model = ... # LLaMA model
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
# Define the learning rate scheduler
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
# Training loop
for epoch in range(100):
model.train()
# Training logic here
for batch in train_loader:
optimizer.zero_grad()
outputs = model(batch)
loss = compute_loss(outputs, batch)
loss.backward()
optimizer.step()
# Step the scheduler every epoch
scheduler.step()
print(f'Epoch {epoch}, Learning Rate: {scheduler.get_last_lr()[0]}')
This code implements a simple learning rate scheduler using StepLR, which reduces
the learning rate by a factor of gamma every step_size epochs.
----****----