0% found this document useful (0 votes)

14 views26 pages

BDA Assignment 5

Uploaded by

Temp Acc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views26 pages

BDA Assignment 5

Uploaded by

Temp Acc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

All assignment description can be found at

https://drive.google.com/drive/folders/1ecIVjsOW83f2LmMACHk5uMpEjpSPJ8WC?
usp=sharing
Assignment #1
Accessing and Analyzing Dataset using Amazon S3

2). To connect to the AWS Management Console, choose the AWS link in the upper-left
corner.

 A new browser tab opens and connects you to the console.

Tip: If a new browser tab does not open, a banner or icon is usually at the top of your
browser with the message that your browser is preventing the site from opening pop-up
windows. Choose the banner or icon, and then choose Allow pop-ups.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

Task 1: Using an AWS Glue crawler with the GHCN-D dataset

3). Configure and create the AWS Glue crawler.

In the AWS Management Console, in the search box next to Services, search for and
choose AWS Glue to open the AWS Glue console.
In the navigation pane, under Databases, choose Tables.
Choose Add tables using crawler.
For Name, enter Weather
Expand the Tags (optional) section.
Notice that this is where you could add tags or extra security configurations. Keep the
default settings.
Choose Next at the bottom of the page.
Choose Add a data source and configure the following:
Data source: Choose S3.
Location of S3 data: Choose In a different account.
S3 path: Enter the following S3 bucket location for the publicly available dataset:

s3://noaa-ghcn-pds/csv/by_year/
Subsequent crawler runs: Choose Crawl all sub-folders.
Choose Add an S3 data source.
Choose Next.
For Existing IAM role, choose gluelab.
This role was provided in the lab environment for you. For reference, see the lab's
CloudFormation template. The following is the YAML snippet for this role:

GlueLab:
Type: AWS::IAM::Role
Properties:
RoleName: "gluelab"
Path: "/"
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

- glue.amazonaws.com
Action:
- sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
- arn:aws:iam::aws:policy/AmazonS3FullAccess

Choose Next.
In the Output configuration section, choose Add database.
A new browser tab opens.
For Name, enter weatherdata
Choose Create database.
Return to the browser tab that is open to the Set output and scheduling page in the
AWS Glue console.
For Target database, choose the weatherdata database that you just created.
Tip: To refresh the list of available databases, choose the refresh icon to the right of the
dropdown list.
In the Crawler schedule section, for Frequency, keep the default On demand.
Choose Next.
Confirm your crawler configuration is similar to the following.

Choose Create crawler.

To perform the extract and load steps of the ETL process, you will now run the crawler.
You can create AWS Glue crawlers to either run on demand or on a set schedule.
Because you created your crawler to run on demand, you must run the crawler to build
the database and generate the metadata.

4. Run the crawler.

Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

o On the Crawlers page, select the Weather crawler that you just created.
o Choose Run.

The crawler state changes to Running.

Important: Wait for the status to change to Ready before moving to the next step. This
will take about 3 minutes.

AWS Glue creates a table to store metadata about the GHCN-D dataset. Next, you will
inspect the data that AWS Glue captured about the data source.

5. Review the metadata that AWS Glue created.

Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

o In the navigation pane, choose Databases.

o Choose the link for the weatherdata database.
o In the Tables section, choose the by_year link.

Review the metadata that the weather crawler captured, as shown in the following
screenshot. The schema lists the columns that the crawler discovered in the imported
dataset.

Now you will edit the schema of the database, which is part of transforming data in the
ETL process.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

Edit the schema.

 From the Actions menu in the upper-right corner of the page, choose Edit schema.
 Change the column names according to the following table.

To change a column name, select the check box for the item that you want to modify,
and then choose Edit.

In the window that opens, change the value for the Name, and then choose Edit. Re-
peat these steps for each column name.

Note: AWS Glue supports column names in lowercase only.

Previous Name New Name

id station

date date

element type

data_value observation

m_flag mflag

q_flag qflag

s_flag sflag

obs_time time

 Choose Update schema.

The schema for the table now looks like the following screenshot.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

Task 2: Querying a table by using Athena

Configure an S3 bucket to store Athena query results.

 In the navigation pane, under Databases, choose Tables.

 Choose the link for the by_year table.
 Choose Actions > View data.
 When the pop-up appears to warn you that you will be taken to the Athena console,
choose Proceed.

The Athena console opens. Notice the error message that indicates that an output loca-
tion was not provided. Before you run a query in Athena, you need to specify an S3
bucket to hold query results.

 Choose the Settings tab.

 Choose Manage.
 To the right of Location of query result, choose Browse S3.
 Choose the bucket name that is similar to the following: data-science-bucket-XXXXXX

Important: Don't choose the bucket name that contains glue-1950-bucket.

 Select Choose.
 Keep the default settings for the other options, and choose Save.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

7. Preview a table in Athena.

o Choose the Editor tab.
o In the Data panel on the left, notice that the Data source is AwsDataCatalog.
o For Database, choose weatherdata.
o In the Tables section, choose the ellipsis (three dot) icon for the by_year table, and
then choose Preview Table.

Tip: To view the column names and their data types in this table, choose the icon to the
left of the table name.

The first 10 records from the weatherdata table display, similar to the following screen-
shot:

Notice the run time and amount of data that was scanned for the query. As you develop
more complex applications, it is important to minimize resource consumption to optimize
costs. You will see examples of how to optimize cost for Athena queries later in this
task.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

9). Create a table for data after 1950.

First, you need to retrieve the name of the bucket that was created for you to store this
data.

 In the search box next to Services, search for and choose S3.
 In the Buckets list, copy the bucket name that contains glue-1950-bucket to a text edi-
tor of your choice.
 Return to the Athena query editor.
 Copy and paste the following query into a query tab in the editor. Replace <glue-1950-
bucket> with the name of the bucket that you recorded:

CREATE table weatherdata.late20th

WITH (
format='PARQUET', external_location='s3://<glue-1950-bucket>/lab3'
) AS SELECT date, type, observation FROM by_year
WHERE date/10000 between 1950 and 2015;

 Choose Run.

After the query runs, the run time and data scanned values are similar to the following:

Time in queue: 128 ms

Run time: 1 min 8.324 sec
Data scanned: 98.44 GB
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

10). Run a query on the new table.

First, create a view that only includes the maximum temperature reading, or TMAX, value.

 Run the following query in a new query tab:

CREATE VIEW TMAX AS

SELECT date, observation, type
FROM late20th
WHERE type = 'TMAX'

 To preview the results, in the Views section, to the right of the tmax view, choose
the ellipsis icon, and then choose Preview View.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

11). Run the following query in a new query tab.

SELECT date/10000 as Year, avg(observation)/10 as Max

FROM tmax
GROUP BY date/10000 ORDER BY date/10000;

The purpose of this query is to calculate the average maximum temperature for each
year in the dataset.

After the query runs, the run time and data scanned values are similar to the following:

Time in queue: 0.211 sec

Run time: 25.109 sec
Data scanned: 2.45 GB
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

Task 3: Creating a CloudFormation template for an AWS Glue crawler

12. Find the Amazon Resource Number (ARN) for the gluelab IAM role. You need this ARN
to deploy the CloudFormation template.
o In the search box next to Services, search for and choose IAM to open the IAM con-
sole.
o In the navigation pane, choose Roles.
o Choose the link for the gluelab role.

Tip: You can search for the role if needed.

The ARN is displayed on the page in the Summary section.

o Copy the ARN to a text editor to use in the next step.

Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

13. Navigate to the AWS Cloud9 integrated development environment (IDE).

o In the search box next to Services, search for and choose Cloud9 to open the AWS
Cloud9 console.

AWS Cloud9 environments are listed.

o For the environment named Cloud9 Instance, choose Open IDE.

A new browser tab opens and displays the AWS Cloud9 IDE.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

14. Create a new CloudFormation template.

 In the AWS Cloud9 IDE, choose File > New File.

 Save the empty file as gluecrawler.cf.yml but keep it open.
 Copy and paste the following code into the file:

AWSTemplateFormatVersion: '2010-09-09'
Parameters: # The name of the crawler to be
created
CFNCrawlerName:
Type: String
Default: cfn-crawler-weather
CFNDatabaseName:
Type: String
Default: cfn-database-weather
CFNTablePrefixName:
Type: String
Default: cfn_sample_1-weather
# Resources section defines metadata for the Data Catalog
Resources:
# Create a database to contain tables created by the crawler
CFNDatabaseWeather:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput:
Name: !Ref CFNDatabaseName
Description: "AWS Glue container to hold metadata tables for the weather crawler"
#Create a crawler to crawl the weather data on a public S3 bucket
CFNCrawlerWeather:
Type: AWS::Glue::Crawler
Properties:
Name: !Ref CFNCrawlerName
Role: <GLUELAB-ROLE-ARN>
#Classifiers: none, use the default classifier
Description: AWS Glue crawler to crawl weather data
#Schedule: none, use default run-on-demand
DatabaseName: !Ref CFNDatabaseName
Targets:
S3Targets:
# Public S3 bucket with the weather data
- Path: "s3://noaa-ghcn-pds/csv/by_year/"
TablePrefix: !Ref CFNTablePrefixName
SchemaChangePolicy:
UpdateBehavior: "UPDATE_IN_DATABASE"
DeleteBehavior: "LOG"
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehav-
ior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}"

15. To validate the CloudFormation template, run the following command in the AWS
Cloud9 terminal:

aws cloudformation validate-template --template-body file://gluecrawler.cf.yml

Note: If you receive an error that says YAML not well-formed, check the value for the
name of the gluelab role. Also check the tabs and spacing for each line. YAML docu-
ments require exact spacing, and the parser will encounter errors if the spacing doesn't
match.

If the template is validated, the following output displays:

{
"Parameters": [
{
"ParameterKey": "CFNCrawlerName",
"DefaultValue": "cfn-crawler-weather",
"NoEcho": false
},
{
"ParameterKey": "CFNTablePrefixName",
"DefaultValue": "cfn_sample_1-weather",
"NoEcho": false
},
{
"ParameterKey": "CFNDatabaseName",
"DefaultValue": "cfn-database-weather",
"NoEcho": false
}
]
}

Important: Don't go to the next step until the template is validated.

Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

16. To create the CloudFormation stack, run the following command:

aws cloudformation create-stack --stack-name gluecrawler --template-body

file://gluecrawler.cf.yml --capabilities CAPABILITY_NAMED_IAM

Note: The command includes the --capabilities parameter with the

CAPABILITY_NAMED_IAM capability. This is because you are creating the following
resources with custom names, which affect permissions:

o An AWS Glue crawler named cfn-crawler-weather

o An AWS Glue database named cfn-database-weather
o A table named cfn_sample_1-weather within the AWS Glue database

If the stack is validated, the CloudFormation ARN displays in the output, similar to the
following:

{
"StackId": "arn:aws:cloudformation:us-east-1:338778555682:stack/gluecrawler/2d8cec90-
5c42-11ec-8fbf-12034b0079a5"
}

The CloudFormation create-stack command creates the stack and deploys it. If valida-
tion passes and nothing causes the stack creation to roll back, proceed to the next step.

Tip: To check the progress of stack creation, navigate to the CloudFormation console.
In the navigation pane, choose Stacks.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

17. To verify that the AWS Glue database was created in the stack, run the following com-
mand:

aws glue get-databases

The output is similar to the following:

{
"DatabaseList": [
{
"Name": "cfn-database-weather",
"Description": "AWS Glue container to hold metadata tables for the weather crawler",
"Parameters": {},
"CreateTime": 1649267047.0,
"CreateTableDefaultPermissions": [
{
"Principal": {
"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
},
"Permissions": [
"ALL"
]
}
],
"CatalogId": "034140262343"
},
{
"Name": "weatherdata",
"CreateTime": 1649263434.0,
"CreateTableDefaultPermissions": [
{
"Principal": {
"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
},
"Permissions": [
"ALL"
]
}
],
"CatalogId": "034140262343"
}
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

]
}

18. Verify that the crawler was created in the stack.

o To verify that the crawler was created, run the following command:

aws glue list-crawlers

The output is similar to the following:

{
"CrawlerNames": [
"Weather",
"cfn-crawler-weather"
]
}

o To retrieve the details of the crawler, run the following command.

aws glue get-crawler --name cfn-crawler-weather

The output is similar to the following:

{
"Crawler": {
"Name": "cfn-crawler-weather",
"Role": "WeatherCrawler-001-CFNRoleWeather-17WB9OM5H5MFL",
"Targets": {
"S3Targets": [
{
"Path": "s3://noaa-ghcn-pds/csv/by_year/",
"Exclusions": []
}
],
"JdbcTargets": [],
"MongoDBTargets": [],
"DynamoDBTargets": [],
"CatalogTargets": [],
"DeltaTargets": []
},
"DatabaseName": "cfn-database-weather",
"Description": "AWS Glue crawler to crawl weather data",
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

"Classifiers": [],
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "LOG"
},
"LineageConfiguration": {
"CrawlerLineageSettings": "DISABLE"
},
"State": "READY",
"TablePrefix": "cfn_sample_1-weather",
"CrawlElapsedTime": 0,
"CreationTime": 1649083535.0,
"LastUpdated": 1649083535.0,
"Version": 1,
"Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBe-
havior\":\"InheritFromTable\"},\"Tables\":
{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}",
"LakeFormationConfiguration": {
"UseLakeFormationCredentials": false,
"AccountId": ""
}
}
}
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

Task 4: Reviewing the IAM policy for Athena and AWS Glue access

Review the Policy-For-Data-Scientists policy in IAM.

 In the search box to the right of Services, search for and choose IAM to open the IAM
console.
 In the navigation pane, choose Users.

Note that mary is one of the IAM users that is listed. This user is part of the Data-
ScienceGroup IAM group.

 Choose the link for the DataScienceGroup IAM group.

 On the DataScienceGroup details page, choose the Permissions tab.
 In the list of policies that are attached to the group, choose the link for the Policy-For-
Data-Scientists policy.

The Policy-For-Data-Scientists details page opens. Review the permissions that are as-
sociated with this policy. Notice that the permissions provide limited access for only the
Athena, AWS Glue, and Amazon S3 services.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

Task 5: Confirming that Mary can access and use the AWS Glue crawler

20. Retrieve the credentials for the mary IAM user, and store these as bash variables.
o In the search box next to Services, search for and choose CloudFormation.
o In the navigation pane, choose Stacks.
o Choose the link for the stack that created the lab environment. The stack name includes
a random string of letters and numbers, and the stack should have the oldest creation
time.
o On the stack details page, choose the Outputs tab.

Note: When you create a CloudFormation template, you can choose to output informa-
tion about the resources that the template will create. The CloudFormation template that
created the resources in your lab environment output the access key and secret access
key for the mary user.

o Copy the value of MarysAccessKey to your clipboard.

o Return to the AWS Cloud9 terminal.
o To create a variable for the access key, run the following command. Replace <AC-
CESS-KEY> with the value from your clipboard.

AK=<ACCESS-KEY>

o Return to the CloudFormation console, and copy the value of MarysSecretAccessKey

to your clipboard.
o Return to the AWS Cloud9 terminal.
o To create a variable for the secret access key, run the following command. Replace
<SECRET-ACCESS-KEY> with the value from your clipboard.

SAK=<SECRET-ACCESS-KEY>

To test whether the mary user can perform a specific command, you can pass the
user's credentials as bash variables (AK and SAK) with the command. The API will then
try to perform that command as the specified user.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

21. Test Mary's access to the AWS Glue crawler.

o To test whether the mary user can perform the list-crawlers command, run the following
command:

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws glue list-crawlers

The output is similar to the following and looks like the output that was displayed after
you ran the command earlier:

{
"CrawlerNames": [
"Weather",
"cfn-crawler-weather"
]
}

o To test whether the mary user can perform the get-crawler command, run the following
command:

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws glue get-crawler --name

cfn-crawler-weather

The output is similar to the following and looks like the output that was displayed after
you ran the command earlier. Note that the state of the crawler is READY, but no status
information is displayed. This is because the crawler hasn't run yet.

{
"Crawler": {
"Name": "cfn-crawler-weather",
"Role": "gluelab",
"Targets": {
"S3Targets": [
{
"Path": "s3://noaa-ghcn-pds/csv/by_year/",
"Exclusions": []
}
],
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

"JdbcTargets": [],
"MongoDBTargets": [],
"DynamoDBTargets": [],
"CatalogTargets": [],
"DeltaTargets": []
},
"DatabaseName": "cfn-database-weather",
"Description": "AWS Glue crawler to crawl weather data",
"Classifiers": [],
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "LOG"
},
"LineageConfiguration": {
"CrawlerLineageSettings": "DISABLE"
},
"State": "READY",
"TablePrefix": "cfn_sample_1-weather",
"CrawlElapsedTime": 0,
"CreationTime": 1649267047.0,
"LastUpdated": 1649267047.0,
"Version": 1,
"Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBe-
havior\":\"InheritFromTable\"},\"Tables\":
{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}",
"LakeFormationConfiguration": {
"UseLakeFormationCredentials": false,
"AccountId": ""
}
}
}
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

21. Test that the mary user can run the crawler.
o Run the following command.

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws glue start-crawler --name

cfn-crawler-weather

If the crawler runs successfully, the terminal doesn't display any output.

o To observe the crawler running and adding data to the table, navigate to the AWS Glue
console.
o In the navigation pane, choose Crawlers.

Here you can see status information for the crawler, as shown in the following screen-
shot.

When the status changes to Ready, the crawler is finished running. It might take a few
minutes.

Return to the AWS Cloud9 terminal.

o To confirm that the crawler is finished running, run the following command.

AWS_ACCESS_KEY_ID=$AK AWS_SECRET_ACCESS_KEY=$SAK aws glue get-crawler --name

cfn-crawler-weather

The output is similar to the following:

{
"Crawler": {
"Name": "cfn-crawler-weather",
"Role": "gluelab",
"Targets": {
"S3Targets": [
{
"Path": "s3://noaa-ghcn-pds/csv/by_year/",
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

"Exclusions": []
}
],
"JdbcTargets": [],
"MongoDBTargets": [],
"DynamoDBTargets": [],
"CatalogTargets": [],
"DeltaTargets": []
},
"DatabaseName": "cfn-database-weather",
"Description": "AWS Glue crawler to crawl weather data",
"Classifiers": [],
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "LOG"
},
"LineageConfiguration": {
"CrawlerLineageSettings": "DISABLE"
},
"State": "READY",
"TablePrefix": "cfn_sample_1-weather",
"CrawlElapsedTime": 0,
"CreationTime": 1649267047.0,
"LastUpdated": 1649267047.0,
"LastCrawl": {
"Status": "SUCCEEDED",
"LogGroup": "/aws-glue/crawlers",
"LogStream": "cfn-crawler-weather",
"MessagePrefix": "5ef3cff5-ce6c-45d5-8359-e223a4227570",
"StartTime": 1649267649.0
},
"Version": 1,
"Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBe-
havior\":\"InheritFromTable\"},\"Tables\":
{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}",
"LakeFormationConfiguration": {
"UseLakeFormationCredentials": false,
"AccountId": ""
}
}
}

Notice that the LastCrawl section is included, and the status in that section is SUC-
CEEDED. This means that Mary was able to run the crawler successfully.
Theem College of Engineering, Boisar

Student Name: Student Year: BE COMPS (2025-26)

Student Roll: Student Subject: BDA

23. To record your progress, choose Submit at the top of these instructions.

AWS Certified Data Engineer
No ratings yet
AWS Certified Data Engineer
693 pages
Cheat Sheet AWS Data Engineer Associate
No ratings yet
Cheat Sheet AWS Data Engineer Associate
117 pages
AWS Associate Data Engineer
100% (2)
AWS Associate Data Engineer
23 pages
T15 AWSAnalyticsAndAI ProblemStatement Mocktest
No ratings yet
T15 AWSAnalyticsAndAI ProblemStatement Mocktest
14 pages
Aws Glue Information
No ratings yet
Aws Glue Information
46 pages
AWS Glue ETL Guide: Setup & Execution
No ratings yet
AWS Glue ETL Guide: Setup & Execution
10 pages
AWS 05 DataLake
No ratings yet
AWS 05 DataLake
78 pages
AWS Athena Knowledgebase
No ratings yet
AWS Athena Knowledgebase
4 pages
Lab Aws 14-10
100% (1)
Lab Aws 14-10
25 pages
Aws Glue Interview
No ratings yet
Aws Glue Interview
259 pages
6 +Athena,+QuickSight,+EMR
No ratings yet
6 +Athena,+QuickSight,+EMR
63 pages
CV - Andi Kurniawan - 2023
No ratings yet
CV - Andi Kurniawan - 2023
6 pages
Aws Glu
No ratings yet
Aws Glu
17 pages
Final Project Report Found
No ratings yet
Final Project Report Found
86 pages
Exercise 3 - Processing Data in A Data Lake
No ratings yet
Exercise 3 - Processing Data in A Data Lake
6 pages
AWS Athena & Glue for Data Analysis
No ratings yet
AWS Athena & Glue for Data Analysis
13 pages
AI Rekognition Lab
No ratings yet
AI Rekognition Lab
42 pages
AWS Project1
No ratings yet
AWS Project1
13 pages
Isabela State University: Republic of The Philippines Cauayan City, Isabela
No ratings yet
Isabela State University: Republic of The Philippines Cauayan City, Isabela
20 pages
Lab 5 Correlate Structured W Unstructured Data
No ratings yet
Lab 5 Correlate Structured W Unstructured Data
5 pages
AWS Glue: Create S3 Data Crawler Guide
No ratings yet
AWS Glue: Create S3 Data Crawler Guide
2 pages
Aws Project
No ratings yet
Aws Project
16 pages
Affinity
No ratings yet
Affinity
7 pages
Abhishek 17032000395 Final Report 2
No ratings yet
Abhishek 17032000395 Final Report 2
24 pages
Notes
No ratings yet
Notes
28 pages
AWS Glue Interview Guide
No ratings yet
AWS Glue Interview Guide
23 pages
Lab - Performing ETL On A Dataset by Using AWS Glue
100% (1)
Lab - Performing ETL On A Dataset by Using AWS Glue
26 pages
Abd213 R Howtobuildadatalakewithawsgluedatacatalog 180208045612
No ratings yet
Abd213 R Howtobuildadatalakewithawsgluedatacatalog 180208045612
43 pages
Pfsense Configuration
No ratings yet
Pfsense Configuration
31 pages
PROF ED 108: Technology For Teaching and Learning
No ratings yet
PROF ED 108: Technology For Teaching and Learning
43 pages
Bus Ticket Reservation
No ratings yet
Bus Ticket Reservation
40 pages
AWS Glue
100% (1)
AWS Glue
225 pages
BBDMS Skproject
No ratings yet
BBDMS Skproject
23 pages
Data Pipelines With AWS Glue (Level 200)
No ratings yet
Data Pipelines With AWS Glue (Level 200)
33 pages
Processing XML With AWS Glue and Databricks Spark
No ratings yet
Processing XML With AWS Glue and Databricks Spark
23 pages
Amazon Athena Federated Query Guide
No ratings yet
Amazon Athena Federated Query Guide
60 pages
How To Work With Iceberg Format in AWS-Glue
No ratings yet
How To Work With Iceberg Format in AWS-Glue
17 pages
AWS Data Lake Setup Guide
No ratings yet
AWS Data Lake Setup Guide
11 pages
Data Storage and AWS
No ratings yet
Data Storage and AWS
24 pages
Modernserverlessdatalak
No ratings yet
Modernserverlessdatalak
45 pages
Overview On DBS
No ratings yet
Overview On DBS
30 pages
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
2 pages
Unit 09 - Assignment 02 Guide
0% (1)
Unit 09 - Assignment 02 Guide
2 pages
Custom Iw 106: Product Specification Sheet
No ratings yet
Custom Iw 106: Product Specification Sheet
1 page
R Studio Notes
No ratings yet
R Studio Notes
10 pages
IND AS 115: Revenue Recognition Guide
No ratings yet
IND AS 115: Revenue Recognition Guide
21 pages
Toll Bridge IT Security Audit Report
0% (1)
Toll Bridge IT Security Audit Report
3 pages
SMTS File - 1 RS20200105 2020 05 19 14 - 26 - 04
No ratings yet
SMTS File - 1 RS20200105 2020 05 19 14 - 26 - 04
2 pages
Gmail - Trend - Test Interviewer Software
No ratings yet
Gmail - Trend - Test Interviewer Software
2 pages
AZ204 Resources
No ratings yet
AZ204 Resources
3 pages
AWS Glue Metadata Management Guide
No ratings yet
AWS Glue Metadata Management Guide
5 pages
Glue by Pushpjeet
No ratings yet
Glue by Pushpjeet
7 pages
Lab - Updating Dynamic Data in Place
No ratings yet
Lab - Updating Dynamic Data in Place
14 pages
Corporate Brochure
No ratings yet
Corporate Brochure
6 pages
Rao Sahab
No ratings yet
Rao Sahab
18 pages
AWS DATA Engineering Abhishek
No ratings yet
AWS DATA Engineering Abhishek
6 pages
Mio-5377r DS (100223) 20231002134454
No ratings yet
Mio-5377r DS (100223) 20231002134454
2 pages
Ais CH 3
No ratings yet
Ais CH 3
39 pages
Dea C01
No ratings yet
Dea C01
7 pages
Class Xii Patfil Cs Project Final
No ratings yet
Class Xii Patfil Cs Project Final
81 pages
Autos Automobile.. EDA Project by Anjali Sinha
No ratings yet
Autos Automobile.. EDA Project by Anjali Sinha
26 pages
BDA New 1
No ratings yet
BDA New 1
65 pages
P04 Calc AbsolutReferences
No ratings yet
P04 Calc AbsolutReferences
2 pages
Big Data Analysis Term Work
No ratings yet
Big Data Analysis Term Work
65 pages
Building Serverless Analytics Pipelines With AWS Glue - Tom McMeekin-1
No ratings yet
Building Serverless Analytics Pipelines With AWS Glue - Tom McMeekin-1
39 pages
Robotics Lab Manual
No ratings yet
Robotics Lab Manual
33 pages
Week 11 APP Tutorial Assignment
No ratings yet
Week 11 APP Tutorial Assignment
4 pages
Control Engineering Completion
No ratings yet
Control Engineering Completion
20 pages
Grok Stuff A
No ratings yet
Grok Stuff A
5 pages
Build An ETL Service Pipeline To Load Data Incrementally From Amazon S3 To Amazon Redshift Using AWS Glue - AWS Prescriptive Guidance
No ratings yet
Build An ETL Service Pipeline To Load Data Incrementally From Amazon S3 To Amazon Redshift Using AWS Glue - AWS Prescriptive Guidance
15 pages
ICT Safety and Security Guide
No ratings yet
ICT Safety and Security Guide
7 pages
Social Media Data Integration and Analysis Ps
No ratings yet
Social Media Data Integration and Analysis Ps
7 pages
CC Rpt3
No ratings yet
CC Rpt3
12 pages
General Terminal Commands::cd:pwd
No ratings yet
General Terminal Commands::cd:pwd
19 pages
ETL Lab
No ratings yet
ETL Lab
34 pages
Week 4 Secure Information System
No ratings yet
Week 4 Secure Information System
19 pages
NLP Techknowledge
No ratings yet
NLP Techknowledge
148 pages
Aws Certified Data Engineer Associate 9
No ratings yet
Aws Certified Data Engineer Associate 9
14 pages
Lesson 02 Exploring The World of AWS Glue
No ratings yet
Lesson 02 Exploring The World of AWS Glue
33 pages
Assignment AWS Glue
No ratings yet
Assignment AWS Glue
38 pages
Assignment 4
No ratings yet
Assignment 4
35 pages
Aws Glue Tutorial Case Study
No ratings yet
Aws Glue Tutorial Case Study
13 pages
Athena Lab
No ratings yet
Athena Lab
35 pages
Assigment 3
No ratings yet
Assigment 3
26 pages
AWS Glue
No ratings yet
AWS Glue
5 pages
NLP Exp List
No ratings yet
NLP Exp List
1 page
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 C Scheme
No ratings yet
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 C Scheme
8 pages
EXp 1 Blockchain.
No ratings yet
EXp 1 Blockchain.
3 pages
POS Tags
No ratings yet
POS Tags
2 pages
QuestionBank BE BDA MidTermTest
No ratings yet
QuestionBank BE BDA MidTermTest
7 pages
Exp 2
No ratings yet
Exp 2
3 pages
BCT Assignment 1
No ratings yet
BCT Assignment 1
10 pages
Awsq
No ratings yet
Awsq
5 pages
AWSw 3
No ratings yet
AWSw 3
9 pages
AWS Glue
No ratings yet
AWS Glue
6 pages
AWS Glue Guide
No ratings yet
AWS Glue Guide
17 pages