Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
454 views9 pages

CS 7650 Summer 2024 Syllabus

The Natural Language Processing (CS 7650) course, taught by Mark Riedl in Summer 2024, focuses on equipping students with the skills to develop NLP algorithms and systems. Prerequisites include proficiency in Python, familiarity with data structures, and a background in basic probability and linear algebra. The course includes programming assignments, quizzes, a written report, and a final exam, with a grading scale and policies on collaboration and academic integrity clearly outlined.

Uploaded by

saeb2saeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
454 views9 pages

CS 7650 Summer 2024 Syllabus

The Natural Language Processing (CS 7650) course, taught by Mark Riedl in Summer 2024, focuses on equipping students with the skills to develop NLP algorithms and systems. Prerequisites include proficiency in Python, familiarity with data structures, and a background in basic probability and linear algebra. The course includes programming assignments, quizzes, a written report, and a final exam, with a grading scale and policies on collaboration and academic integrity clearly outlined.

Uploaded by

saeb2saeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Natural Language Processing (CS 7650)

Summer 2024
Instructor: Mark Riedl ([email protected])

General Course Overview


Natural Language Processing (NLP) seeks to endow computers with the ability to intelligently
process human language. NLP components are used in conversa�onal agents and other systems
that engage in dialogue with humans, automa�c transla�on between human languages,
automa�c answering of ques�ons using large text collec�ons, the extrac�on of structured
informa�on from text, tools that help human authors, and many, many more.

This course will teach you the fundamental ideas used in key NLP components as well as current
state-of-the-art prac�ce in developing NLP algorithms.

Prerequisites
Students are expected to be proficient in the python programming language. Course
requirements include programming assignments writen in the python programming language
using Jupyter notebooks on a personal computer with a GPU or on Google Colab (or similar
cloud development environment that makes GPUs available).

Students are expected to have background in data structures and have had some exposure to
computa�onal complexity (e.g., analysis of algorithm complexity, finite automata). Addi�onally,
students are expected to have background in basic probability, linear algebra, and calculus.
Students are encouraged to have taken a course in ar�ficial intelligence or machine learning. Of
relevance are familiarity with linear classifiers, perceptrons, naive Bayes, and logis�c regression.
While helpful, it is not assumed.

Course Materials and Resources


Online Textbook: Natural Language Processing (2018) by Jacob Eisenstein.

Compu�ng Environment: Programming assignments will be done in Python using Jupyter


Notebooks. Assignments can be performed on personal computers with sufficiently modern
CPUs. Students may benefit from NVIDIA GPUs capable of running CUDA, however we find that
it is not strictly required. Google Colab is a cloud environment that runs Jupyter Notebooks and
can access free GPUs. The free �er of Colab has service limits, and a Google Colab Pro account,
which costs approximately $10 per month, provides access to more RAM and has higher caps on
GPU usage. Students may explore other comparable services, though we have not inves�gated
the prac�cality of other services.

Canvas: Course materials, assignment downloads, quizzes, and exams.


Ed Discussions. For Official Announcements, and Forums for discussion. All class discussions will
be on the Ed Discussion site. Here are some very specific guidelines for these discussions, which
must be adhered to:
• All posts must be professional and cordial and about/related to the course material at
hand.
• Students WILL NOT post specific answers to any of the assignments to Ed Discussion
before the due date of said assignments unless the TAs request it in a PRIVATE post.
• Before asking a ques�on on the Forum, students should search for an answer to their
ques�on. It most probably has been discussed already.
• Instructor team will atempt to answer all ques�ons, as possible. But please do NOT
expect immediate responses. TAs are humans too with courses and other
responsibili�es. TAs are instructed let students answer each other’s ques�ons too, as
that support more interac�ve learning. It is always wise to start programming
assignments early, and to help others out when you can. Set the example you hope to
benefit from later.
• Students can post anonymously to the class, but their IDENTITIES will be known by the
instructor team.
• Instructor team is required to maintain privacy of all students, so please ensure that you
communicate with them privately (using the private channels via Ed Discussion) to
communicate with them.
• If there is a complaint about the class, please DO NOT post a public note to Ed
Discussion. Please communicate directly with the instructor team. We will do our best
to address it. If it is NOT addressed, please use OMS Assistance.

Gradescope. Students will download assignment instruc�ons and materials from Canvas and
submit their solu�ons to Gradescope for grading.

No informa�on will be shared via any other site (Facebook, etc.). Students are welcome to
create their own social media sites, but none of the instructors are required to be on those sites
and will not par�cipate there regularly.

Grading
• Quizzes: 10%
• Programming Assignments: There will be 6 programming assignments:
o 1: Introduc�on to neural networks (10%)
o 2: Classifica�on (10%)
o 3a: Language Modeling, part a (10%)
o 3b: Language Modeling, part b (10%)
o 4: Distribu�onal Seman�cs (10%)
o 5: Mini-project (20%)
• Final Exam: There will be one final, comprehensive exam (10%)
• Writen Report: In lieu of a mid-term exam, you will be asked to select and review a
paper from a recent NLP conference (10%)
Programming Assignments
The primary goal of this class is to provide hands-on learning experiences building natural
language processing systems. We have broken these experiences into 6 programming
assignments.

Programming assignments will be conducted in python using Jupyter notebooks. You may use
Google Colab, which na�vely supports Jupyter. You may also run the notebooks locally on your
own machine. If you run the code on your own machine, you should use Jupyter Lab version 4.0
or greater.

The first assignment will familiarize you with the programming environment and Pytorch API (a
python library for building and training neural networks).

Assignments 2, 3a, 3b, and 4 will walk you through the construc�on of increasingly more
sophis�cated natural language processing models, most of which will be based on neural
networks. This will largely involve filling in the code for pre-defined func�ons.

The final programming assignment, the mini-project, will ask you to work with preparing a pre-
determined dataset and building, training, and tes�ng a specified type of natural language
processing model from scratch. The notebook will walk you through the steps but will not pre-
define func�ons to be completed.

All programming assignments except the last will be graded via autograder. The autograder will
be provided as part of the notebook so that you can perform self-assessment. Grades will be
determined by the same autograder. Any atempt to modify the autograder will result in an
automa�c zero. The mini-project programming assignment will be manually graded via code
inspec�on and writen report.

Each programming assignment except the mini-project is worth 10% of the final grade. The
mini-project is worth 20% of the final grade. All assignments together are worth 70% of the final
grade.

Programming assignments will be submited to Gradescope. Programming assignment


notebooks should have notebook cell outputs saved. This is important because the autograder
will report the grade in the cell outputs, which will then be verified by graders.

Exams
The goal of the exams is to consolidate the knowledge and learning experience of students
about the materials covered in the course modules. There is one un-proctored test at the end of
the course, worth 10% of your overall grade. The test will be delivered via Canvas.
Writen Report
In lieu of a midterm exam, you will select and review a conference paper from a recent NLP
conference. You will be provided a short list of papers to select from to review. The writen
report would be no longer than 5 pages and will address a set of prompts/ques�ons that we will
provide in advance, along with addi�onal guidelines and informa�on on effec�ve paper reading.
Details on the prompts and exact paper length requirement will be provided when the report
assignment instruc�ons are released. This assignment will be worth 10% of your overall grade.

Quizzes
Quizzes act as aten�on tests at the end of each module. Quizzes will involve mul�-choice
ques�ons about the lecture materials and will be delivered via Canvas. The Quizzes will all
together total 10% of the grade.

Grading Scale
Grading Scale (for each assignment/unit and for the en�re class).
A: At or above 90%
B: 80%-89.99%
C: 70%-79.99%
D: 60%-69.99%
F: Below 60%

Late Day Policy


Over the course of the semester, you'll have 5 “free” late days to submit programming
assignments (except the final mini project). A late day is used one minute a�er the due date. A
second late day is used 24 hours a�er that, and so on. Late days are determined by Gradescope
submission �mestamps. Our inten�on is to give you some flexibility around your work
commitments, family obliga�ons, vaca�ons, and the like.

Addi�onal rules:
• For every extra late day (past the 5 “free” late days) used, you'll incur a penalty of 25%
from your final grade.
• If you have a medical or family emergency, please contact the Dean of Students who
may grant an excep�on to the late policy if your circumstances warrant it. We must
receive approval from the Dean to grant an excep�on.

Late days cannot be used on quizzes, midterm report, exams, or the final mini-project.

Regrading Policy
Regrade requests can be made via Gradescope. Please provide clear details as to why you are
reques�ng a regrade. All regrade requests must be made within ONE (1) week of the grade
release. For grades released in the last week of the term, the regrade request must be made by
the last day of the final exam week.
Due Dates
All due dates will be on Canvas, and the �me zone will be Anywhere on Earth Time (AoE) �me.
Please plan accordingly, especially around Daylight Savings changes in the US and in your
loca�on.

Honor Code
Georgia Tech aims to cul�vate a community based on trust, academic integrity, and honor.
Students are expected to act according to the highest ethical standards. The Georgia Tech
Academic Honor Code applies to all aspects of this course. Plagiarism is a viola�on of the
Academic Honor code. To plagiarize is defined by Webster’s as “to steal and pass off (the ideas
or words of another) as one’s own; use (another's produc�on) without credi�ng the source.”
Any student suspected of chea�ng or plagiarizing on a quiz, exam or assignment will be
reported to the Office of Student Integrity, who will inves�gate the incident and iden�fy the
appropriate penalty for viola�ons. For any ques�ons involving these or any other Academic
Honor Code issues, please consult us or htp://catalog.gatech.edu/policies/honor-code/.

Collabora�on Policy
Collabora�on between students on work assigned in class is fine. You are encouraged to discuss
your work with each other. But each individual students MUST submit their own work, done
solely by themselves. In some cases, you may have had a fellow student or a non-student friend,
help you with an assignment. You are REQUIRED to acknowledge any help you may have
received in comple�ng the work assigned, even as small as sugges�ng a possible path to a
solu�on. Please be explicit and provide details. We will be checking for code plagiarism in our
assessment, so please NO copying code from the Web/Internet. Any code snippets must be
cited and limited to a maximum of 5 lines. We understand you may not be familiar with some
libraries and APIs presented in this class and you will likely look up usage examples for individual
func�ons. You may study these examples, but the code used in your assignment must be your
own.

To protect yourself and to protect others we recommend the following heuris�cs when
communica�ng with others about course assignments:

1. Do not copy and paste your own code to share with someone else. If they in turn copy
your code into their assignment, this will trigger plagiarism detectors. You may want to
write out your sugges�ons in English (or your natural language of choice) or pseudocode
that is substan�ally different than python.
2. Do not copy more than 5 lines of any code that you find on the internet or that is
communicated to you. When more than one person in a class copies the same piece of
code from the internet, this triggers plagiarism detectors. Instead, use this as a learning
episode: study the code, close down the source, and try to write it yourself. You will find
this more fulfilling in the long run than copying, even if it is less than 5 lines.
3. Do not share your screen when your Google Colab notebook is visible so that others can
see, record, and screen capture.
4. If communica�ng in person or via video conferencing, use the “whiteboard policy”: write
by hand (if possible) on a whiteboard app and erase the whiteboard a�erward. Do not
take a photo or screen capture.

Devia�ng from these heuris�cs does not automa�cally qualify as academic misconduct;
however, following these heuris�cs greatly reduces the probability that your collabora�on will
not cross the line into misconduct.

As part of this course’s grading process, any suspicion of copying WILL be reported to the Office
of Student Integrity for further analysis.

All students must also ensure that they DO NOT make any of the code for problem sets publicly
available and are required to take steps to prevent future students from having access to it.
Consequently, if you're using any version control systems such as git, please make sure that you
mark your repositories as private.

You may not collaborate at all on the exams or quizzes. Students are not to discuss any
ques�ons or answers from the exams with classmates or anyone else un�l a�er the tes�ng
period is complete.

Use of ChatGPT and other Large Language Models


We treat AI-based assistance, such as ChatGPT, Copilot, Bard, Claude, GPT-3, GPT-4, or similar
(generally understood to be a language model with over 1 billion parameters) the same way we
treat collabora�on with other people: you are welcome to talk about your ideas and work with
other people, both inside and outside the class, as well as with AI-based assistants. However, all
work you submit must be your own. You should never include in your assignment anything that
was not writen directly by you without proper cita�on (including quota�on marks and in-line
cita�on for direct quotes).

This includes anything you did not write in your assignment without proper cita�on will be
treated as an academic misconduct case. If you are unsure where the line is between
collabora�ng with AI and copying from AI, we recommend the following heuris�cs:

1. Never hit "Copy" within your conversa�on with an AI assistant. You can copy your own
work into your conversa�on, but do not copy anything from the conversa�on back into
your assignment. Instead, use your interac�on with the AI assistant as a learning
experience, then let your assignment reflect your improved understanding.
2. Do not have your assignment and the AI agent open at the same �me. Like above, use
your conversa�on with the AI as a learning experience, then close the interac�on down,
open your assignment, and let your assignment reflect your revised knowledge. This
heuris�c includes avoiding using AI directly integrated into your composi�on
environment: just as you should not let a classmate write content or code directly into
your submission, so also you should avoid using tools that directly add content to your
submission.
3. In the likely event that Google Colab integrates language-to-code genera�on capabili�es,
it is recommended that you do not ac�vate this func�on. You may never have to write
some pieces of code from scratch, but you will learn more by doing it yourself at least
once.

Devia�ng from these heuris�cs does not automa�cally qualify as academic misconduct;
however, following these heuris�cs essen�ally guarantees your collabora�on will not cross the
line into misconduct.

Accommoda�ons for Students with Disabili�es


If you have learning needs that require special accommoda�on, contact the Office of Disability
Services at 404-894-2563 or htp://disabilityservices.gatech.edu/ as soon as possible to make an
appointment and discuss your special needs and to obtain an accommoda�ons leter. Please
also talk with us to discuss your learning needs.

Student-Faculty Expecta�ons Agreement


At Georgia Tech, it is important to strive for an atmosphere of mutual respect,
acknowledgement, and responsibility between faculty member and the student body. Please
see htp://catalog.gatech.edu/rules/22/ for an ar�cula�on of some basic expecta�ons that you
can have of us and that we have of you. These were adopted by both the faculty senate and the
student government. In the end, simple respect for knowledge, hard work, and cordial
interac�ons will help build the environment we seek. Therefore, we encourage you to remain
commited to the ideals of Georgia Tech while in this class, especially during class discussion.

Inclusion
The Georgia Ins�tute of Technology is commited to crea�ng a campus free of discrimina�on on
the basis of race, color, religion, sex, na�onal origin, age, disability, sexual orienta�on, gender
iden�ty, or veteran status. We further affirm the importance of cul�va�ng an intellectual
climate that allows us to beter understand the similari�es and differences of those who
cons�tute the Georgia Tech community, as well as the necessity of working against inequali�es
that may also manifest here as they do in the broader society.

If you have ques�ons


We’ll be using Ed Discussions in this course. Please use that forum to post ques�ons and
comments. The instruc�onal staff and other students will be of assistance there. The instructors
may offer live office hours at certain milestones during the semester.

If a�er contac�ng your TA and the instructor you do not feel your issue has been resolved, you
may escalate the issue by emailing [email protected].
Schedule
The following schedule is a suggested �meline for engaging with modules, though module
content can be accessed at any �me. The due dates below are for planning purposes and are
subject to change. Official due dates will be on Canvas. Students are responsible for monitoring
Canvas and Ed Discussion for announcements about due dates.

Week Date Module Reading Released Due


1 May 13 1: Intro to NLP Ch 1 May 15:
2: Founda�ons Appendix A • HW1 (neural nets)
• Module 1 quiz
• Module 2 quiz
2 May 20 3: Classifica�on Ch 2, 3 May 20: May 26:
• HW 2 (classifica�on) • HW 1
• Module 3 quiz • Module 1 quiz
• Module 2 quiz
• Module 3 quiz
3 May 27 4: Language models Ch 6 May 27: June 2:
• Module 4 quiz • Module 4 quiz

4 June 3 5: Seman�cs Ch 14 June 3: June 9:


• HW 3a (language • HW 2
models) • Module 5 quiz
• Module 5 quiz
5 June 10 6: Modern Neural June 10: June 16:
architectures • HW 3b (language • HW 3a
models) • Module 6 quiz
• Module 6 quiz
HW 3b (language models),
released 6/14
6 June 17 7: Informa�on retrieval June 17: June 23:
• Writen report • Module 7 quiz
• Module 7 quiz
7 June 24 8: Task-oriented Ch 19 June 24: June 30:
dialogue • HW 4 (distribu�onal • Writen report
seman�cs) • HW 3b
• Module 8 quiz • Module 8 quiz
8 July 1 9: Summariza�on Ch 11 July 1: July 7:
• Module 9 quiz • Module 9 quiz
9 July 8 10: Machine Reading July 8: July 14:
11: Open-Domain • HW 5 (mini-project) • HW 4
Ques�on-answering • Module 10 quiz • Module 10 quiz
• Module 11 quiz • Module 11 quiz
10 July 15 12: Machine Transla�on July 15: July 21:
• Module 12 quiz • Module 12 quiz
11 July 22 13: Privacy-Preserving July 22: July 28:
NLP • Final exam • Module 13 quiz
14: Responsible AI • Module 13 quiz • Module 14 quiz
• Module 14 quiz
12 July 29 July 30:
Week Date Module Reading Released Due
• Final exam
• HW 5
Aug 1 Last day of semester

All due dates are at midnight anywhere on Earth (AoE). For example, HW1 is due at Midnight
AoE on Sunday, September 3rd which is 7:00 AM Atlanta-�me (EST) on Monday September 4th.
But don't worry, Canvas adjusts the due date based on your computer's locale - be sure your
computer locale is set correctly.

You might also like