T-Th 11:10–12:30, Cockins 312
Instructor: Michael White
What makes Siri and Alexa tick? How does Google Translate make sense of 100+ languages? And how the heck does ChatGPT work? (Or not?)
In this course, you will be given insight into the fundamentals of how computers are used to represent, process and organize textual and spoken information. We will cover the theory and practice of human language technology, going behind the scenes of internet search engines, spam filters, spell and grammar checkers, dialogue systems, automatic translators and more — discussing both how they work and why they often don’t. We will also consider social and ethical considerations such as privacy, job creation and loss due to language technologies, and the nature of consciousness and machine intelligence.
General Education Goals and Expected Learning Outcomes
GEL Quantitative Reasoning: Mathematical and Logical Analysis
Goals: Students develop skills in quantitative literacy and logical reasoning, including the ability to identify valid arguments, and use mathematical models.
Expected Learning Outcomes:
Students comprehend mathematical concepts and methods adequate to construct valid arguments, understand inductive and deductive reasoning, and increase their general problem solving skills.
GEN Foundation: Mathematical and Quantitative Reasoning
Goals: Successful students will be able to apply quantitative or logical reasoning and/or mathematical/ statistical methods to understand and solve problems and will be able to communicate their results.
Expected Learning Outcomes:
Successful students are able to:
- Use logical, mathematical and/or statistical concepts and methods to represent real-world situations.
- Use diverse logical, mathematical and/or statistical approaches, technologies and tools to communicate about data symbolically, visually, numerically and verbally.
- Draw appropriate inferences from data based on quantitative analysis and/or logical reasoning.
- Make and evaluate important assumptions in estimation, modeling, logical argumentation and/or data analysis.
- Evaluate social and ethical implications in mathematical and quantitative reasoning.
The course satisfies the goals and learning outcomes by using natural language systems to motivate students to exercise and develop a range of basic skills in formal and computational analysis. The course philosophy is to ground abstract concepts in real world examples. We introduce strings, regular expressions, finite-state and context-free grammars, as well as probabilistic algorithms defined over these structures and techniques for probing and evaluating systems that rely on these algorithms. The course goes beyond merely subjective evaluation of systems, emphasizing analysis and reasoning to draw and argue for valid conclusions about the design, capabilities and behavior of natural language systems.
We’ll be using the Carmen system for the schedule, homework and reading assignments. There will also be discussion forums for posting questions and providing feedback (comments, complaints or ideas) during the course.
Note that email from Carmen is sent to your official email address (
Name.Number@osu.edu). You should read email sent to your official OSU account on a daily basis.
We’ll be using the draft second edition of the textbook also entitled (not coincidentally!) Language and Computers, by Lelia Glass, Markus Dickinson, Chris Brew and Detmar Meurers (available on Carmen). We will also draw from the NLTK Book, entitled Natural Language Processing with Python — Analyzing Text with the Natural Language Toolkit, by Steven Bird, Ewan Klein, and Edward Loper. This book is available freely on-line. Finally, we’ll use some material from the draft third edition of Speech and Language Processing by Dan Jurafsky and James H. Martin, also currently available free online. Other readings may also be assigned periodically.
Online quizzes will assess your understanding of the readings prior to the classes covering the material. Classes will be dedicated to in-class activities that explore selected topics in greater depth as well as topics not covered by the textbooks.
Materials for in-class activities for each unit will be posted on Carmen, as will the slides presented in class. These slides are meant to aid classroom discussion and cannot replace actually being in class.
The basic requirement is regular attendance in class and active participation. There will be one to two quizzes and (roughly) one homework assignment per textbook chapter, which will give you the opportunity to explore new aspects of the topics discussed in class. There will also be an essay on social/ethical considerations involving language technology. The midterm will be on the material covered in the first half of the class; the final will be on the material covered in the second half of the class, assuming the material from the first half as background knowledge.
You will need to bring a laptop or tablet to each class session for in-class activities. You may either use your own device or obtain a loaner via the Student Technology Loan Program.
For the in-class activities (and some homework exercises) we will make use of Google Colab notebooks, a cloud-based service which combine textual descriptions with executable Python code. These notebooks are very convenient, especially insofar as they avoid the need to install any software on your own device. Using them requires you to use (or create) your own Google account, subject to Google’s terms of service. Alternatively, you can choose to install Python on your own device, along with various Python libraries including NLTK. Similarly, some activities will involve using large language models such as OpenAI’s ChatGPT or Google’s Bard, which have their own terms of service; alternative activities will be made available upon timely request.
While this course does not have Python coding skills as a prerequisite, and does not aim to teach you to be a proficient Python programmer, through the course you will learn enough Python coding basics to get started with using natural language data for linguistic analysis and data science.
Grades will be assigned according to the following scheme:
- Quizzes (5%): Quizzes will be administered on-line through Carmen and are due by midnight of the day indicated. The quizzes naturally are open book, but you should finish the reading before attempting them as only one attempt is allowed. They will be shut off automatically once the deadline is reached, so do not put it off to the last minute! Note that I do not promise to remind you when you have a quiz due; it is your responsibility to keep up with the schedule on Carmen. The lowest quiz grade will be dropped.
- Homework assignments (40%): Homework assignments are due by the beginning of class, in Carmen. PDF format is preferred. No late homeworks will be accepted. The lowest homework grade will be dropped.
Homeworks should be done individually. Homework problems are typically similar to ones explored in groups during class; as such, regular attendance and active participation is vital to doing well on the homeworks.
- Essay (15%): A 1000–1500 word essay on a topic dealing with the social implications of language technology.
- Midterm exam (20%): The midterm will be given in class on Tuesday, October 17.
- Final exam (20%): The final will be given on Monday, December 11 (12:00–1:45).
- Class participation (+5%): Given that the homeworks and exams reflect the material covered in class, attendance is essential for doing well in this course, as is your active participation in class discussion and in-class activities. As such, participation will contribute bonus credit of up to 5% to your grade, based on the number of in-class activities completed.
Grades will be assigned using the standard OSU scale.
If you know you won’t be able to make a deadline or exam, please see me before you miss the deadline or exam. If you miss the midterm or final, you will have to provide extensive written documentation for your excuse.
I expect you to respect one another, to respect me, and to respect yourself. To that end, I expect you to obey the following rules:
- Participate: share experiences, ask questions, express your opinions. Ask me to provide more information, send me emails or see me during office hours for help, clarification, or recommendations for further research.
- Do not read newspapers, materials from other classes, instagram posts, email, etc. in class. Do not pack up early. Switch off your cell phone. If for some reason, you must leave early or you have an important call coming in, notify me before class.
Policy on Academic Misconduct
It is the responsibility of the Committee on Academic Misconduct to investigate or establish procedures for the investigation of all reported cases of student academic misconduct. The term “academic misconduct” includes all forms of student academic misconduct wherever committed; illustrated by, but not limited to, cases of plagiarism and dishonest practices in connection with examinations. Instructors shall report all instances of alleged academic misconduct to the committee (Faculty Rule 3335-5-487). For additional information, see the Code of Student Conduct.
Artificial Intelligence and Academic Integrity
There has been a significant increase in the popularity and availability of a variety of generative artificial intelligence (AI) tools, including ChatGPT, Sudowrite and others. These tools will help shape the future of work, research and technology but when used in the wrong way, they can stand in conflict with academic integrity at Ohio State.
All students have important obligations under the Code of Student Conduct to complete all academic and scholarly activities with fairness and honesty. Our professional students also have the responsibility to uphold the professional and ethical standards found in their respective academic honor codes. Specifically, students are not to use unauthorized assistance in the laboratory, on field work, in scholarship or on a course assignment unless such assistance has been authorized specifically by the course instructor. In addition, students are not to submit their work without acknowledging any word-for-word use and/or paraphrasing of writing, ideas or other work that is not your own. These requirements apply to all students undergraduate, graduate, and professional.
To maintain a culture of integrity and respect, these generative AI tools should not be used in the completion of course assignments unless an instructor for a given course specifically authorizes their use. Some instructors may approve of using generative AI tools in the academic setting for specific goals. However, these tools should be used only with the explicit and clear permission of each individual instructor, and then only in the ways allowed by the instructor.
Students with Disabilities
The university strives to maintain a healthy and accessible environment to support student learning in and out of the classroom. If you anticipate or experience academic barriers based on your disability (including mental health, chronic, or temporary medical conditions), please let me know immediately so that we can privately discuss options. To establish reasonable accommodations, I may request that you register with Student Life Disability Services. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion.
If you are isolating while waiting for a COVID-19 test result, please let me know immediately. Those testing positive for COVID-19 should refer to the Safe and Healthy Buckeyes site for resources. Beyond five days of the required COVID-19 isolation period, I may rely on Student Life Disability Services to establish further reasonable accommodations. SLDS contact information: email@example.com; 614-292-3307; slds.osu.edu.
It is Ohio State’s policy to reasonably accommodate the sincerely held religious beliefs and practices of all students. The policy permits a student to be absent for up to three days each academic semester for reasons of faith or religious or spiritual belief.
Students planning to use religious beliefs or practices accommodations for course requirements must inform the instructor in writing no later than 14 days after the course begins. The instructor is then responsible for scheduling an alternative time and date for the course requirement, which may be before or after the original time and date of the course requirement. These alternative accommodations will remain confidential. It is the student’s responsibility to ensure that all course assignments are completed.
Weather or Other Short-Term Closing
Should in-person classes be canceled, we will meet virtually via CarmenZoom during our regularly scheduled time. I will share any updates via Carmen and email.
This syllabus is subject to change. All important changes will be made in writing (email), with ample time for adjustment.