LING 3802 — Language and Computers (Spring ’24)

T-Th 11:10–12:30, Baker 285
Instructor: Michael White

Description

What makes Siri and Alexa tick? How does Google Translate make sense of 100+ languages? And how the heck does ChatGPT work? (Or not?)

In this course, you will be given insight into the fundamentals of how computers are used to represent, process and organize textual and spoken information. We will cover the theory and practice of human language technology, going behind the scenes of internet search engines, spam filters, spell and grammar checkers, chatbots and dialogue systems, automatic translators and more — discussing both how they work and why they often don’t. We will also consider social and ethical considerations such as privacy, job creation and loss due to language technologies, and the nature of consciousness and machine intelligence.

General Education Goals and Expected Learning Outcomes

GEL Quantitative Reasoning: Mathematical and Logical Analysis

Goals: Students develop skills in quantitative literacy and logical reasoning, including the ability to identify valid arguments, and use mathematical models.

Expected Learning Outcomes:
Students comprehend mathematical concepts and methods adequate to construct valid arguments, understand inductive and deductive reasoning, and increase their general problem solving skills.

The course satisfies the goals and learning outcomes by using natural language systems to motivate students to exercise and develop a range of basic skills in formal and computational analysis. The course philosophy is to ground abstract concepts in real world examples. We introduce strings, regular expressions, finite-state and context-free grammars, as well as probabilistic algorithms defined over these structures and techniques for probing and evaluating systems that rely on these algorithms. The course goes beyond merely subjective evaluation of systems, emphasizing analysis and reasoning to draw and argue for valid conclusions about the design, capabilities and behavior of natural language systems.

GEN Theme: Number, Nature, and Mind

There are three goals associated with this theme:

Goal 1: Successful students will analyze an important topic or idea at a more advanced and in-depth level than the foundations. In this context, “advanced” refers to courses that are e.g., synthetic, rely on research or cutting-edge findings, or deeply engage with the subject matter, among other possibilities.

Expected Learning Outcomes: Successful students are able to …

  • 1.1 Engage in critical and logical thinking.
  • 1.2 Engage in an advanced, in-depth, scholarly exploration of the topic
    or ideas within this theme.

Goal 2: Successful students will integrate approaches to the theme by making connections to out-of-classroom experiences with academic knowledge or across disciplines and/or to work they have done in previous classes and that they anticipate doing in future.

Expected Learning Outcomes: Successful students are able to …

  • 2.1 Identify, describe, and synthesize approaches or experiences.
  • 2.2 Demonstrate a developing sense of self as a learner through reflection, self-assessment, and creative work, building on prior experiences to respond to new and challenging contexts.

Goal 3: Successful students will experience and examine mathematics as an abstract formal system accessible to mental manipulation and/or mathematics as a tool for describing and understanding the natural world.

Expected Learning Outcomes: Successful students are able to …

  • 3.1 Analyze and describe how mathematics functions as an idealized system that enables logical proof and/or as a tool for describing and understanding the natural world or human cognition.

This course will meet these objectives in the following ways:

  • In this course you will learn to analyze human language using multiple types of
    mathematical reasoning drawn from fields such as statistics and formal language theory. In-class exercises, class discussions, and homework assignments will give you the opportunity to apply mathematical reasoning to specific sets of language data and to reason about what analyses of the data follow as logical consequences. You will think critically about what these types of reasoning can show about language as a human cognitive system, including its limitations. At the end of the course, you write a final essay on the social or ethical implications of language technology (specific topic of your choice). This will allow you to engage more deeply with the course material in an area of personal interest, to synthesize different perspectives on language technologies, and to draw on mathematical reasoning skills developed during the semester, in order to build a convincing argument about the role of language technology in modern society. Most topics involve issues that can be argued either way.
  • You are encouraged to draw connections between the concepts introduced in class and your prior knowledge. This knowledge may come from previous coursework in related areas (e.g., Foundations GE coursework in Mathematical and Quantitative Reasoning), everyday experience interacting with language technologies, and/or introspection about how language works and how you as a speaker internally process language as a cognitive system. In-class exercises, class discussions, and homework assignments in this class will analyze language technologies that you probably use every day (e.g., predictive texting, voice assistants), so this class will give you a chance to connect course concepts to your experience using these technologies.
  • In this course you will extensively practice applying mathematical tools to understand the natural phenomenon of human language. You will identify the advantages and disadvantages of studying language using these idealized, formal systems.

Carmen

We’ll be using the Carmen system for the schedule, homework and reading assignments. There will also be discussion forums for posting questions and providing feedback (comments, complaints or ideas) during the course.

Note that email from Carmen is sent to your official email address (Name.Number@osu.edu). You should read email sent to your official OSU account on a daily basis.

Readings

We’ll be using the draft second edition of the textbook also entitled (not coincidentally!) Language and Computers, by Lelia Glass, Markus Dickinson, Chris Brew and Detmar Meurers (available on Carmen). We will also draw from the NLTK Book, entitled Natural Language Processing with Python — Analyzing Text with the Natural Language Toolkit, by Steven Bird, Ewan Klein, and Edward Loper. This book is available freely on-line. Finally, we’ll use some material from the draft third edition of Speech and Language Processing by Dan Jurafsky and James H. Martin, also currently available free online. Other readings may also be assigned periodically.

Online quizzes will assess your understanding of the readings prior to the classes covering the material. Classes will be dedicated to in-class activities that explore selected topics in greater depth as well as topics not covered by the textbooks.

Materials for in-class activities for each unit will be posted on Carmen, as will the slides presented in class. These slides are meant to aid classroom discussion and cannot replace actually being in class.

Requirements

The basic requirement is regular attendance in class and active participation. There will be one to two quizzes and (roughly) one homework assignment per textbook chapter, which will give you the opportunity to explore new aspects of the topics discussed in class. There will also be an essay on social/ethical considerations involving language technology. The midterm will be on the material covered in the first half of the class; the final will be on the material covered in the second half of the class, assuming the material from the first half as background knowledge.

Hardware/Software

You will need to either bring a laptop or tablet to each class session for in-class activities or use a lab computer. (If not using a lab computer, you may either use your own device or obtain a loaner via the Student Technology Loan Program.)

For the in-class activities (and some homework exercises) we will make use of Google Colab notebooks, a cloud-based service which combine textual descriptions with executable Python code. These notebooks are very convenient, especially insofar as they avoid the need to install any software on your own device. Using them requires you to use (or create) your own Google account, subject to Google’s terms of service. Alternatively, you can choose to install Python on your own device, along with various Python libraries including NLTK. Similarly, some activities will involve using large language models such as OpenAI’s ChatGPT or Google’s Bard, which have their own terms of service; alternative activities will be made available upon timely request.

While this course does not have Python coding skills as a prerequisite, and does not aim to teach you to be a proficient Python programmer, through the course you will learn enough Python coding basics to get started with using natural language data for linguistic analysis and data science.

Grading

Grades will be assigned according to the following scheme:

  • Quizzes (5%): Quizzes will be administered on-line through Carmen and are due by midnight of the day indicated. The quizzes naturally are open book, but you should finish the reading before attempting them as only one attempt is allowed. They will be shut off automatically once the deadline is reached, so do not put it off to the last minute! Note that I do not promise to remind you when you have a quiz due; it is your responsibility to keep up with the schedule on Carmen. The lowest quiz grade will be dropped.
  • Homework assignments (40%): Homework assignments are due by the beginning of class, in Carmen. PDF format is preferred. No late homeworks will be accepted. The lowest homework grade will be dropped.Homeworks should be done individually. Homework problems are typically similar to ones explored in groups during class; as such, regular attendance and active participation is vital to doing well on the homeworks.
  • Essay (15%): A 1000–1500 word essay on a topic dealing with the social implications of language technology.
  • Midterm exam (20%): The midterm will be given in class on Thursday, February 29.
  • Final exam (20%): The final will be given on Monday, April 29 (10:00–11:45) in our regular classroom. Update (3/18): If you do better on the final exam than on the midterm, your final exam score will count 30% and your midterm score 10%.
  • Class participation (+5%): Given that the homeworks and exams reflect the material covered in class, attendance is essential for doing well in this course, as is your active participation in class discussion and in-class activities. As such, participation will contribute bonus credit of up to 5% to your grade, based on the number of in-class activities completed.

Grades will be assigned using the standard OSU scale.

Make-up Policy

If you know you won’t be able to make a deadline or exam, please see me before you miss the deadline or exam. If you miss the midterm or final, you will have to provide extensive written documentation for your excuse.

Class Etiquette

I expect you to respect one another, to respect me, and to respect yourself. To that end, I expect you to obey the following rules:

  • Participate: share experiences, ask questions, express your opinions. Ask me to provide more information, send me emails or see me during office hours for help, clarification, or recommendations for further research.
  • Do not read newspapers, materials from other classes, instagram posts, email, etc. in class. Do not pack up early. Switch off your cell phone. If for some reason, you must leave early or you have an important call coming in, notify me before class.

Policy on Academic Misconduct

It is the responsibility of the Committee on Academic Misconduct to investigate or establish procedures for the investigation of all reported cases of student academic misconduct. The term “academic misconduct” includes all forms of student academic misconduct wherever committed; illustrated by, but not limited to, cases of plagiarism and dishonest practices in connection with examinations. Instructors shall report all instances of alleged academic misconduct to the committee (Faculty Rule 3335-5-487). For additional information, see the Code of Student Conduct.

Artificial Intelligence and Academic Integrity

There has been a significant increase in the popularity and availability of a variety of generative artificial intelligence (AI) tools, including ChatGPT, Sudowrite and others. These tools will help shape the future of work, research and technology but when used in the wrong way, they can stand in conflict with academic integrity at Ohio State.

All students have important obligations under the Code of Student Conduct to complete all academic and scholarly activities with fairness and honesty. Our professional students also have the responsibility to uphold the professional and ethical standards found in their respective academic honor codes. Specifically, students are not to use unauthorized assistance in the laboratory, on field work, in scholarship or on a course assignment unless such assistance has been authorized specifically by the course instructor. In addition, students are not to submit their work without acknowledging any word-for-word use and/or paraphrasing of writing, ideas or other work that is not your own. These requirements apply to all students undergraduate, graduate, and professional.

To maintain a culture of integrity and respect, these generative AI tools should not be used in the completion of course assignments unless an instructor for a given course specifically authorizes their use. Some instructors may approve of using generative AI tools in the academic setting for specific goals. However, these tools should be used only with the explicit and clear permission of each individual instructor, and then only in the ways allowed by the instructor.

Students with Disabilities

The university strives to maintain a healthy and accessible environment to support student learning in and out of the classroom. If you anticipate or experience academic barriers based on your disability (including mental health, chronic, or temporary medical conditions), please let me know immediately so that we can privately discuss options. To establish reasonable accommodations, I may request that you register with Student Life Disability Services. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion.

If you are isolating while waiting for a COVID-19 test result, please let me know immediately. Those testing positive for COVID-19 should refer to the Safe and Healthy Buckeyes site for resources. Beyond five days of the required COVID-19 isolation period, I may rely on Student Life Disability Services to establish further reasonable accommodations. SLDS contact information: slds@osu.edu; 614-292-3307; slds.osu.edu.

Religious Accommodations

It is Ohio State’s policy to reasonably accommodate the sincerely held religious beliefs and practices of all students. The policy permits a student to be absent for up to three days each academic semester for reasons of faith or religious or spiritual belief.

Students planning to use religious beliefs or practices accommodations for course requirements must inform the instructor in writing no later than 14 days after the course begins. The instructor is then responsible for scheduling an alternative time and date for the course requirement, which may be before or after the original time and date of the course requirement. These alternative accommodations will remain confidential. It is the student’s responsibility to ensure that all course assignments are completed.

Weather or Other Short-Term Closing

Should in-person classes be canceled, we will meet virtually via CarmenZoom during our regularly scheduled time. I will share any updates via Carmen and email.

Disclaimer

This syllabus is subject to change. All important changes will be made in writing (email or Carmen announcement), with ample time for adjustment.