Ling 8800 — Seminar in Computational Linguistics (Autumn ’16)

Autumn ’16, M 10:45–12:30, Caldwell Lab 183
Instructor: Michael White

Recognizing and Generating Paraphrases

Description

There’s more than one way to skin a cat — in other words, Dermis and feline can be divorced by manifold methods (internet), There are more ways to kill a dog than hanging (1678), or TIMTOWTDI, i.e. There is more than one way to do it (Perl motto).

The myriad ways in which the same idea can be expressed in natural language is a central problem in computational linguistics, posing difficult challenges for tasks such as information extraction, question answering and summarization that require some degree of paraphrase identification. At the same time, the periphrastic capabilities found in language offer opportunities for avoiding repetition and varying complexity or expressiveness in natural language generation.

In this seminar, we will read and discuss seminal and recent research on recognizing and generating paraphrases, with an eye towards applications including question answering, text simplification, and data augmentation. Topics will be drawn from those listed below, as well as any related ones suggested by student input.

  • paraphrase identification
  • paraphrase resources
  • paraphrase alignment
  • paraphrasing as monolingual machine translation
  • paraphrasing applications: question answering, text
    simplification, data augmentation

Expectations

Students will be expected to actively participate in the discussion and research carried out in the seminar. As detailed below, students will be required to facilitate discussions and post questions on the readings in advance, as well as locate relevant background/tutorial materials. Additionally, students taking the course for 3 credits will be required to carry out a class project on a topic related to the seminar; alternatively, for students already working on a related topic, integrating their focus into the seminar will be an option.

Prerequisites

Ling 5802 or equivalent, or permission of the instructor.

Carmen

We’ll use Carmen to schedule discussion facilitators and post advance questions on the readings, as well as links to background/tutorial materials. We’ll also use it for submitting project documents.

Requirements

Class participation (25%)

We are aiming for a dynamic discussion of papers, not death by powerpoint. Thus, we plan on taking a page from Eric Fosler-Lussier’s playbook, and requiring everyone (this includes you!) to post at least one question to the discussion list on Carmen by 8 p.m. the evening before each week’s readings will be discussed. Participants should also feel free to share their (initial) thoughts and views of the papers in their posts. In particular, questions of the type “What did they mean by X” or “Why did they do X instead of Y” are encouraged. Remember that most of the papers are targeted to people who are already expert in the area, so you shouldn’t expect to alway understand everything. Airing such questions can help everyone gain a better understanding of the paper — even those who thought they understood it!

Additionally, for this year’s seminar we are going to split each week’s meetings into one part devoted to the primary readings and one part devoted to background/tutorial materials, which students will be responsible for locating and going over in class. Thus, the expected schedule for each week is as follows:

  • Tuesday: Main readings for the next week assigned
  • Thursday: Skim readings, looking for issues or techniques where background/tutorial materials would be helpful; start scouring web for any such materials
  • Friday: Post links to background/tutorial materials on Carmen by 8 p.m., including an explanation of what was found to be helfpul
  • Sunday: Post questions on main readings on Carmen by 8 p.m.
  • Monday: Go over background/tutorial materials and discuss readings

Facilitating discussions (25%)

Each week’s meeting will have a discussion facilitator. For the main readings, the facilitator should look over the posted questions and choose a subset for discussion. In class, the facilitator should start the session with a brief, five to ten minute summary of the papers, including the highlights and lowlights. Following the opening summary, the facilitator is responsible for managing the discussion, and ensuring that as many viewpoints are heard as possible.

For the part of the meeting on background/tutorial materials, the facilitator should come prepared to go over the materials that s/he found, as well as to determine when it would make sense ask other participants to go through the materials they found. Note that participants other than the facilitator should therefore also come prepared to go over the background/tutorial materials they found, at least briefly.

Students will be required to facilitate one or two sessions during the course. If the discussion does not take up the entire class period, the remaining time may be used to (informally) discuss class projects.

Term project (50%)

As noted above, students taking the course for 3 credits will be required to carry out a term project, either alone or in a team setting. A project sketch will be required to be presented informally in class for brainstorming during the eighth week, followed by a presentation during the last week of class, and a final report by the day the final exam would be held (if there were one).

For students taking the course for 1 credit, no project will be required, and class participation and facilitating discussions will each count for half of the class requirements.

Topics

The topics and readings we expect to cover are listed below; these will be refined as the course progresses.

Paraphrase Identification

Paraphrase Resources

Paraphrase Alignment

Paraphrasing as Monolingual Machine Translation

Paraphrasing Applications: Question Answering, Text
Simplification, Data Augmentation

Policy on Academic Misconduct

As with any class at this university, students are required to follow the Ohio State Code of Student Conduct. In particular, note that students are not allowed to, among other things, submit plagiarized (copied but unacknowledged) work for credit. If any violation occurs, the instructor is required to report the violation to the Council on Academic Misconduct.

Students with Disabilities

Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations. I rely on the Office of Disability Services for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted the Office for Disability Services are encouraged to do so (292-3307; http://www.ods.ohio-state.edu).

Disclaimer

This syllabus is subject to change. All important changes will be made in
writing (email), with ample time for adjustment.