Clippers 10/4: Andy Goodhart on sentiment analysis for multi-label classification

Title: Perils of Legitimacy: How Legitimation Strategies Sow the Seeds of Failure in International Order

Abstract: Autocratic states are challenging U.S. power and the terms of the post-WWII security order. U.S. policy debates have focused on specific military and economic responses that might preserve the United States’ favorable position while largely taking for granted that the effort should be organized around a core of like-minded liberal states. I treat this U.S. emphasis on promoting a liberal narrative of international order as an effort to make U.S. hegemony acceptable to domestic and foreign audiences; it is a strategy to legitimate a U.S. led international hierarchy and mobilize political cooperation. Framing legitimacy in liberal terms is only one option, however. Dominant states have used a range of legitimation strategies that present unique advantages and disadvantages. The main choice these hierarchs face is whether to emphasize the order’s ability to solve problems or to advocate for a governing ideology like liberalism. This project aims to explain why leading states in the international system choose performance- or ideologically-based legitimation strategies and the advantages and disadvantages of each.

This research applies sentiment analysis techniques (that were designed to characterize text based on positive or negative language) to the multi-label classification of foreign policy texts. The goal is to take a corpus of foreign policy speeches and documents that include rhetoric intended to justify an empire or hegemon’s international behavior and build a data set that shows variation in this rhetoric over time. Custom dictionaries reflect vocabulary used by each hierarch to articulate their value proposition to subordinate political actors. The output of the model is the percentage of each text committed to performance- and ideologically-based legitimation strategies. Using sentiment analysis for document classification represents an improvement over supervised machine learning techniques, because it does not require the time-consuming step of creating training sets. It is also better suited to multi-label classification in which each document belongs to multiple categories. Supervised machine learning techniques are better suited to texts that are either homogenous in their category (e.g., a press release is either about health care or about foreign policy) or easily divided into sections that belong to homogenous categories.