August 21st, 2017
Text Analytics for Mining Twitter Data
Dr. Stuart Shulman
Founder & CEO, Texifter
Inventor, entrepreneur, US Soccer National C licensed high school and Olympic Development Program coach, proud Revolution Academy dad, CEO @texifter, inventor of @discovertext, and Taoist garlic grower.
Overview of Workshop
Participate in this workshop to learn how to build custom machine classifiers for sifting Twitter data. The topics covered include how to:
• construct precise social data fetch queries,
• use Boolean search on resulting archives,
• filter on metadata or other project attributes,
• tabulate, explore, and set aside duplicates, cluster near-duplicates,
• crowd source human coding,
• measure inter-rater reliability,
• adjudicate coder disagreements, and
• build high quality word sense and topic disambiguation engines.
DiscoverText is designed specifically for collecting and cleaning up messy Twitter and other text data streams. Use basic research measurement tools to improve human and machine performance classifying data over time. The workshop covers how to reach and substantiate inferences using a theoretical and applied model informed by a decade of interdisciplinary, National Science Foundation-funded research into the text classification problem.
The key breakthrough led to a patent (US No. 9,275,291) being issued on March 1, 2016. We built a tools for adjudicating the work of coders. For example, if I ask 10 students to look at 100 Tweets that mention “penguins” and code whether or not they are about the NHL’s Pittsburgh Penguins, there will be imperfect agreement. Some coders will have deeper knowledge of the subject and some Tweets will be inscrutably ambiguous. Adjudication allows an expert to review the way the group labeled the Tweets and decide who was right and wrong. This method of validation creates a “gold standard” and it allows us to score over time the likelihood that an individual coder will create a valid observation. Participants will learn how to apply “CoderRank” in machine-learning. The major idea of the workshop is that when training machines for text analysis, greater reliance should be placed on the input of those humans most likely to create a valid observation. Texifter proposed a unique way to recursively validate, measure, and rank humans on trust and knowledge vectors, and called it CoderRank.
The Quello Center is hosting this workshop. It will be free to participants, but limited in numbers, so please sign up as soon as possible to reserve your space with Anne Marie Salter at the Quello Center: firstname.lastname@example.org
Dr. Stuart W. Shulman is founder & CEO of Texifter. He was a Research Associate Professor of Political Science at the University of Massachusetts Amherst and the founding Director of the Qualitative Data Analysis Program (QDAP) at the University of Pittsburgh and at UMass Amherst. Dr. Shulman is Editor Emeritus of the Journal of Information Technology & Politics, the official journal of Information Technology & Politics section of the American Political Science Association.