You are here

Text Mining

Text Mining
A Guidebook for the Social Sciences

April 2016 | 208 pages | SAGE Publications, Inc
Online communities generate massive volumes of natural language data and the social sciences continue to learn how to best make use of this new information and the technology available for analyzing it. Text Mining brings together a broad range of contemporary qualitative and quantitative methods to provide strategic and practical guidance on analyzing large text collections. This accessible book, written by a sociologist and a computer scientist, surveys the fast-changing landscape of data sources, programming languages, software packages, and methods of analysis available today. Suitable for novice and experienced researchers alike, the book will help readers use text mining techniques more efficiently and productively.

Available with
 Perusall—an eBook that makes it easier to prepare for class
Perusall is an award-winning eBook platform featuring social annotation tools that allow students and instructors to collaboratively mark up and discuss their SAGE textbook. Backed by research and supported by technological innovations developed at Harvard University, this process of learning through collaborative annotation keeps your students engaged and makes teaching easier and more effective. Learn more

Part I: Digital Texts, Digital Social Science
1. Social Science and the Digital Text Revolution
Learning Objectives


History of Text Analysis

Risk and Rewards of Text Mining for the Social Sciences

Social Data from Digital Environments

Theory and Metatheory

Ethics of Text Mining

Organization of This Volume

2. Research Design Strategies
Learning Objectives


Levels of Analysis

Strategies for Document Selection and Sampling

Types of Inferential Logic

Approaches to Research Design

Part II: Text Mining Fundamentals

3. Web Crawling and Scraping
Learning Objectives


Web Statistics

Web Crawling

Web Scraping

Software for Web Crawling and Scraping

4. Lexical Resources
Learning Objectives



Roget's Thesaurus

Linguistic Inquiry and Word Count

General Inquirer


Downloadable Lexical Resources and APIs

5. Basic Text Processing
Learning Objectives



Stopword Removal

Stemming and Lemmatization

Text Statistics

Language Models

Other Text Processing

Software for Text Processing

6. Supervised Learning
Learning Objectives

Feature Representation and Weighting

Supervised Learning Algorithms

Evaluation of Supervised Learning

Software for Supervised Learning

Part III: Text Analysis Methods from the Humanities and Social Sciences
7. Thematic Analysis, QDAS, and Visualization
Learning Objectives

Thematic Analysis

Qualitative Data Analysis Software

Visualization Tools

8. Narrative Analysis
Learning Objectives


Conceptual Foundations

Mixed Methods of Narrative Analysis

Automated Approaches to Narrative Analysis

Future Directions

Specialized Software for Narrative Analysis

9. Metaphor Analysis
Learning Objectives


Theoretical Foundations

Qualitative Metaphor Analysis

Mixed Methods of Metaphor Analysis

Automated Metaphor Identification Methods

Software for Metaphor Analysis

Part IV: Text Mining Methods from Computer Science
10. Word and Text Relatedness
Learning Objectives


Theoretical Foundations

Corpus-based and Knowledge-based Measures of Relatedness

Software and Datasets for Word and Text Relatedness

Further Reading

11. Text Classification
Learning Objectives


Applications of Text Classification

Representing Texts for Supervised Text Classification

Text Classification Algorithms

Bootstrapping in Text Classifcation

Evaluation of Text Classification

Software and Datasets for Text Classification

12. Information Extraction
Learning Objectives


Entity Extraction

Relation Extraction

Web Information Extraction

Template Filling

Software and Datasets for Information Extraction and Text Mining

13. Information Retrieval
Learning Objectives


Theoretical Foundations

Components of an Information Retrieval System

Information Retrieval Models

The Vector-Space Model

Evaluation of Information Retrieval Models

Web-Based Information Retrieval

Software and Datasets for Information Retrieval

14. Sentiment Analysis
Learning Objectives


Theoretical Foundations




Future Directions

Software and Datasets for Word and Text Relatedness

15. Topic Models
Learning Objectives


Digital Humanities

Political Science


Software for Topic Modeling

V: Conclusions
16. Text Mining, Text Analysis, and the Future of Social Science

Social and Computer Science Collaboration



Student Resource Site

Visit the companion website for free access to data files and links to web resources.

Text Mining and Analysis is a comprehensive book that deals with the latest developments of text mining research, methodology, and applications. An excellent choice for anyone who wants to learn how these emerging practices can benefit their own research in an era of Big Data.

Kenneth C. C. Yang
The University of Texas at El Paso

This is a clear, comprehensive and thorough description of new text mining techniques and their applications: a "must" for students and social researchers who wish to understand how to tackle the challenges raised by Big Data.

Aude Bicquelet
London School of Economics

Clear presentation of text mining best practices. It also calls attention to the need to develop complex interpretation strategies for data acquired through various mining practices.

Mr Elias Ortega-Aponte
Graduate Division of Religion, Drew University
September 9, 2016

Never received the review copy.

Dr Babette Protz
Humanities Division, Univ Of S Carolina-Lancaster
December 16, 2015

Sample Materials & Chapters

Chapter 3

Chapter 9

Gabe Ignatow

Gabe Ignatow is Professor of Sociology and Director of Graduate Studies at the University of North Texas. His research interests are mainly in the areas of sociological theory, digital research methods, cognitive social science, and the philosophy of social science. His most recent books are Text Mining and An Introduction to Text Mining, both coauthored with Rada Mihalcea (University of Michigan). He is also a coeditor, with Wayne Brekhus (University of Missouri), of the Oxford Handbook of Cognitive Sociology. More About Author

Rada F. Mihalcea

Rada Mihalcea is a professor of computer science and engineering at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the following journals: Computational Linguistics, Language Resources and Evaluation, Natural Language Engineering, Research on Language and Computation, IEEE Transactions on Affective Computing, and Transactions of the Association for Computational Linguistics. She was a general... More About Author

Purchasing options

Please select a format:

ISBN: 9781483369341

SAGE Research Methods is a research methods tool created to help researchers, faculty and students with their research projects. SAGE Research Methods links over 175,000 pages of SAGE’s renowned book, journal and reference content with truly advanced search and discovery tools. Researchers can explore methods concepts to help them design research projects, understand particular methods or identify a new method, conduct their research, and write up their findings. Since SAGE Research Methods focuses on methodology rather than disciplines, it can be used across the social sciences, health sciences, and more.