Wednesday/Friday 10:00-11:30, SBS 414

Prof. Juan Pablo Pardo-Guerra

This class introduces students to critical perspectives on data, data science methods, machine learning, and algorithms from and for the social sciences. SOCG 290 is organized as a workshop: students work in groups on original research projects spanning three thematic units (plus additional on-demand topics) that cover core competencies of computational social scientists.

            As such, this course does not: 1) teach you how to program, 2) teach formal computational methods, 3) teach magical techniques for predicting human behavior; 4) lead to a profitable career at Facebook. The added value of the course is getting you to reflect about the assumptions, practices, and implications of working with ‘big’ data. This far outweighs knowing how to code: recent public controversies around big data are all the result of coding practices that didn’t take into account how algorithms and data structures reproduce biases, inequalities, and stereotypes. In this class, we will learn how to reflect on these and other issues and incorporate them into your research design for the production of theoretically robust, and methodologically coherent sociological claims.


WEEK 1 – What is data?

Key themes:

  1. Why survey design still matters
  2. ‘Natural’ data
  3. Data structures and ‘indexicality’
    1. Categorical “back to the futures”
  4. Data provenance
  5. Working with large numbers
  6. Correlation is not causation
  7. Some thoughts on social data (ergodicity, predictability, and emergence)

**McFarland, D. A., & McFarland, H. R. (2015). Big Data and the danger of being precisely inaccurate. Big Data & Society.

**Lewis, K. (2015). Three fallacies of digital footprints. Big Data & Society.

**Bright, J. (2017). Big social science: doing big data in the social sciences. In Fielding, N., Lee, R., & Blank, G. The SAGE Handbook of online research methods (pp. 125-139).

Boellstorff, Tom (2013) “Making big data, in theory” First Monday, 18(10)

DOI: 10.5210/fm.v18i10.4869.

Desroisieres, Alain (1991) “How to Make Things Which Hold Together: Social Science, Statistics and the State” in Peter Wagner et. al. Discourses on Society: The Shaping of the Social Science Disciplines Springer

Rosenberg, Daniel (2013) ‘Data before the fact’ in Gitelman, Lisa, “Raw Data is an oxymoron”, MIT Press: Cambridge, MA.

Rieder, Bernard (2012) “What is PageRank? A historical and conceptual investigation of a recursive status index” Computational Culture Available at:

Newitz, A. (2017) “The secret lives of Google raters” At:

WEEK 2 – Why does big data matter?

Key themes:

  1. Thinking about your projects
  2. A methodological revolution?
    1. Big data across the social sciences
  3. A social revolution?
    1. Big data in prediction, surveillance, and nudging

**Brayne, Sarah. (2017) “Big data surveillance: the case of policing” American Sociological Review

Introna, Lucas. (2015) “Algorithms, governance and governamentality” Science, Technology and human Values

**Fourcade, Marion and Kieran Healy (2013) “Classification Situations: Life-chances in the Neoliberal Era.” Accounting, Organizations, and Society 38: 559–572.

**boyd, danah and Kate Crawford (2012) “Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon”,  Information, Communication & Society 15(5)

UNIT 2 – METHODS —————————–

WEEK 3 – Counting things

Key themes:

  1. What new things can we do when we count in different ways?
  2. Data access and modeling
  3. Moving beyond regressions

**Legewie, J., & Schaeffer, M. (2016). Contested Boundaries: Explaining Where Ethnoracial Diversity Provokes Neighborhood Conflict. American Journal of Sociology, 122(1), 125-161.

**Leung, M. D. (2014). Dilettante or Renaissance Person? How the Order of Job Experiences Affects Hiring in an External Labor Market 1. American Sociological Review, 79(1), 136-158.

**Salganik, Matthew. Bit by bit: Social research in the digital age. Princeton University Press, 2019.

WEEK 4 – Looking for Relatives

Key themes:

  1. What does relational data actually represent?
  2. Techniques for analyzing relational data

**Knigge, A., Maas, I., & van Leeuwen, M. H. (2014). Sources of sibling (dis) similarity: Total family impact on status variation in the Netherlands in the nineteenth century. American journal of sociology, 120(3), 908-948.

**Lin, K. H., & Lundquist, J. (2013). Mate selection in cyberspace: The intersection of race, gender, and education. American Journal of Sociology, 119(1), 183-215.

**Curington, Celeste Vaughan, Ken-Hou Lin, and Jennifer Hickes Lundquist. “Positioning multiraciality in cyberspace: Treatment of multiracial daters in an online dating website.” American Sociological Review 80, no. 4 (2015): 764-788.

WEEK 5 – Doing things with words (1)

Key themes:

  1. Counting words without structure
  2. Can we ‘count’ culture?

Bail, C. A. (2014). The cultural environment: Measuring culture with big data. Theory and Society, 43(3-4), 465-482.

**Bail, C. A. (2012). The fringe effect: Civil society organizations and the evolution of media discourse about Islam since the September 11th attacks. American Sociological Review, 77(6), 855-879.

**Fligstein, N., Stuart Brundage, J., & Schultz, M. (2017). Seeing Like the Fed: Culture, Cognition, and Framing in the Failure to Anticipate the Financial Crisis of 2008. American Sociological Review

**Karell, D., & Freedman, M. (2019). Rhetorics of Radicalism. American Sociological Review84(4), 726–753.

WEEK 6 – Doing things with words (2)

Key themes:

  1. Culture as a semantic structure
  2. Tracking language use

**Rule, A., Cointet, J. P., & Bearman, P. S. (2015). Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014. Proceedings of the National Academy of Sciences, 112(35), 10837-10844.

**Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review84(5), 905–949.

**Bail, C.A., Brown, T.W. and Wimmer, A., 2019. Prestige, Proximity, and Prejudice: How Google Search Terms Diffuse across the World. American Journal of Sociology124(5), pp.1496-1548.

WEEK 7 – Project Workshop

Healy, K., & Moody, J. (2014). Data visualization in sociology. Annual review of sociology, 40, 105-128.

Healy, K. (2017) Data visualization for social science at

UNIT 3 – WHAT NOT TO DO ———————

WEEK 8 – Case 1: The Kosinski affair

Key themes:

  1. The pitfalls of prediction and accuracy
  2. Beware of opaque techniques

**Kosinski, M., & Wang, Y. (2017). Deep neural networks are more accurate than humans at detecting sexual orientation from facial images.

The Guardian “New AI can guess whether you’re gay or straight from a photograph” at

WEEK 9 – Case 2: Biased algorithms

Key themes:

  1. Can algorithms be ‘fair’?
  2. Is programing necessarily biased?

Knight, W. (2017) “Biased algorithms are everywhere, and no-one seems to care”. MIT Technology Review.

Spielkamp, M. (2017) “Inspecting algorithms for bias” in MIT Technology Review

Cain Miller, C. (2015) “When algorithms discriminate” NY Times

WEEK 10 – Presentations