Accueil > Digital Methods for Social Sciences

OBME 2130 - Digital Methods for Social Sciences

Type d'enseignement : Seminar

Semester : Spring 2017-2018

Number of hours : 24

Language of tuition : English

Voir les plans de cours et bibliographies


This course does not require any prior skill or knowledge in coding or statistics, but curiosity about digital space and algorithms are welcome.

Course Description

The first ambition of this course is to teach students how to appraise and study digital environments using quantitative analysis and softwares. The focus will also be put on the design of an original research project making use of quantitative analysis of online empirical data. Classes will alternate theoretical discussions around recent scientific papers (case studies or methodological articles) with more practical training. Beyond readings, students will also have to produce an original empirical analysis of a web corpus (online comments, tweets, reviews, emails, interaction networks, etc.): framing of the research question, data collection, research strategy, visualization of the outcome, etc. Possible research methodologies for the (group) projects will be introduced and discussed in class throughout the semester. The first session will introduce the challenges of data analytics in web studies at large. It will be followed by 8 sessions that will focus on specific methodological aspects (corpus collection, textual coding, network analysis, topic detection, etc.). The three last sessions will be centered around the collective projects of students.


COINTET, Jean-Philippe (Associate professor)

Pedagogical format

The pedagogical format is strongly oriented toward a workshop-style class. Typically, a short theoretical talk will be given to introduce each course topic to start with. A discussion of the reading will follow before the class turns into applied mode where students will practice data analysis by themselves. It is required that students bring their laptop. The installation of open office is strongly advised.

Course validation

The final collective project will contribute to 70 % of the final grading. One or two individual take-home papers will also be assessed to complement this evaluation.


The workload should be limited to two (max three) hours a week. Students will be required to read a paper every week or to make some progress about their collective project ~2,5 hours

Required reading

  • Evans, James A., and Pedro Aceves. "Machine translation: mining text for social theory." Annual Review of Sociology 42 (2016): 21-50.
  • McFarland, D. A., Lewis, K., Goldberg, A., Sep. 2015. Sociology in the Era of Big Data : The Ascent of Forensic Social Science. The American Sociologist.
  • Boyd, Danah, and Kate Crawford. "Six provocations for big data." A decade in internet time: Symposium on the dynamics of the internet and society. Vol. 21 Oxford: Oxford Internet Institute, 2011
  • Manovich, L., 2011. Trending : The promises and the challenges of big social data. Debates in the digital humanities 2, 460–475.
  • Grimmer, J., Stewart, B. M., 2013. Text as data : The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 267–297.

Additional required reading

  • Lazer, D., Kennedy, R., King, G., Vespignani, A., 2014. The parable of google flu : traps in big data analysis. Science 343 (6176), 1203–1205.
  • Grimmer, J., Stewart, B. M., 2013. Text as data : The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 267–297.
  • Demazière, D., Brossaud, C., Trabal, P., Van Meter, K. M., 2006. Analyses textuelles en sociologie(logiciels, méthodes, usages). Didact. Méthodes (Rennes).
  • Moretti, F., 2004. Distant Reading, 2nd Edition. Addison–Wesley

Plans de cours et bibliographies

Séance 1: Big data for social sciences, from promises to reality
Required readings:

  • Boyd, Danah, and Kate Crawford. "Six provocations for big data." A decade in internet time: Symposium on the dynamics of the internet and society. Vol. 21. Oxford: Oxford Internet Institute, 2011

Recommended readings:

  • Lazer, David, et al. "Life in the network: the coming age of computational social science." Science (New York, NY) 323.5915 (2009): 721

Séance 2: Coding textual documents (1) - frequentist approach
Required readings:

  • Klingenstein, Sara, Tim Hitchcock, and Simon DeDeo. "The civilizing process in London’s Old Bailey." Proceedings of the National Academy of Sciences 111.26 (2014): 9419-9424

Recommended readings:

  • Michel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., The Google Books Team, Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., Aiden, E. L., Jan. 2011. Quantitative Analysis of Culture Using Millions of Digitized Books. Science (New York, NY) 331 (6014), 176–182.

Séance 3: Coding textual documents (2) - modeling expression
Required readings:

  • Mohr, John W., et al. "Graphing the grammar of motives in National Security Strategies: Cultural interpretation, automated text analysis and the drama of global politics." Poetics 41.6 (2013): 670-700.          

Recommended readings:

  • Boltanski, Luc. 2. "Le système actanciel de la dénonciation". Editions Métailié, 1990
  • Franzosi, Roberto. "From words to numbers: A generalized and linguistics-based coding procedure for collecting textual data." Sociological methodology (1989): 263-298.         

Séance 4: Classifying words and documents –Topic models & information retrieval
Required readings:

  • Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.

Recommended readings:

  • DiMaggio, Paul, Manish Nag, and David Blei. "Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding." Poetics 41.6 (2013): 570-606

Séance 5: Words, authors, institutions, measuring proximities in an heterogeneous world
Required readings:

  • Bourret, Pascale, et al. "A new clinical collective for French cancer genetics: A heterogeneous mapping analysis." Science, Technology, & Human Values 31.4 (2006): 431-464.

Recommended readings:

  • Joseph, K., Wei, W., Carley, K. M., 2017. Girls rule, boys drool : Extracting semantic and affective stereotypes from twitter. In : 2017 ACM Conference on Computer Supported Cooperative Work.(CSCW).

Séance 6: Mining sentiments
Required readings:

  • Fan, Rui, et al. "Anger is more influential than joy: Sentiment correlation in Weibo." PloS one 9.10 (2014): e110184.

Recommended readings:

  • Joshi, M., Das, D., Gimpel, K., Smith, N. A., 2010. Movie reviews and revenues : An experiment in text regression. In : Human Language Technolo- gies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 293–296.

Séance 7: Visualization strategies
Required readings:

  • Skupin, André, and Sara Irina Fabrikant. "Spatialization methods: a cartographic research agenda for non-geographic information visualization." Cartography and Geographic Information Science 30.2 (2003): 99-119.

Séance 8: Issue Mapping
Required readings:

  • Marres, Noortje. "Why map issues? On controversy analysis as a digital method." Science, Technology, & Human Values 40.5 (2015): 655-686.

Recommended readings:

  • Latour, Bruno, et al. "‘The whole is always smaller than its parts’–a digital test of Gabriel Tardes' monads." The British journal of sociology 63.4 (2012): 590-615.

Séance 9: Collecting and delineating corpora
Required readings:

  • King, Gary, Patrick Lam, and Margaret Roberts. Computer-assisted keyword and document set discovery from unstructured text. Working Paper, 2016.

Séance 10: Investigating online political processes
Required readings:

  • Barberá, Pablo, et al. "Tweeting from left to right: Is online political communication more than an echo chamber?." Psychological science 26.10 (2015): 1531-1542.

Recommended readings:

  • Bakshy, Eytan, Solomon Messing, and Lada A. Adamic. "Exposure to ideologically diverse news and opinion on Facebook." Science 348.6239 (2015): 1130-1132.

Séance 11: Digital revolution?
Required readings:

  • Wilson, Christopher, and Alexandra Dunn. "The Arab Spring| Digital media in the Egyptian revolution: descriptive analysis from the Tahrir data set." International Journal of Communication 5 (2011): 25.

Recommended readings:

  • Tremayne, Mark. "Anatomy of protest in the digital era: A network analysis of Twitter and Occupy Wall Street." Social Movement Studies 13.1 (2014): 110-126.

Séance 12: Research Projects
Assignment for this session (if applicable):
Each group of students is required to present their data analysis based research project that will be discussed collectively

Biographical Information

Jean-Philippe Cointet has recently joined Sciences Po médialab where he works on the development of innovative computational sociology methods. Prior to his arrival, he participated in various quali-quantitative research projects including social media analysis (Facebook, public comments), science dynamics (oncology collective thoughts (CIHR project), synthetic biology emergence), political processes (political discourses, climate change negotiations). He also designs the CorText platform. He holds a PhD in Complex Systems and was trained as en engineer at Ecole Polytechnique. He also is an adjunct research scholar at INCITE, Columbia University.