Accueil > Digital Methods for Social Sciences

OBME 2130 - Digital Methods for Social Sciences

Type d'enseignement : Seminar

Semester : Autumn 2017-2018

Number of hours : 24

Language of tuition : English

Voir les plans de cours et bibliographies


No prerequisites, although basic knowledge on descriptive statistics and/or coding skills are welcome.

Course Description

This course introduces a large set of digital methods for the analysis of textual corpus. Its main objective is to teach students how to design empirical protocols making use of data analytics in their research projects. Courses will alternate theoretical teaching and applied training. For the latter part, students can choose among a large variety of empirical datasets such as: online comments, tweets, press articles around a given issue, political discourses, climate change negotiations reports, etc. Students can also propose to work on data in connection with their own research project. Each session will introduce new tools and/or data sources that students will be required to practice exercices in group. The first session will introduce the challenges of text analytics for social sciences at large. It will be followed by 8 sessions that will focus on specific methodological aspects (textual coding, network analysis, classification, visualization, corpus collection, etc.). The last four sessions will be research question oriented illustrating how social sciences problems can be addressed using data analysis methods. The course is workshop oriented. Methods will systematically be accompanied by empirical applications and will be systematically practiced by students


COINTET, Jean-Philippe (Associate professor)

Pedagogical format

The pedagogical format is strongly oriented toward a workshop-style class. A short theoretical talk will be given to introduce each course topic to start with. A discussion of the reading will follow before the class turns into applied mode where students will practice data analysis by themselves. It is required that they bring their own laptop and have proper software installed first (the list of (open source) software will be provided along the semester).

Course validation

Students will be assessed based on group work assignments throughout the semester (70%). An individual research proposal due by the end of the semester will complement the final grading (30%).


Throughout the semester, students will be asked to read a paper on a weekly basis. Those readings will be complemented by more practical, collective assignments (groups of 3 -4 students). Additionally, a final individual paper is due on the 11th week.

Required reading

  • Evans, James A., and Pedro Aceves. "Machine translation: mining text for social theory." Annual Review of Sociology 42 (2016): 21-50.
  • Wagner-Pacifici, Robin, John W. Mohr, and Ronald L. Breiger. "Ontologies, methodologies, and new uses of Big Data in the social and cultural sciences." (2015): 2053951715613810.
  • McFarland, D. A., Lewis, K., Goldberg, A., Sep. 2015. Sociology in the Era of Big Data : The Ascent of Forensic Social Science. The American Sociologist.
  • Boyd, Danah, and Kate Crawford. "Six provocations for big data." A decade in internet time: Symposium on the dynamics of the internet and society. Vol. 21 Oxford: Oxford Internet Institute, 2011

Additional required reading

  • Manovich, L., 2011. Trending : The promises and the challenges of big social data. Debates in the digital humanities 2, 460–475.
  • Lazer, D., Kennedy, R., King, G., Vespignani, A., 2014. The parable of google flu : traps in big data analysis. Science 343 (6176), 1203–1205.
  • Grimmer, J., Stewart, B. M., 2013. Text as data : The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 267–297.
  • Demazière, D., Brossaud, C., Trabal, P., Van Meter, K. M., 2006. Analyses textuelles en sociologie(logiciels, méthodes, usages). Didact. Méthodes (Rennes).
  • Moretti, F., 2004. Distant Reading, 2nd Edition. Addison–Wesley

Plans de cours et bibliographies

Séance 1: Big data for social sciences, from promises to reality
Required readings:

  • Boyd, Danah, and Kate Crawford. "Six provocations for big data." A decade in internet time: Symposium on the dynamics of the internet and society. Vol. 21. Oxford: Oxford Internet Institute, 2011

Recommended readings:

  • Lazer, David, et al. "Life in the network: the coming age of computational social science." Science (New York, NY) 323.5915 (2009): 721

Séance 2: Coding textual documents (1) - frequentist approach
Required readings:

  • Klingenstein, Sara, Tim Hitchcock, and Simon DeDeo. "The civilizing process in London’s Old Bailey." Proceedings of the National Academy of Sciences 111.26 (2014): 9419-9424

Recommended readings:

  • Michel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., The Google Books Team, Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., Aiden, E. L., Jan. 2011. Quantitative Analysis of Culture Using Millions of Digitized Books. Science (New York, NY) 331 (6014), 176–182.

Séance 3: Coding textual documents (2) - modeling expression
Required readings:

  • Mohr, John W., et al. "Graphing the grammar of motives in National Security Strategies: Cultural interpretation, automated text analysis and the drama of global politics." Poetics 41.6 (2013): 670-700.          

Recommended readings:

  • Boltanski, Luc. 2. "Le système actanciel de la dénonciation". Editions Métailié, 1990
  • Franzosi, Roberto. "From words to numbers: A generalized and linguistics-based coding procedure for collecting textual data." Sociological methodology (1989): 263-298.         

Séance 4: Classifying words and documents –Topic models & information retrieval
Required readings:

  • Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.

Recommended readings:

  • DiMaggio, Paul, Manish Nag, and David Blei. "Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of US government arts funding." Poetics 41.6 (2013): 570-606

Séance 5: Words, authors, institutions, measuring proximities in an heterogeneous world
Required readings:

  • Bourret, Pascale, et al. "A new clinical collective for French cancer genetics: A heterogeneous mapping analysis." Science, Technology, & Human Values 31.4 (2006): 431-464.

Recommended readings:

  • Joseph, K., Wei, W., Carley, K. M., 2017. Girls rule, boys drool : Extracting semantic and affective stereotypes from twitter. In : 2017 ACM Conference on Computer Supported Cooperative Work.(CSCW).

Séance 6: Mining sentiments
Required readings:

  • Fan, Rui, et al. "Anger is more influential than joy: Sentiment correlation in Weibo." PloS one 9.10 (2014): e110184.

Recommended readings:

  • Joshi, M., Das, D., Gimpel, K., Smith, N. A., 2010. Movie reviews and revenues : An experiment in text regression. In : Human Language Technolo- gies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 293–296.

Séance 7: Visualization strategies
Required readings:

  • Skupin, André, and Sara Irina Fabrikant. "Spatialization methods: a cartographic research agenda for non-geographic information visualization." Cartography and Geographic Information Science 30.2 (2003): 99-119.

Séance 8: Issue Mapping
Required readings:

  • Marres, Noortje. "Why map issues? On controversy analysis as a digital method." Science, Technology, & Human Values 40.5 (2015): 655-686.

Recommended readings:

  • Latour, Bruno, et al. "‘The whole is always smaller than its parts’–a digital test of Gabriel Tardes' monads." The British journal of sociology 63.4 (2012): 590-615.

Séance 9: Collecting and delineating corpora
Required readings:

  • King, Gary, Patrick Lam, and Margaret Roberts. Computer-assisted keyword and document set discovery from unstructured text. Working Paper, 2016.

Séance 10: Investigating online political processes
Required readings:

  • Barberá, Pablo, et al. "Tweeting from left to right: Is online political communication more than an echo chamber?." Psychological science 26.10 (2015): 1531-1542.

Recommended readings:

  • Bakshy, Eytan, Solomon Messing, and Lada A. Adamic. "Exposure to ideologically diverse news and opinion on Facebook." Science 348.6239 (2015): 1130-1132.

Séance 11: Digital revolution?
Required readings:

  • Wilson, Christopher, and Alexandra Dunn. "The Arab Spring| Digital media in the Egyptian revolution: descriptive analysis from the Tahrir data set." International Journal of Communication 5 (2011): 25.

Recommended readings:

  • Tremayne, Mark. "Anatomy of protest in the digital era: A network analysis of Twitter and Occupy Wall Street." Social Movement Studies 13.1 (2014): 110-126.

Séance 12: Research Projects
Assignment for this session (if applicable):
Each group of students is required to present their data analysis based research project that will be discussed collectively

Biographical Information

Jean-Philippe Cointet has recently joined Sciences Po médialab where he works on the development of innovative computational sociology methods. Prior to his arrival, he participated in various quali-quantitative research projects including social media analysis (Facebook, public comments), science dynamics (oncology collective thoughts (CIHR project), synthetic biology emergence), political processes (political discourses, climate change negotiations). He also designs the CorText platform. He holds a PhD in Complex Systems and was trained as en engineer at Ecole Polytechnique. He also is an adjunct research scholar at INCITE, Columbia University.