<!-- TITLE: Project Overview -->
<!-- SUBTITLE: A large-N qualitative study of the history of Sociological theory -->
Check out the [brainstorming document](/journal-analysis/brainstorm), which is messier and filled with way too many ideas.
Also the [annotated bibliography](/journal-analysis/literature) which summarizes relevant research.
I wrote an [area concentration exam](/uploads/journal-analysis/mcgail-computational-sociology-area-concentration.pdf "Mcgail Computational Sociology Area Concentration") preparing for this analysis.
Important learning: [age, period, and cohort effects](/learning/age-period-cohort)
+ If I stacked up the full contents of all X journals I am analyzing, it would [fill a room] [reach to the top of the Sociology building] [weight as much as an elephant]
+ It would take me $100,000 / (5 * 365) = 54.8$ years to read them all, if I read five papers every day.
# In a sentence
I report on the cultural patterns I observe in theory sections of sociology papers.
# In a paragraph
Each sentence in the theory section of a Sociology article is a co-occurrence of the author, year, (term, term, term), and the citation attached to it, among other qualitites. It is a labeled multi-edge in a multi-mode network. We can collapse along any collections of these node-types and derive new networks, connecting citations to years, or authors, years, and keywords. In this context, Martin&Lee(2018) give us the language to ask, for instance, what meaning a year has in terms of keywords, and to compare meanings and "meaningfulness". Changes in the structure of these derivative networks indicate shifts in meaning, highlighting conceptual and motivational changes. We can also generate "approval" matrices of each combination of traits, the difference between $P(\text{A}) * P(\text{B})$ and $P(\text{A} \cap \text{B})$, where A and B are some combination of qualitities. We can use the variance and entropy in these matrices to quantify the "strength and salience of a symbolic boundary" (Edelmann 2018). These two formalizations help to build a quantitative history of concept-development in sociological theory. I examine some structures and patterns in this history, and relate them to research questions in the sociology of science.
# What I need to do
+ extract the context of all in-text citations, with metadata
+ find term lists which are of interest
+ helps focus the analysis, and gives us more computational legroom
+ uses chi2 measure (cortext) to assess whether a word or phrase is meaningful
+ see [here](https://docs.cortext.net/lexical-extraction/) for docs
+ the chi2 is from Manning and Scheutze, [ch. 5](https://nlp.stanford.edu/fsnlp/promo/colloc.pdf) ([whole book](https://www.cs.vassar.edu/~cs366/docs/Manning_Schuetze_StatisticalNLP.pdf)), simply determining if two words are independent of each other
+ find the terms which have something "pattern" over another quality.
+ and the citations
+ and the years
+ and the institutions
+ what is "variation"?
+ how do I judge whether there's anything to be seen?
+ intuitively, it's having enough numbers in a (generalized) column of this dataset
+ communicate that meaning concisely. make a way to "see" the data
+ web of science info: https://images.webofknowledge.com/images/help/WOS/hs_wos_fieldtags.html
# Paper Outline
Total paper length: 20 pages, single spaced + visualizations
+ Abstract (1 paragraph)
+ Introduction / motivation (1.5 page)
+ Introduction to data (1 page)
+ Introduction to methods (1 page)
+ Describe the contours of meaning (4 pages + visualizations)
+ My descriptions will just be nice visualizations of metrics. It will be entirely inductive, at least in the beginning.
+ Relate these descriptives to real-world happenings (in-depth understanding) (8 pages)
+ Take a few case-studies of patterns and examine them in detail
+ Discussion and conclusion
## Introduction to data
[Data descriptives](/journal-analysis/data)