Blog Post 1: Linguistic Background of Genital Descriptions

hw1
blogpost1
Emily Miller
internetarchivebooks
internetarchive
ggplot2
Author

Emily Miller

Published

September 18, 2022

Summary

There is a long history of how attitudes around sex and genitalia have affected the treatment of patients. I would like to study the language describing vulvas and penises, and see if that language use changes over time. This post contains historical background, summary of literature, and description of my data sources.

Background

I recently came across a New York Times article discussing the lack of research on vaginal anatomy and how this disparity has affected the outcomes of patients throughout history. I was already acquainted with the lack of general knowledge of vulvar anatomy, demonstrated by a study finding only 9% of volunteers from an outpatient urogynecology clinic could successfully identify 7 structures of the vulva (labia minora and majora, vagina, urethra, anus, peroneum, and clitoris; El-Hamamsy et al., 2022). This ignorance is reinforced in our studies of other species as well. A review of primarily literature looking at the evolution of genitals found that approximately 1/2 of papers looked at penises alone, roughly 8% looked at vulvas alone, and the remainder reported on both (Ah-King et al, 2014). The lack of knowledge on the spectrum of differences in genitalia impacts the quality of our scientific hypotheses, patient care and outcomes, and other areas.

The history of the study of genitalia provides insight into the way genitalia and pleasure are treated today. The writings of two greco-roman physicians shaped opinions of early European anatomists. Both views believed that women are inferior derivatives of men, and the vagina to be “an inside-out and inferior mirror version of the penis” (Moore, 2018). However, one view, derived from the writings of Hippocrates of Kos in the late 300s BCE, saw women’s pleasure as necessary for conception and recognized the clitoris as essential to this means. The other view, evolved from the writings of Galen of Pergamon in the mid 100s BCE, viewed women to be incapable of pleasure and did not make reference to the clitoris (Moore, 2018). With the resurgence of cadaveric dissections in the 16th and 17th century, early European anatomists began to codify vulvar anatomy and accept the existence of the clitoris. The 18th and 19th centuries saw a rise in anti-masturbation campaigns and genital mutilation, but no change in the view that women and to a lesser extent vulvar anatomy were lesser male homologues. Only in the mid 20th century –with advances in embriological research and through feminist activism–was women’s supposed homology of men rejected (Moore, 2018). The effect was massive. For example, the NIH passed regulations requiring clinical trials to include women in the 1993s, because they previously assumed that women would have the same effects as men because they were derivatives of them (Liu & Mager, 2016). We continue to feel the effects of this recent history today, and highlights the work we must continue to do to identify areas of bias in our healthcare system and medical knowledge base.

What interests me for this project is the way the shift in thought could be seen in the language used to describe sexes and genitals. For example, even today, a recent survey of medical textbooks found that only a third labeled all structures of the clitoris (Beni et al., 2022), which continues the long history of ignorance of vulvar anatomy. I will describe the current available literature and the dataset I have chosen, along with other considerations.

Literature Review

Methods

I reviewed PubMed, where I performed the following searches: “medical textbook*” AND (genital* OR vagin* OR clit* OR vulv* OR pubic OR gynec*); and “medical textbook*” AND (genital* OR vagin* OR clit* OR vulv* OR pubic OR gynec*). Results were filtered for those published after 2000. Articles describing disease pathology, provider attitudes, did not discuss text methods or that focused on non-Western texts were excluded.

Results

Below I summarize findings for 2 papers for the assignment. The remaining papers I have put in a separate table as a reference for myself in the future. These don’t use quantitative methods, or I used them specifically for reviewing their methods.

Code
primary_lit %>% kable %>% 
  kable_styling()
Question Berttoti & Miner, 2019 Walsh & Elhadad, 2014
Research aim Review sociolinguistic patterns surrounding contraception risk across gynecology textbooks Compare social histories in clinician notes of inpatient and outpatient treatment centers
Data source and collection Sampled randomly 15 medical schools in the United States. Researchers contacted the bookstores and chose 7 books which appeared most frequently EMR were retrieved from Columbia University Medical Center inpatient and outpatient notes for a 5 year period with IRB approval. They mentioned although the dataset is limited in terms of year span it is an important use of machine learning.
Hypothesis Textbook claims relating to pharmacological contraceptives would be associated with more factive language Clinician notes may differ between inpatient and outpatient.
Methodology Manually identified and selected sentences that described risks or benefits of contraceptives. Researchers identified: Reporting verbs: these show author’s opinions on importance/truth of finding (ex “confirm, demonstrate, show”) Clauses: can show hedging/intensifying effect (ex “however, despite, nonetheless”) Attitude markers: also reflect authors’ positions (ex “remarkable, should, unfortunately”) Hedges and boosters: show tentativeness/certainty of authors’ view of claim (ex “apparently, presumable, clearly, many”) Concessions: use of rhetorical device that ‘bolsters the speaker’s case and weakens its counter’ Sentences were then given a score to show the level of certainty associated with a claim Only the social history section of the note was taken. R was used to remove stopwords and identified contractions (ex: “h o” = “history of”). Inpatient and outpatient notes were then separated into 2 separate corpuses and LDA topic models were applied to both. Topic labels were then manually assigned. Perplexity models (seeing whether the language is the same in 2 different settings) was applied.
Findings Textbooks tended to use language that conveyed certainty surrounding the lack of risk of pharmaceutical interventions and used language insinuating doubt when describing the “relevance of risk”. These linguistic strategies reveal a pro-pharmaceutical perspective of the literature Social histories didn’t typically include religious affiliation, alternative healthcare practices, or stressors in either setting. Inpatient notes tended not to include clear demarcations of parts of history (ex alcohol use, employment, family support) compared to outpatient.
My take I read this paper to see how data were selected and how they tagged their data. I would like to use some of the strategies that the authors of this study used when comparing genitals and also comparing across time periods. Specifically, I think the use of reporting verbs, hedges/boosters, and attitude markers would be especially interesting, and I would like to review this further.. I read this paper because I wanted to see how the messiness of medical data can be leveraged, here in the perplexity analysis. This paper made me want to use topic modeling to compare textbooks before the feminist movement and after, which is around the mid 1900s. I may look into the use of perplexity analyses.

This is the other table.

Code
other_lit %>% kable %>% 
  kable_styling()
Citation Research Aim Data source and collection Methodology (methods used) Findings My thoughts
Andrikopoulou et al., 2013 Discover the consistency of vulvar dimension reporting in gynecology textbooks All gynecology and anatomy textbooks in select Universities of the London area was performed, as well as review of syllabi of undergrads. Vulval pathology textbooks were omitted. Dimensions of vulval structures and vagina were recorded as well as illustration types. Gynecology textbooks tended to have more information, and only gynecology books had measurements. These measurements were primarily taken decades ago. Authors feel as though without literature for physicians to rely on, they may turn to personal experiences instead. Did not discuss bias. I read this to review their data selection methods and see how they processed their data. This made me think that I would like to see if measurements are reported for vaginas in comparison to penises.
Berotti & Miner 2021 Reviewing gynecology materials suggested by the American College of Obstetricians and Gynecologists (ACOG) to assess bias in reporting contraceptive’s risk Took 15 top materials recommended by ACOG and 15 top gynecology books (same method as Berotti & Miner 2019). Applied Critical Discourse Analysis to the texts, which is done by hand. This helps researchers identify areas of bias in texts, especially if they’re framed as impartial. Pharmacological contraceptives are framed as safe as compared to the lifestyle choices and bodies of women. The effectiveness is valued more than the autonomy of the person using it, and side effects are seen as minimal in this light. I read this because I was interested in CDA and how they processed text. From my brief search, there are few articles that combine CDA and NLP. Given more time, it would be interesting to perform this type of analysis on my data.
Gwan et al., 2022 Compare skin tone representation of vulvar conditions Chose most widely used OBG textbooks available through university, and confirmed with medical school faculty of their relevance Authors employed volunteers to review skin tones based on the Fitzpatrick Scale Darker skin tones represented more pathologic vulvar conditions. This was especially significant in textbooks for residents I read this to see their methods. The authors say that because these textbooks are well known and used across many institutions, they don’t consider this a potential for bias. I like that they validated the use of these books with faculty, which (if I am doing a larger project) I would like to do.
Whalen, 2005 Develop a method for summarizing medical textbook and encyclopedias automatically Uses sources in the Columbia University Health Sciences Library and Harrison’s Online. Utilizes Unified Medical Language System (UMLS) which is a database of medical language for NLP. They also make use of MetaMap, which is used for tagging parts of speech while also using UMLS and other programs. Whalen uses these resources while developing an algorithm. Summaries are clustered according to a user’s search term and the algorithm is promising for future study. I read this to see ways that medical texts can be processed. I didn’t know about UMLS, I will look into whether or not R has any packages available in this regard.
Yacob et al., 2020 Compare readability and reliability of wikipedia vs medical textbooks, as physicians frequently reference both Used Qualifying Exam topics for vascular surgery as ways to choose Wikipedia articles and Schwart’s Principles of Surgeries which is a well known textbook Assess readability using Flesch Reading Ease score along with Flesch-Kincaid Grade Level system among others. DISCERN was used to evaluate the quality and completeness of the articles, which is manually done. Wikipedia was more readable compared to the textbook, but had higher rates of omission and lower quality grades. I read this to see methods of how they chose their texts and how they processed them. This gave me an idea that I could potentially look at change in readability over time. “Womanhood” and topics related to sex were considered mysterious, so I wonder if this would translate in language difficulty.

Study Design

Data Selection

To look at the change in word use surrounding genitals over time, I am going to use textbooks as my text source. Most primary source literature from the early 1900s tends to discuss case studies rather than reviewing anatomy itself. Textbooks, in comparison, are sources of authority yet also contain bias. Textbooks play a critical role in introducing students to a field or topic by summarizing findings in the primary literature in a given field (Bertotti & Miner, 2019). In doing so, the deliberation behind hypotheses is omitted and framed as a coordinated whole. Writers typically use different language patterns which assert a hypothesis is true, making it difficult for students to ponder the lens of the literature they are reading (Bertotti & Miner, 2019). Student impressions of the topics they learn are critical. For example, a recent paper reviewing all available medical and gynecology textbooks in a select number of London University Libraries found that none reviewed variability in labial morphology (Andrikopoulou et al., 2013; Hayes & Temple-Smith, 2022). Another paper reported that an audit of referrals for labiaplasty in the UK found providers describing the labia as “leathery or pendulous”, which was deemed “non-scientific” and “pejorative” by the authors (Hayes & Temple-Smith, 2022). Using textbooks allows us to see the attitudes and hypotheses of the primary literature of the time, while also inferring the effects these viewpoints had on the administration of medical care.

For this pilot study, in order to narrow the scope of my project, I am looking at textbooks published between the early 1900s until the present. I hope to examine and perhaps capture the shift in language used as a result of the feminist movement in the mid 1900s (Moore, 2018). I also intend to look at anatomy and physiology textbooks rather than surgery-specific textbooks. From my review of the data, surgery books tend to describe techniques rather than purported function and general descriptions of genitals. Anatomy and physiology books tend to do the latter, so my hope is to capture this language shift in these books.

Data Sources

The Internet Archive (IA) has a collection of over 20 million text resources (EBooks and Texts, n.d.). To name a few of their massive collection, they have resources from: google books, PubMed, and other databases; public libraries and those from private institutions such as the Yale Medical Library; as well as other user-uploaded texts. Once texts have been uploaded to IA, they are automatically OCR’d (EBooks and Texts, n.d.) and then made available to download with the R-package “internetarchive” (Mullen et al., n.d.). These texts include medical textbooks and journal articles dating back to the 1500s (EBooks and Texts, n.d.). I will review the way in which I access this data, as well as select relevant texts in the next blog post.

Potential Areas of Selection Bias

There is some potential bias in using sources that are in a “rare” book collection. Collections are usually curated by specific individuals, so the books in these collections reflect the interests and biases of these individuals. So called “rare” books that are the focus of many collections may be anomalous to the actual body of work available to most readers at the time. In addition, these historical sources don’t always make it to the present, either because of poor conditions or lack of printing. These factors may mean that the books available in this collection are not representative of the true attitudes of writers at the time. Statistics can be used to account for this bias in some cases in downstream analysis (Zimran, 2020), but it is critical to acknowledge this bias before continuing with analysis (Inwood & Maxwell-Stewart, 2020; Zimran, 2020). In a future study, I might reach out to curators of these historical collections to identify anomalous books and randomly sample from these.

Next, many resources used for teaching in academic institutions are not open-access, which is frequently the texts that are uploaded to IA. The copyright of a resource would need to expire to become open-access. However, some users will upload textbooks that they use so that others don’t need to purchase them. Berotti & Miner handled this possible bias in an interesting way. They randomly selected medical schools, called their bookstores, then selected the top books reported by the schools (2019, 2021). If I were to do this, I would need to OCR these texts myself. For this pilot study, using the books available through the IA will reflect a bias of the collection similar to the historical books. In further studies, it would make sense to perform a random sample similar to Berotti & Miner. Similar to the historical collection, the more recent resources available may not entirely represent the current distribution of medical textbooks.

Questions and Future Directions

This is my current thoughts and questions:

  • Is there a difference in language use between penises and vulvas?
    • Are there different sentiments associated with this language?
    • Are there differences in readability scores in descriptions of genitals compared to general anatomy?
    • Are there differences in uses of adjectives and hedgers/boosters?
  • Is there a change in language use over time?
    • Can we see a change in language before and after the feminist movement?

Bibliography

Ah-King, M., Barron, A. B., & Herberstein, M. E. (2014). Genital evolution: why are females still understudied?. PLoS biology, 12(5), e1001851. https://doi.org/10.1371/journal.pbio.1001851 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4011675/

Andrikopoulou, M., Michala, L., Creighton, S. M., & Liao, L. M. (2013). The normal vulva in medical textbooks. Journal of obstetrics and gynaecology : the journal of the Institute of Obstetrics and Gynaecology, 33(7), 648–650. https://doi.org/10.3109/01443615.2013.807782 https://pubmed.ncbi.nlm.nih.gov/24127945/

Beni, R., Fisher, L., & Longhurst, G. J. (2022). The importance of diverse and accurate descriptions of genital anatomy in textbooks. Anatomical sciences education, 15(5), 985–988. https://doi.org/10.1002/ase.2200 https://pubmed.ncbi.nlm.nih.gov/35593013/

Bertotti, A. M., Mann, E. S., & Miner, S. A. (2021). Efficacy as safety: Dominant cultural assumptions and the assessment of contraceptive risk. Social science & medicine (1982), 270, 113547. https://doi.org/10.1016/j.socscimed.2020.113547 https://pubmed.ncbi.nlm.nih.gov/33455813/

Bertotti, A. M., & Miner, S. A. (2019). Constructing contentious and noncontentious facts: How gynecology textbooks create certainty around pharma-contraceptive safety. Social studies of science, 49(2), 245–263. https://doi.org/10.1177/0306312719834676 https://pubmed.ncbi.nlm.nih.gov/30841787/

EBooks and texts. Internet Archive. (n.d.). Retrieved November 2, 2022, from https://archive.org/details/texts

El-Hamamsy, D., Parmar, C., Shoop-Worrall, S., & Reid, F. M. (2022). Public understanding of female genital anatomy and pelvic organ prolapse (POP); a questionnaire-based pilot study. International urogynecology journal, 33(2), 309–318. https://doi.org/10.1007/s00192-021-04727-9 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803818/

Gwan, A., Tanner-Sanders, L. N., Nair, N., Chapple, A. G., & Jernigan, A. (2022). Equity in visual representation of vulvar conditions in major gynecology textbooks. Journal of the National Medical Association, 114(3), 314–323. https://doi.org/10.1016/j.jnma.2022.02.005

Hayes, J. A., & Temple-Smith, M. J. (2022). New context, new content-Rethinking genital anatomy in textbooks. Anatomical sciences education, 15(5), 943–956. https://doi.org/10.1002/ase.2173 https://anatomypubs.onlinelibrary.wiley.com/doi/full/10.1002/ase.2173

Inwood, K., & Maxwell-Stewart, H. (2020). Selection Bias and Social Science History. Social Science History, 44(3), 411-416. doi:10.1017/ssh.2020.18 https://www.cambridge.org/core/journals/social-science-history/article/selection-bias-and-social-science-history/329E909CB9F8942DDEDE95A3D27BFCD7

Liu, K. A., & Mager, N. A. (2016). Women’s involvement in clinical trials: historical perspective and future implications. Pharmacy practice, 14(1), 708. https://doi.org/10.18549/PharmPract.2016.01.708 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4800017/

Moore, A. M. (2018). Victorian Medicine Was Not Responsible for Repressing the Clitoris: Rethinking Homology in the Long History of Women’s Genital Anatomy. Signs: Journal of Women in Culture and Society, 44(1), 53–81.doi:10.1086/698277 https://www.journals.uchicago.edu/doi/full/10.1086/698277#rf50

Mullen, L., Chamberlain, S., Padgham, M., & Ye, S. (n.d.). Ropensci/internetarchive: Search the internet archive, retrieve metadata, and download files. GitHub. Retrieved November 2, 2022, from https://github.com/ropensci/internetarchive Verma, S., Ernst, M., & Just, R.. (2021). Removing biased data to improve fairness and accuracy. arXiv, 2021. https://arxiv.org/pdf/2102.03054.pdf

Walsh, C., & Elhadad, N. (2014). Modeling clinical context: rediscovering the social history and evaluating language from the clinic to the wards. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2014, 224–231. https://pubmed.ncbi.nlm.nih.gov/25717417/

Whalen G. (2005). Medical textbook summarization and guided navigation using statistical sentence extraction. AMIA … Annual Symposium proceedings. AMIA Symposium, 2005, 814–818. https://pubmed.ncbi.nlm.nih.gov/16779153/

Yacob, M., Lotfi, S., Tang, S., & Jetty, P. (2020). Wikipedia in Vascular Surgery Medical Education: Comparative Study. JMIR medical education, 6(1), e18076. https://doi.org/10.2196/18076 https://pubmed.ncbi.nlm.nih.gov/32417754/