<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2940-1348</journal-id>
<journal-title-group>
<journal-title>Journal of Computational Literary Studies</journal-title>
</journal-title-group>
<issn pub-type="epub">2940-1348</issn>
<publisher>
<publisher-name>Technische Universit&#228;t Darmstadt</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.48694/jcls.3567</article-id>
<article-categories>
<subj-group>
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Gender Depiction in Portuguese</article-title>
<subtitle>Distant Reading Brazilian and Portuguese Literature</subtitle>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-6807-8558</contrib-id>
<name>
<surname>Freitas</surname>
<given-names>Cl&#225;udia</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-3108-7706</contrib-id>
<name>
<surname>Santos</surname>
<given-names>Diana</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Department of Letters, Pontifical Catholic University of Rio de Janeiro <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://ror.org/05vghhr25">ROR</ext-link>, Rio de Janeiro, Brazil</aff>
<aff id="aff-2"><label>2</label>Department of Literature, Area Studies and European Languages, University of Oslo <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://ror.org/05vghhr25">ROR</ext-link>, Oslo, Norway</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2024-02-06">
<day>14</day>
<month>02</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>2</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>29</lpage>
<history>
<date date-type="received" iso-8601-date="2023-01-19">
<day>19</day>
<month>01</month>
<year>2023</year>
</date>
<date date-type="accepted" iso-8601-date="2023-12-11">
<day>11</day>
<month>12</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2024 The Author(s)</copyright-statement>
<copyright-year>2024</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>The text of this work is released under the Creative Commons license CC BY 4.0 International. You can find the contract text of the license at <uri xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</uri>. The illustrations are excluded from this license, here the copyright lies with the respective rights holder.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://jcls.io/article/10.48694/jcls.3567/"/>
<abstract>
<p>In this paper, we look at how masculine and feminine characters are described in literature in Portuguese using a publicly available literary corpus: <italic>Literateca</italic>. We investigate the words used to characterise human beings, after classifying them into four broad categories, namely those related to the social, appearance, character and emotional axes. We study the influence of genre, literary school, author gender, and time, among others.</p>
</abstract>
<kwd-group>
<kwd>distant reading</kwd>
<kwd>annotation</kwd>
<kwd>Brazilian literature</kwd>
<kwd>Portuguese literature</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="S1">
<title>1. Introduction</title>
<p>The way people are described is a rich source of information about societies and cultures, revealing the values and beliefs of those who describe them. In addition to proper names, there are many other ways of human designation, such as the use of general human nouns like <italic>man, woman, person, gentleman, lady</italic>, and designation by traits or functions of the people mentioned (using places of origin, professions, family ties, etc., such as <italic>Brazilian, doctor, mother, foreigner</italic>).</p>
<p>In this paper, we look into how human beings are characterised in literature in Portuguese &#8211; also called Lusophone literature &#8211; using a distant reading approach. In particular, we want to investigate the influence of features such as authorship, geographical origin, historical period and gender (both character gender, and authorial gender).</p>
<p>Inspired by Moretti and Sobchuk (<xref ref-type="bibr" rid="B15">2019</xref>)&#8217;s warning, we try to go beyond simple visualisations by date or author, and add other ways to look at the data. Following their &#8220;dissecting table&#8221; analogy, our aim is to find out which pieces are able to provide pertinent analysis, triggering meaningful readings. So, we search for &#8220;creative cuttings&#8221;, &#8211; such as the &#8220;volume&#8221; of speech verbs in Katsma (<xref ref-type="bibr" rid="B9">2018</xref>) &#8211; to give us new insights. Specifically, we add the class &#8216;human depiction&#8217; to our data; still, we aim for consensual and understandable categories, like &#8220;century&#8221; in history.</p>
<sec id="S1.1">
<title>1.1 Gender in Literature</title>
<p>The theme of gender roles in fiction texts has received increasing attention in the Digital Humanities community, as the following works testify.</p>
<p>Looking at English literature (104,000 works, from 1703 to 2009), Underwood et al. (<xref ref-type="bibr" rid="B30">2018</xref>) found that the gender difference between characters became less pronounced from the middle of the nineteenth century to the present day: Actions and attributes of characters became less defined by gender categories. In other words, gender roles tend to become more flexible. At the same time, they also found a decrease in the number of feminine characters, with the volume of fiction written by women from 1850 to 1950 dropping by half.</p>
<p>Exploring the <italic>Black Drama</italic> collection, which contains plays written between 1950 and 2006, Argamon et al. (<xref ref-type="bibr" rid="B1">2009</xref>) report poor results when trying to automatically distinguish the gender of the authors and/or characters. However, they found differences in the way masculine and feminine authors and characters use language. Feminine playwrights allocate more than half (52.1%) of speeches to feminine characters, while 34.7% of the speeches in plays by masculine authors belong to feminine characters.</p>
<p>Working with present-day Dutch literary fiction (170 novels published in one sample year), Smeets (<xref ref-type="bibr" rid="B28">2021</xref>) found the same imbalance between masculine and feminine characters. However, the author questions what he describes as a &#8220;perhaps naive mimetic assumption&#8221; according to which the relative absence of feminine characters is a result of their unequal status in society. From the results of his investigation, feminine characters, although fewer in number, occupy a relatively central position in their fictional social networks &#8211;&#8211; they display more relations, both more relations in general and more relations with important characters.</p>
<p>Hoyle et al. (<xref ref-type="bibr" rid="B8">2019</xref>), using 3,5 Mio. digitised books in English, analyses the lexical choices (adjectives and verbs) associated with feminine gendered nouns and found that positive adjectives used to describe women were more often related to their bodies than adjectives used to describe men. Following the same trend, Schulz and Bahn&#237;k (<xref ref-type="bibr" rid="B26">2019</xref>) explores the depiction of male and female characters using the Google Books Ngram corpus, focusing on twentieth-century English-language fiction. The study analyses adjective-noun bigrams associated with the words <italic>man, woman, boy</italic>, and <italic>girl</italic>, and reports that adjectives associated with <italic>men</italic> are more positive (&#8220;honest&#8221;, &#8220;wise&#8221;, &#8220;honorable&#8221;, and &#8220;able&#8221;) than those associated with <italic>women</italic> (&#8220;vulgar&#8221;, &#8220;foolish&#8221;). As for preferences, &#8220;charming&#8221;, &#8220;fashionable&#8221;, and &#8220;warm&#8221; were relatively feminine words, while &#8220;lazy&#8221; and &#8220;mean&#8221; were relatively masculine words. On the one hand, men were described in decreasingly masculine terms throughout the beginning and end of the twentieth century; on the other hand, the masculinity of adjectives used to describe women started to slightly increase from 1968 to 2000.</p>
<p>Weingart and Jorgensen (<xref ref-type="bibr" rid="B31">2013</xref>) performed a computational analysis of gendered bodies in ca. 200 European fairy tales (German, French and Italian folklore texts translated into English). They show that feminine characters are more likely than masculine characters to be described with appearance-evaluative words, suggesting that men are associated with the mind and women with the body.</p>
<p>Cerm&#225;kov&#225; and Mahlberg (<xref ref-type="bibr" rid="B5">2022</xref>) explore linguistic descriptions of gendered body language and compare nineteenth century British children&#8217;s literature (<italic>ChiLit Corpus</italic>) with contemporary fiction for children (the <italic>OCC2000</italic>+ <italic>Corpus</italic>, a subcorpus of the <italic>Oxford Children&#8217;s Corpus</italic>). Using a corpus linguistic approach, the authors study sequences of five words which contain at least one body part noun and a marker of gender. They found fewer clusters for feminine characters in the nineteenth century. The contemporary data suggest a trend for feminine and masculine clusters to become more similar, and an increasing range of options for the description of feminine characters and their interactional spaces. Using the same <italic>ChiLit Corpus</italic>, Cerm&#225;kov&#225; and Mahlberg (<xref ref-type="bibr" rid="B4">2021</xref>) focused on nouns &#8211;&#8211; excluding proper names &#8211;&#8211; frequently used to label people, and found that <italic>Mothers</italic> are the most frequent occurring feminine character in the corpus.</p>
<p>It is also worth noting the existence of studies such as Cao and Daum&#233; (<xref ref-type="bibr" rid="B3">2021</xref>) and Lucy and Bamman (<xref ref-type="bibr" rid="B11">2021</xref>). The first one explores the consequences of gender bias for machine learning. The paper investigates how different aspects of linguistic notions of gender impact an annotator&#8217;s judgements of anaphora, and points out that a significant possible source of bias comes from the annotations themselves &#8211;&#8211; from underspecified annotation guidelines and the human annotators. The authors emphasise that both, humans and systems, should not over-rely on cues such as names, semantically gendered nouns, and terms of address, relying on &#8220;relatively safe&#8221; cues like syntax instead. At the other pole of the machine learning approach, the study conducted by Lucy and Bamman (<xref ref-type="bibr" rid="B11">2021</xref>) raises questions on how to avoid unintended social biases when using large language models for storytelling. Focusing on how GPT-3 may perceive a character&#8217;s gender based on textual features such as personal pronouns (<italic>he/she/her</italic>, etc.), the work finds that stories generated by GPT-3 place masculine and feminine characters in different topics and exhibit many gender stereotypes: For example, feminine characters are more associated with family and appearance than masculine characters.</p>
<p>In this paper, we also try to contribute to the investigation of gender roles using works written in Portuguese. As a crossover between Corpus Linguistics and Digital Humanities, we use morpho-syntactic and semantic information automatically provided by the PALAVRAS parser (<xref ref-type="bibr" rid="B2">Bick 2014</xref>), and add extra semantic annotations, which are described below.</p>
<p>With Larson (<xref ref-type="bibr" rid="B10">2017</xref>), we recognise that using gender as a variable in Natural Language Processing is an ethical issue and that we need to explicitly explain what &#8220;gender&#8221; means in this work. As Larson points out, there are many views of how gender functions as a social construct. In this study, we treat gender as binary, since in the vast majority of works in our corpus, gender was mainly constructed in terms of the binary distinction femininity/masculinity. We acknowledge, however, that the category &#8220;gender&#8221; can be more complex than this binary distinction, and that these kinds of studies, which describe the cultural apparatus around gender for an extended period of time, do not in any way purport to assert what gender is, but only how it has been/is perceived. So they should not be used for reinforcing gender stereotypes, as warned against by Mandell (<xref ref-type="bibr" rid="B12">2019</xref>).</p>
</sec>
<sec id="S1.2">
<title>1.2 Previous Work for Portuguese</title>
<p>For distant reading of Portuguese, we are aware of some works dealing with characters in literature (<xref ref-type="bibr" rid="B18">Santos and Freitas 2019</xref>), as well as of the DIP challenge for automatic character identification in Portuguese (<xref ref-type="bibr" rid="B23">Santos et al. 2022b</xref>), to which we will come back later.</p>
<p>Our point of departure is the work by Freitas et al. (<xref ref-type="bibr" rid="B6">2022</xref>)<xref ref-type="fn" rid="n1">1</xref> &#8211; and later extended in Silva&#8217;s master thesis (<xref ref-type="bibr" rid="B27">Silva 2021</xref>) &#8211; who have suggested a fourfold classification for human characterisation. Human attributes were organised in social, appearance, character, and emotional characteristics.</p>
<p>Using <italic>OBras</italic>, a corpus of Brazilian literature in the public domain (<xref ref-type="bibr" rid="B19">Santos et al. 2018</xref>), they studied 223 works by 25 Brazilian authors, two of them women (authoring 3 novels altogether), and observed the following trends:</p>
<list list-type="bullet">
<list-item><p>Men were more frequently described than women (60%-40%), which may be related to the fact that there were roughly more masculine characters than feminine ones in the same proportion.</p></list-item>
<list-item><p>The most frequent masculine characterising words were <italic>bom</italic> (good), <italic>s&#233;rio</italic> (&#8216;honest&#8217;), <italic>rico</italic> (&#8216;rich&#8217;), and <italic>alto</italic> (&#8216;tall&#8217;), while <italic>bonita</italic> (&#8216;beautiful&#8217;) was by far the top characteristic for women.</p></list-item>
<list-item><p>Almost 50% of women depicting words were about beauty (namely <italic>bonita</italic> and <italic>bela</italic>).</p></list-item>
<list-item><p>Character and social predication were most frequent for men; for women, social characterisation is reduced to <italic>married</italic> and <italic>rich</italic>.</p></list-item>
<list-item><p>Emotional characterisations like <italic>feliz</italic> (&#8216;happy&#8217;) were (almost) exclusively used for women.</p></list-item>
</list>
<p>We wanted to check whether these observations held true for a wider collection, including Portuguese literature as well.</p>
</sec>
<sec id="S1.3">
<title>1.3 A Brief Comparison with DIP</title>
<p>It is useful to compare and contrast our study with the recent DIP challenge for Portuguese (<italic>Desafio de Identifica&#231;&#227;o de Personagens</italic>), an evaluation contest for identifying literary characters and some information about them in Brazilian and Portuguese works (<xref ref-type="bibr" rid="B20">Santos et al. 2022a</xref>, <xref ref-type="bibr" rid="B21">2023</xref>). By describing them and pointing out the differences, we shed some light on different ways of looking at (roughly) the same data.</p>
<p>For DIP, the unit is the literary character, and so the challenge looked at their gender, their profession, occupation and/or social status, and their family relationships with other characters. In addition, &#8220;literary character&#8221; in DIP does not include all people.<xref ref-type="fn" rid="n2">2</xref> In the present study, we try to look at all mentions of characterisation of people in the works, so all numbers reported in this paper are not per character, but per mention of people.</p>
<p>We will discuss and compare the findings about character gender in <xref ref-type="sec" rid="S4.7">subsection 4.7</xref>.</p>
</sec>
<sec id="S1.4">
<title>1.4 The Importance of Studying Literature in Portuguese</title>
<p>Portuguese has a rich literary tradition, but unfortunately the digitisation efforts are lagging behind other languages. This has, for example, been discussed in Sch&#246;ch et al. (<xref ref-type="bibr" rid="B24">2021</xref>).</p>
<p>Also, major actors in the big data landscape, no matter the high number of Portuguese speakers in the world, have not endowed Portuguese with the &#8220;current&#8221; tools that are available for other languages, even with much fewer speakers/readers/writers, like Hebrew or Italian: There is, for example, no Google Book N-grams<xref ref-type="fn" rid="n3">3</xref> service for Portuguese.</p>
<p>Likewise, recent reviews of the computational literature landscape, because they do not find enough internationally published DH papers on Portuguese, have decided not to review or include papers in Portuguese, therefore contributing actively to the lack of information on Lusophone materials and studies. For example, Sch&#246;ch et al. (<xref ref-type="bibr" rid="B25">2022, 4</xref>) state:</p>
<disp-quote>
<p>&#8220;several languages, however, were represented only with relatively low numbers of articles or papers, and in order not to misrepresent the research communities these publications stem from, we decided not to take the materials in several languages into account [&#8230;].&#8221;</p>
</disp-quote>
<p>This is one of the reasons why we are writing this paper for an international audience. Maybe the results are not so different than the ones our English-speaking or English-studying colleagues obtained, but they are novel because they are obtained from completely different data.</p>
</sec>
</sec>
<sec id="S2">
<title>2. The Material</title>
<p>We provide here an overview of the data used, also with the purpose of making it known, and hopefully, useful, for other researchers. And not least because it shows the methodological problems it invites.</p>
<p>Attempting to complement close readings of canonical authors with a wider material, following Moretti (<xref ref-type="bibr" rid="B13">2000</xref>, <xref ref-type="bibr" rid="B14">2013</xref>) and Underwood (<xref ref-type="bibr" rid="B29">2019</xref>), we use as many books as possible whose full text is currently publicly available in Portuguese to investigate properties of literary text which can be identified in an automatic way.</p>
<p>In order for these data to be shareable and replicable for studies, we restrain our data (mostly<xref ref-type="fn" rid="n4">4</xref>) to books in the public domain. We are aware that many more texts exist in electronic form, but by using them we would either incur copyright law infringement, or at least we would risk creating materials only for our own study, which cannot be shared with others.</p>
<p>Also, it is important to stress that we are referring to textual versions of the works, not simply images. Optical character recognition for Portuguese, especially for old books, is not good enough yet, so all books have been revised by humans, if not born digital.</p>
<sec id="S2.1">
<title>2.1 Corpus</title>
<p>We used <italic>Literateca</italic> version 11.1, created on 26 May 2023, comprising ca. 32 Mio. tokens of (original) prose (excluding drama) from 1700 on. A quantitative overview of the material is in <xref ref-type="table" rid="T1">Table 1</xref>. <xref ref-type="fig" rid="F1">Figure 1</xref> shows the distribution of the material in time, by size in words.</p>
<table-wrap id="T1">
<caption>
<p><bold>Table 1:</bold> Size of the material: prose from 1700 to the present.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top">Literature</td>
<td align="right" valign="top">no. of tokens</td>
<td align="right" valign="top">no. works</td>
<td align="right" valign="top">no. authors</td>
</tr>
<tr>
<td align="left" valign="top">Total</td>
<td align="right" valign="top">32,718,621</td>
<td align="right" valign="top">669</td>
<td align="right" valign="top">200</td>
</tr>
<tr>
<td align="left" valign="top">Portuguese</td>
<td align="right" valign="top">20,639,007</td>
<td align="right" valign="top">306</td>
<td align="right" valign="top">127</td>
</tr>
<tr>
<td align="left" valign="top">Brazilian</td>
<td align="right" valign="top">12,079,614</td>
<td align="right" valign="top">355</td>
<td align="right" valign="top">73</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F1">
<caption>
<p><bold>Figure 1:</bold> Distribution of words per decade.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g1.png"/>
</fig>
<p><italic>Literateca</italic> is the result of the merging of several literary corpora written in Portuguese, and thus has some particularities:</p>
<list list-type="bullet">
<list-item><p>It includes literary works by canonical authors, but also other works by those canonical writers that are not usually or necessarily deemed literary, such as newspaper chronicles, letters, memoirs, and even scholarly works such as history books or ethnographic studies, and travelogues. For earlier centuries, even sermons are included. However, these genres are only included for canonical writers.<xref ref-type="fn" rid="n5">5</xref></p></list-item>
<list-item><p>It includes drama, poetry, and prose.</p></list-item>
<list-item><p>Some of the works have updated orthography, others keep the original orthography. Given that there have been several norms of Portuguese spelling across the centuries, this means that there can be a variety of forms for the same word.</p></list-item>
<list-item><p>While some authors have all their works included, others have only a few, or just one. Especially for non-canonical writers, there is no claim to completeness.</p></list-item>
<list-item><p>Given that the works have been digitised by different bodies and with different tools and for different purposes, there is no claim to homogeneity: Works can come from the first or the last paper version, they may keep their prefaces or not, they have different ways of describing chapters, etc.</p></list-item>
<list-item><p>All works are marked with author, author gender, date of publication, variety of Portuguese, genre, and whether they are original or translated. Some texts are also classified by the literary school they belong to.</p></list-item>
</list>
<p>We tried to use as much of this material as we could, but we removed poetry and drama. Poetry is probably a natural choice to remove because of syntactic idiosyncrasies &#8211; and therefore a worse parser performance &#8211;, and because we believe that poetry has not so many mentions of fictional characters. We removed drama, also in prose, because it was heavily unbalanced, given that most of the plays were from Portugal.</p>
<p>As to prose, we started to use everything published since 1700. It is, anyway, important to recognise that we do not have a balanced corpus, and the lion&#8217;s share is fiction. We then selected different subsets for different research questions.</p>
<list list-type="bullet">
<list-item><p>Just fiction and just non-fiction, to see whether the character depiction was different across the fiction divide.</p></list-item>
<list-item><p>Just works published after 1840 to compare Brazilian and Portuguese authors.</p></list-item>
<list-item><p>Just fiction published after 1840 to compare Brazilian and Portuguese literature.</p></list-item>
</list>
<p>See <xref ref-type="fig" rid="F2">Figure 2</xref> and <xref ref-type="fig" rid="F3">Figure 3</xref> for a bird&#8217;s eye view of the genre distributions in total and in fiction.</p>
<fig id="F2">
<caption>
<p><bold>Figure 2:</bold> Genre in the full corpus. The unit is the work. In red are the works written by Portuguese authors.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g2.png"/>
</fig>
<fig id="F3">
<caption>
<p><bold>Figure 3:</bold> Genre in the fiction corpus. The unit is the work. In red are the works written by feminine authors.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g3.png"/>
</fig>
<p>Only in <xref ref-type="fig" rid="F3">Figure 3</xref>, we include the variable author gender, since it is only in fiction that we have texts written by women.</p>
<p>In <xref ref-type="table" rid="T2">Table 2</xref>, we give the number of words in the material published after 1840.</p>
<table-wrap id="T2">
<caption>
<p><bold>Table 2:</bold> Size in words of the different materials after 1840.</p>
</caption>
<table>
<tbody>
<tr>
<td align="right" valign="top"></td>
<td align="left" valign="top">Fiction</td>
<td align="left" valign="top">Non fiction</td>
<td align="left" valign="top">Total</td>
</tr>
<tr>
<td align="right" valign="top">Brazil</td>
<td align="left" valign="top">10,547,327</td>
<td align="left" valign="top">1,532,287</td>
<td align="left" valign="top">12,079,614</td>
</tr>
<tr>
<td align="right" valign="top">Portugal</td>
<td align="left" valign="top">15,280,938</td>
<td align="left" valign="top">5,358,069</td>
<td align="left" valign="top">20,639,007</td>
</tr>
<tr>
<td align="right" valign="top">Total</td>
<td align="left" valign="top">25,828,265</td>
<td align="left" valign="top">6,890,356</td>
<td align="left" valign="top">32,718,621</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="S2.2">
<title>2.2 Gender Attribution</title>
<p>We explore the influence of gender in both character description and authorship. Masculine and feminine gender labels were manually ascribed to writers, as our corpus contains works written by canonical authors, which are widely discussed in literary studies. For the non-canonical authors, gender was attributed either based on adjective/inflected forms used in prefaces or based on their proper names. As for the characters, the gender labels were automatically assigned by the PALAVRAS parser, and then manually revised by linguists (<xref ref-type="bibr" rid="B16">Rocha et al. 2019</xref>; <xref ref-type="bibr" rid="B27">Silva 2021</xref>). The linguistic clues that were followed on attributing and revising gender were syntactic agreement and morphological features.</p>
<p>Portuguese is a Romance language that forces the speakers to specify the gender of nouns (both common and proper nouns) and adjectives. The main formal clue to distinguish masculine and feminine forms is the word&#8217;s ending: Masculine forms tend to end in <italic>-o</italic>, feminine ones tend to end in <italic>-a</italic>, and those ending in <italic>-e</italic> can be both feminine and masculine &#8211; <italic>ponte</italic> (&#8216;bridge&#8217;) is feminine, and <italic>pente</italic> (&#8216;comb&#8217;) is masculine. However, there is no perfect equivalence between the ending in <italic>-o</italic> or <italic>-a</italic> and the masculine or feminine gender, respectively &#8211; <italic>planeta</italic> (&#8216;planet&#8217;) is masculine, and <italic>tribo</italic> (&#8216;tribe&#8217;) is feminine. Therefore, observing syntactic agreement between the head noun and its modifiers is the most reliable way to assign morphological gender.</p>
<p>When calculating the gender of depicting words, we take into account the gender of the nominal head (noun, proper noun or pronoun) being characterised, not the gender of the words (modifiers) associated with it. This choice is due to the fact that, although adjectives can be inflected for gender in most cases, the search patterns we used also retrieve nouns, which do not admit inflection. Thus, nouns like <italic>anjo</italic> (&#8216;angel&#8217;) will always be masculine, even if the mentioned angels are feminine. When considering the gender of nominal heads, <italic>anjo</italic>, although a masculine common noun, is classified as a feminine classifier if it modifies a feminine character.</p>
</sec>
</sec>
<sec id="S3">
<title>3. The Process</title>
<p>We wanted to identify all cases where human beings were mentioned to find out how they were described or depicted. We extended the search patterns used by Silva (<xref ref-type="bibr" rid="B27">2021</xref>)<xref ref-type="fn" rid="n6">6</xref> in two ways: (i) We enriched the lexicon of general human nouns, including names of professions as targets, and (ii) having extended the number of works analysed to include works written by Portuguese authors, we broadened the lexicon of characterising words<xref ref-type="fn" rid="n7">7</xref>, based on the prose of the eighteenth, nineteenth and twentieth centuries in <italic>Literateca</italic>. During the process of data analysis, we were forced to discuss the previous classification, which led to a refinement of the classification guidelines and a reclassification of a few words.</p>
<p>We start from the idea that specific linguistic patterns indicate certain (semantic) relationships. So, we have used a set of patterns &#8211; relying on the automatic morpho-syntactic annotation &#8211; to search the material for instances of describing human beings. Below are some examples of what the patterns yielded (the patterns are publicly available).</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(1)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>&#8211; Ouviste? &#8211; perguntou <bold>ela inquieta.</bold> [&#8211; Did you hear? <bold>she</bold> asked <bold>restlessly.</bold>]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(2)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>&#8230;acudiu logo o <bold>padre</bold>, muito <bold>arisco.</bold> [&#8230;came the <bold>priest</bold>, very <bold>skittish.</bold>]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(3)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Uma <bold>mulher honesta</bold> n&#227;o tem segredos para seu marido! [A <bold>honest woman</bold> has no secrets from her husband!]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(4)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p><bold>D. Joana Tecla</bold> era <bold>idiota.</bold> [&#8211;<bold>Mrs. Joan Tecla</bold> was an <bold>idiot.</bold>]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(5)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Em todo o caso era uma bela <bold>mulher, alta</bold> e forte sem ser gorda&#8230; [In any case, she was a beautiful <bold>woman, tall</bold> and strong without being fat&#8230;]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(6)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>&#8230;calado como a tarde triste, um <bold>homem</bold>, ainda <bold>mo&#231;o</bold>, vestido como os ess&#234;nios taciturnos, caminhava&#8230; [&#8230;silent as the sad afternoon, a <bold>man</bold>, still <bold>young</bold>, dressed like&#8230;]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p>Then we proceeded to classify each word of the aforementioned list &#8211; which are the words associated with human beings in the examples &#8211;, in four (non-mutually exclusive) classes, according to the type of characterisation: social, emotional, physical (appearance), and character. In order to group these idiosyncratic data and provide a better view from afar, we analysed the most frequent words and came up with the four classes. We also used the class &#8216;other&#8217; if none of the four could hold, and one or more of the four otherwise. The allocation of the categories themselves follows their scope and the main choices involved are:</p>
<disp-quote>
<p><bold>social</bold> In addition to professions, occupations, and social status, we also included absence of profession like <italic>mendigo</italic> (&#8216;beggar&#8217;), nationality, civil status, family relations, political opinions like <italic>mon&#225;rquico</italic> (&#8216;monarchist&#8217;), and cases which are a consequence of social intercourse, like <italic>ignorante</italic> (&#8216;ignorant&#8217;) or <italic>educado</italic> (&#8216;civil&#8217; or &#8216;knowledgeable&#8217;).</p>
<p><bold>appearance</bold> Physical appearance, including clothing or lack of it, as well as those features associated with time, such as <italic>jovem</italic> (&#8216;young&#8217;) or <italic>velho</italic> (&#8216;old&#8217;).</p>
<p><bold>emotional</bold> Feelings, emotions, and emotional tendencies.</p>
<p><bold>character</bold> Personality traits, also including cognitive properties, such as intelligence or lack of it. It also includes evaluations according to social conduct, such as <italic>honesto</italic> (&#8216;honest&#8217;), <italic>malcriado</italic> (&#8216;rude&#8217;) or <italic>pretensioso</italic> (&#8216;snobbish&#8217;).</p>
</disp-quote>
<p>It is important to mention that each category works as a label, which in turn encodes four perspectives on people: &#8216;Appearance&#8217; refers to what is visible; &#8216;social&#8217; refers to the various roles someone can play in society; &#8216;character&#8217; refers to internal/cognitive characteristics; and &#8216;emotion&#8217; refers to emotional traits. We could also, and more broadly, consider two large classes: internal characteristics (&#8216;character&#8217; + &#8216;emotion&#8217;) and external characteristics (&#8216;appearance&#8217; + &#8216;social&#8217;). We will use this in <xref ref-type="fig" rid="F17">Figure 17</xref> below. We note that the words classified can often refer to non-human entities, as in the next example (7). But if they could modify a human person, they were classified accordingly. However, the results presented in the next sections refer only to those cases where the characterisation was assigned to human beings, as in example (8), since only they are retrieved by the patterns applied.</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(7)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>&#8211; Que <bold>triste</bold> pensamento! [What a <bold>sad</bold> thought!]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(8)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>&#8211; Mas a <bold>triste</bold> senhora continuava a choramingar. [But the <bold>sad</bold> woman kept weeping.]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p>We classified the retrieved words out of context, except in those rare cases where we had to check whether the adjective had been used to characterise at all in the corpus.<xref ref-type="fn" rid="n8">8</xref> For example, initially, we wanted to discard the words <italic>gran&#237;tico</italic> (&#8216;made of granite&#8217;) and <italic>triunfal</italic> (&#8216;of triumph&#8217;), but we checked the corpus and there were instances where both were applied to human characters, so they were retained in our list.</p>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(9)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>&#8211; Sim, o velho Afonso &#233; <bold>gran&#237;tico</bold>&#8230; [&#8211; Yes, old Afonso is <bold>made of granite</bold>&#8230;]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<list list-type="gloss">
<list-item>
<list list-type="wordfirst">
<list-item><p>(10)</p></list-item>
</list>
</list-item>
<list-item>
<list list-type="sentence-gloss">
<list-item>
<list list-type="final-sentence">
<list-item><p>Nunca as mulheres <bold>triunfais</bold> me fizeram bater o cora&#231;&#227;o&#8230; [<bold>Triumphal</bold> women never made my heart beat&#8230;]</p></list-item>
</list>
</list-item>
</list>
</list-item>
</list>
<p>The classification was done manually by the authors of this paper, and divergences were heartily discussed. We dismissed mistakes either because (i) they were not characterisation words, (ii) they resulted from wrong parsing, or (iii) we decided they were not relevant to our goals. As for exclusion:</p>
<list list-type="bullet">
<list-item><p>We did not take into account &#8220;complex adjectives&#8221; in the sense of having more than one word, like <italic>bem intencionado</italic> (&#8216;having good intentions&#8217;), <italic>mal intencionado</italic> (&#8216;having bad intentions&#8217;), <italic>bem educado<xref ref-type="fn" rid="n9">9</xref></italic> (&#8216;polite&#8217;), etc.</p></list-item>
<list-item><p>We did not classify relational adjectives, such as <italic>partid&#225;rio</italic> (<italic>de&#8230;</italic>) (&#8216;partisan&#8217;), <italic>apologista</italic> (<italic>de&#8230;</italic>) (&#8216;in favour of&#8217;), <italic>compar&#225;vel</italic> (<italic>a &#8230;</italic>) (&#8216;comparable to&#8217;), <italic>emparelhado com</italic> (&#8216;pairing with&#8217;), <italic>similhante a</italic> (&#8216;similar to&#8217;), since a precise characterisation would require a close reading of each sentence.</p></list-item>
<list-item><p>We dismissed misspellings, except for lack of diacritics.<xref ref-type="fn" rid="n10">10</xref> Our rationale is that, in future improved versions of the corpus, the corrected words would be correctly annotated.</p></list-item>
</list>
<p>Following the annotation approach adopted in the AC/DC project (<xref ref-type="bibr" rid="B17">Santos 2014</xref>), which underlies <italic>Literateca</italic>, we used multiple classification when two or more categories/senses could be assigned to a characterising word (vague or ambiguous words). References to madness, for instance, were considered both &#8216;social&#8217; and &#8216;character&#8217;. The same is true for habits like <italic>madrugador</italic> (&#8216;early riser&#8217;) and <italic>b&#234;bado</italic> (&#8216;drunkard&#8217; or &#8216;drunk&#8217;), which can be either due to biology or social upbringing. The word <italic>acanhado</italic> (&#8216;shy&#8217;) can be interpreted as a not-expansive person (thus &#8216;character&#8217;) or as someone fearful (&#8216;emotion&#8217;), and the same applies to <italic>impaciente</italic> (&#8216;impacient&#8217;), which can be interpreted as anxious (&#8216;emotion&#8217;) or restless (&#8216;character&#8217;).</p>
<p>Finally, cases such as <italic>maravilhoso</italic> (&#8216;wonderful&#8217;), <italic>incompar&#225;vel</italic> (&#8216;incomparable&#8217;), <italic>ideal</italic> (&#8216;ideal&#8217;) or <italic>horr&#237;vel</italic> (&#8216;horrible&#8217;), where it is not clear to which axis they apply out of context, were classified as referring simultaneously to &#8216;character&#8217;, &#8216;social&#8217; and &#8216;appearance&#8217;.</p>
<p>To verify the degree of reliability of the classifications and the adequacy of the classes, Silva (<xref ref-type="bibr" rid="B27">2021</xref>) carried out a study on the inter-annotator agreement of 15 people in the classification of occurrences considered especially difficult. The degree of agreement was 80%. We have not carried out any further studies on this matter.</p>
<p>After this classification, we ended up with a list of 4,481 words which might be employed in depicting human beings (see <xref ref-type="table" rid="T3">Table 3</xref>).<xref ref-type="fn" rid="n11">11</xref> Due to the properties of the parser, we list the lemmas which can be verb infinitives for past participle forms, because we use the lemmas in our patterns.</p>
<table-wrap id="T3">
<caption>
<p><bold>Table 3:</bold> Depicting words by category. Recall that words can belong to more than one category.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top">type</td>
<td align="right" valign="top">size</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">1,391</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">672</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">514</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">1,578</td>
</tr>
<tr>
<td align="left" valign="top">Other</td>
<td align="right" valign="top">326</td>
</tr>
<tr>
<td align="left" valign="top">Total</td>
<td align="right" valign="top">4,481</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In order to provide a richer description of this list, we show in <xref ref-type="table" rid="T4">Table 4</xref> how often depicting words are vague or ambiguous.</p>
<table-wrap id="T4">
<caption>
<p><bold>Table 4:</bold> Words belonging to several categories.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top">type</td>
<td align="right" valign="top">size</td>
</tr>
<tr>
<td align="left" valign="top">Appearance, character</td>
<td align="right" valign="top">88</td>
</tr>
<tr>
<td align="left" valign="top">Appearance, emotion</td>
<td align="right" valign="top">12</td>
</tr>
<tr>
<td align="left" valign="top">Appearance, social</td>
<td align="right" valign="top">8</td>
</tr>
<tr>
<td align="left" valign="top">Appearance, character, social</td>
<td align="right" valign="top">10</td>
</tr>
<tr>
<td align="left" valign="top">Character, emotion</td>
<td align="right" valign="top">107</td>
</tr>
<tr>
<td align="left" valign="top">Character, emotion, social</td>
<td align="right" valign="top">1</td>
</tr>
<tr>
<td align="left" valign="top">Character, social</td>
<td align="right" valign="top">80</td>
</tr>
<tr>
<td align="left" valign="top">Emotion, social</td>
<td align="right" valign="top">9</td>
</tr>
<tr>
<td align="left" valign="top">Total with more than one class</td>
<td align="right" valign="top">315</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We then annotated the corpus with this new classification<xref ref-type="fn" rid="n12">12</xref> and computed how often and when the words were used to describe human beings.</p>
<p>We start by providing a picture of the distribution of mentions of human characters over time in <xref ref-type="fig" rid="F4">Figure 4</xref>, as well as how many depicting events we were able to identify in <xref ref-type="fig" rid="F5">Figure 5</xref>.</p>
<fig id="F4">
<caption>
<p><bold>Figure 4:</bold> Number of mentions of characterised people in the corpus per decade.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g4.png"/>
</fig>
<fig id="F5">
<caption>
<p><bold>Figure 5:</bold> Relative characterisation per person, per decade.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g5.png"/>
</fig>
<p>A comment is in order: The decade of 1830 is a clear outlier because it contains only one short text of 19,334 words, a political pamphlet by Alexandre Herculano, in the whole decade. The same happens with 1950, which is represented in the material by only 4,777 words of Jorge Amado&#8217;s <italic>Gabriela, Cravo e Canela</italic>.</p>
</sec>
<sec id="S4">
<title>4. Analysis</title>
<p>The first thing we report is the proportion of these subclasses in our material. <xref ref-type="table" rid="T5">Table 5</xref> shows the raw numbers, and also those referring to masculine and feminine characters.<xref ref-type="fn" rid="n13">13</xref> <xref ref-type="fig" rid="F6">Figure 6</xref> displays the overall distribution of characterisation words.</p>
<table-wrap id="T5">
<caption>
<p><bold>Table 5:</bold> Different depiction classes, in general and per gender of the characterised person, using the subject&#8217;s gender.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="right" valign="top">Total</td>
<td align="right" valign="top">Mentions of masculine characters</td>
<td align="right" valign="top">Mentions of feminine characters</td>
</tr>
<tr>
<td align="left" valign="top">People</td>
<td align="right" valign="top">578,815</td>
<td align="right" valign="top">352,851</td>
<td align="right" valign="top">173,370</td>
</tr>
<tr>
<td align="left" valign="top">Characterised people</td>
<td align="right" valign="top">80,415</td>
<td align="right" valign="top">52,252</td>
<td align="right" valign="top">24,664</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">11,793</td>
<td align="right" valign="top">7,813</td>
<td align="right" valign="top">3,534</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">15,394</td>
<td align="right" valign="top">9,099</td>
<td align="right" valign="top">5,862</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">9,670</td>
<td align="right" valign="top">5,562</td>
<td align="right" valign="top">3,895</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">23,880</td>
<td align="right" valign="top">16,542</td>
<td align="right" valign="top">6,394</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F6">
<caption>
<p><bold>Figure 6:</bold> Distribution of characterisation words among the four classes for all, masculine and feminine, depictions.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g6.png"/>
</fig>
<p>The first observation is that there are many more mentions of masculine than of feminine characters in the material (ca. twice as many). Feminine characters are, however, almost as often characterised as the masculine ones: 14.2% against 14.8%.</p>
<p>The second remark is that by far the most frequent subclass deals with character (most frequent words: <italic>bom</italic> (&#8216;good&#8217;), <italic>grande</italic> (&#8216;great&#8217;), <italic>honrado</italic> (&#8216;honourable, honest&#8217;), <italic>simples</italic> (&#8216;simple&#8217;), <italic>digno</italic> (&#8216;with dignity&#8217;), <italic>excelente</italic> (&#8216;excellent&#8217;)), followed by appearance (most frequent words: <italic>velho</italic> (&#8216;old&#8217;), <italic>novo</italic> (&#8216;young&#8217;)<xref ref-type="fn" rid="n14">14</xref>, <italic>antigo</italic> (&#8216;old-fashioned&#8217;), <italic>jovem</italic> (&#8216;young&#8217;), <italic>belo</italic> (&#8216;beautiful&#8217;), <italic>formoso</italic> (&#8216;beautiful&#8217;), <italic>bonito</italic> (&#8216;pretty&#8217;)).</p>
<p>Social characterisation comes third (most frequent words: <italic>rico</italic> (&#8216;rich&#8217;), <italic>ilustre</italic> (&#8216;illustrious&#8217;), <italic>nobre</italic> (&#8216;noble&#8217;), <italic>casado</italic> (&#8216;married&#8217;), <italic>c&#233;lebre</italic> (&#8216;famous&#8217;), <italic>pobre</italic> (&#8216;poor&#8217;), <italic>livre</italic> (&#8216;free&#8217;), <italic>famoso</italic> (&#8216;famous&#8217;), while emotional characterisation is the least frequent (<italic>pobre</italic> (&#8216;poor&#8217;), <italic>infeliz</italic> (&#8216;unhappy&#8217;), <italic>valente</italic> (&#8216;brave&#8217;), <italic>feliz</italic> (&#8216;happy&#8217;), <italic>triste</italic> (&#8216;sad&#8217;), <italic>desgra&#231;ado</italic> (&#8216;miserable), <italic>alegre</italic> (&#8216;joyful&#8217;), <italic>humilde</italic> (&#8216;humble&#8217;)).</p>
<p>Thirdly, feminine characters have a higher chance of being characterised by their appearance compared to masculine ones (23.8% vs. 17.4%), which confirms the findings of previous studies, and which we return to in <xref ref-type="sec" rid="S4.2">subsection 4.2</xref>.</p>
<sec id="S4.1">
<title>4.1 Does Textual Genre Matter?</title>
<p>Does it make more sense to look only at literary texts, removing travelogues, essays, history and political writings?</p>
<p>On the one hand, we kept all the material because we wanted to look at the way people described people in Portuguese, but then it is also conceivable that the kinds of information about people are rather different when you write the history of the Inquisition, an essay about your fellow writers, or a report of you crossing Africa, compared with a narrative in which you introduce fictional characters.</p>
<p>So, we reproduced our queries, removing all texts not classified as novels, novellas or short stories (see the new numbers in <xref ref-type="table" rid="T6">Table 6</xref>).</p>
<table-wrap id="T6">
<caption>
<p><bold>Table 6:</bold> Different depiction classes, in general and per gender of the characterised person, using the subject&#8217;s gender only in novels, novellas, and short stories.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="right" valign="top">Total</td>
<td align="right" valign="top">Masculine</td>
<td align="right" valign="top">Feminine</td>
</tr>
<tr>
<td align="left" valign="top">Words</td>
<td align="right" valign="top">25,828,265</td>
<td align="right" valign="top"></td>
<td align="right" valign="top"></td>
</tr>
<tr>
<td align="left" valign="top">Mentions of people</td>
<td align="right" valign="top">490,892</td>
<td align="right" valign="top">291,403</td>
<td align="right" valign="top">159,216</td>
</tr>
<tr>
<td align="left" valign="top">Characterised mentions of people</td>
<td align="right" valign="top">47,450</td>
<td align="right" valign="top">30,036</td>
<td align="right" valign="top">16,620</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">8,968</td>
<td align="right" valign="top">5,720</td>
<td align="right" valign="top">2,979</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">12,951</td>
<td align="right" valign="top">7,401</td>
<td align="right" valign="top">5,226</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">8,767</td>
<td align="right" valign="top">4,922</td>
<td align="right" valign="top">3,665</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">19,002</td>
<td align="right" valign="top">12,587</td>
<td align="right" valign="top">5,773</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>It is interesting to see that removing the non-fictional prose genres does not change the relative order of the subcategories but increases the percentage of feminine characters, from 30.0% to 32.4%, and characterised feminine characters, from 33.2% to 35.0%.</p>
<p>As to the characterisation of masculine and feminine characters, we have similar trends to those presented for the full material, as shown in <xref ref-type="fig" rid="F7">Figure 7</xref>: Masculine targets are characterised, by far, by their character, while feminine targets are (almost) equally characterised by their appearance and their character.</p>
<fig id="F7">
<caption>
<p><bold>Figure 7:</bold> Relative characterisation per gender in novels, novellas, and short stories.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g7.png"/>
</fig>
<p>For the non-fiction part, let us see whether the picture is different. In <xref ref-type="table" rid="T7">Table 7</xref>, we describe the masculine and feminine characterisations in the (considerably smaller) non-fiction part.</p>
<table-wrap id="T7">
<caption>
<p><bold>Table 7:</bold> Different depiction classes, in general and per gender of the characterised person, using the subject&#8217;s gender only in non-fiction.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="right" valign="top">Total</td>
<td align="right" valign="top">Masculine</td>
<td align="right" valign="top">Feminine</td>
</tr>
<tr>
<td align="left" valign="top">Words</td>
<td align="right" valign="top">6,890,356</td>
<td align="right" valign="top"></td>
<td align="right" valign="top"></td>
</tr>
<tr>
<td align="left" valign="top">Mentions of people</td>
<td align="right" valign="top">87,923</td>
<td align="right" valign="top">61,448</td>
<td align="right" valign="top">14,154</td>
</tr>
<tr>
<td align="left" valign="top">of people</td>
<td align="right" valign="top">10,537</td>
<td align="right" valign="top">8,033</td>
<td align="right" valign="top">1,899</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">2,825</td>
<td align="right" valign="top">2,093</td>
<td align="right" valign="top">555</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">2,443</td>
<td align="right" valign="top">1,698</td>
<td align="right" valign="top">636</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">966</td>
<td align="right" valign="top">688</td>
<td align="right" valign="top">245</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">4,878</td>
<td align="right" valign="top">3,955</td>
<td align="right" valign="top">621</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The percentage of feminine characters and feminine characterisations shrunk considerably to 16% and 18%, confirming that women are even less important in the public sphere.</p>
<p>We see that social characteristics are &#8211; globally &#8211; more frequent than appearance. &#8216;Character&#8217; remains the most important form of describing people, and &#8216;emotion&#8217; the least.</p>
<p>In <xref ref-type="fig" rid="F8">Figure 8</xref>, we present the distribution of the four kinds of features and see that the few mentions of women that are present have a large proportion of appearance descriptions, even more in non-fiction than in fiction.</p>
<fig id="F8">
<caption>
<p><bold>Figure 8:</bold> Relative characterisation per gender in non-fiction.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g8.png"/>
</fig>
</sec>
<sec id="S4.2">
<title>4.2 Differences when Describing Masculine and Feminine Characters</title>
<p>The previous figures show that &#8216;appearance&#8217; is more frequently used when describing feminine characters. Based on the entire data set, this can also be seen in the bar plot in <xref ref-type="fig" rid="F9">Figure 9</xref>.</p>
<fig id="F9">
<caption>
<p><bold>Figure 9:</bold> Relative characterisation per gender in the whole material as a bar plot.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g9.png"/>
</fig>
<p>However, this is just the tip of the iceberg. The analysis of depictive words preferentially used with masculine and feminine characters can be more revealing than the general analysis we presented in <xref ref-type="fig" rid="F9">Figure 9</xref>, which takes into account the whole range of depictive words. In order to be evaluated as &#8216;preferred&#8217;, a word must (i) be used for masculine targets at least 80% of the occurrences, or for feminine targets more than 60% of the occurrences; and (ii) have a total frequency of 4 or more.</p>
<p>In cases where different lexical items correspond to gendered male/female pairs (<italic>m&#227;e/pai</italic> (&#8216;mother/father&#8217;); <italic>rainha/rei</italic> (&#8216;queen/king&#8217;); <italic>namorada/namorado</italic> (&#8216;girlfriend/boyfriend&#8217;) etc.), we manually grouped the elements of the pair as if they shared the same lemma, so that they could be included in the preference count.</p>
<p>The new data are presented in <xref ref-type="fig" rid="F10">Figure 10</xref>, which shows a slightly different picture, in which (i) words of the emotional axis are almost not seen at all and almost disappear with the feminine characters, (ii) the balance between &#8216;appearance&#8217; and &#8216;character&#8217; in feminine depiction gives way to a characterisation based mainly on &#8216;appearance&#8217;, which accounts for half of all preferred feminine characterisations, and (iii) &#8216;appearance&#180;, the second most frequent characterisation (of both masculine and feminine characters), drops to third place when associated with masculine characters, and rises to the first place when associated with feminine characters.</p>
<fig id="F10">
<caption>
<p><bold>Figure 10:</bold> Preferred characterisation per gender.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g10.png"/>
</fig>
<p>The &#8216;appearance&#8217; axis has a raw frequency almost similar for both genders, but <xref ref-type="fig" rid="F11">Figure 11</xref> and <xref ref-type="fig" rid="F12">Figure 12</xref>, complementary to <xref ref-type="fig" rid="F10">Figure 10</xref>, provide a few details that enrich the analyses.<xref ref-type="fn" rid="n15">15</xref></p>
<fig id="F11">
<caption>
<p><bold>Figure 11:</bold> Preferred characterisation of masculine characters.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g11.png"/>
</fig>
<fig id="F12">
<caption>
<p><bold>Figure 12:</bold> Preferred characterisation of feminine characters.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g12.png"/>
</fig>
<p>As noted in previous studies, typically feminine social characterisations relate to the family environment (<italic>m&#227;e</italic> (&#8216;mother&#8217;), <italic>prima</italic> (&#8216;cousin&#8217;)). However, mentions of the marital status are the highlight: (<italic>casada</italic> (&#8216;married&#8217;) and <italic>vi&#250;va</italic> (&#8216;widow&#8217;) are the most frequent words, but <italic>ad&#250;ltera</italic> (&#8216;adulteress&#8217;) is frequent as well. Conversely, marital status is absent as typical masculine social characterisation. They are rather related to (positive) social recognition such as <italic>ilustre</italic> (&#8216;illustrious&#8217;), <italic>c&#233;lebre</italic> (&#8216;famous&#8217;), <italic>not&#225;vel</italic> (&#8216;remarkable&#8217;), <italic>famoso</italic> (another word for &#8216;famous&#8217;), and <italic>poderoso</italic> (&#8216;powerful&#8217;).</p>
<p>On the feminine emotional axis, words associated with love and sweetness (<italic>adorada</italic> (&#8216;adored&#8217;) and <italic>meiga</italic> (&#8216;sweet&#8217;)) stand out, but also words associated with sadness and insecurity (<italic>pobre</italic> (&#8216;poor&#8217;), <italic>chorosa</italic> (&#8216;tearful&#8217;), <italic>ciumenta</italic> (&#8216;jealous&#8217;), <italic>ofendida</italic> (&#8216;offended&#8217;)), and fear (<italic>espavorida</italic> (&#8216;terrified&#8217;)). On the other hand, bravery is the masculine highlight: <italic>valente</italic> (&#8216;brave&#8217;) is, by far, the most frequent word and <italic>atrevido</italic> (&#8216;cheeky/audacious&#8217;) is in the sixth place. Humility (<italic>humilde</italic> (&#8216;humble&#8217;)) and anger (<italic>furioso</italic> (&#8216;furious&#8217;)) rank second and third, respectively. Anxiety also appears: <italic>desesperado</italic> (&#8216;desperate&#8217;) is the fourth most frequent emotional word for masculine characters.</p>
<p>Finally, masculine characters seem to be taken by surprise more often than feminine ones, frequently being <italic>assombrado</italic> (&#8216;haunted&#8217;), <italic>surpreso</italic> (&#8216;surprised&#8217;), and <italic>maravilhado</italic> (&#8216;marveled&#8217;), which might be due to their role in narrative events.</p>
<p>&#8216;Appearance&#8217;, although highly typical for feminine targets, varies relatively little in terms of the most frequently mentioned attributes: Beauty (<italic>bonita, formosa, bela, linda</italic>, Portuguese words for &#8216;beautiful&#8217;; <italic>encantadora</italic> (&#8216;charming&#8217;)) or the lack of it (<italic>feia</italic> (&#8216;ugly&#8217;)) are the most frequent features. In the masculine appearance axis, age and size, instead of beauty, are the most frequently mentioned attributes (<italic>velho</italic> (&#8216;old&#8217;) and <italic>jovem</italic> (&#8216;young&#8217;); <italic>robusto</italic> (&#8216;robust&#8217;), <italic>grande</italic> (&#8216;big&#8217;) and <italic>baixo</italic> (&#8216;short&#8217;)).</p>
<p>On the typically masculine character axis, positive traits such as <italic>grande</italic> (&#8216;great&#8217;), <italic>simples</italic> (&#8216;simple&#8217;), <italic>verdadeiro</italic> (&#8216;real&#8217;), <italic>valente</italic> (&#8216;brave&#8217;), <italic>livre</italic> (&#8216;free&#8217;), and <italic>h&#225;bil</italic> (&#8216;skillful&#8217;) stand out. Other highly mentioned positive traits are <italic>generoso</italic> (&#8216;generous&#8217;) and <italic>habilidoso</italic> (&#8216;skillful&#8217;). Negative highlights are <italic>mau</italic> (&#8216;bad&#8217;), <italic>terr&#237;vel</italic> (terrible), and <italic>rude</italic> (&#8216;rude&#8217;). For the feminine targets, the highlights are, in general, positive and associated with virtue: <italic>virtuosa</italic> (&#8216;virtuous&#8217;), <italic>santa</italic> (&#8216;holy&#8217;), and <italic>inocente</italic> (&#8216;innocent&#8217;). Other typically feminine characterisation words are <italic>meiga</italic> (&#8216;sweet&#8217;) and <italic>d&#243;cil</italic> (&#8216;docil&#8217;), but we also see <italic>fraca</italic> (&#8216;weak&#8217;), which contrasts with masculine strength.</p>
</sec>
<sec id="S4.3">
<title>4.3 Does the Gender of the Author Matter?</title>
<p>Do these findings vary according to the author&#8217;s gender? In our material, see <xref ref-type="table" rid="T8">Table 8</xref>, feminine authors use more appearance descriptions than masculine ones, as shown in <xref ref-type="fig" rid="F13">Figure 13</xref>.</p>
<fig id="F13">
<caption>
<p><bold>Figure 13:</bold> Characterisation by masculine and feminine authors. Note the different sizes in the y-axis.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g13.png"/>
</fig>
<table-wrap id="T8">
<caption>
<p><bold>Table 8:</bold> Different depiction classes for masculine and feminine authors in novels, novellas, and short stories.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="right" valign="top">Total</td>
<td align="right" valign="top">Feminine author</td>
<td align="right" valign="top">Masculine author</td>
</tr>
<tr>
<td align="left" valign="top">Words</td>
<td align="right" valign="top">25,828,265</td>
<td align="right" valign="top">1,206,744</td>
<td align="right" valign="top">24,621,521</td>
</tr>
<tr>
<td align="left" valign="top">People</td>
<td align="right" valign="top">490,892</td>
<td align="right" valign="top">24,271</td>
<td align="right" valign="top">466,621</td>
</tr>
<tr>
<td align="left" valign="top">Characterised people</td>
<td align="right" valign="top">57,680</td>
<td align="right" valign="top">2,235</td>
<td align="right" valign="top">55,445</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">8,968</td>
<td align="right" valign="top">355</td>
<td align="right" valign="top">8,613</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">12,951</td>
<td align="right" valign="top">595</td>
<td align="right" valign="top">12,356</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">8,704</td>
<td align="right" valign="top">533</td>
<td align="right" valign="top">8,171</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">19,002</td>
<td align="right" valign="top">887</td>
<td align="right" valign="top">18,115</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>However, there is a huge difference in the size of the compared material: There are only 1.2 Mio. words written by women compared to almost 32 Mio. words written by men. In fact, this is an inescapable problem, given the reduced number of texts by women in our corpus: only 19 authors who wrote 33 works in prose. <xref ref-type="fn" rid="n16">16</xref></p>
<p>Even though the material is very unbalanced, we tried to discern any interesting trends in the works written by women in terms of whose appearance is described more &#8211; could it be that they would emphasise or concentrate more on the appearance of masculine characters?</p>
<p>We get 265 appearance descriptions of feminine characters and 319 of masculine characters in 985 characterisations of feminine characters and 1,195 characterisations of masculine characters. In other words, 26.9% of feminine characterisations and 26.7% of masculine characterisations involve their appearance. But we acknowledge that the numbers are too small to be conclusive. In any case, it is conspicuous that both genders have roughly the same characterisation frequency in literature written by women.</p>
<p>Despite the imbalanced data, <xref ref-type="fig" rid="F14">Figure 14</xref> shows a preferential characterisation of both characters and writers in terms of gender. Below, we sketch some differences between human depiction in works written by men and women. The main difference is the increase of &#8216;appearance&#8217; in masculine characterisation in works written by woman.</p>
<fig id="F14">
<caption>
<p><bold>Figure 14:</bold> Preferred characterisation by masculine and feminine authors.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g14.png"/>
</fig>
<p>Beginning with feminine characters and focusing on women writers only, we found that <italic>married</italic> is no longer among the most frequent social depictions, but <italic>widow</italic> and <italic>single</italic> remain. Despite still being frequent, less space is devoted to beauty in works written by women. By contrast, age is more present: <italic>young</italic> and <italic>old</italic>. As for emotional characterisation, <italic>happy</italic> and <italic>adorable</italic> are the highlights, and none of the preferred emotional words relate to sadness. As for character, the highlights of feminine depiction words are <italic>honest, infamous, crazy, refined</italic>, and <italic>dangerous</italic>. In the social axis, masculine characters are mainly <italic>married</italic> and <italic>noble</italic>. Positive emotions are present for masculine characters as well (like <italic>happy/pleased, enthusiastic</italic>), but bravery (<italic>brave</italic>) has only one occurrence. Masculine &#8216;appearance&#8217; follows the general trend, and masculine characters are mainly <italic>kind</italic> and <italic>honourable</italic>.</p>
</sec>
<sec id="S4.4">
<title>4.4 Differences between Brazil and Portugal</title>
<p>Are there differences between the two countries with regard to people&#8217;s characterisation?</p>
<p>We compared the works from 1840 to the present day (Brazil became independent in 1822, and, as already mentioned, for the 1830 decade we only have one work by a Portuguese author).</p>
<p>We decided to compare only novels, novellas and short stories between the two countries because the non-fiction parts differ widely: While we have a large body of texts on history on the Portuguese side, we have almost only short essays in newspapers on the Brazilian side. The results are presented in <xref ref-type="table" rid="T9">Table 9</xref>.</p>
<table-wrap id="T9">
<caption>
<p><bold>Table 9:</bold> Different depiction classes in novels, novellas and short stories, in general, and per author nationality after 1840.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="right" valign="top">Total</td>
<td align="right" valign="top">Brazil</td>
<td align="right" valign="top">Portugal</td>
</tr>
<tr>
<td align="left" valign="top">People</td>
<td align="right" valign="top">486,575</td>
<td align="right" valign="top">209,283</td>
<td align="right" valign="top">277,292</td>
</tr>
<tr>
<td align="left" valign="top">Characterised people</td>
<td align="right" valign="top">46,704</td>
<td align="right" valign="top">19,642</td>
<td align="right" valign="top">27,062</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">8,887</td>
<td align="right" valign="top">3,545</td>
<td align="right" valign="top">5,342</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">12,877</td>
<td align="right" valign="top">6,199</td>
<td align="right" valign="top">6,678</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">8,704</td>
<td align="right" valign="top">4,874</td>
<td align="right" valign="top">3,650</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">18,782</td>
<td align="right" valign="top">7,649</td>
<td align="right" valign="top">11,133</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We can see that the numbers of &#8216;character&#8217; and &#8216;social&#8217; characterisation are somewhat higher in Portuguese literature, while the other categories &#8211; especially emotion &#8211; are more pronounced in Brazilian literature. One may wonder whether this is due to a more socially rigid society in Portugal, or whether the cause lies in the historical novels (almost absent in the Brazilian material and quite frequent in the Portuguese material).</p>
<p>We also investigated whether the differences among genders are more obvious in the Brazilian material or different from the ones in the Portuguese material. For this, we created <xref ref-type="table" rid="T10">Table 10</xref>, where we can see that Brazilian literature has a higher proportion of mentions of feminine characters (36.5%) than the Portuguese (29.7%). This may again be due to the historical novels, but needs to be investigated further.</p>
<table-wrap id="T10">
<caption>
<p><bold>Table 10:</bold> Different depiction classes in novels, novellas, and short stories after 1840 per author nationality and per gender of the characterised.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="right" valign="top">Br total</td>
<td align="right" valign="top">Br fem.</td>
<td align="right" valign="top">Br masc.</td>
<td align="right" valign="top">Pt total</td>
<td align="right" valign="top">Pt fem.</td>
<td align="right" valign="top">Pt masc.</td>
</tr>
<tr>
<td align="left" valign="top">People</td>
<td align="right" valign="top">202,829</td>
<td align="right" valign="top">74,020</td>
<td align="right" valign="top">118,088</td>
<td align="right" valign="top">275,301</td>
<td align="right" valign="top">81,847</td>
<td align="right" valign="top">165,796</td>
</tr>
<tr>
<td align="left" valign="top">Characterised people</td>
<td align="right" valign="top">17,453</td>
<td align="right" valign="top">6,381</td>
<td align="right" valign="top">10,591</td>
<td align="right" valign="top">24,548</td>
<td align="right" valign="top">8,452</td>
<td align="right" valign="top">15,372</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">3,545</td>
<td align="right" valign="top">1,216</td>
<td align="right" valign="top">2,217</td>
<td align="right" valign="top">5,342</td>
<td align="right" valign="top">1,753</td>
<td align="right" valign="top">3,434</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">6,199</td>
<td align="right" valign="top">2,579</td>
<td align="right" valign="top">3,472</td>
<td align="right" valign="top">6,678</td>
<td align="right" valign="top">2,618</td>
<td align="right" valign="top">3,885</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">3,474</td>
<td align="right" valign="top">1,444</td>
<td align="right" valign="top">1,949</td>
<td align="right" valign="top">5,230</td>
<td align="right" valign="top">2,206</td>
<td align="right" valign="top">2,925</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">7,649</td>
<td align="right" valign="top">2,446</td>
<td align="right" valign="top">4,955</td>
<td align="right" valign="top">11,133</td>
<td align="right" valign="top">3,292</td>
<td align="right" valign="top">7,452</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="table" rid="T10">Table 10</xref>, we see that the social status of male characters is more important in Portuguese literature.</p>
<p>If we now compare the distribution by country and by gender, presented in <xref ref-type="fig" rid="F15">Figure 15</xref>, masculine characters seem to be similarly depicted, although in Portuguese-authored works there is a slightly more balanced distribution between appearance, social and emotion axes. In Brazilian-authored works, besides the emphasis on &#8216;appearance&#8217;, there is proportionally less use of the character axis, which leads to a smaller difference between characterisations by &#8216;appearance&#8217; and by &#8216;character&#8217;. For feminine characters, there are relatively fewer mentions of their social status and emotional states in Brazilian-authored works.</p>
<fig id="F15">
<caption>
<p><bold>Figure 15:</bold> Characterisation by country.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g15.png"/>
</fig>
</sec>
<sec id="S4.5">
<title>4.5 Differences among Authors</title>
<p>In <xref ref-type="table" rid="T11">Table 11</xref>, we show the distribution of the types of characterisation for 12 canonical authors, six Brazilian and six Portuguese.</p>
<table-wrap id="T11">
<caption>
<p><bold>Table 11:</bold> Different depiction classes per authors ordered by number of characterisations. &#8220;nr&#8221; shows the number of different fiction works by that author in <italic>Literateca</italic> and &#8220;mfreq&#8221; the most frequent characterising word.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top">Author</td>
<td align="left" valign="top">Country</td>
<td align="right" valign="top">nr</td>
<td align="right" valign="top">Total</td>
<td align="right" valign="top">Character</td>
<td align="right" valign="top">Social</td>
<td align="right" valign="top">Appearance</td>
<td align="right" valign="top">Emotion</td>
<td align="right" valign="top">mfreq</td>
</tr>
<tr>
<td align="left" valign="top">Camilo Castelo Branco</td>
<td align="left" valign="top">PT</td>
<td align="right" valign="top">42</td>
<td align="right" valign="top">4,045</td>
<td align="right" valign="top">1,781</td>
<td align="right" valign="top">938</td>
<td align="right" valign="top">845</td>
<td align="right" valign="top">481</td>
<td align="right" valign="top">pobre</td>
</tr>
<tr>
<td align="left" valign="top">Machado de Assis</td>
<td align="left" valign="top">BR</td>
<td align="right" valign="top">140</td>
<td align="right" valign="top">1,864</td>
<td align="right" valign="top">793</td>
<td align="right" valign="top">219</td>
<td align="right" valign="top">643</td>
<td align="right" valign="top">209</td>
<td align="right" valign="top">bom</td>
</tr>
<tr>
<td align="left" valign="top">E&#231;a de Queir&#243;s</td>
<td align="left" valign="top">PT</td>
<td align="right" valign="top">16</td>
<td align="right" valign="top">2,487</td>
<td align="right" valign="top">1,019</td>
<td align="right" valign="top">420</td>
<td align="right" valign="top">923</td>
<td align="right" valign="top">125</td>
<td align="right" valign="top">bom</td>
</tr>
<tr>
<td align="left" valign="top">JM de Macedo</td>
<td align="left" valign="top">BR</td>
<td align="right" valign="top">7</td>
<td align="right" valign="top">1,325</td>
<td align="right" valign="top">411</td>
<td align="right" valign="top">232</td>
<td align="right" valign="top">515</td>
<td align="right" valign="top">167</td>
<td align="right" valign="top">velho</td>
</tr>
<tr>
<td align="left" valign="top">Alu&#237;sio Azevedo</td>
<td align="left" valign="top">BR</td>
<td align="right" valign="top">13</td>
<td align="right" valign="top">1,307</td>
<td align="right" valign="top">513</td>
<td align="right" valign="top">191</td>
<td align="right" valign="top">374</td>
<td align="right" valign="top">229</td>
<td align="right" valign="top">pobre</td>
</tr>
<tr>
<td align="left" valign="top">Jos&#233; d&#8217;Alencar</td>
<td align="left" valign="top">BR</td>
<td align="right" valign="top">15</td>
<td align="right" valign="top">887</td>
<td align="right" valign="top">331</td>
<td align="right" valign="top">154</td>
<td align="right" valign="top">370</td>
<td align="right" valign="top">32</td>
<td align="right" valign="top">velho</td>
</tr>
<tr>
<td align="left" valign="top">Coelho Neto</td>
<td align="left" valign="top">BR</td>
<td align="right" valign="top">17</td>
<td align="right" valign="top">966</td>
<td align="right" valign="top">369</td>
<td align="right" valign="top">81</td>
<td align="right" valign="top">440</td>
<td align="right" valign="top">76</td>
<td align="right" valign="top">velho</td>
</tr>
<tr>
<td align="left" valign="top">Humberto de Campos</td>
<td align="left" valign="top">BR</td>
<td align="right" valign="top">6</td>
<td align="right" valign="top">766</td>
<td align="right" valign="top">169</td>
<td align="right" valign="top">193</td>
<td align="right" valign="top">368</td>
<td align="right" valign="top">36</td>
<td align="right" valign="top">velho</td>
</tr>
<tr>
<td align="left" valign="top">J&#250;lio Dinis</td>
<td align="left" valign="top">PT</td>
<td align="right" valign="top">9</td>
<td align="right" valign="top">1,038</td>
<td align="right" valign="top">430</td>
<td align="right" valign="top">127</td>
<td align="right" valign="top">302</td>
<td align="right" valign="top">179</td>
<td align="right" valign="top">pobre</td>
</tr>
<tr>
<td align="left" valign="top">Te&#243;filo Braga</td>
<td align="left" valign="top">PT</td>
<td align="right" valign="top">4</td>
<td align="right" valign="top">419</td>
<td align="right" valign="top">144</td>
<td align="right" valign="top">82</td>
<td align="right" valign="top">112</td>
<td align="right" valign="top">81</td>
<td align="right" valign="top">pobre</td>
</tr>
<tr>
<td align="left" valign="top">Alexandre Herculano</td>
<td align="left" valign="top">PT</td>
<td align="right" valign="top">8</td>
<td align="right" valign="top">809</td>
<td align="right" valign="top">321</td>
<td align="right" valign="top">201</td>
<td align="right" valign="top">228</td>
<td align="right" valign="top">59</td>
<td align="right" valign="top">velho</td>
</tr>
<tr>
<td align="left" valign="top">Raul Brand&#227;o</td>
<td align="left" valign="top">PT</td>
<td align="right" valign="top">5</td>
<td align="right" valign="top">206</td>
<td align="right" valign="top">73</td>
<td align="right" valign="top">24</td>
<td align="right" valign="top">102</td>
<td align="right" valign="top">7</td>
<td align="right" valign="top">grande</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We can see that there are some differences among these authors. They agree in that none of them emphasises an explicitly emotional description, and several authors follow the &#8220;general&#8221; pattern in fiction: first &#8216;character&#8217;, then &#8216;appearance&#8217;, then &#8216;social&#8217;, and finally &#8216;emotion&#8217;: Machado de Assis, E&#231;a de Queir&#243;s, Alu&#237;sio de Azevedo, Jos&#233; de Alencar, J&#250;lio Dinis, Te&#243;filo Braga, and Alexandre Herculano.</p>
<p>However, in Jos&#233; Manuel de Macedo, Coelho Neto and Raul Brand&#227;o &#8216;appearance&#8217; is the most frequent characterisation and &#8216;character&#8217; is the second most frequent.</p>
<p>As to the relative order of &#8216;character&#8217; and &#8216;social&#8217; characterisation, Humberto de Campos is the only one who reverts the &#8220;canonical&#8221; order, using more &#8216;social&#8217; characterisations than those reflecting &#8216;character&#8217;, while Camilo Castelo Branco (incidentally the author with the highest number of works in <italic>Literateca</italic>) is the only one who describes more &#8216;social&#8217; than &#8216;appearance&#8217;.</p>
<p>In any case, there are also differences in the number of characterisations provided by each author: <xref ref-type="fig" rid="F16">Figure 16</xref> illustrates how much each author depicts, i.e. how many characterisations they use per number of words.</p>
<fig id="F16">
<caption>
<p><bold>Figure 16:</bold> Characterisation by author. From left to right, Humberto de Campos, Jos&#233; Manuel de Macedo, Alu&#237;sio Azevedo, Camilo Castelo Branco, J&#250;lio Dinis, Jos&#233; de Alencar, E&#231;a de Queir&#243;s, Machado de Assis, Coelho Neto, Alexandre Herculano, Raul Brand&#227;o, and Te&#243;filo Braga.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g16.png"/>
</fig>
<p>In <xref ref-type="fig" rid="F17">Figure 17</xref>, we represent each author in a plane formed by internal and external characteristics.</p>
<fig id="F17">
<caption>
<p><bold>Figure 17:</bold> Characterisation by author in terms of type and relative weight of characterisation.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g17.png"/>
</fig>
</sec>
<sec id="S4.6">
<title>4.6 The Influence of Literary School</title>
<p>For a subset of the works of <italic>Literateca</italic>, we have metadata about the literary school to which they belong, as described in Santos et al. (<xref ref-type="bibr" rid="B22">2020</xref>).</p>
<p>We selected all works marked as romantic in one group (11,850,395 words, 175 books) and those marked as realist or naturalistic (7,616,384 words, 121 different books) in another group<xref ref-type="fn" rid="n17">17</xref> to see whether one could identify differences regarding people&#8217;s depictions just based on this fourfold sub-classification, and also according to the gender of who gets characterised. The results are presented in <xref ref-type="table" rid="T12">Table 12</xref> and in <xref ref-type="fig" rid="F18">Figure 18</xref>.</p>
<fig id="F18">
<caption>
<p><bold>Figure 18:</bold> Characterisation per literary school and per gender.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jcls-3567_freitas-g18.png"/>
</fig>
<table-wrap id="T12">
<caption>
<p><bold>Table 12:</bold> Different depiction classes in novels, novellas and short stories per literary school and per gender of the characterised.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="right" valign="top">Romantic</td>
<td align="right" valign="top">fem</td>
<td align="right" valign="top">masc</td>
<td align="right" valign="top">Realist</td>
<td align="right" valign="top">fem</td>
<td align="right" valign="top">masc</td>
</tr>
<tr>
<td align="left" valign="top">People</td>
<td align="right" valign="top">238,338</td>
<td align="right" valign="top">74,991</td>
<td align="right" valign="top">142,245</td>
<td align="right" valign="top">149,699</td>
<td align="right" valign="top">52,771</td>
<td align="right" valign="top">86,861</td>
</tr>
<tr>
<td align="left" valign="top">Characterised</td>
<td align="right" valign="top">22,733</td>
<td align="right" valign="top">8,140</td>
<td align="right" valign="top">14,041</td>
<td align="right" valign="top">13,834</td>
<td align="right" valign="top">5,187</td>
<td align="right" valign="top">8,244</td>
</tr>
<tr>
<td align="left" valign="top">Social</td>
<td align="right" valign="top">4,629</td>
<td align="right" valign="top">1,510</td>
<td align="right" valign="top">3,002</td>
<td align="right" valign="top">2,516</td>
<td align="right" valign="top">946</td>
<td align="right" valign="top">1,501</td>
</tr>
<tr>
<td align="left" valign="top">Appearance</td>
<td align="right" valign="top">5,573</td>
<td align="right" valign="top">2,279</td>
<td align="right" valign="top">3,179</td>
<td align="right" valign="top">3,944</td>
<td align="right" valign="top">1,678</td>
<td align="right" valign="top">2,147</td>
</tr>
<tr>
<td align="left" valign="top">Emotion</td>
<td align="right" valign="top">4,370</td>
<td align="right" valign="top">1,932</td>
<td align="right" valign="top">2,350</td>
<td align="right" valign="top">2,635</td>
<td align="right" valign="top">1,112</td>
<td align="right" valign="top">1,464</td>
</tr>
<tr>
<td align="left" valign="top">Character</td>
<td align="right" valign="top">9,389</td>
<td align="right" valign="top">2,899</td>
<td align="right" valign="top">6,237</td>
<td align="right" valign="top">5,649</td>
<td align="right" valign="top">1,819</td>
<td align="right" valign="top">3,650</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The first interesting remark is that there are (relatively) more mentions of feminine characters in realist works than in romantic. However, 10.9% of the feminine occurrences are characterised in romantic books (and 9.9% of masculine occurrences), but only 9.8% in realist ones (compared to 9.5% for masculine).</p>
<p>We see that in romanticism, there are far more &#8216;character&#8217; characterisations of masculine characters than in realism, where the relationship across all kinds of characterisations is stable across genres. In addition, realism describes the physical appearance of both genders, while romanticism prefers feminine appearance.</p>
</sec>
<sec id="S4.7">
<title>4.7 Going back to DIP</title>
<p>DIP has clearly demonstrated that there are fewer feminine characters in Lusophone literature. In this study, however, we see that those feminine characters are relatively more characterised, at least for &#8216;appearance&#8217;, than the masculine ones.</p>
<p>Ideally, and for the near future, we would like to connect the two studies/activities/forms of distantly looking at literature and provide, for each literary work, not only their description in terms of characters (as DIP does) but also how each character is characterised, using the present work and some form of anaphoric resolution of the non-proper name depictions and of those cases where human subjects (whether or not proper names) are omitted (Freitas and Souza (<xref ref-type="bibr" rid="B7">2021</xref>) found omitted subjects in 41% of clauses in Brazilian literature material).</p>
<p>We might therefore link types of characters with particular clusters of properties, like the beautiful rich woman and the poor honest lad and the evil old priest.</p>
</sec>
</sec>
<sec id="S5">
<title>5. Concluding Remarks</title>
<p>In this paper, we offered some insights into human depiction based on distant reading literature in Portuguese. We can summarise our results as follows: Human depiction seems to obey the pattern &#8216;character&#8217;, &#8216;social&#8217;, &#8216;appearance&#8217;, and &#8216;emotion&#8217; for masculine characters, and &#8216;character&#8217; and &#8216;appearance&#8217;, &#8216;social&#8217; and &#8216;emotion&#8217; for feminine characters. If we consider only preferred depiction words, differences between feminine and masculine characters become more pronounced, and changing the lens &#8211; from distant to close reading &#8211; reveals that features associated with characters are related to their genders. The results also suggest an impact of the author&#8217;s gender in the types of characterisation used, but the limited number of works written by women hinders a more definite conclusion.</p>
<p>We acknowledge that the material we used (works and words) is smaller than those used in other studies conducted under the umbrella of Digital Humanities. However, our findings show that an advantage of annotated data is the opportunity to see trends and patterns even in moderately sized collections. Furthermore, we stress that another intention of this work is to convince (the Portuguese-speaking community, mainly) to enlarge Portuguese-language literary collections with machine-readable texts.</p>
<p>In the near future, we would like to assess the precision of each rule used, and to correct the detected mistakes, as well as to widen the scope of characterisation. We are aware that human depiction is not restricted to the lexical-syntactic patterns we used, and to detect other ways in which the Portuguese language manifests characterisation is, therefore, a natural route to continue the investigation.</p>
<p>We are also aware that our study mainly reflects the vision of male authors of the nineteenth and early twentieth centuries. Therefore, it is by no means an unbiased description of gender. Other studies that we may undertake on this material will add an evaluative view: Which of these ways of depicting are positive, negative, or neutral? This is more straightforward for character and emotional words, but also possible for appearance and even social descriptions. We could also separate age from appearance and check what this dimension might bring.</p>
<p>In any case, all the material is open for inspection, from the lists of the characterising words to the patterns used, and the annotated works themselves, which allow interested researchers to repeat our searches and even refine them.</p>
</sec>
<sec id="S6">
<title>6. Data Availability</title>
<p>We make available in Zenodo:</p>
<list list-type="bullet">
<list-item><p>the list of characterising words, classified in five classes: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.5281/zenodo.7979566">https://doi.org/10.5281/zenodo.7979566</ext-link>;</p></list-item>
<list-item><p>the patterns to find them in the corpus, together with the commands to create the tables and/or figures used in the paper: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://doi.org/10.5281/zenodo.7979619">https://doi.org/10.5281/zenodo.7979619</ext-link>.</p></list-item>
</list>
</sec>
</body>
<back>
<sec id="S7">
<title>7. Acknowledgements</title>
<p>We thank Funda&#231;&#227;o Cient&#237;fica para a Computa&#231;&#227;o Nacional (FCCN) of Portugal for hosting the corpora on which this study is conducted, and UNINETT Sigma2 &#8211; the National Infrastructure for High Performance Computing and Data Storage in Norway for the computational resources. We thank the reviewers for constructive criticism and the audience at the CCLS for comments and suggestions.</p>
</sec>
<sec id="S8">
<title>8. Author Contributions</title>
<p><bold>Cl&#225;udia Freitas:</bold> Conceptualization, Writing &#8211; original draft, review &amp; editing</p>
<p><bold>Diana Santos:</bold> Conceptualization, Writing &#8211; original draft, review &amp; editing</p>
</sec>
<fn-group>
<fn id="n1"><p>Although published in 2022, the work was conducted in 2018.</p></fn>
<fn id="n2"><p>By this, we mean that when people are mentioned to specify a time frame or authorship, as in <italic>During D. Jo&#227;o VI&#8217;s reign</italic>, or as in <italic>Goethe&#8217;s Faust</italic>, neither &#8220;D. Jo&#227;o VI&#8221; nor &#8220;Goethe&#8221; were considered characters. But this turned out to be a controversial decision and hard to decide in historical novels. In any case, it does represent an unusual way to look at literary characters that needs to be documented.</p></fn>
<fn id="n3"><p>See: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://books.google.com/ngrams/">https://books.google.com/ngrams/</ext-link>.</p></fn>
<fn id="n4"><p>Exceptions are excerpts of books existing in parallel corpora or texts whose authors gave us permission to use them.</p></fn>
<fn id="n5"><p>By this, we mean that established authors who belong to the Portuguese and Brazilian canons have been fully digitised, i.e., everything they published is available. This is in strong contrast with the works of non-canonical authors, who may have had some of their (mainly) novels digitised in the context of other projects.</p></fn>
<fn id="n6"><p>Which, in turn, are an improvement of the patterns used in Freitas et al. (<xref ref-type="bibr" rid="B6">2022</xref>).</p></fn>
<fn id="n7"><p>The list comprises not only adjectives and nouns, but also verbs (for past participles), given that it is a feature of PALAVRAS that most participles are analysed as verbs even though in an adjectival context.</p></fn>
<fn id="n8"><p>Actually, there was one case where we consistently considered the context: In Portuguese, the word <italic>grande</italic> can mean either <italic>big</italic> or <italic>great</italic>. Since each meaning corresponds, in general, to a different syntactic position &#8211; <italic>grande homem</italic> (&#8216;great man&#8217;); <italic>homem grande</italic> (&#8216;big man&#8217;), we used this information to correctly classify each of the occurrences: &#8216;character&#8217; or &#8216;appearance&#8217;, respectively.</p></fn>
<fn id="n9"><p>But note that <italic>educado</italic> and <italic>bem-educado</italic>, as words of size one, were included.</p></fn>
<fn id="n10"><p>That is, we considered missing accents to be something that could be present in the original paper edition but not OCR mistakes.</p></fn>
<fn id="n11"><p>Available from <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.linguateca.pt/Gramateca/ListaPredicadoresClassificados.txt">https://www.linguateca.pt/Gramateca/ListaPredicadoresClassificados.txt</ext-link>.</p></fn>
<fn id="n12"><p>The classification is encoded in the following tags <monospace>pred:carater, pred:aparencia, pred:social</monospace> and <monospace>pred:emo</monospace>. To find them in <italic>Literateca</italic>, search for <monospace>[sema=&#34;.*pred:social.*&#34;]</monospace>, etc.</p></fn>
<fn id="n13"><p>It should be noted that the numbers do not add up because in some cases the parser is not able to assign a morphological gender and marks them as M/F. Also, remember that by &#8220;character&#8221; here we mean mentions to people, not distinct characters.</p></fn>
<fn id="n14"><p>It may seem surprising at first to include age as appearance, but it is something that we assess visually.</p></fn>
<fn id="n15"><p>In <xref ref-type="fig" rid="F11">Figure 11</xref> and <xref ref-type="fig" rid="F12">Figure 12</xref>, words such as <italic>beautiful_1</italic> and <italic>pretty_2</italic> relate to different Portuguese words that could be translated into the same English word, such as <italic>bonita</italic> e <italic>formosa</italic>, which could be both translated as &#8216;pretty&#8217;.</p></fn>
<fn id="n16"><p>Namely, ordered by decreasing number of words in the corpus: J&#250;lia Lopes de Almeida, Virg&#237;nia de Castro e Almeida, Ana Pl&#225;cido, Teresa Margarida da Silva e Orta, Maria Am&#225;lia Vaz de Carvalho, Maria O&#8217;Neill, Maria Firmina dos Reis, Florbela Espanca, M.M.S.A. e Vasconcelos, Cl&#225;udia Campos, Maur&#237;cia C. de Figueiredo, Maria Lu&#237;sa Marques da Silva, Matilde Isabel de Santana e Vasconcelos Moniz Bettencourt, Ana de Castro Os&#243;rio, Alice Moderno, Maria Peregrina de Sousa, Paulina Filad&#233;lfia, Clarice Lispector, and S&#244;nia Coutinho.</p></fn>
<fn id="n17"><p>Note that the groups are not mutually exclusive: There are a few books classified as both romantic and realist, which correspond to the transition between the two schools.</p></fn>
</fn-group>
<ref-list>
<ref id="B1"><mixed-citation publication-type="webpage"><string-name><surname>Argamon</surname>, <given-names>Shlomo</given-names></string-name>, <string-name><given-names>Charles</given-names> <surname>Cooney</surname></string-name>, <string-name><given-names>Russell</given-names> <surname>Horton</surname></string-name>, <string-name><given-names>Mark</given-names> <surname>Olsen</surname></string-name>, <string-name><given-names>Sterling Stuart</given-names> <surname>Stein</surname></string-name>, and <string-name><given-names>Robert</given-names> <surname>Voyer</surname></string-name> (<year>2009</year>). <chapter-title>&#8220;Gender, Race, and Nationality in Black Drama, 1950-2006: Mining Differences in Language Use in Authors and their Characters&#8221;</chapter-title>. In: <source>Digital Humanities Quarterly</source> <volume>3</volume> (<issue>2</issue>). <uri>http://www.digitalhumanities.org/dhq/vol/3/2/000043/000043.html</uri> (visited on 01/17/2023).</mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="book"><string-name><surname>Bick</surname>, <given-names>Eckhard</given-names></string-name> (<year>2014</year>). <chapter-title>&#8220;PALAVRAS, a Constraint Grammar-based Parsing System for Portuguese&#8221;</chapter-title>. In: <source>Working with Portuguese Corpora</source>. Ed. by <string-name><given-names>Tony Berber</given-names> <surname>Sardinha</surname></string-name> and <string-name><given-names>Thelma</given-names> <surname>de Lurdes S&#227;o Bento Ferreira</surname></string-name>. <publisher-name>Bloomsbury Academic</publisher-name>, <fpage>279</fpage>&#8211;<lpage>302</lpage>.</mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="journal"><string-name><surname>Cao</surname>, <given-names>Yang Trista</given-names></string-name> and <string-name><suffix>III</suffix> <given-names>Daum&#233;</given-names> <surname>Hal</surname></string-name> (<year>2021</year>). <article-title>&#8220;Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle*&#8221;</article-title>. In: <source>Computational Linguistics</source> <volume>47</volume> (<issue>3</issue>), <fpage>615</fpage>&#8211;<lpage>661</lpage>. <pub-id pub-id-type="doi">10.1162/coli_a_00413</pub-id>.</mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="journal"><string-name><surname>Cerm&#225;kov&#225;</surname>, <given-names>Anna</given-names></string-name> and <string-name><given-names>Michaela</given-names> <surname>Mahlberg</surname></string-name> (<year>2021</year>). <article-title>&#8220;The Representation of Mothers and the Gendered Social Structure of Nineteenth-Century Children&#8217;s Literature&#8221;</article-title>. In: <source>English Text Construction</source> <volume>14</volume> (<issue>2</issue>), <fpage>119</fpage>&#8211;<lpage>149</lpage>. <pub-id pub-id-type="doi">10.1075/etc.00044.cer</pub-id>.</mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="journal"><string-name><surname>Cerm&#225;kov&#225;</surname>, <given-names>Anna</given-names></string-name> and <string-name><given-names>Michaela</given-names> <surname>Mahlberg</surname></string-name> (<year>2022</year>). <article-title>&#8220;Gendered Body Language in Children&#8217;s Literature Over Time&#8221;</article-title>. In: <source>Language and Literature</source> <volume>31</volume> (<issue>1</issue>), <fpage>11</fpage>&#8211;<lpage>40</lpage>. <pub-id pub-id-type="doi">10.1177/09639470211072154</pub-id>.</mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="journal"><string-name><surname>Freitas</surname>, <given-names>Cl&#225;udia</given-names></string-name>, <string-name><given-names>Fl&#225;via</given-names> <surname>Martins</surname></string-name>, and <string-name><given-names>Liana</given-names> <surname>Biar</surname></string-name> (<year>2022</year>). <article-title>&#8220;Um &#8216;olhar discursivo&#8217; sobre Predica&#231;&#227;o e G&#234;nero: Aproxima&#231;&#245;es Metodol&#243;gicas entre Corpus e Discurso&#8221;</article-title>. In: <source>Texto Livre</source>. <pub-id pub-id-type="doi">10.35699/1983-3652.2022.36213</pub-id>.</mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="journal"><string-name><surname>Freitas</surname>, <given-names>Cl&#225;udia</given-names></string-name> and <string-name><given-names>Elvis</given-names> <surname>Souza</surname></string-name> (<year>2021</year>). <article-title>&#8220;Sujeito oculto &#224;s claras: uma abordagem descritivo-computacional / Omitted subjects revealed: a quantitative-descriptive approach&#8221;</article-title>. In: <source>Revista de Estudos da Linguagem</source> <volume>29</volume> (<issue>2</issue>), <fpage>1033</fpage>&#8211;<lpage>1058</lpage>. <pub-id pub-id-type="doi">10.17851/2237-2083.29.2.1033-1058</pub-id>.</mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="book"><string-name><surname>Hoyle</surname>, <given-names>Alexander Miserlis</given-names></string-name>, <string-name><given-names>Lawrence</given-names> <surname>Wolf-Sonkin</surname></string-name>, <string-name><given-names>Hanna</given-names> <surname>Wallach</surname></string-name>, <string-name><given-names>Isabelle</given-names> <surname>Augenstein</surname></string-name>, and <string-name><given-names>Ryan</given-names> <surname>Cotterell</surname></string-name> (<year>2019</year>). <chapter-title>&#8220;Unsupervised Discovery of Gendered Language through Latent-Variable Modeling&#8221;</chapter-title>. In: <source>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>. <publisher-name>Association for Computational Linguistics</publisher-name>, <fpage>1706</fpage>&#8211;<lpage>1716</lpage>. <pub-id pub-id-type="doi">10.18653/v1/P19-1167</pub-id>.</mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="webpage"><string-name><surname>Katsma</surname>, <given-names>Holst</given-names></string-name> (<year>2018</year>). <source>Loudness in the Novel</source>. <uri>https://litlab.stanford.edu/LiteraryLabPamphlet7.pdf</uri> (visited on 01/17/2023).</mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="book"><string-name><surname>Larson</surname>, <given-names>Brian</given-names></string-name> (<year>2017</year>). <chapter-title>&#8220;Gender as a Variable in Natural-Language Processing: Ethical Considerations&#8221;</chapter-title>. In: <source>Proceedings of the First ACL Workshop on Ethics in Natural Language Processing</source>. <publisher-name>Association for Computational Linguistics</publisher-name>, <fpage>1</fpage>&#8211;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.18653/v1/W17-1601</pub-id>.</mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="book"><string-name><surname>Lucy</surname>, <given-names>Li</given-names></string-name> and <string-name><given-names>David</given-names> <surname>Bamman</surname></string-name> (<year>2021</year>). <chapter-title>&#8220;Gender and Representation Bias in GPT-3 Generated Stories&#8221;</chapter-title>. In: <source>Proceedings of the Third Workshop on Narrative Understanding</source>. <publisher-name>Association for Computational Linguistics</publisher-name>, <fpage>48</fpage>&#8211;<lpage>55</lpage>. <pub-id pub-id-type="doi">10.18653/v1/2021.nuse-1.5</pub-id>.</mixed-citation></ref>
<ref id="B12"><mixed-citation publication-type="book"><string-name><surname>Mandell</surname>, <given-names>Laura</given-names></string-name> (<year>2019</year>). <chapter-title>&#8220;Gender and Cultural Analytics: Finding of Making Stereotypes?&#8221;</chapter-title> In: <source>Debates in the Digital Humanities</source>. Ed. by <string-name><given-names>Matthew K.</given-names> <surname>Gold</surname></string-name> and <string-name><given-names>Lauren F.</given-names> <surname>Klein</surname></string-name>. <publisher-name>Manifold Scholarship</publisher-name>, <fpage>3</fpage>&#8211;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.5749/j.ctvg251hk.4</pub-id>.</mixed-citation></ref>
<ref id="B13"><mixed-citation publication-type="journal"><string-name><surname>Moretti</surname>, <given-names>Franco</given-names></string-name> (<year>2000</year>). <article-title>&#8220;The Slaughterhouse of Literature&#8221;</article-title>. In: <source>Modern Language Quarterly</source> <volume>61</volume> (<issue>1</issue>). <pub-id pub-id-type="doi">10.1215/00267929-61-1-207</pub-id>.</mixed-citation></ref>
<ref id="B14"><mixed-citation publication-type="book"><string-name><surname>Moretti</surname>, <given-names>Franco</given-names></string-name> (<year>2013</year>). <source>Distant Reading</source>. <publisher-name>Verso Books</publisher-name>.</mixed-citation></ref>
<ref id="B15"><mixed-citation publication-type="journal"><string-name><surname>Moretti</surname>, <given-names>Franco</given-names></string-name> and <string-name><given-names>Oleg</given-names> <surname>Sobchuk</surname></string-name> (<year>2019</year>). <article-title>&#8220;Hidden in Plain Sight: Data Visualization in the Humanities&#8221;</article-title>. In: <source>New Left Review</source>, <fpage>86</fpage>&#8211;<lpage>115</lpage>.</mixed-citation></ref>
<ref id="B16"><mixed-citation publication-type="journal"><string-name><surname>Rocha</surname>, <given-names>Lu&#237;sa</given-names></string-name>, <string-name><given-names>Cl&#225;udia</given-names> <surname>Freitas</surname></string-name>, and <string-name><given-names>Diana</given-names> <surname>Santos</surname></string-name> (<year>2019</year>). <article-title>&#8220;Prepara&#231;&#227;o para Leitura Distante em Portugu&#234;s: Di&#225;logos entre PLN e Humanidades Digitais&#8221;</article-title>. In: <source>Anais do TILic 2019</source>. <uri>10400.26/31834</uri>.</mixed-citation></ref>
<ref id="B17"><mixed-citation publication-type="webpage"><string-name><surname>Santos</surname>, <given-names>Diana</given-names></string-name> (<year>2014</year>). <chapter-title>&#8220;Corpora at Linguateca: Vision and Roads Taken&#8221;</chapter-title>. In: <source>Working with Portuguese Corpora</source>. Ed. by <string-name><given-names>Tony Berber</given-names> <surname>Sardinha</surname></string-name> and <string-name><given-names>Thelma</given-names> <surname>de Lurdes S&#227;o Bento Ferreira</surname></string-name>. <publisher-name>Bloomsbury Academic</publisher-name>, <fpage>219</fpage>&#8211;<lpage>236</lpage>. <uri>http://hdl.handle.net/10400.26/20539</uri> (visited on 01/17/2023).</mixed-citation></ref>
<ref id="B18"><mixed-citation publication-type="webpage"><string-name><surname>Santos</surname>, <given-names>Diana</given-names></string-name> and <string-name><given-names>Cl&#225;udia</given-names> <surname>Freitas</surname></string-name> (<year>2019</year>). <article-title>&#8220;Estudando Personagens na Literatura Lus&#243;fona&#8221;</article-title>. In: <source>Proceedings of the 12th Symposium in Information and Human Language Technology and Collocates Events (STIL)</source>, <fpage>48</fpage>&#8211;<lpage>52</lpage>. <uri>https://comissoes.sbc.org.br/ce-pln/stil2019/proceedings-stil-2019-Final-Publicacao.pdf</uri> (visited on 01/17/2023).</mixed-citation></ref>
<ref id="B19"><mixed-citation publication-type="journal"><string-name><surname>Santos</surname>, <given-names>Diana</given-names></string-name>, <string-name><given-names>Cl&#225;udia</given-names> <surname>Freitas</surname></string-name>, and <string-name><given-names>Eckhard</given-names> <surname>Bick</surname></string-name> (<year>2018</year>). <article-title>&#8220;OBras: a Fully Annotated and Partially Human-revised Corpus of Brazilian Literary Works in Public Domain&#8221;</article-title>. In: <source>CorLex</source>. <uri>10400.26/31830</uri>.</mixed-citation></ref>
<ref id="B20"><mixed-citation publication-type="webpage"><string-name><surname>Santos</surname>, <given-names>Diana</given-names></string-name>, <string-name><given-names>Cristina</given-names> <surname>Mota</surname></string-name>, <string-name><given-names>Emanoel</given-names> <surname>Pires</surname></string-name>, <string-name><given-names>Marcia Caetano</given-names> <surname>Langfeldt</surname></string-name>, <string-name><given-names>Rebeca Schumacher</given-names> <surname>Fu&#227;o</surname></string-name>, and <string-name><given-names>Roberto</given-names> <surname>Willrich</surname></string-name> (<year>2022a</year>). <source>Introduction to DIP: Goal, Setup, Resources and Results</source>. <uri>https://www.linguateca.pt/aval_conjunta/dip/apr_encontro/DIPpresentation.pdf</uri> (visited on 01/17/2023).</mixed-citation></ref>
<ref id="B21"><mixed-citation publication-type="journal"><string-name><surname>Santos</surname>, <given-names>Diana</given-names></string-name>, <string-name><given-names>Cristina</given-names> <surname>Mota</surname></string-name>, <string-name><given-names>Emanoel</given-names> <surname>Pires</surname></string-name>, <string-name><given-names>Marcia Caetano</given-names> <surname>Langfeldt</surname></string-name>, <string-name><given-names>Rebeca Schumacher</given-names> <surname>Fu&#227;o</surname></string-name>, and <string-name><given-names>Roberto</given-names> <surname>Willrich</surname></string-name> (<year>2023</year>). <article-title>&#8220;DIP - Desafio de Identifica&#231;&#227;o de Personagens: Objectivo, Organiza&#231;&#227;o, Recursos e Resultados&#8221;</article-title>. In: <source>Linguam&#225;tica</source> <volume>15</volume> (<issue>1</issue>), <fpage>3</fpage>&#8211;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.21814/lm.15.1.399</pub-id>.</mixed-citation></ref>
<ref id="B22"><mixed-citation publication-type="journal"><string-name><surname>Santos</surname>, <given-names>Diana</given-names></string-name>, <string-name><given-names>Emanoel</given-names> <surname>Pires</surname></string-name>, <string-name><given-names>Cl&#225;udia</given-names> <surname>Freitas</surname></string-name>, <string-name><given-names>Rebeca Schumacher</given-names> <surname>Fu&#227;o</surname></string-name>, and <string-name><given-names>Jo&#227;o Marques</given-names> <surname>Lopes</surname></string-name> (<year>2020</year>). <article-title>&#8220;Periodiza&#231;&#227;o Autom&#225;tica: Estudos Lingu&#237;stico-Estat&#237;sticos de Literatura Lus&#243;fona&#8221;</article-title>. In: <source>Linguam&#225;tica</source> <volume>12</volume> (<issue>1</issue>), <fpage>81</fpage>&#8211;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.21814/lm.12.1.314</pub-id>.</mixed-citation></ref>
<ref id="B23"><mixed-citation publication-type="journal"><string-name><surname>Santos</surname>, <given-names>Diana</given-names></string-name>, <string-name><given-names>Roberto</given-names> <surname>Willrich</surname></string-name>, <string-name><given-names>Marcia</given-names> <surname>Langfeldt</surname></string-name>, <string-name><given-names>Ricardo Gaiotto</given-names> <surname>de Moraes</surname></string-name>, <string-name><given-names>Cristina</given-names> <surname>Mota</surname></string-name>, <string-name><given-names>Emanoel</given-names> <surname>Pires</surname></string-name>, <string-name><given-names>Rebeca</given-names> <surname>Schumacher</surname></string-name>, and <string-name><given-names>Paulo Silva</given-names> <surname>Pereira</surname></string-name> (<year>2022b</year>). <article-title>&#8220;Identifying Literary Characters in Portuguese: Challenges of an International Shared Task&#8221;</article-title>. In: <source>Proceedings of the 15th International Conference of Computational Processing of the Portuguese Language (PROPOR)</source>, <fpage>413</fpage>&#8211;<lpage>419</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-030-98305-5_39</pub-id>.</mixed-citation></ref>
<ref id="B24"><mixed-citation publication-type="journal"><string-name><surname>Sch&#246;ch</surname>, <given-names>Christof</given-names></string-name>, <string-name><given-names>Tomaz</given-names> <surname>Erjavec</surname></string-name>, <string-name><given-names>Roxana</given-names> <surname>Patras</surname></string-name>, and <string-name><given-names>Diana</given-names> <surname>Santos</surname></string-name> (<year>2021</year>). <article-title>&#8220;Creating the European Literary Text Collection (ELTeC): Challenges and Perspectives&#8221;</article-title>. In: <source>Modern Languages Open</source> <volume>2021</volume> (<issue>1</issue>), <fpage>1</fpage>&#8211;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.3828/mlo.v0i0.364</pub-id>.</mixed-citation></ref>
<ref id="B25"><mixed-citation publication-type="journal"><string-name><surname>Sch&#246;ch</surname>, <given-names>Christof</given-names></string-name>, <string-name><given-names>Evgeniia</given-names> <surname>Fileva</surname></string-name>, and <string-name><given-names>Julia</given-names> <surname>Dudar</surname></string-name> (<year>2022</year>). <article-title>&#8220;CLS INFRA D3.1 Baseline Methodological User Needs Analysis&#8221;</article-title>. In: <source>Zenodo</source>. <pub-id pub-id-type="doi">10.5281/zenodo.6389333</pub-id>.</mixed-citation></ref>
<ref id="B26"><mixed-citation publication-type="journal"><string-name><surname>Schulz</surname>, <given-names>Daniel</given-names></string-name> and <string-name><given-names>&#352;tep&#225;n</given-names> <surname>Bahn&#237;k</surname></string-name> (<year>2019</year>). <article-title>&#8220;Gender Associations in the Twentieth-Century English-Language Literature&#8221;</article-title>. In: <source>Journal of Research in Personality</source> <volume>81</volume>, <fpage>88</fpage>&#8211;<lpage>97</lpage>. <pub-id pub-id-type="doi">10.1016/j.jrp.2019.05.010</pub-id>.</mixed-citation></ref>
<ref id="B27"><mixed-citation publication-type="thesis"><string-name><surname>Silva</surname>, <given-names>Fl&#225;via Martins da Rosa Pereira da</given-names></string-name> (<year>2021</year>). <chapter-title>&#8220;Diferencia&#231;&#245;es de G&#234;nero na Caracteriza&#231;&#227;o de Personagens: Uma Proposta Metodol&#243;gica e Primeiros Resultados&#8221;</chapter-title>. MA thesis. <publisher-name>PUC-Rio</publisher-name>. <uri>https://www.maxwell.vrac.puc-rio.br/54130/54130.PDF</uri> (visited on 01/17/2023).</mixed-citation></ref>
<ref id="B28"><mixed-citation publication-type="webpage"><string-name><surname>Smeets</surname>, <given-names>Roel</given-names></string-name> (<year>2021</year>). <source>Character Constellations: Representations of Social Groups in Present-Day Dutch Literary Fiction</source>. <publisher-name>Leuven University Press</publisher-name>. <uri>http://www.jstor.org/stable/j.ctv21wj5cb</uri> (visited on 12/17/2022).</mixed-citation></ref>
<ref id="B29"><mixed-citation publication-type="book"><string-name><surname>Underwood</surname>, <given-names>Ted</given-names></string-name> (<year>2019</year>). <source>Distant Horizons: Digital Evidence and Literary Change</source>. <publisher-name>University of Chicago Press</publisher-name>. <pub-id pub-id-type="doi">10.7208/9780226612973</pub-id>.</mixed-citation></ref>
<ref id="B30"><mixed-citation publication-type="journal"><string-name><surname>Underwood</surname>, <given-names>Ted</given-names></string-name>, <string-name><given-names>David</given-names> <surname>Bamman</surname></string-name>, and <string-name><given-names>Sabrina</given-names> <surname>Lee</surname></string-name> (<year>2018</year>). <article-title>&#8220;The Transformation of Gender in English-Language Fiction&#8221;</article-title>. In: <source>Journal of Cultural Analytics</source> <volume>3</volume> (<issue>2</issue>). <pub-id pub-id-type="doi">10.22148/16.019</pub-id>.</mixed-citation></ref>
<ref id="B31"><mixed-citation publication-type="journal"><string-name><surname>Weingart</surname>, <given-names>Scott</given-names></string-name> and <string-name><given-names>Jeana</given-names> <surname>Jorgensen</surname></string-name> (<year>2013</year>). <article-title>&#8220;Computational Analysis of the Body in European Fairy Tales&#8221;</article-title>. In: <source>Literary and Linguistic Computing</source> <volume>28</volume> (<issue>3</issue>), <fpage>404</fpage>&#8211;<lpage>416</lpage>. <pub-id pub-id-type="doi">10.1093/llc/fqs015</pub-id>.</mixed-citation></ref>
</ref-list>
</back>
</article>