KEMBAR78
Week 4 Note | PDF | Linguistics | Word
0% found this document useful (0 votes)
21 views5 pages

Week 4 Note

Chapter 4 of 'Corpora in Applied Linguistics' by Susan Hunston discusses foundational statistical methods in corpus linguistics, focusing on frequency-based analysis techniques. It covers various analytical methods such as frequency and normalization, keyword analysis, collocation measures, and lexical bundles, emphasizing their roles in exploring linguistic patterns across different texts. The chapter also addresses the applications and challenges of these methods in revealing thematic and stylistic insights while cautioning against potential biases in interpretation.

Uploaded by

Thutra Dinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Week 4 Note

Chapter 4 of 'Corpora in Applied Linguistics' by Susan Hunston discusses foundational statistical methods in corpus linguistics, focusing on frequency-based analysis techniques. It covers various analytical methods such as frequency and normalization, keyword analysis, collocation measures, and lexical bundles, emphasizing their roles in exploring linguistic patterns across different texts. The chapter also addresses the applications and challenges of these methods in revealing thematic and stylistic insights while cautioning against potential biases in interpretation.

Uploaded by

Thutra Dinh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Summary of Chapter 4 Foundational

Quantitative Concepts in Corpus Linguistics.


Corpora in Applied Linguistics Susan Hunston
1. Introduction
● Focus: Introduces statistical methods foundational to corpus linguistics and emphasizes
frequency-based analysis techniques.
● Purpose: Explains how these methods provide insights into the general linguistic
features of corpora.
● Structure:
1. Techniques for analyzing words and phrases:
▪ Frequency and normalization (Section 4.3).

▪ Keyword analysis (Section 4.4).


▪ Collocation measures (Section 4.5).
▪Lexical bundles (Section 4.6).
2. Techniques for analyzing categories:
▪ Multidimensional analysis (MDA, Section 4.7).

▪ Semantic annotation (Section 4.8).


● Key Point: These methods help researchers explore and compare linguistic patterns
across various text types and genres, complementing more focused studies like
concordance line analysis.

2. Frequency and Normalization


2.1 Frequency
● Definition: Measures how often a word or lemma occurs in a corpus.
o Example: The word "disappearance" occurs 632 times in the British National
Corpus (BNC).
● Issue: Raw frequency is uninformative without context. Researchers need comparative
frameworks to interpret these numbers.
2.2 Normalization
● Purpose: Accounts for corpus size to make word frequencies comparable.

● Formula:

o Example:
▪ Word frequency: 350.
▪ Corpus size: 15,000 tokens.
▪ Basis: 1,000 tokens.
▪ Normalized frequency =

2.3 Comparisons
1. Within-Corpus Comparison:
o Examines relative word frequencies in a single corpus.
o Example:
▪ "Appearance" occurs 5,310 times in the BNC compared to
"disappearance" (632 times). The frequency gap is explained by semantic
diversity: "appearance" has multiple meanings, while "disappearance" is
more specific.

2. Between-Corpus Comparison:
o Compares frequencies of words across different corpora.
o Example:
▪ In the Global Environmental Change (GEC) corpus:

▪ "Disappearance" occurs 34 times out of 80 total instances of


"appearance" and "disappearance," forming 42.5% of the total.
▪ In the BNC, "disappearance" forms only 10.6% of the total,
reflecting different topical emphases.

3. Keywords
3.1 Definition
● Keywords are statistically prominent words in one corpus compared to another.

● Purpose: Identify themes, stylistic elements, or topical focus.


3.2 Examples
1. Shakespeare Analysis (Scott & Tribble, 2006):
o Keywords in Romeo and Juliet: "love," "death," "banished," "night," "poison."
o Insights:
▪ Thematic terms (e.g., "love") reflect plot content and dialogues.

▪ Stylistic terms (e.g., "night") concentrate in specific scenes or speeches.


2. Travel Writing Analysis (Gerbig, 2010):
o 19th-century keywords: "desert," "reindeer," "tent" (explorative and exotic
narratives).
o 21st-century keywords: "visa," "taxi," "guesthouse" (practical and relatable
themes).
3. Political Manifestos (Rayson, 2008):
o Labour Party keywords: "reform," "new."
o Liberal Democrats: "freedom," "entitled."
o Findings: Labour emphasizes societal change, while Liberal Democrats focus on
individual rights.
3.3 Keyword Studies Considerations
● Reference Corpus Selection: Researchers can compare:
1. Specialized corpora against general corpora (e.g., Romeo and Juliet vs. all
Shakespeare plays).
2. Sub-corpora within the same dataset (e.g., character-specific speech in Romeo
and Juliet).
3.4 Limitations
● Focus on differences may exaggerate contrasts.

● Stereotypes: Rayson et al. (1997) highlighted gender-associated keywords, which


inadvertently reinforced stereotypes about male and female speech patterns.

4. Measuring Collocation
4.1 Definition
● Collocation: Statistically significant co-occurrence of words within a specified span
(e.g., ±4 words).
● Purpose: Explores contextual patterns and thematic significance.
4.2 Metrics
1. Log-Likelihood:
o Measures how significant a co-occurrence is.
o Example: "Species" + "of" is highly significant due to recurring phrases like
"species of bird."
2. Mutual Information (MI):
o Highlights strong, rare pairings.
o Example: "Mutability of species" has high MI because "mutability" co-occurs
almost exclusively with "species."
3. T-Score:
o Balances strength with evidence, prioritizing frequent combinations.
o Example: "Species" + "new" reflects common academic usage in evolutionary
contexts.
4.3 Examples
● In The Rough Guide to Evolution:
o "Species" collocates with "of" (176 times), "new" (41 times), and "mutability" (6
times).
o Findings illustrate phraseologies:
▪ "Species of [noun]" (e.g., "species of bird").

▪ "[Adjective] species" (e.g., "new species").


5. Lexical Bundles
5.1 Definition
● Multi-word units recurring frequently in a corpus (e.g., "on the other hand").

● Automatically identified based on thresholds for frequency and dispersion.


5.2 Examples
● Academic Writing (Global Environmental Change Corpus):
o Subject-specific bundles: "Impacts of climate change" (536 occurrences).
o General-purpose bundles: "As well as the" (381 occurrences).
5.3 Functions (Biber, 2006b):
1. Stance Expressions: Convey attitudes or likelihood (e.g., "It is important to").
2. Discourse Organizers: Structure arguments (e.g., "On the other hand").
3. Referential Expressions: Specify attributes or relationships (e.g., "At the end of the
year").
5.4 Applications
● Lexical bundles reflect disciplinary norms:
o Sciences: Frequent bundles related to methodology (e.g., "in the case of").
o Humanities: Focus on framing arguments (e.g., "on the basis of").

6. Multidimensional Analysis (MDA)


6.1 Definition
● Statistical technique comparing linguistic feature distributions across sub-corpora.

● Introduced by Biber (1988).


6.2 Process
1. Tagging: Annotate corpora for linguistic features (e.g., pronouns, tenses).
2. Factor Analysis: Group co-occurring features into factors.
3. Interpretation: Assign dimensions based on feature patterns.
6.3 Dimensions (Biber, 1988):
1. Involved vs. Informational Production:
o Positive features: First-person pronouns, contractions.
o Negative features: Nouns, prepositions.
2. Narrative vs. Non-Narrative Concerns:
o Positive features: Past tense, public verbs.
o Negative features: Present tense, attributive adjectives.

7. Applications and Challenges


7.1 Applications
● Thematic analysis: Keywords and collocations identify topics and stylistic patterns (e.g.,
political discourse).
● Genre-specific insights: Lexical bundles reveal academic writing conventions.
7.2 Challenges
● Statistical measures like MI may highlight rare, unrepresentative collocations.
● Overemphasis on differences risks perpetuating stereotypes (e.g., gender-specific
keywords).

You might also like