0% found this document useful (0 votes)

61 views4 pages

Lancaster University - Week 1 Lecture: Part 3

This document provides an overview of different types of corpora that can be used for linguistic analysis. It discusses specialized corpora focused on a single genre or time period. It also mentions general corpora like the British National Corpus that aim to represent a language broadly. Other corpus types discussed include multilingual corpora for comparing languages, parallel corpora containing original texts and translations, learner corpora of second language use, historical corpora showing language change over time, and monitor corpora that continuously track rapid word evolution. The document aims to illustrate the variety of corpus approaches available for studying language.

Uploaded by

Jogi Nero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views4 pages

Lancaster University - Week 1 Lecture: Part 3

Uploaded by

Jogi Nero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lancaster University | Week 1 Lecture: Part 3

So we've just looked at markup, and I think, for most purposes, you can really forget it exists. So all it is,
is the magic behind the scenes that allows you to do various forms of intelligent searches as a linguist,

or a person interested in language, looking at corpus data. But when I talked about something being
representative before, I sort of started to imply that corpora come, if you like, in different shapes and

sizes, or at least in different flavours. There are different things you can do with different types of

corpora.

By the way, 'corpora' is the plural of corpus. It's a very unfortunate plural, but that's the plural we're

stuck with. So different types of corpora, we have to say. How might we begin to sort of rough out a sort
of topology of different types of corpus? Well, for example, we could talk about genre.

What is the sort of principle genre? What is the range of genres represented in that corpus? So I might
have quite a specialised corpus where I had a very focused genre. Let's just say, for example,

newspaper material. Very focused. Very specialised. I could look at the language of newspapers, and
the time it was gathered, the place it was gathered, using that type of corpus.

And I mentioned time already, time also gives a sense of some type of specialisation and limitation of
the focus of the data. So, for example, if I'm interested in looking at news reported in the 21 first
century, looking at a newspaper from 1651 is clearly problematic in terms of time. But if I'm interested in

newspapers in the 1650s in England, something from 1651, a collection of newspapers that are very
specialised (British English -1651 -newspapers), that, for me, is very helpful. And also, as again, I've
just implied, the location or, if you like, the variety of language sometimes that it's produced in also

creates another force of specialisation.

So there's a whole range of ways in which the corpus might be more tightly focused, but very often, we
want to appeal to some type of general corpus, as I talked about before. And here, we can look at large
corporas, such as the British National Corpus, 100 million words of spoken and written English. Note, by
the way, that spoken and written, that mode of communication, can also generate different types of

corpora. Spoken language is somewhat different to written language in a range of ways, as you'll
discover.

But the British National Corpus is composed of a large variety of genres of writing, nearly 88 million
1
words of it. Informative writing, in broadly eight types, in world affairs, leisure, the arts, commerce and
finance, belief and thought, social science, applied science, natural and pure science, and also,

imaginative writing, though the one genre that is represented there is fiction. And you can see on the
slide, the relative proportions of data in those categories. So that's an attempt to produce a broad
collection of British English in a range of types of writing.

On the speech side, just over 10 million words of speech, and, of course, producing that was a major
undertaking. Lots of people carried around a tape recorder, recording everyday conversations, and

then some very brave and noble people typed that up so that we could search it by using a keyword
and searching for words, etcetera of the data. It's broadly split in two, the so-called spoken
demographic, which is informal conversation. Some pulled from across the UK and across a range of
social classes, a range of ages, and also, male and female.

And then spoken context governed, what you might call more task-centred speech recorded in specific
locations because we know that spoken language can vary by location. I might use a more formal way
of speaking if I'm talking to my bank manager then when I'm talking to my nephew, something like that.
So trying to think about, if you like, the shape of spoken language and represent that, as best we could

in that spoken collection there. Later on in the course, you'll actually be getting the opportunity to use
and search in the British National Corpus, and that should prove very interesting indeed.

So that's an example of a large corpus, which tries to represent the language in general. But there are
still further types of corpora. There are multilingual corpora. Well, to some extent, you could say, well,

there's corpus of English, there's a corpus of Spanish, I can contrast the two languages by looking at

that. I can collect those corpora also to ensure that the contrast is productive. You don't want to collect
the corpus of English looking at one genre and then looking at a completely different genre in Spanish.

Some of the things that you observe might be an artefact or a byproduct, if you like, of the genre in

question. So very often people try to balance those corpora. They're going to look across languages so
that they're broadly comparable in terms of the design decisions that have been made in creating them.

But in a sort of weaker sense of multilingual, perhaps you can look at varieties of one language.

So you can contrast American English, British English, Indian English, if you like. Large collections of

corpora have been produced exactly to do that. The so-called ICE family of corpora, the International

Corpus of English, initiative run out of University College London, where they've tried to build corpora
2
with roughly the same design for a whole range of varieties of English to allow people to look at the

differences between those varieties of English.

There are also sometimes things called parallel corpora. Again, let's think about English and Spanish, a

corpus I built was called the CRATER corpus, and there we had English original texts and their

translations into Spanish. That rather distinctive, of course, from comparing native speaker English to
native speaker Spanish. You're looking at something which is being translated, so as well as being able

to look, if you like, through a distorted mirror at the language, you could also focus on those distortions,
if you'd like, and look at what the process of translation does when you translate from English into

Spanish. So if you also have that native speaker Spanish corpus for purposes of comparison, you could

do some very interesting work using this type of corpus data, looking at what the process of translation
does when you convert from English into Spanish and then compare it to native speaker Spanish.

There's also the learner corpus. And time, again, I think throughout this course in some of the
conversations that you'll hear in the conversation videos and also in one of the lectures, we'll be looking

at learner corpora. Language data produced by people who are speaking in a second language. So

let's say, for example, if you've heard me speak French, and it isn't pretty, so I hope you don't have to,
but if you hear me trying to speak French, you'll hear learner language. I was taught French at school.

I'm a native speaker of English, and my production in this so-called L2, second language, could be
gathered together into a corpus so that you could systematically analyse, for example, the many errors

I'm likely to make when I speak French, I'm afraid. Very interesting, as we'll see.

Also, of course, we have historical or diachronic corpora. Corpora that allow us to look at the language
developing or changing or sometimes, remaining the same, over a long period of time. A good example

of that is the Helsinki corpus, 1 and 1/2 million words of texts focused on English between 1700 and 700
AD. You can go back through time looking at changes in the English language using a corpus like that.

Very, very helpful.

Why very, very helpful? Well, put it this way, if you didn't agree with the corpus approach and you really
thought it was best just to work on the basis of intuitions, or maybe observing a few individuals and

taking notes, well, I wouldn't necessarily agree with you that that's always the best way of working, but I
would challenge you to use that way of working in looking at say early modern English. There are no

speakers of early modern English left. There's nobody you can observe who speaks early modern

3
English. You have very few, if any, intuitions about early modern English. And what you then really do

need to do is look back at the record of it and study that.

Another type of corpus is the monitor corpus. Very, very useful for looking at very rapid change in
language. Typically, new words coming into use and old words dying out quite rapidly. A good example

of that is the 'Bank of English' developed at Birmingham University in the UK. And it's constantly, if you
like, being added to. Think about the formation of sedimentary rocks, this endless layering of mud being
compressed to form this rock within which are strata.

And I think the monitor corpus is very easily viewed in that way. As the language gets pressed down

into this corpus, you're able to go back through the different layers of it, back through time, but on a
very fine grained basis. This type of corpus is updated usually almost daily, at least very frequently. So
you can see some new words come into use. You can see their birth. Very, very helpful for linguists.

Now, there are many other types of corpora, but that gives you a good flavour of the types of corpus
data that you'll be hearing about on this course.

Corpus Linguistics Practical Introduction PDF
No ratings yet
Corpus Linguistics Practical Introduction PDF
32 pages
Group Members:: Ayesha Azhar Bareera Akbar Irum Masood Maryam Ahmed Tahira Jabeen
No ratings yet
Group Members:: Ayesha Azhar Bareera Akbar Irum Masood Maryam Ahmed Tahira Jabeen
58 pages
Summary LC
No ratings yet
Summary LC
9 pages
Linguistics Researchers' Guide
100% (1)
Linguistics Researchers' Guide
13 pages
1 Corpus Linguistics
No ratings yet
1 Corpus Linguistics
38 pages
8-CORPUS Analysis - Module 2-12-01-2024
No ratings yet
8-CORPUS Analysis - Module 2-12-01-2024
41 pages
Corpus Lingustics
No ratings yet
Corpus Lingustics
24 pages
Linguistic Corpora Overview
No ratings yet
Linguistic Corpora Overview
41 pages
Corpus Linguistics 1
No ratings yet
Corpus Linguistics 1
48 pages
00 General Handout
No ratings yet
00 General Handout
24 pages
Topics
No ratings yet
Topics
85 pages
The International Encyclopedia of Language and Social Interaction - 2015 - Vaughan
No ratings yet
The International Encyclopedia of Language and Social Interaction - 2015 - Vaughan
17 pages
Corpus Linguistics: An Introduction
No ratings yet
Corpus Linguistics: An Introduction
43 pages
Corpus Linguistics and Corpus Analysis
No ratings yet
Corpus Linguistics and Corpus Analysis
7 pages
Corpus
No ratings yet
Corpus
123 pages
ELT/ESP Corpora and Legal English
No ratings yet
ELT/ESP Corpora and Legal English
3 pages
Hello and Welcome Back To This Series of
No ratings yet
Hello and Welcome Back To This Series of
32 pages
Corpus Linguistics: History and Analysis
No ratings yet
Corpus Linguistics: History and Analysis
66 pages
Cospus Approaches in Discourse Analysis
No ratings yet
Cospus Approaches in Discourse Analysis
14 pages
ملخص لنغويستيك (1) - 241223 - 222806
No ratings yet
ملخص لنغويستيك (1) - 241223 - 222806
4 pages
Corpus Linguistics Capsule by BB
No ratings yet
Corpus Linguistics Capsule by BB
47 pages
Corpus Bases Language Studies
No ratings yet
Corpus Bases Language Studies
312 pages
Corpus Based Language Studies PDF
20% (5)
Corpus Based Language Studies PDF
6 pages
McEnery Corpusit 2001
No ratings yet
McEnery Corpusit 2001
47 pages
Corpus Linguistics Overview
No ratings yet
Corpus Linguistics Overview
42 pages
Unit 7 Text Book AL Bad and Good English
No ratings yet
Unit 7 Text Book AL Bad and Good English
17 pages
Dicción 1
No ratings yet
Dicción 1
52 pages
Corpus Linguistics
100% (5)
Corpus Linguistics
12 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
17 pages
Concordancing and ELT: Porntip Bodeepongse
No ratings yet
Concordancing and ELT: Porntip Bodeepongse
19 pages
Douglas Biber and Randi Reppen (Eds.) - The Cambridge Handbook of English
100% (1)
Douglas Biber and Randi Reppen (Eds.) - The Cambridge Handbook of English
5 pages
Literature Review On Corpus Linguistics
No ratings yet
Literature Review On Corpus Linguistics
7 pages
Roberta - Facchinetti Corpus - Linguistics (25.years - On)
100% (1)
Roberta - Facchinetti Corpus - Linguistics (25.years - On)
392 pages
An Introduction To Corpus Linguistics
100% (1)
An Introduction To Corpus Linguistics
328 pages
Apllied-The English Languages. Chap4. Macarthur 1998
No ratings yet
Apllied-The English Languages. Chap4. Macarthur 1998
24 pages
Séquence 4 NEW PPDDFF
No ratings yet
Séquence 4 NEW PPDDFF
6 pages
Corpus 2
No ratings yet
Corpus 2
49 pages
Linguistics Summary
No ratings yet
Linguistics Summary
3 pages
Copia Di CORPUS LINGUISTICS
No ratings yet
Copia Di CORPUS LINGUISTICS
51 pages
Seminar 1
No ratings yet
Seminar 1
7 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
104 pages
Topic 9 Corpus Stylistics
No ratings yet
Topic 9 Corpus Stylistics
8 pages
Film Discourse: Corpus Analysis and Synchronic Perspective
No ratings yet
Film Discourse: Corpus Analysis and Synchronic Perspective
5 pages
Designing A Corpus
No ratings yet
Designing A Corpus
29 pages
Introduction To Corpus Linguistics PDF
No ratings yet
Introduction To Corpus Linguistics PDF
12 pages
Unit 7 Extended Well-Known and Influential Corpora
No ratings yet
Unit 7 Extended Well-Known and Influential Corpora
56 pages
The History of English Volume 1 Historical Outlines From Sound To Text (Etc.) (Z-Library)
No ratings yet
The History of English Volume 1 Historical Outlines From Sound To Text (Etc.) (Z-Library)
266 pages
Corpus Typology
No ratings yet
Corpus Typology
23 pages
PBaker Contemporary Corpus Linguistics 2012
No ratings yet
PBaker Contemporary Corpus Linguistics 2012
369 pages
Types of Corpora and Some Famous (English) Examples: Balanced, Representative
100% (1)
Types of Corpora and Some Famous (English) Examples: Balanced, Representative
2 pages
Corpus Linguistics Presentation
100% (1)
Corpus Linguistics Presentation
25 pages
12 Corpora Linguistics
No ratings yet
12 Corpora Linguistics
27 pages
Session 1
No ratings yet
Session 1
46 pages
Tools Corpora and CAT S NMT Lelandem
No ratings yet
Tools Corpora and CAT S NMT Lelandem
35 pages
Cheng 2012 PP 3-8 Intro
No ratings yet
Cheng 2012 PP 3-8 Intro
6 pages
Emoji Chat Worksheet
No ratings yet
Emoji Chat Worksheet
4 pages
Word Formation-1
No ratings yet
Word Formation-1
39 pages
NI2-Grammar-worksheet-6 Verb + Indirect and Direct Object
No ratings yet
NI2-Grammar-worksheet-6 Verb + Indirect and Direct Object
1 page
Vocabulary Standard Unit2 Without Answers
No ratings yet
Vocabulary Standard Unit2 Without Answers
1 page
Holiday Periods in Poland - Trade - Gov.pl
No ratings yet
Holiday Periods in Poland - Trade - Gov.pl
1 page
Vocabulary Standard Unit1 Without Answers
No ratings yet
Vocabulary Standard Unit1 Without Answers
1 page
Through Over Away From Across Out of Along Into Under Round Towards Up Down 4 PRONUNCIATION - U - Aʊ - Ɒ - Əʊ
No ratings yet
Through Over Away From Across Out of Along Into Under Round Towards Up Down 4 PRONUNCIATION - U - Aʊ - Ɒ - Əʊ
14 pages
Standard Grammar Unit7 With Answers
No ratings yet
Standard Grammar Unit7 With Answers
1 page
Standard Grammar Unit2 Without Answers
No ratings yet
Standard Grammar Unit2 Without Answers
1 page
Standard Grammar Unit1 With Answers
No ratings yet
Standard Grammar Unit1 With Answers
1 page
English-Fun With Grammar Communicative Activities-064-066
No ratings yet
English-Fun With Grammar Communicative Activities-064-066
3 pages
Standard Grammar Unit2 Without Answersmod
No ratings yet
Standard Grammar Unit2 Without Answersmod
1 page
Live Beat 2 TB - Unit 6
No ratings yet
Live Beat 2 TB - Unit 6
10 pages
Lang Focus On Verb Patterns
No ratings yet
Lang Focus On Verb Patterns
7 pages
Transport English. Answer Key1
100% (1)
Transport English. Answer Key1
48 pages
Zoom In: Model Answer
No ratings yet
Zoom In: Model Answer
6 pages
Autoevaluación N°02: (Inglés Profesional 2)
No ratings yet
Autoevaluación N°02: (Inglés Profesional 2)
3 pages
SOW - Grade 6 - Language and Literature - FT - 23-24
No ratings yet
SOW - Grade 6 - Language and Literature - FT - 23-24
12 pages
Grammar Guide: Modifiers Explained
No ratings yet
Grammar Guide: Modifiers Explained
5 pages
Saluan "Monika" Marriage Rituals
No ratings yet
Saluan "Monika" Marriage Rituals
19 pages
ASL1301S Course Outline
No ratings yet
ASL1301S Course Outline
6 pages
ENGLISH5 Q2 5 Using Complement - Noun
No ratings yet
ENGLISH5 Q2 5 Using Complement - Noun
27 pages
Madhusudan Das
No ratings yet
Madhusudan Das
2 pages
The Bible Throughout The World
No ratings yet
The Bible Throughout The World
227 pages
Individual Reading Profile School Template
No ratings yet
Individual Reading Profile School Template
4 pages
Duolingo Tips and Notes @latin
No ratings yet
Duolingo Tips and Notes @latin
59 pages
Adverb Sheetwith Answer
No ratings yet
Adverb Sheetwith Answer
43 pages
M 33 - Domestic Animal
No ratings yet
M 33 - Domestic Animal
9 pages
1 Reading and Writing Level 2
No ratings yet
1 Reading and Writing Level 2
3 pages
MODUL KELAS XI DUVIS EDIT Duvis Adila PDF
No ratings yet
MODUL KELAS XI DUVIS EDIT Duvis Adila PDF
50 pages
澳門隨想曲總譜
No ratings yet
澳門隨想曲總譜
18 pages
Bespoke Lab - Instructions & Guidelines (Malay)
No ratings yet
Bespoke Lab - Instructions & Guidelines (Malay)
2 pages
Islamic University in Uganda Dec 2023
No ratings yet
Islamic University in Uganda Dec 2023
35 pages
G20.1.405 Level 4 Ex.05
No ratings yet
G20.1.405 Level 4 Ex.05
3 pages
L2 and 3
No ratings yet
L2 and 3
35 pages
Figurative Language Guide & Quiz
No ratings yet
Figurative Language Guide & Quiz
75 pages
Aslam CV
No ratings yet
Aslam CV
1 page
MD Shahinur Islam
No ratings yet
MD Shahinur Islam
2 pages
Intertek webinar - 各國標示整理
No ratings yet
Intertek webinar - 各國標示整理
24 pages
Working From Home A2 Group
No ratings yet
Working From Home A2 Group
4 pages
Bicolano and Ilocano Literature
No ratings yet
Bicolano and Ilocano Literature
28 pages
Japanese Exonyms Explained
No ratings yet
Japanese Exonyms Explained
45 pages
Super Goal 3.1 Semester1 24 25
No ratings yet
Super Goal 3.1 Semester1 24 25
18 pages
SM2 L6 LanguageTestB U1
No ratings yet
SM2 L6 LanguageTestB U1
2 pages
Awal Ɣef Yiɣersiwen - Mehenna Sehrane (HCA - Idlisen-Nneɣ 2010)
100% (4)
Awal Ɣef Yiɣersiwen - Mehenna Sehrane (HCA - Idlisen-Nneɣ 2010)
54 pages
Ingles Angelica Santamaria
No ratings yet
Ingles Angelica Santamaria
12 pages

Lancaster University - Week 1 Lecture: Part 3

Uploaded by

Lancaster University - Week 1 Lecture: Part 3

Uploaded by

Lancaster University | Week 1 Lecture: Part 3

creates another force of specialisation.

differences between those varieties of English.

Very, very helpful.

need to do is look back at the record of it and study that.

You might also like