What do we mean when we talk about web corpora and how are they built?: DiLCo Methods Day 2022 (7 October) : Digital language variation in context (DiLCo)
Regionales Rechenzentrum der Universität Hamburg/ MCC/ Lecture2Go
Verlag/Körperschaft:
Universität Hamburg
Erscheinungsjahr:
2022
Medientyp:
Audiovisuell
Schlagworte:
DiLCo
NLP
corpus linguistics
social media
data collection
Sprache, Literatur, Medien (SLM I + II)
Beschreibung:
Using texts from the web to observe language seems simple, but methodological issues are inevitable. So the data collection phase can sometimes become a project in itself. After a brief history of web corpus linguistics, corpus building methods will be reviewed, from major data sources and their quirks to concrete steps focussing on the discovery and processing of web page contents, including the example of blogs and blog comments. DiLCo Methods Day 2022 - Natural language processing for digital language DiLCo organised a "Methods Day " on computational and quantitative analysis of born-digital language. The workshop targets linguists and also other students and researchers from the humanities and beyond who want to broaden their methodological skills. Three lectures will introduce current innovative techniques of meaning representation, social media data collection and analysis.