September 23rd, 2010

Stanford has been a subscribing member of the Linguistic Data Consortium for several years, so we have most of the corpora released by the LDC.

Where are the corpora?

The corpus inventory lists the location of each corpus that we own. The corpora are all stored either:

  • on Stanford’s AFS filesystem,
  • on the NLP file system
  • or available for download from LDC Online (this is true for smaller corpora: the corpus TA can download these for you).

How do I access the corpora?

See the instructions for access for full details of licensing agreements and the location of the corpora.

We also maintain a list of useful tools and utilities.

The corpus TA

The Corpus TA maintains the department’s collection of corpora, installs corpora and software, and provides support to members of the linguistics department, including students taking classes in the department. If you want to register for any of the corpora-related services (e.g. AFS access) please contact the Corpus TA after reading about getting access.

If you have questions on how to start your research (e.g. how to formulate a search within a specific search language), how to access the corpora of your interest, etc. we encourage you to contact the Corpus TA. Also try out, our blog about corpora and corpus methods.


Natalia Silveira
natalias at stanford

Previous corpus TAs

  • Sam Bowman (2012-2013)
  • Tyler Schnoebelen (2011-2012)
  • David Clausen (2010-2011)
  • Robert Munro (2009-2010)
  • Uriel Cohen-Priva (2008-2009)
  • Anubha Kothari (2007-2008)
  • Harry Tily (2006-2007)
  • Liz Coppock (2005-2006)
  • Neal Snider (2004-2005)
  • Florian Jaeger (2003-2004)
  • Comments(0)

Comments are closed.