Skip to content Skip to navigation

Accessing corpora

Our corpora are available in soft copy on AFS and in hard copy from the corpus TA.  To access the corpora via either of these means, you need to agree to the conditions of use, as detailed below.

Licenses, copyright, and user agreements

In order to use any of the corpora or the software on AFS, you need to register.  By registering, you agree to inform yourself of and observe all access and use restrictions for the corpora/software, and to not copy them or large chunks of them to non-Stanford machines.  These rules also apply to corpora that can be freely downloaded from the web.  Be aware that by not following these rules you violate federal copyright law (in several countries, depending on the corpus).

We urge corpus users to be responsible about reporting the source of information obtained in corpus searches.  At a minimum, the corpus creator should be identified and bibliographic information on the source document should be included with any citation. 

How to register

General use agreement

In order to register, please send an email to the corpus TA containing the following information.

  • An indication that you will inform yourself of copyright restrictions and user agreements for any corpus you will use on AFS.  You can copy the following sentences into your email: "I will inform myself about any copyright restrictions that hold for the corpora on AFS.  I recognize that it is my responsibility to do so.  I will also follow all guidelines outlined by user agreements (if there are any) of any corpus I will use."
  • Your SUNetID (not your student ID number, but your user name for the Stanford network)
  • Your first and last name
  • If you need to use any of the restricted corpora, include the appropriate agreement (see list below)
  • Your departmental affiliation (and degree you're pursuing, if a student)
  • Sponsor (only if you are not within the linguistics department):  which professor/which class do you need corpus-access for?  Please cc your advisor/sponsor if citing a sponsor.

As mentioned above, we understand that, by registering, you agree to the general user agreement that holds for all corpora at Stanford.

Email the Corpus TA

Corpora with special access restrictions

Some corpora require a special signed agreement from you.  Before the corpus TA can give you access to those corpora, you need to hand in the signed agreement to the corpus TA.  Corpora that need a special signed agreement have special protection on AFS: you must be added to a specific group to be able to use them.  Some corpora are subject to other kind of limitations — for example a maximum number of simultaneously registers users.  In order to get access to any kind of corpus that is subject to special regulations contact the corpus TA and tell them which corpus you are interested in.

The restricted corpora, the special groups, and a link to their user license template are listed below.

Corpus/Corpora collection Group-membership necessary Link to user agreement
Avocado Research Email Collection corpora-avocado user agreement
Buckeye Corpus corpora-buckeye limited eligibility for access; contact corpus TA
CELEX 2 corpora-celex user agreement
LINK Project Switchboard Corpus corpora-link use must be reported; contact corpus TA
PPCME2 corpora-ppcme2 limited to 5 simultaneous users; contact corpus TA
Reuters Corpus corpora-reuters user agreement
Stanford Speed-date Corpus corpora-speeddate contact Dan Jurafsky for approval and forward to corpus TA
TDT Pilot Study Corpus corpora-tdtpilot user agreement
TIPSTER Complete corpora-tipster user agreement
YCOE corpora-ycoe user agreement