Accessing corpora
Our corpora are available in soft copy on AFS and in hard copy from the corpus TA. To access the corpora via either of these means, you need to agree to the conditions of use, as detailed below.
Licenses, copyright, and user agreements
In order to use any of the corpora or the software on AFS, you need to register. By registering, you agree to inform yourself of and observe all access and use restrictions for the corpora/software, and to not copy them or large chunks of them to non-Stanford machines. These rules also apply to corpora that can be freely downloaded from the web. Be aware that by not following these rules you violate federal copyright law (in several countries, depending on the corpus).
We urge corpus users to be responsible about reporting the source of information obtained in corpus searches. At a minimum, the corpus creator should be identified and bibliographic information on the source document should be included with any citation.
How to register
General use agreement
In order to register, please send an email to the corpus TA containing the following information.
- An indication that you will inform yourself of copyright restrictions and user agreements for any corpus you will use on AFS. You can copy the following sentences into your email: "I will inform myself about any copyright restrictions that hold for the corpora on AFS. I recognize that it is my responsibility to do so. I will also follow all guidelines outlined by user agreements (if there are any) of any corpus I will use."
- Your SUNetID (not your student ID number, but your user name for the Stanford network)
- Your first and last name
- If you need to use any of the restricted corpora, include the appropriate agreement (see list below)
- Your departmental affiliation (and degree you're pursuing, if a student)
- Sponsor (only if you are not within the linguistics department): which professor/which class do you need corpus-access for? Please cc your advisor/sponsor if citing a sponsor.
As mentioned above, we understand that, by registering, you agree to the general user agreement that holds for all corpora at Stanford.
Corpora with special access restrictions
Some corpora require a special signed agreement from you. Before the corpus TA can give you access to those corpora, you need to hand in the signed agreement to the corpus TA. Corpora that need a special signed agreement have special protection on AFS: you must be added to a specific group to be able to use them. Some corpora are subject to other kind of limitations — for example a maximum number of simultaneously registers users. In order to get access to any kind of corpus that is subject to special regulations contact the corpus TA and tell them which corpus you are interested in.
The restricted corpora, the special groups, and a link to their user license template are listed below.
Corpus/Corpora collection | Group-membership necessary | Link to user agreement |
---|---|---|
Avocado Research Email Collection | corpora-avocado | user agreement |
Buckeye Corpus | corpora-buckeye | limited eligibility for access; contact corpus TA |
CELEX 2 | corpora-celex | user agreement |
LINK Project Switchboard Corpus | corpora-link | use must be reported; contact corpus TA |
PPCME2 | corpora-ppcme2 | limited to 5 simultaneous users; contact corpus TA |
Reuters Corpus | corpora-reuters | user agreement |
TDT Pilot Study Corpus | corpora-tdtpilot | user agreement |
TIPSTER Complete | corpora-tipster | user agreement |
YCOE | corpora-ycoe | user agreement |