UNIVERSITY OF PENNSYLVANIA - AFRICAN STUDIES CENTER
Archive-name: text-archives/09-Aug-90 Original-posting-by: dsims@uceng.UC.EDU (david l sims) Original-subject: Summary of Text Archive Sites (long) Reposted-by: firstname.lastname@example.org (Edward Vielmetti)
The following article is a summary of the responses I received to my query for text archive sites. Thanks go to those who replied to me. You were most helpful.
This summary contains two articles. The first one provides information on the Oxford Text Archives. The second article provides a list of text archive sites around the world (search for "Article" since Article 1 is quite long).
----- Article 1 of 2-----
WHAT IS THE OXFORD TEXT ARCHIVE?
The Oxford Text Archive is a facility provided by Oxford University Computing Service. It has no connexion with Oxford University Press or any other commercial organisation and exists to serve the interests of the academic community by providing archival and dissemination facilities for electronic texts at low cost.
The Archive offers scholars long term storage and maintenance of their electronic texts free of charge. It manages non-commercial distribution of electronic texts and information about them on behalf of its depositors.
WHAT TEXTS DOES IT CONTAIN?
The Archive contains electronic versions of literary works by many major authors in Greek, Latin, English and a dozen or more other languages. It contains collections and corpora of unpublished materials prepared by field workers in linguistics. It contains electronic versions of some standard reference works. It has copies of texts and corpora prepared by individual scholars and major research projects worldwide. The total size of the Archive exceeds a gigabyte and there are about a thousand titles in its catalogue.
WHERE CAN I GET A CATALOGUE?
The Catalogue is available in paper form by post from the address below. New editions are published at least twice a year. It is also available in electronic form, either as a formatted file for display at a terminal or in a tagged form using SGML. These files are available from a number of different places under various names. For EARN or INTERNET users, the most convenient source is probably LISTSERV@BROWNVM which makes the files available under the names OTALIST FORMAT and OTALIST SGML. If you are a JANET user you can consult the list interactively on HUMBUL, or request a copy from OXFORD.VAX (the filenames are OX$DOC:TEXTARCHIVE. LIST and OX$DOC:TEXTARCHIVE.SGML respectively). Wherever you are, you can send a note to ARCHIVE@VAX.OXFORD.AC.UK specifying which form you want.
WHAT ARE THE TEXTS LIKE?
Because the texts come from so many different sources, they are held in many different formats. The texts also vary greatly in their accuracy and the features which have been encoded. Some have been proof read to a high standard, while others may have come straight from an optical scanner, Some have been extensively tagged with special purpose analytic codes, and others simply designed to mimic the appearance of the printed source.
HOW USABLE ARE THE TEXTS?
Most of the texts can be used with commonly available text indexing and concordancing software, or can easily be converted for that purpose. All texts are held as `plain ASCII' files on magnetic tape, with no special formatting codes. Documentation of the coding scheme used in each text is supplied with it, wherever possible.
WHAT ABOUT COPYRIGHT?
Most of the texts in the Archive are subject to some form of copyright restriction. The Archive's obligations to its depositors generally restrict use of the texts to private study and research. In some cases, depositors have also authorised use of the texts in teaching. In all cases, users of the texts must agree not to use the texts commercially and not to redistribute copies of them without consultation.
HOW DO I ACCESS THE TEXTS?
If you are a registered user of Oxford University Computing Service (i.e. you have an account on OXFORD.VAX), just send an e-mail message to the username ARCHIVE specifying which texts you want to use and for what purpose. Copies of texts in categories U and A only are also available to scholars elsewhere, as described further below. Copies of texts are usually distributed on magnetic tape or cartridge, though smaller texts can be sent on diskette or (within the UK only) over JANET. There is a distribution charge to cover media and postage which *must* be paid in advance.
WHAT DO THE CODES IN THE CATALOGUE MEAN?
Each title in the list is preceded by a code made of of a single letter indicating the availability of the text (U, A or X), in some cases followed by a star, a number identifying the text and another single letter which gives some idea of the size of the text.
Availability codes: X Available only to registered OUCS users. May not be copied. U Freely available for scholarly use in private research. U* Freely available for scholarly use in private research and also for teaching purposes. A Available for scholarly use, but only with written authorisation from the depositor.
Size codes: A Size less than 512 Kb B Size between 512 Kb and 1 Mb C Size between 1 and 2 Mb D Size between 2 and 5 Mb E Size greater than 5 Mb
Depending on format, a standard 600 foot magnetic tape will hold up to 50 texts of size category A. Most texts of size code A will fit on a standard double density floppy diskette; any text of size code A or B will fit on a standard high density diskette.
WHAT DO I DO TO ORDER A COPY OF A TEXT?
For texts with availability code U, the only authorisation needed is your signature on the Order Form. For A category texts, you must also provide written authorisation from the depositor of the text; you should therefore ask us for depositor details before ordering. All orders must be prepaid to the account of Oxford University Computing Service, in sterling or in US dollars. We cannot issue invoices, and any orders which are not prepaid or not submitted on the standard order form will be ignored.
====================================================================== Oxford Text Archive email ARCHIVE @ UK.Ac.Oxford.VAX OUCS, 13 Banbury Road voice +44 (865) 273 238 Oxford OX2 6NN, UK fax +44 (865) 273 275
OXFORD TEXT ARCHIVE
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+* * Hardcopy of this form must be returned, duly completed, to * * Oxford Text Archive * * 13 Banbury Road * * Oxford OX2 6NN * * UK * * NB The whole of this document must be returned IN HARDCOPY * * All relevant parts of the form must be completed * * Payment must accompany the order * * Forms returned electronically will be ignored *
SECTION ONE: User Declaration
In consideration of The Oxford Archive agreeing to supply me certain texts in machine-readable form together with supporting documentation as listed in Part Two below, hereinafter called 'the texts', I hereby undertake:-
(1) To use the texts for purposes of private scholarly research only and not for profit (this shall not preclude the publication in a scholarly context of analyses or interpretations derived from the texts). To use and make available to others for educational purposes only texts specifically designated as `available for teaching purposes'. (2) To acknowledge in any work, published or unpublished, based in whole or in part on analyses made of the texts both the original depositors and the Archive. (3) Not to copy in whole or in part the text, except insofaras this may be necessary for security purposes or for my own personal use. Not to distribute the text to third parties, nor to publish or reproduce it in anyway, except for teaching purposes, where so permitted. Copyright of all machine-readable texts issued by the Archive is reserved to the Depositors. (4) To give access to the text only to persons directly associated with me or working under my control and to require of such persons signed undertakings neither to use the text except in connexion with my academic purposes nor to give access to the text to others; these signed undertakings to be made available to the Archive on request. (5) Not to hold the Archive liable for any errors of transcription which may be found in the texts, but to notify the Archive of such errors wherever possible. (6) To pay such charges as the Archive may determine from time to time to cover the cost of supplying the texts.
NOTE: Only texts with an availability code of U or A may be ordered. Texts with Availability Code of A may be included in this list only if authorisation from the depositor accompanies this form. Depositor details are available on demand.
Texts may be supplied on Magnetic Tape, Diskette or Data Cartridge. Pricing is different for each format. Use tape if you are ordering more than a megabyte or so of data.
Texts required on TAPE: Tape density: 1600 or 6250
Tape format: ASCII or EBCDIC Labelled or Unlabelled Fixed or Variable length
Texts Required on Diskette: DD (360/720 Kb) or HD (1.2/1.4 Mb) MS/DOS or Macintosh 3.5" or 5.25
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+* * For information, contact: * * Oxford Text Archive * * 13 Banbury Road * * Oxford OX2 6NN * * UK * * NB The whole of this document must be returned IN HARDCOPY * * All relevant parts of the form must be completed * * Payment must accompany the order * * Forms returned electronically will be ignored *
-----Article 2 of 2-----
For the past year, the Georgetown Center for Text and Technology has been gathering information about archives and projects in electronic text throughout the world. Listed below -- in alphabetical order by country and city -- are the titles of over 270 projects, brief descriptions of their contents, and the names and addresses of contact persons.
Our list is certainly not complete, and we invite members of this bulletin board who know of other projects (or corrections to the current catalog) to bring them to our attention.
Further information about specific projects -- on such topics as time period, languages encoded, intended use, file formats, means of access, and sources --can be obtained by writing to the address below. The entire file, however, is under constant revision and has not been edited for distribution.
Michael Neuman, Director Georgetown Center for Text and Technology 238 Reiss Science Building Washington, DC 20057 (202) 687-6096 neuman@guvax email@example.com
======================================================================== Examples from the "List of Archives and Projects in Machine-Readable Text, April 2, 1990"
*NOTE*: For the full list, please contact M. Neuman at the addresses listed below.
(Academy of Finland)/ CNA = Neo-Assyrian Text Corpus Project Textbank/database of all texts of Neo-Assyrian Empire Robert M. Whiting, Managing Editor; Simo Parpola,Director CNA/Neo-Assyrian Text Corpus Project Dept. of Asian and African Studies University of Helsinki Fabianinkatu 24 A 226, SF-00100 Helsinki Finland tel. +358 0 191 3289 (Whiting); +358 0 191 2093 (Parpola) BITNET: whiting@finuh parpola@finuh Internet: firstname.lastname@example.org email@example.com
Birmingham (Univ)/ Egyptian Daily Press Textbase Textbank of 300,000 words Adnan al-Jubouri Birmingham University (Aston Triangle)
CA Davis (Univ CA)/ Project Rhetor Textfiles of approx. 5000 authors and 15,000-18,000 works in 12 languages James J. Murphy, Director Project Rhetor Rhetoric Department University of California at Davis Davis, CA 95616 tel. (916) 752-0813
IL Chicago (Univ)/ ARTFL = American and French Research on the Treasury of the French Language Textbase of 150 million words from 1700 works in 2000 texts (classic literature to non-fiction prose and technical writing) The ARTFL Project Prof. Robert Morrissey, Director Department of Romance Languages and Literatures University of Chicago 1050 E. 59th St. Chicago, IL 60637 tel. (312) 702-8488 BITNET: xrtmjm9@uchimvs1 or firstname.lastname@example.org ARPA/Internet: MJM9@SPHINX.UCHICAGO.EDU or Institut National de la Langue Francaise Centre National de la Recherche Scientifique 52 Boulevard de Magenta 75010 Paris France
IL Chicago (Univ/ Ethiopic Etymological Database Gene Gragg Internet: email@example.com
Lille (Univ)/ CREDO = Centre de Recherches sur la Documentation e l'information: Cultures et religions antiques Bibliographic database Gerard Losfeld, Director BITNET: losfeld@frcitl71
London (Univ)/ School of Oriental and African Studies
London (Islamic Computing Centre)/ al-Hadith and Al-Qur'an Textbank for Islamic studies A. Barkatulla Islamic Computing Centre 72 St. Thomas Road London N4 2QJ tel. 01-359-6233 (Insufficient address -- letter returned)
Lyon (Univ, INaLF-CNRS)/ URL 6 = Groupe d'E/tudes Lexicologiques et Lexicographiques des XVI et XVII Sie\cles Jacques Abelard Universite/ de Lyon 2 86 rue Pasteur 69365 Lyon CEDEX 7
Nijmegen (Univ)/ Nijmegen Arabic Corpus Textbank of 1 million words in Modern Standard Arabic for study of linguistics Everhard Ditters T.C.M.O. University of Nijmegen PO Box 9103 6500 HD Nijmegen The Netherlands tel. (NL)-080-512996 BITNET: u279300@hnykun11
PA Philadelphia (Univ PA)/ CATSS = Computer Assisted Tools for Septuagint Studies Textbank and database with text-critical, lexical, grammatical, translational, conceptual, and bibliographical tools Dr. Robert A. Kraft, Co-director CATSS Department of Religious Studies Box 36 College Hall University of Pennsylvania Philadelphia, PA 19104-6303 tel. (215) 898-5827 BITNET: kraft@penndrls
Riyadh (King Saud University)/Textbank of Qur'a-n and H+adi-th Computer Information Center or Dept. of Religious Studies King Saud University P.O. Box 2454 Riyadh, Saudi Arabia 11451
TX Dallas (Summer Institute of Linguistics) Textbank of minority language groups S.I.L. Inc. 7500 West Camp Wisdom Road Dallas, Texas 75236
Compiled and Maintained by:
The Georgetown Center for Text and Technology
238 Reiss Science Building
Washington, DC 20057
|Previous Menu||Home Page||What's New||Search||Country Specific|