README.TXT for COREX version 4 April 26 2002 This is release 4 of COREX, the explotation software provided for CGN release 5. --------------------------------------------------------------------------- --------------------------------------------------------------------------- 1. New in this release is: --------------------------------------------------------------------------- - Metadata search, you are able to select subcorpora by specifying metadata constraints. Much more metadata is now available compared with the previous release. Still not all available metadata fields have been provided. This will be remedied in the future. - Viewing/Searching of phonetic tiers. - Itterative search. You can use the results of a previous content or metadata search as the start set for a new query. - Printing of the display of the COREX viewer. - Saving content search results in a file for further processing. - Support for Mac OS X. For those of you curious for the meta data vocabulary used we refer to the ISLE Metadata Initiative at http://www.mpi.nl/ISLE. For the MPI COREX team: Daan Broeder, Max-Planck Institute for Psycholinguistics http://www.mpi.nl/COREX --------------------------------------------------------------------------- --------------------------------------------------------------------------- 2. Annotation files in XML format --------------------------------------------------------------------------- This CD also includes all the available CGN annotation files in XML format. These files are no longer available on the other CGN CD's. They are stored in compressed form and you can uncompress the using the gzip programme. A windows version of gzip is included. The annotation files can be found in the directories Corpora/cgn/annot/*/pri Corpora/cgn/annot/*/tag Corpora/cgn/annot/*/skp Corpora/cgn/annot/*/bpt Corpora/cgn/annot/*/lnx It is also possible to use the COREX software to extract and uncompress individual annotation files and copy them to another directory. See the manual. --------------------------------------------------------------------------- --------------------------------------------------------------------------- 3. CD Demo function --------------------------------------------------------------------------- Under Windows/* his CD can be used to give a demonstration of COREX without installing the software and data to harddisk. You will find a file CorexDemo.bat that will start up COREX. In the this mode you can only access the only audio file provided on this CD: "fn000001.wav". --------------------------------------------------------------------------- --------------------------------------------------------------------------- 4. INSTALLATION: --------------------------------------------------------------------------- This is version 4 of the COREX exploitation software for the CGN corpus release 4. This software was tested on Win2000, Win/NT, WIN98, Sun Solaris (Sparc), Linux and MacOSX. For windows platforms the installation script copies a complete JAVA environment to the local computer in order to be independent from and avoid version mismatches with local JAVA installations. You need to have about 600 MB free disk space for the corex programme environment and all the CGN annotation files that are also installed by this install script. The annotation files need to be installed on your computer to leave the CD drive free for loading the CDs with audio files. Although the total number of textfragments is now near 10000 we managed to put the whole COREX release on one CD. This by compressing all metadata description files and most annotation files. If however you need access to the transcription files themselves as is possible from COREX, you can decompress them. The COREX software will work with both compressed and uncompressed files. The special ".sea" and ".syn" files should remain uncompressed!!! You do not need to remove any older COREX installations, if you have enough disk space available. But this release should be installed (and does so by default) in its own directory. The install scripts copy lots of data so be patient! If the install process breaks of before finishing for whatever reason, you should remove the install directory before trying again!!! To install for Windows platforms: - Run the file install-win.bat. This will create a directory C:\COREX4 by default and copy all necessary files to this directory. If you want to install the software in another directory, for instance in d:\MYDIR, type: install-win.bat D:\MYDIR The install script will also create a bat file "corex.bat". This bat file starts up the corex programme. - Create a desktop icon that points to the file C:\COREX\corex.bat if you like. (Note) Installing on Windows98 might produce an error at the end of the procedure that can be ignored. The programme has still been installed correctly. To install for Unix platforms (only tested on Linux & Solaris): - Run the install-ux.pl perl script with as argument the desired installation directory. For instance to install in /usr/local perl install-ux.pl /usr/local/ The install script will create a command file "corex" that when executed will start up the COREX programme. It is expected that both java and perl executables are in your path!!! Some jars will be copied to the installation directory. However they might not be compatible with jour version of java, in that case you will have to experiment a bit with other versions! Linux configuration: SuSE-Linux 7.1 Java 1.3.0_2 Perl 5.6.0 xerces 1.4.2 jmf 2.1 Sparc Solaris configration: SunOS 5.8 Java 1.3.1 Perl 5.005 xerces 1.4.2 jmf 2.1 COREX assumes that the cdrom mount point for Solaris is "/cdrom/cdrom0" and "/cdrom" for Linux. These are the defaults. If you have another mount point you can modify the "corex" startup file by adding the parameter -DBrowser.cdRoot="my-cd-mount-point" to the java command string. To install for MacOS X - Run the install-macosx.pl perl script with as argument the desired installation directory. For instance to install in /users/myname perl install-macosx.pl /users/myname/ The install script will create a command file "corex" that when executed will start up the COREX programme. It is expected that both java and perl executables are in your path!!! Some jars will be copied to the installation directory. Some jars will be copied to the installation directory. However they might not be compatible with jour version of java, in that case you will have to experiment a bit with other versions! We tested this release with java version 1.3.1 --------------------------------------------------------------------------- --------------------------------------------------------------------------- 5. REQUIRED SOFTWARE: --------------------------------------------------------------------------- For the windows platforms all software is included and will be installed except other programmes as Praat and Portray that can serve as extensions for COREX. For communication between COREX and Praat also the sendpraat programme is needed. The windows version of sendpraat is included. For Portray you need to put the appropriate tcl/tk scripts in the installation directory For the unix platforms. You need to supply Java, JMF, Perl. For a list with information what versions are required see our website http://www.mpi.nl/COREX --------------------------------------------------------------------------- --------------------------------------------------------------------------- 6. REQUIRED HARDWARE: --------------------------------------------------------------------------- A pentium II 400 with 128 Mb is the very least you should have. You may try less powerful platforms but be prepared to be patient. --------------------------------------------------------------------------- --------------------------------------------------------------------------- 7. AVAILABLE EXTENSIONS --------------------------------------------------------------------------- Using the COREX programme together with the "Praat" programme imposes more speed and memory requirements. Be sure to keep the list of data objects within Praat to a minimum. Using the COREX programme together with the "Portray" (see the documentation on the CGN annotation CD's) allows to view the CGN syntax annotation files. You need t provide the portray tcl/tk scripts yourself and put them in your installation directory. --------------------------------------------------------------------------- --------------------------------------------------------------------------- 8. NETWORK ACCESS. --------------------------------------------------------------------------- The COREX programme does not need network access. If however your machine has network access and you remove some of the configuration files. The COREX programme will automatically try to load these files from the network. --------------------------------------------------------------------------- --------------------------------------------------------------------------- 10. KNOWN BUGS: --------------------------------------------------------------------------- 1. Under linux sometimes some of the panel come up with minimal size. You can simple resize them by hand. USER MANUAL: There is a (short) user manual: corexman.doc, updates will be placed on http://www.mpi.nl/COREX FURTHER INFORMATION: For the latest new on "known bugs", "installation problems", "sofware updates" see the website: http://www.mpi.nl/COREX BUGREPORTS: Reports on bugs not mentioned here or at http://www.mpi.nl/COREX can be sent to: cgn@let.kun.nl but be sure to write "COREX" in the subject field --------------------------------------------------------------------------- --------------------------------------------------------------------------- RIGHTS & RESTRICTIONS & GUARANTEES: --------------------------------------------------------------------------- This software version is for use with CGN annotation & audio data. All other use is unsupported. All commercial use is prohibited. This version is delivered as such and no claims are made concerning the functionality of the software except that we will try to remove bugs and provide a functioning version. Where the term functioning will be defined in consultation with the proper CGN management. (C) Max Plank Institute for Psycholinguistics Wundtlaan 1 6525 XD Nijmegen