Compiling CWB from source under Windows
Compilation and installation of CWB version 3.0
Status of the Cywgin port
The Cygwin port of the CWB is experimental. While the source code compiles without error messages and basic CQP queries work well, it is currently not possible to process large corpora (e.g. in the 100M word range).
The problems we have encountered may be due to limitations in the virtual memory management of Windows and the Cygwin emulation layer manage. Apparently, a user process is limited to a 2 GB address space in Windows, and Cygwin seems to impose further restrictions. Help from someone with more experience in Windows and Cygwin programming would be highly welcome, so that we can identify the true cause of the problems and try to find a workaround.
In the long term, we hope to offer a native Windows port of the CWB. Please join the CWBdev mailing list if you would be interested to work on this port.
Prerequisites
In order to compile and run the CWB tools under Windows you need the cygwin environment. A standard cygwin installation will suffice, plus the following packages (you will find them in the Devel section of the cygwin setup program):
bison
flex
gcc
libncurses-devel
make
perl
We recommend to install the simple text editor nano
(from the Editors section) for editing configuration files, but you can also use your favourite Windows text editor.
Getting the source code
Get the source code from here, and unpack it:
tar xf cwb-XXXXXX.tgz (current version 2.2.b99-RC1)
Enter the new directory:
cd cwb-XXXXXX (current version 2.2.b99-RC1)
Important note: In principle, you can unpack the CWB source code anywhere you like, but don't put it on a network drive (we've encountered some weird errors there) and make sure that the directory path does not contain blanks (which will happen e.g. if you put the source code on your Desktop). The best solution is probably to keep the source code somewhere in the Cygwin directory tree, e.g. your Cygwin home directory.
Edit the makefile
First you need to set a few parameters in config.mk
using your favourite text editor (but not Microsoft Word!). If you have installed the nano
package as recommended above, just type the following command:
nano -w config.mk
Otherwise, navigate to the cwb-XXXXXX
directory in Windows Explorer and open the file config.mk
with a text editor.
In the platform directive, insert cygwin
# # PLATFORM-SPECIFIC CONFIGURATION (OS and CPU type) # # Pre-defined platform configuration files: # unix standard Unix configuration [must set ENDIAN manually!] # linux i386-Linux (generic) # linux-64 - configuration for 64-bit CPUs # linux-opteron - with optimimzation for AMD Opteron processor # darwin Mac OS X / Darwin [use one of the CPU-specific entries below] # darwin-g4 - with optimization for PowerPC G4 processor # darwin-g5 - with optimization for PowerPC G5 processor # darwin-i386 - configuration for i386-compatible processors # darwin-64 - 64-bit build on Intel Core2 and newer processors # darwin-core2 - optimised build for Core 2 CPU (requires Xcode 3.1) # solaris SUN Solaris 8 for SPARC CPU # cygwin Win32 build using Cygwin emulation layer (experimental) # include $(TOP)/config/platform/cygwin
The site directive also has to be changed to cygwin
# # SITE-SPECIFIC CONFIGURATION (installation path and other local settings) # # Pre-defined site configuration files: # standard standard configuration (installation in /usr/local tree) # classic "classic" configuration (CWB v2.2, uses /corpora/c1/registry) # osx-fink Mac OS X installation in Fink's /sw tree # binary-release Build binary package for release (static if possible, local install in build/ tree) # osx-release ... for Mac OS X # linux-release ... for i386 Linux # solaris-release ... for SUN Solaris 2.x # linux-rpm ... build binary RPM package on Linux (together with rpm-linux.spec) # cygwin Win32 / Cygwin configuration (experimental) # include $(TOP)/config/site/cygwin
Compilation
The easiest way to compile the CWB is to type
make all
at the command line, and go to fetch a cup of coffee (due to the overhead of the Cygwin emulation layer, compilation is much slower than on Unix systems).
Since the Cygwin port is still experimental, it is probably a good idea to compile each component of the CWB separately. This will make it easier to recognise compilation errors and warnings. First, clean up any old files and check dependencies:
make clean make depend
Then, compile the editline
library used by CQP, which is included in the CWB source code distribution:
make editline
Now compile the corpus library:
make cl
Then the utilities:
make utils
And finally CQP:
make cqp
You may also want to check that the manpages are up to date:
make man
Installation
Now we're ready to install the whole toolkit:
make install
If you have set up Cygwin with a separate administrator account, you may need to type sudo make install
and enter the administrator password here.