DMtools ======= A toolbox for efficient and flexible data mining. Authors: Contact and more information: - Ole M. Nielsen - URL: http://csl.anu.edu.au/ml/dm/ - Peter Christen - E-Mail: Ole.Nielsen@anu.edu.au - Markus Hegland Peter.Christen@anu.edu.au - Timothy Hancock - Tatiana Semenova Introduction ------------ The DMtools are developed by the Data Mining group at the Australian National University (ANU), Canberra. Efficient and flexible data exploration and analysis are often crucial for the successful outcome of a data mining project. The DMtools help the data miner with efficient handling of common tasks in data mining. Using the scripting language Python (www.python.org) the DMtools are based on two core features: (1) Caching of time consuming functions including, but not limited to, database queries, and (2) parallelism within a collection of independent queries. The toolbox provides a number of routines for basic data mining tasks on top of which the user can add more functions - mainly domain and data collection dependent - for complex data mining tasks. For more details see our our web page at http://csl.anu.edu.au/ml/dm/ and a scientific publication describing the toolbox A Toolbox Approach to Flexible and Efficient Data Mining, to be presented at the 9th International Python Conference (Long Beach, March 2001) and the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD, Hong Kong, April 2001), available online at http://csl.anu.edu.au/ml/dm/dm_publications.html Files ----- This distribution of the DMtools contains the following files: - caching.py The caching module - database.py The database access module - aggregation.py A module providing several aggregation functions - README This readme file - LICENSE A copy of the GNU General Public License Installation ------------ - To be able to use the DMtools, you need Python installed on your system. See the Python homepage (http://www.python.org) for downloading and installing. - You also need a relational database for which a Python database module is available. For more information about database access within Python have a look at the Python database SIG: http://www.python.org/topics/database/ We are using MySQL, available from: http://www.mysql.com - We use the MySQLdb.py module by Andy Dustman, see: http://dustman.net/andy/python/MySQLdb/ The DMtools are tested with version 0.2.2 of MySQLdb. - The caching.py module works optimally if the zlib module (http://www.cdrom.com/pub/infozip/zlib/) is installed. - Both the caching.py and database.py modules have a dictionary of options at the beginning of the code. Please change the entries to your appropriate values, using the function set_option() in either module. - The caching module contains a built-in self-test function which can be started with: >>> import caching >>> caching.test() License ------- This software is distributed under the GNU General Public License. For more details see the LICENSE file.