DMtools
One of our research projects is the development of an efficient
and flexible data mining toolbox (called DMtools) based
on the scripting language Python, which is capable of handling common
tasks in data mining.
Two core features of the toolbox are caching of database queries
and parallelism within a collection of independent queries.
The toolbox provides a number of routines for basic data mining
tasks on top of which the user can add more functions - mainly
domain and data collection dependent - for complex and time
consuming data mining tasks.
The DMtools have been presented at two data mining
conferences:
- DMtools - Open Source Software for Database
Mining
Peter Christen, Ole M. Nielsen and Markus Hegland
Workshop on Database Support
for KDD (at the
PKDD'2001 Conference), Freiburg,
Germany, September 2001. (Workshop proceedings
online)
Paper
(ps.gz, 81 KB)
Slides
(ps.gz, 284 KB)
Slides
(pdf.gz, 435 KB)
- A Toolbox Approach to Flexible and Efficient
Data Mining
Ole M. Nielsen, Peter Christen, Markus Hegland,
Tatiana Semenova and Timothy Hancock
PAKDD-2001 Conference, Hong Kong,
April 2001.
Published in the
Springer Lecture Notes in Computer
Science, Artificial Intelligence series,
LNAI2035.
Copyright for this publication is held by the
Springer Verlag.
Paper
(ps.gz, 157 KB)
Paper
(pdf.gz, 166 KB)
Currently, three modules of the DMtools are available
under the GNU
General Public License for download
(dmtools.tgz contains all files
as a compressed archive):
- caching.py
The caching module which provides supervised caching for
any Python function.
-
database.py
The database module contains a number of functions to access
a MySQL
database and perform various standard queries in a flexible
and efficient way. If supported by the database architecture,
queries are run in parallel.
-
aggregation.py
This module contains some function for efficient data
aggregation. More functions are under development and will
be added later.
- README
Gives more information on how to install and run the
DMtools.
- LICENSE
A copy of the GNU General Public License.
Note: Not all functionality has been tested under Windows!
Please contact us
if you have questions, problems or ideas. All feedback is much
appreciated.
Authors
Ole
Nielsen
(Ole.Nielsen@anu.edu.au)
Peter
Christen
(peter (dot) christen {at} anu [dot] edu {dot} au)
Return to TOP