Publications
2008:
- Data Mining and Analytics 2008
John Roddick, Jiuyong Li, Peter Christen and Paul Kennedy
(editors).
Proceedings of the
Seventh Australasian Data Mining
Conference
(AusDM 2008), Glenelg, Adelaide, November
2008.
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 87.
- Towards Scalable Real-Time Entity Resolution using a
Similarity-Aware Inverted Index Approach
Peter Christen and Ross Gayler
In proceedings of the Seventh Australasian Data Mining
Conference (AusDM 2008), Glenelg, Adelaide, November
2008.
To be published in
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 87.
Accepted
paper (10 pages, pdf, 218 KB)
- Probabilistic Data Generation
Agus Pudjijono
Master of Computing (Honours) thesis, ANU Department of
Computer Science, November 2008.
Thesis (pdf.gz, 2.4 MB)
- Visualization of Temporal Changes in Cluster Structures
using Self-Organizing Maps
Denny, Graham Williams, and Peter Christen
Accepted as regular paper for the
IEEE
International Conference on Data Mining (ICDM), Pisa,
Italy, December 2008.
Please contact
Denny if you are interested in this
paper.
- Automatic Record Linkage using Seeded Nearest Neighbour
and Support Vector Machine Classification
Peter Christen
Proceedings of the ACM SIGKDD 2008 conference, Las Vegas,
August 2008.
Paper available online.
- Febrl - An Open Source Data Cleaning, Deduplication and
Record Linkage System with a Graphical User Interface
Peter Christen
Proceedings of the
demo session at the
ACM SIGKDD
2008 conference, Las Vegas, August 2008.
Paper available online.
- Automatic Training Example Selection for Scalable
Unsupervised Record Linkage
Peter Christen
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD), Osaka, Japan, May 2008.
Paper available online.
Submitted
paper (12 pages, pdf, 146 KB)
Submitted
paper (12 pages, ps.gz, 142 KB)
- Exploratory Hot Spot Profile Analysis using an Interactive
Visual Drill-Down Self-Organizing Maps
Denny, Graham J. Williams and Peter Christen.
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD), Osaka, Japan, May 2008.
Paper available online.
- Febrl - A Freely Available Record Linkage System
with a Graphical User Interface
Peter Christen
Proceedings of the
Australasian Workshop on Health Data and
Knowledge Management (HDKM), Wollongong,
January 2008.
Paper
(pdf, 748 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 80.
2007:
- Data Mining and Analytics 2007
Peter Christen, Paul J. Kennedy, Jiuyong Li, Inna Kolyshkina
and Graham J. Williams (editors).
Proceedings of the
Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, Australia, December 2007.
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- A Two-Step Classification Approach to Unsupervised Record
Linkage
Peter Christen
In proceedings of the Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, December 2007.
Paper
(pdf, 440 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- Exploratory Multilevel Hot Spot Analysis: Australian
Taxation Office Case Study
Denny, Graham J. Williams, and Peter Christen
In proceedings of the Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, December 2007.
Paper
(pdf, 759 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- Evaluation of a Graduate Level Data Mining Course
with Industry Participants
Peter Christen
In proceedings of the Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, December 2007.
Paper
(pdf, 436 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- Towards parameter-free blocking for scalable record
linkage
Peter Christen
Technical Report TR-CS-07-03
ANU Joint Computer Science Technical Report
Series, August 2007.
Report
(pdf, 201 KB)
Report
(ps.gz, 199 KB)
- Quality and Complexity Measures for Data Linkage and
Deduplication
Peter Christen and Karl Goiser
Chapter in the book
Quality
Measures in Data Mining, vol. 43,
Studies in Computational Intelligence.
F. Guillet and H. Hamilton (eds), Springer, March 2007.
Available online at
SpringerLink.
2006:
- Privacy-Preserving Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen
In proceedings of the Workshop on Privacy Aspects of Data Mining (PADM)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Final 5-page version:
Paper
(pdf, 53 KB)
Paper
(ps.gz, 35 KB)
Submitted 11-page version:
Paper
(pdf, 118 KB)
Paper
(ps.gz, 74 KB)
- A Comparison of Personal Name Matching: Techniques and
Practical Issues
Peter Christen
In proceedings of the Workshop on Mining Complex Data (MCD)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Final 5-page version:
Paper
(pdf, 57 KB)
Paper
(ps.gz, 40 KB)
Submitted 12-page version available as:
Technical Report TR-CS-06-02
ANU Joint Computer Science Technical Report
Series, September 2006.
Report
(pdf, 248 KB)
Report
(ps.gz, 236 KB)
- Dynamic Algorithm Selection Using Reinforcement Learning
Warren Armstrong, Peter Christen, Eric McCreath and Alistair
Rendell
Proceedings of the
Workshop on Integrating AI and Data Mining,
Hobart, Australia, December 2006.
Paper
(pdf, 254 KB)
- Towards Automated Record Linkage
Karl Goiser and Peter Christen
In proceedings of the Fifth Australasian Data Mining Conference
(AusDM2006), Sydney, November 2006.
Paper
(pdf, 513 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 61.
- Secure Health Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen and Tim Churches
Proceedings of the
National e-Health Privacy and Security
Symposium (ehPASS), Brisbane, October 2006.
Paper
(pdf, 139 KB)
Paper
(ps.gz, 127 KB)
- Automated Geocoding of Routinely Collected Health Data
in New South Wales
Richard Summerhayes, Paul Holder, John Beard, Peter
Christen, Alan Willmore and Tim Churches
The NSW Public Health Bulletin,
volume 17, number 3-4, March-April 2006.
Online version available
here.
- A Probabilistic Geocoding System Utilising a Parcel Based
Address File
Peter Christen, Alan Willmore and Tim Churches
In Advances in Data Mining: Theory, Methodology,
Techniques, and Applications. Simeon Simoff and Graham
Williams (editors). State-of-the-Art Lecture Notes in
Artificial Intelligence, Volume 3755, Springer-Verlag,
2006.
Available online at
SpingerLink, LNCS 3755.
Copyright for this publication is held by the Springer
Verlag.
2005
- Automated Probabilistic Address Standardisation and
Verification
Peter Christen and Daniel Belacic
Proceedings of the
fourth Australasian
Data Mining Conference (AusDM 2005), Sydney, December 2005.
Paper
(pdf, 146 KB)
Paper
(ps.gz, 204 KB)
- Assessing Deduplication and Data Linkage Quality: What to
Measure?
Peter Christen and Karl Goiser
Proceedings of the
fourth Australasian Data Mining
Conference (AusDM 2005), Sydney, December 2005.
Paper
(pdf, 178 KB)
Paper
(ps.gz, 163 KB)
- Probabilistic Data Generation for Deduplication and
Data Linkage
Peter Christen
Proceedings of the
Sixth International Conference on Intelligent
Data Engineering and Automated Learning (IDEAL'05),
Brisbane, July 2005.
Available online at
SpingerLink, LNCS 3578.
Copyright for this publication is held by the Springer
Verlag.
Paper
(pdf, 124 KB)
Paper
(ps.gz, 135 KB)
- Febrl - Freely extensible biomedical record linkage
(Manual, release 0.3)
Peter Christen and Tim Churches
Available online from
SourceForge.Net, April 2005.
Manual
(pdf, 960 KB)
Manual
(pdf, 282 KB)
- A Probabilistic Deduplication, Record Linkage
and Geocoding System
Peter Christen and Tim Churches
Proceedings of the
ARC Health Data Mining workshop,
University of South Australia, April 2005.
Paper
(pdf, 136 KB)
Paper
(ps.gz, 134 KB)
2004
- A Probabilistic Geocoding System based on
a National Address File
Peter Christen, Tim Churches and Alan Willmore
Accepted for the Australasian Data Mining Conference,
Cairns, December 2004.
Paper
(pdf, 120 KB)
Paper
(ps.gz, 128 KB)
- Some Methods for Blindfolded Record Linkage
Tim Churches and Peter Christen
Published online at BioMed Central
Medical Informatics and Decision Making,
June 2004.
For abstract and downloadable PDF file see
here.
- A Parallel Open Source Data Linkage System
Peter Christen, Tim Churches and Markus Hegland
Proceedings of the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Springer Lecture Notes in Artificial Intelligence,
(3056). Available online at
Springer Online.
Copyright for this publication is held by the Springer
Verlag.
Paper
(pdf, 203 KB)
Paper
(ps.gz, 82 KB)
- Blind Data Linkage using n-gram Similarity
Comparisons
Tim Churches and Peter Christen
Proceedings of the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Springer Lecture Notes in Artificial Intelligence,
(3056). Available online at
Springer Online.
Copyright for this publication is held by the Springer
Verlag.
Paper
(long version, pdf, 177 KB)
Paper
(long version, ps.gz, 69 KB)
2003
- A Comparison of Fast Blocking Methods for Record
Linkage
Rohan Baxter, Peter Christen and Tim Churches
Proceedings of the Workshop on Data Cleaning, Record
Linkage and Object Consolidation at the
Ninth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Washington DC,
August 2003.
Paper 3 pages
(pdf, 87 KB)
Paper 6 pages
(pdf, 138 KB)
- Parallelisation of Sparse Grids for Large Scale
Data Analysis
Jochen Garcke, Markus Hegland and Ole Nielsen
in Computational Science - ICCS 2003, LNCS 2659,
International Conference Melbourne, Australia and St.
Petersburg, Russia, June 2003, Proceedings, Part III,
pages 683-692, P.M.A. Sloot, D. Abramson, A. Bogdanov,
J. Dongarra, A. Zomaya and Y. Gorbachev (Eds.), Springer.
- A Logical Formalisation of the Fellegi-Holt Method
of Data Cleaning
Agnes Boskovitz, Rajeev Gore and Markus Hegland
Technical Report TR-ARP-02-03, Automated Reasoning Project,
Australian National University, Canberra, May 2003.
Paper (pdf, 132 KB)
- Parallel Algorithms for Predictive Modelling
Markus Hegland
Draft, April 2003.
Paper
(pdf, 327 KB)
- Approximation of a Thin Plate Spline Smoother Using
Continuous Piecewise Polynomial Functions
Stephen Roberts, Markus Hegland and Irfan Altas
SIAM Journal on Numerical Analysis, vol. 41, no. 1,
pages 208-234, 2003.
Available
online.
- Adaptive sparse grids
Markus Hegland
Proceedings of 10th Computational Techniques and Applications
Conference CTAC-2001, ANZIAM Journal, vol. 44, C335--C353,
editors: K. Burrage and Roger B. Sidje, April 2003.
Available
online.
- Data Mining - Challenges, Models, Methods and Algorithms
Markus Hegland
Draft, February 2003.
Paper
(pdf, 884 KB)
2002
- Preparation of name and address data for record linkage
using hidden Markov models
Tim Churches, Peter Christen, Kim Lim and Justin X Zhu
Published online at BioMed Central
Medical Informatics and Decision Making,
December 2002.
For abstract and downloadable PDF file see
here.
- Probabilistic Name and Address Cleaning and
Standardisation
Peter Christen, Tim Churches and Justin Xi Zhu
Proceedings of the
Australasian Data Mining Workshop, Canberra,
December 2002.
Paper
(ps.gz, 74 KB)
Paper
(pdf, 158 KB)
- How Fast is '-fast'? Performance Analysis of KDD
Applications using Hardware Performance Counters on
UltraSPARC-III
Adam Czezowski and Peter Christen
Proceedings of the
Australasian Data Mining Workshop, Canberra,
December 2002.
Paper
(ps.gz, 82 KB)
Paper
(pdf, 174 KB)
- High-Performance Computing Techniques for
Record Linkage
Peter Christen, Justin Xi Zhu, Markus Hegland, Stephen
Roberts, Ole M. Nielsen, Tim Churches and Kim Lim
Proceedings of the Australian Health Outcomes
Conference (AHOC-2002), Canberra, July 2002.
Paper
(ps.gz, 95 KB)
Paper
(pdf, 233 KB)
- Additive Sparse Grid Fitting
Markus Hegland
Submitted to Fifth International Conference on
Curves and Surfaces organized by AFA-SMAI,
Saint-Malo, France, 27 June - 3 July, 2002.
Paper
(pdf, 136 KB)
- Graphical Models for High Dimensional Density Estimation
Gordon T. Deane
Honours Thesis in the Department of Mathematics at the Australian National University
14 June 2002.
Thesis
(pdf, 1.1 MB)
- Algorithms for Association Rules
Markus Hegland
Course Notes, 2002.
Paper
(pdf, 95 KB)
- Symmetry and Fractal-like Structures in the Statistics
of Sequence Comparison
Hilary S Booth, Shev F MacNamara, Ole M. Nielsen and
Susan R. Wilson
Submitted to Journal of Computational Biology and
Proceedings of RECOMB2003, Berlin, April 2003.
Paper
(ps.gz, 105 KB)
Paper
(pdf, 98 KB)
- Parallel Computing Techniques for
High-Performance Probabilistic Record Linkage
Peter Christen, Markus Hegland, Stephen Roberts,
Ole M. Nielsen, Tim Churches and Kim Lim
Proceedings of the Symposium on Health Data
Linkage, Sydney, March 2002.
Paper
(ps.gz, 91 KB)
Paper
(pdf.gz, 107 KB)
Paper
(pdf, 228 KB)
- Performance Analysis of KDD Applications using
Hardware Event Counters
Peter Christen and Adam Czezowski
Technical Report TR-CS-02-01, ANU Joint Computer
Science Technical Report Series, February 2002.
Report
(ps.gz, 131 KB)
Report
(pdf.gz, 173 KB)
2001
- Adaptive Sparse Grids
Markus Hegland
Submitted to the
CTAC-2001 Conference,
Brisbane, 16-18 July 2001.
Paper
(ps.gz, 142 KB)
- DMtools - Open Source Software for Database
Mining
Peter Christen, Ole M. Nielsen and Markus Hegland
Workshop on Database Support
for KDD (at the
PKDD'2001 Conference), Freiburg,
Germany, September 2001. (Workshop proceedings
online)
Paper
(ps.gz, 81 KB)
- Scalable Parallel Algorithms for Surface Fitting
and Data Mining
Peter Christen, Markus Hegland, Ole M. Nielsen,
Stephen Roberts, Peter E. Strazdins and Irfan Altas
Elsevier Journal of Parallel Computing,
special issue on Aspects of Parallel Computing for
Linear Systems and Associated Problems, Volume 27,
Number 7, September 2001.
- Parallel Data Mining on a Beowulf Cluster
Peter Christen, Ole M. Nielsen, Markus Hegland and
Peter E. Strazdins
Proceedings of the
HPC
Asia 2001 Conference, Gold Coast, Queensland,
Australia, September 2001.
Paper
(ps.gz, 264 KB)
Paper
(pdf.gz, 232 KB)
- Towards a Parallel Data Mining Toolbox
Peter Christen, Markus Hegland, Ole M. Nielsen,
Stephen Roberts, Peter E. Strazdins, Irfan Altas,
Tatiana Semenova and Timothy Hancock
Proceedings of the 15th International Parallel and
Distributed Processing Symposium
(IPDPS-2001), San Francisco,
April 2001. Workshop
Parallel and Distributed Data Mining.
Copyright 2001 Institute of Electrical and Electronic
Engineers (IEEE). Reprinted for the Proceedings of the
IPDPS-2001.
Paper
(ps.gz, 140 KB)
- A Scalable Parallel FEM Surface Fitting Algorithm
for Data Mining
Peter Christen, Markus Hegland, Stephen Roberts,
Ole M. Nielsen and Irfan Altas
International
Workshop on Mining Spatial and Temporal Data
(at the
PAKDD-2001
Conference), Hong Kong, April 2001.
Paper
(ps.gz, 229 KB)
- A Toolbox Approach to Flexible and Efficient
Data Mining
Ole M. Nielsen, Peter Christen, Markus Hegland,
Tatiana Semenova and Timothy Hancock
PAKDD-2001 Conference, Hong Kong,
April 2001.
Published in the
Springer Lecture Notes in Computer
Science, Artificial Intelligence series,
LNAI2035.
Copyright for this publication is held by the
Springer Verlag.
Paper
(ps.gz, 157 KB)
Paper
(pdf.gz, 166 KB)
- Data Mining with Python
Ole M. Nielsen, Peter Christen, Markus Hegland and
Tatiana Semenova
9th
International Python Conference, Long Beach,
California, March 2001.
Paper available upon request from:
Ole Nielsen.
- A Scalable Parallel FEM Surface Fitting
Algorithm for Data Mining
Peter Christen, Markus Hegland, Stephen Roberts
and Irfan Altas
Technical Report TR-CS-01-01, ANU Joint Computer
Science Technical Report Series, January 2001.
Report
(ps.gz, 254 KB)
Report
(pdf.gz, 301 KB)
- Discovery and Classification of Variable Stars
William Patrick Clarke
A thesis submitted for the degree of Master of
Science at the Australian National University,
January 2001.
Thesis
(ps.gz, 4.3 MB)
2000
- High Dimensional Smoothing Based on Multilevel
Analysis
Markus Hegland, Ole M. Nielsen and Zuowei Shen
Submitted,
November 2000.
Paper
(ps.gz, 576 KB)
- Data Mining of Administrative Claims Data of
Pathology Services
Simon Hawkins, Graham Williams, Rohan Baxter,
Peter Christen, Michael Fett, Markus Hegland,
Fuchun Huang, Ole M. Nielsen, Tatiana Semenova and
Andrew Smith
Proceedings of the Thirty-Fourth Hawaii International
Conference on System Sciences (HICSS-34),
January 2001.
Available upon request from:
Rohan
Baxter, CSIRO CMIS.
- Scalable Parallel Algorithms for Predictive
Modelling
Peter Christen, Markus Hegland, Ole M. Nielsen,
Stephen Roberts and Irfan Altas
Proceedings of the Data Mining 2000 Conference,
Cambridge, UK, August 2000.
Paper
(ps.gz, 605 KB)
- Parallel Performance of Fast Wavelet Transforms
Ole M. Nielsen and Markus Hegland
International Journal of High Speed Computing, Vol. 11,
No. 1 (2000) 55-74.
Paper
(ps.gz, 584 KB)
Paper
(pdf.gz, 115 KB)
- Developing a Spline Smoothed Density
Giles Hooker
A thesis submitted for the degree of Bachelor of
Science with Honours at the Australian National
University, 2000.
Thesis (ps.gz, 6.0 MB)
1999
- Additive Models in High Dimensions
Markus Hegland and Vladimir Pestov
Research Report 99-33, School of Mathematical and
Computing Sciences, Victoria University of Wellington,
December 1999.
Available online at
xxx.llnl.gov.
Paper
(pdf, 242 MB)
- Computational Challenges in Data Mining
Markus Hegland
Proceedings of the CTAC-99 Conference,
Canberra, September 1999.
Paper
(ps.gz, 583 KB)
- Finite Element Thin Plate Splines in Density
Estimation
Markus Hegland, Giles Hooker and Stephen Roberts
Proceedings of the CTAC-99 Conference,
Canberra, September 1999.
Paper
(ps.gz, 360 KB)
- Identification and Classification of interesting
variable stars in the MACHO database
Bill Clarke and Markus Hegland
Proceedings of the CTAC-99 Conference,
Canberra, September 1999.
- High Dimensional Wavelet Smoothing
Ole Møller Nielsen
Proceedings of the CTAC-99 Conference, Canberra,
September 1999.
Paper
(ps.gz, 1.5 MB)
Paper
(pdf.gz, 1.5 MB)
- Parallelization of a Finite Element Surface
Fitting Algorithm for Data Mining
Peter Christen, Irfan Altas, Markus Hegland,
Stephen Roberts, Kevin Burrage and Roger Sidje
Proceedings of the CTAC-99 Conference,
Canberra, September 1999.
Paper
(ps.gz, 552 KB)
- The Integrated Delivery of Large-Scale Data
Mining: The ACSYS Data Mining Project
Graham Williams, Irfan Altas, Sergey Barkin,
Peter Christen, Markus Hegland, Alonso Marquez,
Peter Milne, Rajehndra Nagappan and Stephen Roberts
KDD-99 Workshop on Large-Scale Parallel KDD Systems,
San Diego, August 1999,
Springer Lecture Notes in Artificial Intelligence 1759.
- A Parallel Finite Element Surface Fitting Algorithm
for Data Mining
Peter Christen, Irfan Altas, Markus Hegland,
Stephen Roberts, Kevin Burrage and Roger Sidje
Proceedings of the ParCo-99 Conference, Delft,
August 1999.
- A Parallel Solver for Generalised Additive
Models
Markus Hegland, Ian McIntosh and Berwin Turlach
Computational Statistics and Data Analysis, 31(4),
pages 377-396, 1999.
Paper
(ps.gz, 579 KB)
- Mining Taxation Data with the Parallel BMARS
Algorithm
Sergey Bakin, Markus Hegland and Graham Williams
Parallel Algorithms and Applications, Vol. 15, Gordon and
Breach Publishing, 2000.
1998
- Data-Mining Massive Time Series Astronomical
Data Sets - a Case Study
Michael K. Ng, Zhexue Huang and Markus Hegland
Second Pacific-Asia Conference on Knowledge Discovery
in Data Bases (PAKDD98) 1998.
Paper
(ps.gz, 226 KB)
Paper
(pdf.gz, 480 KB)
- Finite Element Thin Plate Splines for Data Mining
Applications
Markus Hegland, Steve Roberts and Irfan Altas
In 'Mathematical Methods for Curves & Surfaces II',
M. Daehlen, T. Lyche and L.L. Schumaker, 1998,
Vanderbilt University Press, Nashville, TN.
Prepublished as SMS report MRR 057-97.
Paper
(ps.gz, 148 KB)
Paper
(pdf.gz, 137 KB)
1997
- Cluster Analysis using Triangulation
Markus Hegland and C. Eldershaw
Computational Techniques and Applications:
CTAC-97, B.J. Noye, M.D. Teubner and A.W. Gill,
Eds., (World Scientific, Singapore).
Paper
(ps.gz, 132 KB)
- Can MARS be improved with B-splines?
Sergey Bakin, Markus Hegland, Mike Osborne
Computational Techniques and Applications:
CTAC-97, B.J. Noye, M.D. Teubner and A.W. Gill,
Eds., (World Scientific, Singapore).
Paper
(ps.gz, 188 KB)
Return to TOP