Parallel Large Scale Techniques for High-Performance
Record Linkage
Publications and Presentations
Back to main page
Publications:
2008
- Febrl - An Open Source Data Cleaning, Deduplication and
Record Linkage System with a Graphical User Interface
Peter Christen
Accepted for the
demo
session at the
ACM SIGKDD
2008 conference, Las Vegas, August 2008.
Submitted paper (4 pages, pdf, 562 KB)
- Automatic Record Linkage using Seeded Nearest Neighbour
and Support Vector Machine Classification
Peter Christen
Accepted for the ACM SIGKDD 2008 conference, Las Vegas,
August 2008.
Submitted paper
(9 pages, pdf, 169 KB)
- Automatic Training Example Selection for Scalable
Unsupervised Record Linkage
Peter Christen
Proceedings of the
Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD), Osaka, Japan, May 2008.
Paper available online.
Submitted
paper (12 pages, pdf, 146 KB)
Submitted
paper (12 pages, ps.gz, 142 KB)
- Febrl - A Freely Available Record Linkage System
with a Graphical User Interface
Peter Christen
Proceedings of the
Australasian Workshop on Health Data and
Knowledge Management (HDKM), Wollongong,
January 2008.
Paper
(pdf, 748 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 80.
2007
- A Two-Step Classification Approach to Unsupervised Record
Linkage
Peter Christen
In proceedings of the Sixth Australasian Data Mining Conference
(AusDM 2007), Gold Coast, December 2007.
Paper
(pdf, 440 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 70.
- Towards parameter-free blocking for scalable record
linkage
Peter Christen
Technical Report TR-CS-07-03
ANU Joint Computer Science Technical Report
Series, August 2007.
Report
(pdf, 201 KB)
Report
(ps.gz, 199 KB)
- Quality and Complexity Measures for Data Linkage and
Deduplication
Peter Christen and Karl Goiser
Chapter in the book
Quality
Measures in Data Mining, vol. 43,
Studies in Computational Intelligence.
F. Guillet and H. Hamilton (eds), Springer, March 2007.
Available online at
SpringerLink.
2006
- Privacy-Preserving Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen
In proceedings of the Workshop on Privacy Aspects of Data Mining (PADM)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Final 5-page version:
Paper
(pdf, 53 KB)
Paper
(ps.gz, 35 KB)
Submitted 11-page version:
Paper
(pdf, 118 KB)
Paper
(ps.gz, 74 KB)
- A Comparison of Personal Name Matching: Techniques and
Practical Issues
Peter Christen
In proceedings of the Workshop on Mining Complex Data (MCD)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Final 5-page version:
Paper
(pdf, 57 KB)
Paper
(ps.gz, 40 KB)
Submitted 12-page version available as:
Technical Report TR-CS-06-02
ANU Joint Computer Science Technical Report
Series, September 2006.
Report
(pdf, 248 KB)
Report
(ps.gz, 236 KB)
- Towards Automated Record Linkage
Karl Goiser and Peter Christen
In proceedings of the Fifth Australasian Data Mining Conference
(AusDM2006), Sydney, November 2006.
Paper
(pdf, 513 KB) available online from
Conferences in Research and Practice in
Information Technology (CRPIT), vol. 61.
- Secure Health Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen and Tim Churches
Proceedings of the
National e-Health Privacy and Security
Symposium (ehPASS), Brisbane, October 2006.
Paper
(pdf, 139 KB)
Paper
(ps.gz, 127 KB)
- Automated Geocoding of Routinely Collected Health Data
in New South Wales
Richard Summerhayes, Paul Holder, John Beard, Peter
Christen, Alan Willmore and Tim Churches
The NSW Public Health Bulletin,
volume 17, number 3-4, March-April 2006.
Online version available
here.
- A Probabilistic Geocoding System Utilising a Parcel Based
Address File
Peter Christen, Alan Willmore and Tim Churches
In Advances in Data Mining: Theory, Methodology,
Techniques, and Applications. Simeon Simoff and Graham
Williams (editors). State-of-the-Art Lecture Notes in
Artificial Intelligence, Volume 3755, Springer-Verlag,
2006.
Available online at
SpingerLink, LNCS 3755.
Copyright for this publication is held by the Springer
Verlag.
2005
- Automated Probabilistic Address Standardisation and
Verification
Peter Christen and Daniel Belacic
Proceedings of the
fourth Australasian
Data Mining Conference (AusDM 2005), Sydney, December 2005.
Paper
(pdf, 146 KB)
Paper
(ps.gz, 204 KB)
- Assessing Deduplication and Data Linkage Quality: What to
Measure?
Peter Christen and Karl Goiser
Proceedings of the
fourth Australasian Data Mining
Conference (AusDM 2005), Sydney, December 2005.
Paper
(pdf, 178 KB)
Paper
(ps.gz, 163 KB)
- Probabilistic Data Generation for Deduplication and
Data Linkage
Peter Christen
Proceedings of the
Sixth International Conference on Intelligent
Data Engineering and Automated Learning (IDEAL'05),
Brisbane, July 2005.
Available online at
SpingerLink, LNCS 3578.
Copyright for this publication is held by the Springer
Verlag.
Paper
(pdf, 124 KB)
Paper
(ps.gz, 135 KB)
- Febrl - Freely extensible biomedical record linkage
(Manual, release 0.3)
Peter Christen and Tim Churches
Available online from
SourceForge.Net, April 2005.
Manual
(pdf, 960 KB)
Manual
(pdf, 282 KB)
- A Probabilistic Deduplication, Record Linkage
and Geocoding System
Peter Christen and Tim Churches
Proceedings of the
ARC Health Data Mining workshop,
University of South Australia, April 2005.
Paper
(pdf, 136 KB)
Paper
(ps.gz, 134 KB)
2004
- A Probabilistic Geocoding System based on
a National Address File
Peter Christen, Tim Churches and Alan Willmore
Accepted for the Australasian Data Mining Conference,
Cairns, December 2004.
Paper
(pdf, 120 KB)
Paper
(ps.gz, 128 KB)
- Some Methods for Blindfolded Record Linkage
Tim Churches and Peter Christen
Published online at BioMed Central
Medical Informatics and Decision Making,
June 2004.
For abstract and downloadable PDF file see
here.
- A Parallel Open Source Data Linkage System
Peter Christen, Tim Churches and Markus Hegland
Proceedings of the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Springer Lecture Notes in Artificial Intelligence,
(3056). Available online at
Springer Online.
Copyright for this publication is held by the Springer
Verlag.
Paper
(pdf, 203 KB)
Paper
(ps.gz, 82 KB)
- Blind Data Linkage using n-gram Similarity
Comparisons
Tim Churches and Peter Christen
Proceedings of the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Springer Lecture Notes in Artificial Intelligence,
(3056). Available online at
Springer Online.
Copyright for this publication is held by the Springer
Verlag.
Paper
(long version, pdf, 177 KB)
Paper
(long version, ps.gz, 69 KB)
2003
2002
- Preparation of name and address data for record linkage
using hidden Markov models
Tim Churches, Peter Christen, Kim Lim and Justin X Zhu
Published online at BioMed Central
Medical Informatics and Decision Making,
December 2002.
For abstract and downloadable PDF file see
here.
- Probabilistic Name and Address Cleaning and
Standardisation
Peter Christen, Tim Churches and Justin Xi Zhu
Proceedings of the
Australasian Data Mining Workshop, Canberra,
December 2002.
Paper
(ps.gz, 74 KB)
Paper
(pdf, 158 KB)
- Febrl - Freely extensible biomedical record linkage
Peter Christen and Tim Churches
ANU Computer Science Technical Reports TR-CS-02-05,
Australian National University, Canberra, October 2002.
Available
here.
- High-Performance Computing Techniques for
Record Linkage
Peter Christen, Justin Zhu, Markus Hegland, Stephen Roberts,
Ole M. Nielsen, Tim Churches and Kim Lim
Proceedings of the Australian Health Outcomes
Conference (AHOC-2002), Canberra, July 2002.
Paper
(ps.gz, 95 KB)
Paper
(pdf, 233 KB)
- Parallel Computing Techniques for
High-Performance Probabilistic Record Linkage
Peter Christen, Markus Hegland, Stephen Roberts,
Ole M. Nielsen, Tim Churches and Kim Lim
Proceedings of the Symposium on Health Data
Linkage, Sydney, March 2002.
Paper
(ps.gz, 91 KB)
Paper
(pdf, 228 KB)
Back to main page
Presentations:
2008
2007
2006
- Privacy-Preserving Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen
Presentation at the Workshop on Privacy Aspects of Data Mining (PADM)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Slides
(pdf, 859 KB)
Slides
8up (ps.gz, 1.4 MB)
- A Comparison of Personal Name Matching: Techniques and
Practical Issues
Peter Christen
Presentation at the Workshop on Mining Complex Data (MCD)
held at the IEEE International Conference on Data
Mining (ICDM), Hong Kong, December 2006.
Slides
(pdf, 830 KB)
Slides
8up (ps.gz, 1.3 MB)
- Recent Developments in Data Linkage and Research at the
ANU
Peter Christen
Invited talk at the
Australian
Taxation Office, Data matching personnel, Canberra,
December 2006.
Slides
10up (ps.gz, 1.6 MB)
- Data Quality Aspects in Data Mining, Data Linkage and
Geocoding
Peter Christen
Invited talk at
Geoscience
Australia, Canberra, November 2006.
Slides
(pdf, 1.7 MB)
Slides
(ps.gz, 1.2 MB)
Slides
9up (ps.gz, 1.2 MB)
- Secure Health Data Linkage and Geocoding: Current
Approaches and Research Directions
Peter Christen and Tim Churches
Presentation at the
National e-Health Privacy and Security
Symposium (ehPASS), Brisbane, October 2006.
Slides
(pdf, 779 KB)
Slides
9up (ps.gz, 363 KB)
- Data Linkage Techniques: Past, Present and Future
Peter Christen
Invited talk at the
Australian
Taxation Office, Canberra, October 2006.
(same set of slides as used for the Analytics Practise
Group presentation, see below).
- Data Linkage Techniques: Past, Present and Future
Peter Christen
Invited talk at the
Canberra
Analytics Practise Group, Canberra, August 2006.
Slides 8up
(pdf, 1.5 MB)
Slides 8up
(ps.gz, 630 KB)
Slides
(pdf, 1.5 MB)
2005
- De-duplication and Data Linkage Quality: What to measure?
Karl Goiser
Presentation at the
fourth Australasian
Data Mining Conference (AusDM 2005), Sydney, December 2005.
Slides
8up (pdf, 395 KB)
Slides
(pdf, 292 KB)
- Automated Probabilistic Address Standardisation and
Verification
Peter Christen
Presentation at the
fourth Australasian
Data Mining Conference (AusDM 2005), Sydney, December 2005.
Slides
8up (ps.gz, 428 KB)
Slides
(pdf, 892 KB)
Slides
(ps.gz, 425 KB)
- Recent Developments in Data Linkage Technologies
Peter Christen
Invited talk at the
Data Linkage Symposium of the Canberra
Branch of the
Statistical Society of Australia,
Canberra, September 2005.
Slides
(pdf, 1.7 MB)
Slides 8up
(ps.gz, 690 KB)
- Probabilistic Data Generation for Deduplication and
Data Linkage
Peter Christen
Presentation at the
Sixth International Conference on Intelligent
Data Engineering and Automated Learning (IDEAL'05),
Brisbane, July 2005.
Slides
(pdf, 704 KB)
Slides
(ps.gz, 269 KB)  
Slides 8up
(pdf, 677 KB)
Slides 8up
(ps.gz, 272 KB)
- Probabilistic Deduplication, Data Linkage and Geocoding
Peter Christen
Presentation at the
DAMA Canberra Chapter,
June 2005.
Slides 8up
(pdf, 2.7 MB)
Slides 8up
(ps.gz, 1.3 MB)
- Probabilistic Deduplication, Record Linkage
and Geocoding
Peter Christen
Guest lecture for
MATH1500: ANU Computational Science Undergraduate
Seminar, ANU, May 2005.
Slides 8up
(pdf, 2.1 MB)
Slides 8up
(ps.gz, 989 KB)
- Febrl - A parallel open source record linkage and geocoding
system
Peter Christen
Presentation at the Data Linkage Workshop,
Australian Bureau
of Statistics, Canberra, April 2005.
Slides 8up
(pdf, 2.4 MB)
Slides 8up
(ps.gz, 1.2 MB)
- A Probabilistic Deduplication, Record Linkage
and Geocoding System
Peter Christen and Tim Churches
Presentation at the
ARC Health Data Mining workshop,
University of South Australia, April 2005.
Slides 4up
(pdf, 854 KB)
Slides 4up
(ps.gz, 389 KB)
Slides
(pdf, 885 KB)
Slides
(ps.gz, 386 KB)
2004
- A Probabilistic Geocoding System based on
a National Address File
Peter Christen, Tim Churches and Alan Willmore
Presentation at the Australasian Data Mining Conference,
Cairns, December 2004.
Slides 4up
(pdf, 1.6 MB)
Slides 4up
(ps.gz, 752 KB)
Slides
(pdf, 1.6 MB)
Slides
(ps.gz, 751 KB)
- Febrl - A parallel open source data linkage and geocoding
system
Peter Christen
Presentation at the Open Source Workshop,
Australian
Bureau of Statistics, Canberra, July 2004.
Slides 4up
(pdf, 1.3 MB)
Slides 4up
(ps.gz, 595 KB)
- A Parallel Open Source Data Linkage System
Peter Christen, Tim Churches and Markus Hegland
Presentation at the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Slides
(pdf, 656 KB)
Slides 4up
(pdf, 638 KB)
- Blind Data Linkage using n-gram Similarity
Comparisons
Tim Churches and Peter Christen
Presentation at the 8th
PAKDD'04
(Pacific-Asia Conference on Knowledge Discovery and Data
Mining), Sydney, May 2004.
Slides
(pdf, 510 KB)
Slides
4up (pdf, 501 KB)
2003
2002
- Probabilistic Name and Address Cleaning and
Standardisation
Peter Christen, Tim Churches and Justin Xi Zhu
Presentation at the
Australasian Data Mining Workshop, Canberra,
December 2002.
Slides
4up (ps.gz, 345 KB)
Slides
4up (pdf, 764k KB)
- High-Performance Computing Techniques for
Record Linkage
Peter Christen, Tim Churches, Markus Hegland, Kim Lim, Ole M.
Nielsen, Stephen Roberts and Justin Xi Zhu
Presentation at the Australian Health Outcomes
Conference (AHOC-2002), Canberra, July 2002.
Slides 4up
(ps.gz, 1.6 MB)
Slides 4up
(pdf, 1.5 MB)
- Parallel Techniques for High-Performance
Record Linkage (Data Matching)
Peter Christen
Seminar at the ANU Department of Computer
Science, Canberra, June 2002.
Slides
(ps.gz, 532 KB)
Slides
(pdf, 1.2 MB)
Slides 4up
(ps.gz, 535 KB)
Slides 4up
(pdf, 1.1 MB)
- Parallel Computing Techniques for High-Performance
Probabilistic Record Linkage
Peter Christen, Tim Churches, Markus Hegland, Kim Lim, Ole M.
Nielsen and Stephen Roberts
Presentation at the Symposium on Health Data
Linkage, Sydney, March 2002.
Slides
(ps.gz, 1.5 MB)
Slides
(pdf, 654 KB)
Back to main page
Third Party Publications:
- Adaptive Filtering for Efficient Record Linkage
Lifang Gu and Rohan Baxter
2004 SIAM Int. Conf. on Data Mining, April 22-24, Orlando,
Florida
Available online (PDF)
- Record Linkage: Current Practice and Future
Directions
Lifang Gu, Rohan Baxter, Deanne Vickers and Chris
Rainsford
Technical Report 03/83, April 2003, CSIRO Mathematical and
Information Sciences, GPO Box 664, Canberra 2601, Australia
Available online (PDF)
Back to main page