Duplicate text detection system now integrated with conference management software

The system is currently being used by IEEE and ACM, and helps them enforce their new 30%-policy.

conference Enlarge

Prof. Igor Markov’s duplicate text detection system, called DUDE, is now integrated with Softconf’s conference management software. Softconf is an internet company dedicated to organizing conferences, workshops and other software development events. DUDE is now integrated with their signature product, START V2, which is a web-based solution for managing peer-reviewed conferences and workshops.

Created in 2008, DUDE applies computer technology used by web search engines to detect matching text in sets of technical papers. When researchers submit papers to conferences, their paper should not overlap with previously published work, and for reviewers it can be tedious to evaluate each paper to make sure the work is original and innovative.

DUDE consists of two parts. The first part is a web server that stores files containing hash codes computed from the papers. Hash codes are distributed to users, but confidential conference submissions never reach the server under normal operation. The second part is the client, which is run on a conference program committee machine. This program computes hash codes from submitted papers, and then consults the DUDE server to find other papers containing the same phrases.

The system does not make moral judgments about how much matching text is “too much overlap” or “fairly closely match,” but rather sorts matching papers to highlight most similar pairs. It generates reports for conference committees, pointing out and annotating any similarities that exist. Conference committees make all of the decisions in accordance with their conference policies.

DUDE is integrated with START V2 so that program chairs can invoke the system with a push of a button and add notes next to papers suspected of plagiarism. The integration also simplifies the management of databases needed to detect plagiarism.

The system is currently being used by IEEE and ACM, and helps them enforce their new 30%-policy, which requires at least 30% new material compared to earlier conference publications.