Exporting and Importing Records

Why Export and Import?

Consider the following situations:

In each of the above situations, records are to be selected from the RefCollection of one user and sent to another user. This is different from just generating a list of references in some format, say HTML. In this case, the entire records and all their associated data are to be transferred. This is achieved by the Export/Import feature of RefKeep.

How to Export and Import

Export

Exporting involves the following steps:

  1. Choose the records to be exported. This can be done in two ways:

    1. Export only the record currently displayed on the screen and its associated data. This is done by pressing the following button:

      Figure 1.

    2. Export multiple records in one go. Bring up the Record Chooser dialog box shown below by selecting File->Export Records.

      Figure 2.

      Select the records to be exported from the different panes of the tabbed pane. Press the EXPORT button to export the selected records (appearing towards the right in the record chooser dialog).

  2. Choose the name of the ZIP file which will contain the exported records using the File Chooser Dialog which pops up.

  3. That's it! The selected records and their associated data are exported to the specified file. This ZIP file can be sent across (for example, as an email attachment) to the desired recipient, who can then import the records as described next.

Importing

Importing records into the currently selected (in focus) RefCollection is a one-step process. Select the menu option File->Import Records and specify the name of the ZIP file containing the exported records.

Identifying Duplicate Records while Importing

While importing a set of records into a RefCollection, it is possible that the RefCollection already contains some of the records being imported. Identifying similar records is a big challenge. RefKeep attempts to solve the problem through a simple Similarity Identifying Scheme in which it compares the hamming distance between the character count arrays of the titles (name, URL etc) of the two records being compared. If the distance divided by the average length of the two titles is less than a particular threshold value, the two records are deemed to be identical and the record is not imported. All references to this particular record in the set of records being imported are appropriately adjusted. For example, if you are importing the book "Introduction to Algorithms" with author by the name "Cormen" into a RefCollection which already contains a book by "Cormen". In such a case, a duplicate record Cormen is not added, but the Author field of the Book record being imported is set to the existing author "Cormen".

If the value of the Similarity Rating is above a threshold, the two records are considered different and the imported record is incorporated into the RefCollection. If the rating value falls in between the two thresholds, the system fails to automatically determine if the records are identical. It will then display a summary of the two records being compared to the user, as shown below. The user can then decide if the two records are identical or not by pressing YES, they are IDENTICAL or NO, they NOT IDENTICAL. The record is incorporated or not on the basis of the user response.

Figure 3.

RefKeep currently uses a very simple Similarity Detection algorithm. We are working on better methods. If you have any ideas, kindly email

Exporting/Importing Source Files of Paper Records

Paper records can have Source Files associated with them. These source files usually contain the actual paper contents - for example, the PDF or HTML file containing the text and figures of the paper represented by the particular Paper Record. While exporting Paper Records having associated source files, the user is presented with an option to select the source files which are to be included in the ZIP file created through the dialog box shown below:

Figure 4.

Please note that some of the source files may be large in size; including them can significantly increase the size of ZIP file created. So hence the source files should be selected only after noting their individual file sizes (given in bracket).

While importing Paper Records, the associated source files (if the user had decided to include them) will be extracted to a sub-directory called paperStore located in the directory containing the RefCollection .rcl file. If the paperStore already contains a file with the same name as the file being imported, options are presented to the user to overwrite, rename or ignore the operation.