Abstract

  • Existing approaches are structured and static, difficult to represent textual data.
  • This paper’s approach: semantic network to represent the data.
  • Uses graph theory and NLP approaches to train a classifier for assigning text documents to relevant classes.
    • This is based on conceptual network distances.
  • Potential to generalise to unseen classes with unseen concepts since the classifier is trained on conceptual distances.

Conclusion

  • Core idea: linking structured data models with unstructured textual data.
  • Concept network is an intermediary between data schema (structured layer) and unstructured sources (which are themselves represented as an unstructured concept network).
  • Classifier uses proposed layered network structure to calculate concept distances for text documents. Uses the classifier to determine whether they are related or not.
  • Classifier assigns a text record to an class if the concepts in the record are semantically close enough to the concepts of the class in the concept networks.
  • Paper’s approach tests using a dataset of work order records.
  • Classifier targets relatedness, not necessarily the class.
  • Concepts are extracted from the data rather than defined by the user.
    • This means we can derive context from the training set instead of defining it ourselves.

Introduction