Publications

Displaying 1 - 6 of 6
  • Galke, L., Vagliano, I., Franke, B., Zielke, T., & Scherp, A. (2023). Lifelong learning on evolving graphs under the constraints of imbalanced classes and new classes. Neural networks, 164, 156-176. doi:10.1016/j.neunet.2023.04.022.

    Abstract

    Lifelong graph learning deals with the problem of continually adapting graph neural network (GNN) models to changes in evolving graphs. We address two critical challenges of lifelong graph learning in this work: dealing with new classes and tackling imbalanced class distributions. The combination of these two challenges is particularly relevant since newly emerging classes typically resemble only a tiny fraction of the data, adding to the already skewed class distribution. We make several contributions: First, we show that the amount of unlabeled data does not influence the results, which is an essential prerequisite for lifelong learning on a sequence of tasks. Second, we experiment with different label rates and show that our methods can perform well with only a tiny fraction of annotated nodes. Third, we propose the gDOC method to detect new classes under the constraint of having an imbalanced class distribution. The critical ingredient is a weighted binary cross-entropy loss function to account for the class imbalance. Moreover, we demonstrate combinations of gDOC with various base GNN models such as GraphSAGE, Simplified Graph Convolution, and Graph Attention Networks. Lastly, our k-neighborhood time difference measure provably normalizes the temporal changes across different graph datasets. With extensive experimentation, we find that the proposed gDOC method is consistently better than a naive adaption of DOC to graphs. Specifically, in experiments using the smallest history size, the out-of-distribution detection score of gDOC is 0.09 compared to 0.01 for DOC. Furthermore, gDOC achieves an Open-F1 score, a combined measure of in-distribution classification and out-of-distribution detection, of 0.33 compared to 0.25 of DOC (32% increase).

    Additional information

    Link to preprint version code datasets
  • Galke, L., Franke, B., Zielke, T., & Scherp, A. (2021). Lifelong learning of graph neural networks for open-world node classification. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ: IEEE. doi:10.1109/IJCNN52387.2021.9533412.

    Abstract

    Graph neural networks (GNNs) have emerged as the standard method for numerous tasks on graph-structured data such as node classification. However, real-world graphs are often evolving over time and even new classes may arise. We model these challenges as an instance of lifelong learning, in which a learner faces a sequence of tasks and may take over knowledge acquired in past tasks. Such knowledge may be stored explicitly as historic data or implicitly within model parameters. In this work, we systematically analyze the influence of implicit and explicit knowledge. Therefore, we present an incremental training method for lifelong learning on graphs and introduce a new measure based on k-neighborhood time differences to address variances in the historic data. We apply our training method to five representative GNN architectures and evaluate them on three new lifelong node classification datasets. Our results show that no more than 50% of the GNN's receptive field is necessary to retain at least 95% accuracy compared to training over the complete history of the graph data. Furthermore, our experiments confirm that implicit knowledge becomes more important when fewer explicit knowledge is available.
  • Galke, L., Seidlmayer, E., Lüdemann, G., Langnickel, L., Melnychuk, T., Förstner, K. U., Tochtermann, K., & Schultz, C. (2021). COVID-19++: A citation-aware Covid-19 dataset for the analysis of research dynamics. In Y. Chen, H. Ludwig, Y. Tu, U. Fayyad, X. Zhu, X. Hu, S. Byna, X. Liu, J. Zhang, S. Pan, V. Papalexakis, J. Wang, A. Cuzzocrea, & C. Ordonez (Eds.), Proceedings of the 2021 IEEE International Conference on Big Data (pp. 4350-4355). Piscataway, NJ: IEEE.

    Abstract

    COVID-19 research datasets are crucial for analyzing research dynamics. Most collections of COVID-19 research items do not to include cited works and do not have annotations
    from a controlled vocabulary. Starting with ZB MED KE data on COVID-19, which comprises CORD-19, we assemble a new dataset that includes cited work and MeSH annotations for all records. Furthermore, we conduct experiments on the analysis of research dynamics, in which we investigate predicting links in a co-annotation graph created on the basis of the new dataset. Surprisingly, we find that simple heuristic methods are better at
    predicting future links than more sophisticated approaches such as graph neural networks.
  • Melnychuk, T., Galke, L., Seidlmayer, E., Förster, K. U., Tochtermann, K., & Schultz, C. (2021). Früherkennung wissenschaftlicher Konvergenz im Hochschulmanagement. Hochschulmanagement, 16(1), 24-28.

    Abstract

    It is crucial for universities to recognize early signals of scientific convergence. Scientific convergence describes a dynamic pattern where the distance between different fields of knowledge shrinks over time. This knowledge
    space is beneficial to radical innovations and new promising research topics. Research in converging areas of knowledge can therefore allow universities to establish a leading position in the science community.
    The Q-AKTIV project develops a new approach on the basis of machine learning to identify scientific convergence at an early stage. In this work, we briefly present this approach and the first results of empirical validation. We discuss the benefits of an instrument building on our approach for the strategic management of universities and
    other research institutes.
  • Galke, L., Mai, F., Schelten, A., Brunch, D., & Scherp, A. (2017). Using titles vs. full-text as source for automated semantic document annotation. In O. Corcho, K. Janowicz, G. Rizz, I. Tiddi, & D. Garijo (Eds.), Proceedings of the 9th International Conference on Knowledge Capture (K-CAP 2017). New York: ACM.

    Abstract

    We conduct the first systematic comparison of automated semantic
    annotation based on either the full-text or only on the title metadata
    of documents. Apart from the prominent text classification baselines
    kNN and SVM, we also compare recent techniques of Learning
    to Rank and neural networks and revisit the traditional methods
    logistic regression, Rocchio, and Naive Bayes. Across three of our
    four datasets, the performance of the classifications using only titles
    reaches over 90% of the quality compared to the performance when
    using the full-text.
  • Galke, L., Saleh, A., & Scherp, A. (2017). Word embeddings for practical information retrieval. In M. Eibl, & M. Gaedke (Eds.), INFORMATIK 2017 (pp. 2155-2167). Bonn: Gesellschaft für Informatik. doi:10.18420/in2017_215.

    Abstract

    We assess the suitability of word embeddings for practical information retrieval scenarios. Thus, we assume that users issue ad-hoc short queries where we return the first twenty retrieved documents after applying a boolean matching operation between the query and the documents. We compare the performance of several techniques that leverage word embeddings in the retrieval models to compute the similarity between the query and the documents, namely word centroid similarity, paragraph vectors, Word Mover’s distance, as well as our novel inverse document frequency (IDF) re-weighted word centroid similarity. We evaluate the performance using the ranking metrics mean average precision, mean reciprocal rank, and normalized discounted cumulative gain. Additionally, we inspect the retrieval models’ sensitivity to document length by using either only the title or the full-text of the documents for the retrieval task. We conclude that word centroid similarity is the best competitor to state-of-the-art retrieval models. It can be further improved by re-weighting the word frequencies with IDF before aggregating the respective word vectors of the embedding. The proposed cosine similarity of IDF re-weighted word vectors is competitive to the TF-IDF baseline and even outperforms it in case of the news domain with a relative percentage of 15%.

Share this page