The success of language models based on the Transformer architecture appears to be inconsistent with observed anisotropic properties of representations learned by such models. We resolve this by showing, contrary to previous studies, that the representations do not occupy a narrow cone, but rather drift in common directions.
In this paper we investigate how the BERT architecture and its pre-training protocol affect the geometry of its embeddings and the effectiveness of its features for classification tasks. As an auto-encoding model, during pre-training, it produces representationsthat are context dependent and at the same time must beable to “reconstruct” the original input sentences. The complex interactions of the two via transformers lead to interesting geometric properties of the embeddings and subsequently affectthe inherent discriminability of the resulting representations. Our experimental results illustrate that the BERT models do not produce “effective” contextualized representations for words.
In this paper, we propose two deep-learning-based models for supervised WSD - a model based on bi-directional long short-term memory (BiLSTM) network, and an attention model based on self-attention architecture.
In this work, we derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. In addition, we provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.
In this paper, we propose a novel deep neural network architecture for supervised medical word sense disambiguation.
Predicting the risk of mortality for patients with acute myocardial infarction (AMI) using electronic health records (EHRs) data can help identify risky patients who might need more tailored care. Our prior work only used the structured clinical data from MIMIC-III. In this study, we enhanced our work by adding the word embedding features from free-text discharge summaries. The average accuracy of our deep learning models was 92.89% and the average F-measure was 0.928.