In this paper, we propose a novel deep neural network architecture for supervised medical word sense disambiguation. Our architecture is based on a layered bidirectional LSTM network, upon which a max-pooling along multiple time steps are performed so that a dense representation of the context is created. In addition, we introduced four different adjustments to the output of the LSTM in order to find the most suitable input form to the max-pooling layer. Results show that the best model outperforms the current state-of-the-art model on the MSH WSD dataset. Moreover, we also train an “universal” network to disambiguate all the target ambiguous words together. We concatenate the embedding of the ambiguous word to the max-pooled vector in the universal network as a `hint’ layer. Results show that our universal network achieves nearly 90 percent of the test accuracy.