Poverty, Rurality and Disability are the three major burdens in achieving Universal Healthcare Coverage (UHC). The advent of Information and Communication Technologies (ICT) and remote healthcare systems play a significant role to reach the unreached communities and are addressing rurality and poverty issues. However, the persons with disability (PWDs), especially the speech and hearing impaired people find it difficult to participate in remote healthcare systems as they cannot communicate with a remote doctor. A design of a 'Sign Language Transformer (SLT)' has been introduced in this paper for the patients who know sign languages to establish a communication with a remote doctor who cannot interpret such signs. The primary function of this SLT is to recognize the signs/gestures from video images and translate them into both text and speech (SLTT), and to translate doctor's speech into sign language (STSL). Sign representation of words and sentences requires hand gesture, movement, and orientation. Several technologies such as the two-stream CNN, the two-stream 3D CNN, the LSTM, the 3DCNN+ ConvLSTM, the 3D CNN and the 3D CNN + LSTM are commonly used techniques to recognize human gestures. The proposed SLT model will evaluate the performances of these technologies to transform the Bangla Sign Language and to recommend the suitable technology for designing a sign language transformer.