The physical human-robot interaction (pHRI) research field is expected to contribute to competitive and cooperative human-robot tasks that involve force interactions. However, compared with human-human interactions, current pHRI approaches lack tactical considerations. Current approaches do not estimate intentions from human behavior and do not select policies that are appropriate for the opponent's changing policy. For this reason, we propose a reinforcement learning model that estimates the opponent's changing policy using time-series observations and expresses the agent's policy in a common latent space, referring to descriptions of tactics in open-skill sports. We verify the performance of the reinforcement learning agent using two novel physical and competitive environments, push-hand game and air-hockey. From this, we confirm that the latent space works properly for policy information because each latent variable that represents the machine agent's own policy and that of the opponent affects the behavior of the agent. Two latent variables can clearly express how the agent estimates the opponent's policy and decides its own policy.