So we want to recover that tag sequence T which is most likely given the
observed sequence W, i.e. which maximises p(T|W). There's no obvious
way to do this directly, but Bayes rule tells us p(T|W) is equal to
p(T) * p(W|T)
-------------
p(W)
So we want to find the sequence T which maximises THAT value, and
that's not too hard, since p(W) doesn't change as we change T, so we
just need to maximise the numerator, p(T) * p(W|T). A simple tagger
might do this by using observed tag bigram frequencies to estimate
p(T), and similarly [finally he gets to the answer to your question!]
the product of the frequency-based estimates of the individual
p(w[i]|t[i]) to estimate p(W|T).
Needless to say, any table of word/tag pair frequencies will allow you
to derive estimates of either p(w|t) or p(t|w)---it just depends on
what you sum over what, as it were.
Hope this helps,
ht