In the context of a pure tone, the frequency is determined mechanically by the ear. The basilar membrane essentially plays the role of the diaphragm in a microphone, but it has a tapered shape, and variable stiffness, so that the resonant frequency changes according to position. As a result, the membrane itself performs a mechanical Fourier transform of the acoustic signal that enters the cochlea.
Positioned on the basilar membrane, are 'hair cells' that detect vibration. Because of the mechanical nature of the membrane, a specific frequency sound will cause significant vibration only in one specific region, causing only those cells to be activated. The auditory nerve provides a massively parallel connection from the millions of hair cells to the brain - there is a limited amount of signal processing before the signal from the cells reaches the auditory nerve - e.g. combining signals for a few adjacent cells onto a single nerve fiber.
So, in the case of a pure tone, the pitch has already been determined before the signal reaches the brain, because of the specific tuning of individual nerve fibers to specific frequencies. All the brain has to do is see which fibers are active - in much the same way as you can tell whether something has touched you on your thumb or little finger, because the different regions are connected to the brain via different nerve fibers. There is a little more subtlety here, in that the tuning isn't perfect, and the cells/membrane have a finite, albeit quite narrow bandwidth, so a pure tone will cause activation of a bunch of adjacent nerve fibers - so the brain would determine the pitch as being at the center of the bundle.
There is a more fundamental problem when talking about determination of pitch, or frequency, and latency; uncertainty. The precision to which a frequency can be measured is determined by the duration of the signal (or of the observation). If you suddenly switch on a sine wave, then when the sine wave begins, there will be a discontinuity in the amplitude waveform which will contain all frequencies. The more gradually the signal is switched on (or off), the smaller the discontinuity, and the narrower the band of frequencies that it contains. This is a fundamental mathematical problem, not a biological one. Measuring the latency of pitch determination is therefore very difficult, as you must gradually switch on the sound, otherwise, you spray a whole bunch of frequencies into the ear, and you cannot be sure that what you are measuring is genuine. It's been a very long time since I did my research into pitch processing in the ear, but I seem to recall that we used 50 ms attack/decay ramps precisely to avoid this 'spectral splatter' problem.
What is much more interesting is how the brain determines pitch for a sound without a defined frequency.... e.g. A signal containing frequencies 1000 Hz, 1500 Hz and 2000 Hz is heard with a pitch of 500 Hz, even though there is no 500 Hz component. That's a much more complicated problem, and I'm not sure that it is one that has been fully solved.