Any libraries or methods for syncing audio with text?

Making an ebook for practice, I want to highlight the text associated with the audio. Both language inputs are in Korean, any ideas/thoughts?