Pembandingan Tiga Nada Vokal /e/ untuk Animasi Gerak Bibir

Anung Rachman, Risanuri Hidayat, Hanung Adi Nugroho

Abstract


Saat ini teknologi animasi gerak bibir tengah berkembang secara signifikan seiring dengan perkembangan industri kreatif. Metode yang sering digunakan untuk membuat gerak bibir tersebut adalah peta fonem ke visem. Pembangunan peta fonem ke visem awalnya mengacu pada ketentuan baku susunan fonem yang sudah ada, namun kemudian susunan ini berkembang mengikuti kebutuhan. Fonem vokal mengambil peran terbesar karena energi percakapan terakumulasi padanya. Variasi pengucapan vokal yang sangat beragam menyebabkan susunan keberadaan fonem vokal pada peta juga beragam. Keragaman ini berimplikasi pada akurasi peta fonem ke visem yang juga berujung pada akurasi gerak bibir animasi. Paper ini membahas tentang signifikansi perbedaan tiga macam nada vokal /e/ Bahasa Indonesia baik dari ciri audio maupun dari ciri visual untuk menunjang susunan baku vokal pada peta fonem ke visem. Metode yang digunakan adalah filter LPC untuk mengekstraksi ciri frekuensi forman, Par-CLR untuk mengekstraksi ciri visual, hingga uji statistik untuk mengetahui signifikansi perbedaan. Hasilnya menunjukkan sebagian nada tersebut memiliki perbedaan signifikan satu sama lain. Sehingga peta fonem ke visem akan lebih akurat jika menyertakan unsur ketiga /e/ tersebut.

Keywords


Animasi Gerak Bibir; Peta Fonem e Visem; Ciri Audio; Ciri Visual

Full Text:

PDF

References


S.-M. Hwang, H.-K. Yun, and B.-H. Song, “Korean Speech Recognition Using Phonemics for Lip-Sync Animation,” in Information Science, Electronics and Electrical Engineering (ISEEE), 2014, pp. 1011–1014.

International Phonetic Association and I. P. Association, Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, 1999.

L. Cappelletta and N. Harte, “Phoneme-to-Viseme Mapping for Visual Speech Recognition,” in 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM) Volume 2, 2012, pp. 322–329.

E. Bozkurt, Ç. E. Erdem, E. Erzin, T. Erdem, and M. Özkan, “Comparison of Phoneme and Viseme Based Acoustic Units for Speech Driven Realistic lip Animation,” in 3DTV Conference, 2007, pp. 1–4.

E. Setyati, S. Sumpeno, M. H. Purnomo, K. Mikami, M. Kakimoto, and K. Kondo, “Phoneme-Viseme Mapping for Indonesian Language Based on Blend Shape Animation,” IAENG Int. J. Comput. Sci., vol. 42, no. 3, pp. 233–244, 2015.

C. Neti et al., “Audio-Visual Speech Recognition,” Work. Final Rep., p. 764, 2000.

T. J. Hazen, K. Saenko, C.-H. La, and J. R. Glass, “A Segment-based Audio-visual Speech Recognizer: Data Collection, Development, and Initial Experiments,” in Proceedings of the 6th International Conference on Multimodal Interfaces, 2004, pp. 235–242.

Arifin, Muljono, S. Sumpeno, and M. Hariadi, “Towards Building Indonesian Viseme: A Clustering-Based Approach,” in IEEE International Conference on Computational Intelligence and Cybernetics (CYBERNETICSCOM), 2013, pp. 57–61.

S. Lee and D. Yook, “Audio-to-Visual Conversion Using Hidden Markov Models,” in PRICAI 2002: Trends in Artificial Intelligence, 2002, pp. 563–570.

M. Liyanthy, H. Nugroho, and W. Maharani, “Realistic Facial Animation Of Speech Synchronization For Indonesian Language,” in 2015 3rd International Conference on Information and Communication Technology (ICoICT ), 2015, pp. 563–567.

J. Xu, J. Pan, and Y. Yan, “Agglutinative language speech recognition using automatic allophone deriving,” Chinese J. Electron., vol. 25, no. 2, pp. 328–333, 2016.

D. O’Shaughnessy, “Linear predictive coding,” IEEE Potentials, vol. 7, no. 1, pp. 29–32, 1988.

P. Ladefoged and K. Johnson, A Course in Phonetics, 7th ed. Stamford: Cengage Learning, 2014.

A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, “Incremental Face Alignment in the Wild,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1859–1866.

C. Goodall, “Procrustes Methods in the Statistical Analysis of Shape,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 53. WileyRoyal Statistical Society, pp. 285–339, 1991.

X. Xiong and F. De la Torre, “Supervised Descent Method and Its Applications to Face Alignment,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 532–539.

D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.

N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” in Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005, vol. 1, pp. 886–893.




DOI: http://dx.doi.org/10.26418/jp.v5i2.32667

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.  
  View My Stats