1.1. State-of-the-art study on bimodal corpus design
1.2. Inventory of Romanian language data collections available at partners or in third parties coalitions and of their storage formats.
1.3. Functional and architectural design of the infrastructure that will host the consortium's resources and tools for processing and access and the realization of a prototype
1.4. Dissemination.
2.1. Creating the common infrastructure for storing bimodal resources and for processing and searching tools
2.2. Designing solutions for the harmonization of different representations of existing collections (annotations and metadata)
2.3. Creating format convertors for the harmonization of different representations to a standard representation agreed upon within the consortium
2.4. Armonizarea colecțiilor existente Harmonization of existing collections
2.5. Dissemination
3.1. Increasing the size of the oral corpus with new recordings that duplicate texts from the CoRoLa corpus
3.2. Increasing the size of the bimodal corpus: metadata filling-in, alignment with the help of the algorithms developed in projects P2, P3 and P4 and manual and semiautomatic annotations of the bimodal corpus
3.3. Extracting statistics on the bimodal corpus
3.4. Designing applications for exploiting the bimodal corpus and the technologies for written and oral texts processing, created in projects P2, P3 and P4
3.5. Management and dissemination.
4.1. Other applications for the exploitation of the bimodal corpus and of the speech and text processing technologies developed in ReTeRom
4.2. Dissemination of the bimodal corpus
1.5. Defining the functional and architectural specifications of the integrated and configurable text processing platform
1.6. Defining the software modules and services offered by the project; identifying necessary adaptations for existing NLP modules and new modules needed
1.7. Making the necessary adaptations for the existing NLP modules identified in Activities 1.5 and 1.6
1.8. Creating and validating (possibly with necessary manual corrections) a bimodal corpus lexicon and incorporating it into the existing lexicon
1.9. Dissemination
2.6. Implementation of new modules conforming to the defined functional specifications
2.7. Phonetical transcription of the words from the validated lexicon
2.8. Implementation of the prototype for the integrated and configurable platform; testing, evaluating and validating the prototype
2.9. Processing the textual component of the bimodal corpus collected in project 1. Validation and correction of processing errors
2.10. Dissemination
3.6. Analysis of the errors of the ASR and TTS systems trained in projects 3 and 4 on the annotated and corrected bimodal corpus aggregated in project 1
3.7. Finalizing the development, testing and validation of the integrated and configurable platform for processing texts in Romanian; ready-to-use solution
3.8. Disseminating the results of TEPROLIN
4.3. Testing the dockerised TEPROLIN platform on new corpora
4.4. Dissemination of the dockerised TEPROLIN platform
1.10. Study of well-known methods on the use of complementary ASR systems for the automatic generation of annotations
1.11. Study of well-known methods for alignment of approximate transcripts with speech signal
1.12. Study of well-known methods for generating confidence scores for Automatic Speech Recognition (ASR)
1.13. Design and implementation of a basic solution for automatic speech annotation using complementary ASR systems
1.14. Dissemination
2.11. Designing and implementing a basic solution for filtering and aligning the approximate transcriptions with speech signal
2.12. Designing and implementing a basic solution for the generating ASR confidence score
2.13. Enhancing automatic speech recognition solution using complementary ASR systems
2.14. Dissemination
3.9. Analysis of the impact of using complementary ASRs for generating annotations within the context of improving ASR systems
3.10. Improving the solution for filtering and aligning approximate transcription with the speech signal
3.11. Improving the solution for generating confidence scores for ASR
3.12. Analysis of the impact of using approximate transcriptions for retraining ASR systems
3.13. Analysis of the impact of using confidence scores for filtering ASR transcriptions for retraining ASR systems
3.14. Dissemination
4.5. Management and dissemination
1.15. Identifying prosody patterns; highlighting correlations between text (morphology, syntax) and vocal signal
1.16. Identifying methods for automatic recognision and classification of the expression style in textual data sources
1.17. Analysis of the methods for automatic control and adaptation of the speakers' expressivity in the text-to-speech synthesis systems
1.18. Implementation of the automatic prosody control module
1.19. Dissemination
2.15. Implementing a module for identification of the speech style and expressivity level from text analysis
2.16. Implementing a module for the adaptation of the TTS system to a new speaker
2.17. Implementing a module for transplantation of a speaker’s prosody in the TTS system
2.18. Improving the prosody modelling and control component; software testing and validation/demonstration activities
2.19. Dissemination
3.15. Developing a new technology for adapting the synthetic voice to the style and expressivity of a new speaker
3.16. Developing a new method for quick adaptation of the synthetic voice using atypical audio data
3.17. Integrating a new technology and demonstrating it in the creation of human-computer interfaces for speech synthesis
3.18. Dissemination
4.6. Final evaluation and distribution of Project 4 technologies
4.7. Dissemination
Technical-Scientific Report for ReTeRom
Phase I (2018).
TADARAVStudy of well-known methods on the use of complementary ASR systems for the automatic generation of annotations
TADARAVStudy of well-known methods for alignment of approximate transcripts with speech signal.
TADARAV Study of well-known methods for generating confidence scores for Automatic Speech Recognition (ASR).
SINTEROIdentifying prosody patterns; highlighting correlations between text (morphology, syntax) and vocal signal.
SINTEROIdentifying methods for automatic recognision and classification of the expression style in textual data sourcesaudio.
SINTEROAnalysis of the methods for automatic control and adaptation of the speakers' expressivity in the text-to-speech synthesis systems.
SINTEROImplementation of the automatic prosody control module.
COBILIRO: State-of-the-art study on bimodal corpus design.
COBILIRO: Inventory of Romanian language data collections available at partners or in third parties coalitions and of their storage formats.
COBILIRO: Functional and architectural design of the infrastructure that will host the consortium's resources and tools for processing and access and the realization of a prototype.
DISSEMINATION Dissemination and participation in technical-scientific events, including in the media.
TEPROLIN: Defining the functional and architectural specifications of the integrated and configurable text processing platform.
TEPROLIN: Defining the software modules and services offered by the project; identifying necessary adaptations for existing NLP modules and new modules needed.
TEPROLIN: Making the necessary adaptations for the existing NLP modules identified in Activities 1.5 and 1.6
ICIA: Web page launch.
COBILIRO: Creating the common infrastructure for storing bimodal resources and for processing and searching tools.
COBILIRO: Designing solutions for the harmonization of different representations of existing collections (annotations and metadata).
COBILIRO: Creating format convertors for the harmonization of different representations to a standard representation agreed upon within the consortium.
COBILIRO: Harmonization of existing collections.
COBILIRO: Dissemination.
TEPROLIN: Implementation of new modules conforming to the defined functional specifications.
TEPROLIN: Implementation of the prototype for the integrated and configurable platform; testing, evaluating and validating the prototype..
TADARAV: Act 2.11 - Designing and implementing a basic solution for filtering and aligning the approximate transcriptions with speech signal. Act 2.12 - Designing and implementing a basic solution for the generating ASR confidence score Act 2.13 - Enhancing automatic speech recognition solution using complementary ASR systems Act 2.14 - Dissemination
TEPROLIN: Processing the textual component of the bimodal corpus collected in project 1. Validation and correction of processing errors.
SINTERO: Implementing a module for identification of the speech style and expressivity level from text analysis.
SINTERO: Implementing a module for the adaptation of the TTS system to a new speaker.
SINTERO: Implementing a module for transplantation of a speaker’s prosody in the TTS system.
SINTERO: Improving the prosody modelling and control component; software testing and validation/demonstration activities..
SINTERO: Dissemination.
Technical-Scientific Report for ReTeRom Phase II (2019).
TEPROLIN: Disseminations.
Events organized in the
ReTeRom project
TEPROLIN: Disseminations.
Events organized in the
ReTeRom project
TEPROLIN: Romanian Portal of Language Technologies.
Scientific and technical report(2018 - september 2020)
COBILIRO: Increasing the size of the oral corpus with new recordings that duplicate texts from the CoRoLa corpus
COBILIRO: Increasing the size of the bimodal corpus: metadata filling-in, alignment with the help of the algorithms developed in projects P2, P3 and P4 and manual and semiautomatic annotations of the bimodal corpus
COBILIRO: Extracting statistics on the bimodal corpus
COBILIRO: Designing applications for exploiting the bimodal corpus and the technologies for written and oral texts processing, created in projects P2, P3 and P4
COBILIRO: Management and dissemination
TEPROLIN: Analysis of the errors of the ASR and TTS systems trained in projects 3 and 4 on the annotated and corrected bimodal corpus aggregated in project 1
TEPROLIN: Finalizing the development, testing and validation of the integrated and configurable platform for processing texts in Romanian; ready-to-use solution
TEPROLIN: Dissemination
TADARAV: Analysis of the impact of using complementary ASRs for generating annotations within the context of improving ASR systems
TADARAV: Analysis of the impact of using approximate transcriptions for retraining ASR systems
TADARAV: Analysis of the impact of using confidence scores for filtering ASR transcriptions for retraining ASR systems
TADARAV: Dissemination
SINTERO: Developing a new technology for adapting the synthetic voice to the style and expressivity of a new speaker
SINTERO: Developing a new method for quick adaptation of the synthetic voice using atypical audio data
SINTERO: Integrating a new technology and demonstrating it in the creation of human-computer interfaces for speech synthesis.
SINTERO: Dissemination
Scientific and technical report phase III
Resources and technologies for the development of human-computer interfaces in the Romanian language.
Final phase Workshop
COBILIRO: Other applications for the exploitation of the bimodal corpus and of the speech and text processing technologies developed in ReTeRom
COBILIRO: Dissemination of the bimodal corpus
TEPROLIN: Testing the dockerised TEPROLIN platform on new corpora
TEPROLIN: Dissemination of the dockerised TEPROLIN platform
SINTERO: Final evaluation and distribution of Project 4 technologies
SINTERO: Dissemination