Artificial Intelligence and Knowledge Processing - 5th International Conference, Proceedings
editorial
Springer Science and Business Media Deutschland GmbH
Thesis author
Alexander Rodriguez-Lopez
Julián David Forero
Resumen
This work presents the design and validation of a semantic retrieval and natural language generation pipeline aimed at supporting molecular docking studies through compound recommendation and scientific literature contextualization. The pipeline integrates biomedical data processing, semantic indexing with FAISS, and language modeling using BioBERT and TinyLlama-1.1B-Chat to generate concise and informative responses. A curated scientific corpus was built from sources such as PubMed, and embeddings were generated to enable semantic similarity searches. The system was tested using ellagic acid as a query compound, successfully retrieving relevant literature and presenting detailed chemical information from the COCONUT database. In parallel, this research aligns with ongoing efforts from the Semill-IAS research group at Universidad del Rosario, which focuses on inclusive health technologies, particularly the use of artificial intelligence for different health contexts. The proposed solution demonstrates the potential of combining machine learning models with biomedical informatics for accessible, data-driven decision support in health and drug discovery domains.