- Working on a new NLP-based product, similar to a retrieval-augmented generation application.
- Implementing the backend system for the new product, including data storage, services to convert any text file to markdown and eventually a vector representation, text-to-speech services, the retrieval system, the generation system, and some minor services needed for the product, as well as test code. (Python, FastAPI, PostgreSQL, SQLModel, VLLM, Pytest)
- Fine-tuning some ML models for the retrieval system based on transformer encoders, such as ColBERT and bi-encoder with Matryoshka embeddings. (HuggingFace Transformers, PyTorch, TPU training, PyTorch Lightning)
- phone number given upon email request
- carlesoctavianus@tuta.io
- Jakarta Barat, Jakarta, ID
Carles is a graduate in Mathematics with a keen interest in building ML-powered products. He has experience using TensorFlow and PyTorch and has developed and researched models for NLP problems, with a specific focus on information retrieval. Carles has received recognition for his work, including one of the best company capstone projects in Bangkit 2023. You can find more details here. Additionally, he has experience using cloud platforms to deploy ML models, as demonstrated in his capstone project. Currently, Carles is attending Generasi Gigih 3.0 on the fullstack engineering track to further enhance his understanding of software engineering and build better ML-powered products.
Experience
Kecilin is a SaaS company with a primary focus on data compression. Currently, Kecilin has successfully secured $4 million in pre-seed funding from PT Mandiri Capital Indonesia.
- During my internship, I focused on creating reusable computer vision components/tools in Python. These components were used to develop client applications for computer vision requests. In summary, it was similar to the roboflow/supervision API for computer vision tasks.
- Additionally, I created proof-of-concept projects that were presented and delivered to clients. For example, I developed a line counting feature to track the number of people entering or leaving a specific area, as well as a heatmap analysis feature to identify the most crowded areas each day.
Generasi GIGIH was designed by the GoTo Impact Foundation and GoTo as a solution for young Indonesian technology talents to follow the rapidly growing technology industry. Through competency training and the right mindset, we aim to produce critical, tenacious, resilient, and highly competitive technology talents, capable of facing future challenges. This program is expected to help accelerate digital transformation while making technology a positive agent of change in Indonesia. Carles Completed a fullstack engineering track, which includes courses on frontend, backend, and DevOps, with MongoDB, Express, React, and Node.js (MERN) as the main stack.
- Completed self-paced courses on frontend and backend development through Progate.
- Attended a 6-week, 3-hour-per-day live session with experienced mentors and engineers from various tech companies.
- Created a final fullstack project, which is a clone of the Tokopedia Play web app.
- Finally, I completed a capstone project with other participants in collaboration with Talensia to digitalize the learning journal platform.
As a Bangkit student, I completed various courses and contributed to a capstone project, gaining experience in machine learning and cloud platforms.
- Completed courses required to become a TensorFlow Developer Certified.
- Collaborated with five other students and Dicoding Indonesia to create a Semantic Search Engine that can search even with zero lexical overlap between the query and the document. More details about the project can be found in the project section.
- Our capstone project was recognized as one of the best in the Company Capstone Project category (9 out of 29 teams).
Projects
A search engine that can find results even when there is no lexical overlap between the query and the document. Includes discussion autotagging to help new users properly tag their discussions.
- Improved search functionality to increase MRR@10 from 10.3 to 35.27.
- Incorporated sparse retrieval techniques like BM25 to create a keyword-aware semantic search engine.
- Created a model for automatic tagging of user-created discussions on dicoding.
Skills
- Node.js
- Express
- restful API
- MongoDB
- Docker
- React
- HTML
- CSS
- TensorFlow
- PyTorch
- HuggingFace
- Tabular Data
- NLP
- Computer Vision
- TPU Training
- Docker Deployment
Education
Bachelor, Mathematics
Awards
Languages
- English · Limited Working Proficiency
- Indonesian · Native Speaker