Hendra Bunyamin is a lecturer who graduated from Mathematics department Bandung Institute of Technology in 1999 and Software Engineering Informatics department from the same university in 2003.
He is very passionate about teaching. Mainly, he teaches Mathematics and Programming.
His research interests are machine learning and its applications.
He also enjoys sharing his faith and understanding about maths & machine learning in his blog.
Information retrieval (IR) system is a system, which is used to search and retrieve information relevant to the users’ needs. IR system retrieves and displays documents that are relevant to the users’ input (query). One of the methods to retrieve information relevant to the query is how to match the query semantically with document collection. Latent Semantic Indexing (LSI) is a method to match the query semantically with document collection. For example, there is a query ‘purchase’. ‘Purchase’ and ‘buy’ are two words that have semantic matching. So, LSI retrieves documents, which have both or either one of those words. This thesis explores the comparison between the performance of LSI method and that of vector method. The performance is measured by non-interpolated average precision (NIAP).(download dataset) (download pdf)
Saat ini banyak sekali informasi tersedia dalam bentuk dokumen on-line. Para peneliti berusaha menyelidiki masalah automatic text categorization sebagai bagian untuk mengorganisir informasi untuk pengguna. Banyak hasil penelitian berfokus pada topical categorization dengan cara mengurutkan dokumen-dokumen menurut subjeknya (contoh: sports vs politics). Akan tetapi, belakangan ini muncul fokus baru yaitu bagaimana mengurutkan atau mengklasifikasikan dokumen-dokumen menurut sentiment-nya atau opini keseluruhan terhadap objek pembicaraan (contoh: apakah sebuah product review positif atau negatif). Penelitian ini bermaksud untuk menyelidiki keefektifan penggunaan teknik machine learning untuk menyelesaikan masalah sentiment classification. (download pdf)
We consider the following retweet prediction task: given a tweet, predict whether it will be retweeted. In the past, a wide range of learning methods and features has been proposed for this task. We provide a systematic comparison of the performance of these learning methods and features in terms of prediction accuracy and feature importance. Specifically, from each previously published approach we take the best performing features and group these into two sets: user features and tweet features. In addition, we contrast five learning methods, both linear and non-linear. On top of that, we examine the added value of a previously proposed time-sensitive modeling approach. To the authors’ knowledge this is the first attempt to collect best performing features and contrast linear and non-linear learning methods. We perform our comparisons on a single dataset and find that user features such as the number of times a user is listed, number of followers, and average number of tweets published per day most strongly contribute to prediction accuracy across selected learning methods. We also find that a random forest-based learning, which has not been employed in previous studies, achieves the highest performance among the learning methods we consider. We also find that on top of properly tuned learning methods the benefits of time-sensitive modeling are very limited. (download pdf)
Topic model has been an elegant method to discover hidden structures in knowledge collections, such as news archives, blogs, web pages, scientific articles, books, images, voices, videos, and social media. The basic model of topic model is Latent Dirichlet Allocation (LDA) and this paper utilizes LDA to automatically cluster topics from final project abstract collection. We compare two methods, that are LDA as a unigram model and LDA with Skip-gram model. Our results are evaluated by an expert on readily available categories. Overall, words from each topic are indeed keywords describing each topic; moreover, the combination of LDA and skip-gram model are capable to capture key phrases from each topic (download pdf).
Penelusuran kesamaan isi tulisan pada sebuah karya tulis ilmiah merupakan salah satu cara untuk mengurangi atau menghilangkan kejadian plagiarisme di kalangan para peneliti, termasuk para mahasiswa yang sedang menempuh proses pendidikan tinggi. Penelitian ini ditujukan untuk membuat sebuah aplikasi berbasis web sederhana dengan menggunakan inverted index untuk mencari seberapa banyak kesamaan sebuah dokumen dengan data dokumen yang telah dimiliki. Seluruh dokumen pembanding disimpan dalam basis data lokal untuk memudahkan proses pencarian datanya. Adapun dokumen yang dibandingkan merupakan file PDF yang dapat merupakan sebagian atau seluruh laporan tugas akhir mahasiswa yang ditulis dalam Bahasa Indonesia. Berdasarkan hasil percobaan yang telah dilakukan, aplikasi yang dihasilkan telah dapat mengukur berapa besar kesamaan kesamaan kalimat dan dokumen yang diberikan terhadap dokumen referensi yang telah tersimpan di dalam basis data. (download pdf)
Managing risk is important. Organizations are starting to see the value of, or asking for strategic solutions to managing the risk. Risk refers to a deviation from what the organization plans or expects. Risk has an upside (opportunity), as well as a downside, the potential negative impact to an asset. This type of risk (loss) can prevent companies from achieving strategic goals. Organizations can turn risks into opportunities through effective risk management. For public companies which have subsidiaries in many countries, one of the risks should be managed is country risk. Country risk is defined as the risk a foreign government will default on its bonds or other financial commitments. Country risk also refers to the broader notion of degrees to which political and economic unrest affects the securities of issuers that do businesses in a particular country. In this research, we analyze the effect of country risk on company performance. Moreover, we employ linear regression to model the effect and the result shows country risk has a significant negative influence on Return on Equity (ROE). We also build nine models to predict country risk ratings based on country risk reports by utilizing machine learning algorithms. Furthermore, decision tree algorithm has the highest accuracy 31.25% on our dataset. Finally, our results show that, firstly, international companies who have overseas subsidiaries can benefit from using country risk as a tool to measure returns. Secondly, decision tree algorithm should be utilized to help decision makers determine country risks based on country reports; however, the effect of time-series data set into the machine learning algorithms still needs more investigations. (download pdf)
The main objective of this research is to develop an image recognition system for distinguishing dog breeds using Keras’ pre-trained Convolutional Neural Network models and to compare the accuracy between those models. Specifically, the models utilized are ResNet50, Xception, and VGG16. The system that we develop here is a web application using Flask as its development framework. Moreover, this research also explains how the deep learning approaches, such as CNN, can distinguish an object in an image. After testing the system on a set of images manually, we learn that every model has different performance, and Xception came out as the best in term of accuracy. We also test the acceptance of the user interface we develop to the end-users. (download pdf)(source code)