Logo du site
  • English
  • Français
  • Se connecter
Logo du site
  • English
  • Français
  • Se connecter
  1. Accueil
  2. Université de Neuchâtel
  3. Publications
  4. Simple and efficient classification scheme based on specific vocabulary
 
  • Details
Options
Vignette d'image

Simple and efficient classification scheme based on specific vocabulary

Auteur(s)
Savoy, Jacques 
Institut d'informatique 
Zubaryeva, Olena 
Institut d'informatique 
Date de parution
2012
In
Computational management science
Vol.
9
No
3
De la page
401
A la page
415
Mots-clés
  • Statistics in lexical analysis
  • Corpus linguistics
  • Text categorization
  • Machine learning
  • Natural language processing (NLP)
  • Statistics in lexical...

  • Corpus linguistics

  • Text categorization

  • Machine learning

  • Natural language proc...

Résumé
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms (character <i>n</i>-gram, word, stem, lemma or sequence of them) which characterize a document. We then show how these Z score values can be used to derive a simple and efficient categorization scheme. To evaluate this proposition and demonstrate its effectiveness, we develop two experiments. First, the system must categorize speeches given by B. Obama as being either electoral or presidential speech. In a second experiment, sentences are extracted from these speeches and then categorized under the headings electoral or presidential. Based on these evaluations, the proposed classification scheme tends to perform better than a support vector machine model for both experiments, on the one hand, and on the other, shows a better performance level than a Naïve Bayes classifier on the first test and a slightly lower performance on the second (10-fold cross validation).
Identifiants
https://libra.unine.ch/handle/123456789/9562
_
10.1007/s10287-012-0149-z
Type de publication
journal article
Dossier(s) à télécharger
 main article: Savoy_Jacques-Simple_and_efficient_classification-20130104.pdf (7.21 MB)
google-scholar
Présentation du portailGuide d'utilisationStratégie Open AccessDirective Open Access La recherche à l'UniNE Open Access ORCIDNouveautés

Service information scientifique & bibliothèques
Rue Emile-Argand 11
2000 Neuchâtel
contact.libra@unine.ch

Propulsé par DSpace, DSpace-CRIS & 4Science | v2022.02.00