Produtor: PROVALIS Research
Última Versão: 8.0
Sistema operativo: Windows
Versão de teste: Sim
Áreas: Análise de Texto | Análise de Documentos | Investigação | Deteção de Fraude textual

visão geral

Software de análise de conteúdo e extração de texto em documentos. 

 WordStat é um software de análise de texto flexível e fácil de usar

 Pode ser usado por qualquer pessoa que precise trabalhar com grandes quantidades de documentos não estruturados como:  feedback do cliente, e-mails, respostas abertas, as transcrições de entrevistas, relatórios de incidentes, documentos legais, análise de conteúdo de notícias impressas, blogs ou sites. e, portanto, precisam uma ferramenta de extração de texto rápida e que disponibilize funcionalidades que conversem com ferramentas de análise de conteúdo quantitativas de última geração.

 A integração perfeita do WordStat com o  SimStat (ferramenta de análise de dados estatísticos),  o  QDA Miner (software de análise de dados qualitativos) e o Stata (o software estatístico abrangente da StataCorp), oferece flexibilidade sem precedentes para analisar texto e relacionar o seu conteúdo a informações estruturadas, incluindo numéricas e dados categóricos.


Explore relationships between unstructured text and structured data such as dates, numbers or categorical data for identifying temporal trends or differences between subgroups or for assessing relationships with rating or other kinds of categorical or numerical data with statistical and graphical tools (correspondence analysis, heatmaps, bubble charts, etc.)



Analyze large amounts of unstructured information with WordStat. The software can process 25 million words per minute, quickly extract themes and automatically identify patterns using clustering, multidimensional scaling, proximity plots and more.


Quickly and easily extract meaning from large amounts of text data using Explorer mode especially made for those with little text mining experience. In one click, you can extract the most frequent words, phrases and the most salient topics in your documents.


Import Word, Excel, HTML, XML, SPSS, Stata, NVivo, PDFs, as well as images. Connect and directly import from social media, emails, web survey platforms, and reference management tools.


Get a quick overview of the most salient topics from very large text collections using state-of-the-art automatic topic extraction based on words, phrases and related words (including misspellings).


Explore relationships among words or concepts and retrieve text segments associated with specific connections.


Achieve full text analysis automation using existing dictionaries or create your own categorization model with words, phrases, proximity rules and more


Build your dictionary faster with tools for extracting common phrases and technical terms and for quickly identifying in your text collection misspellings, synonyms, antonyms and related words


Develop and optimize automatic document classification models using Naïve Bayes and K-Nearest Neighbours


Verify or dig deeper into your analysis by going back to the text from almost any feature, chart or graph. You can use the Keyword Retrieval or Keyword-in-Context features to retrieve sentences, paragraphs or whole documents. This is particularly helpful when building taxonomies or for word-sense disambiguation. You can also attach QDA Miner codes to retrieved segments


Combine WordStat with a state-of-the-art qualitative coding tool (QDA Miner) for more precise exploration of data or more in-depth analysis of specific documents or extracted text segments when needed


Relate unstructured text data with geographic information and create interactive plots of data points, thematic maps, and heatmaps, along with a geocoding web service for transforming location names, postal codes and IP addresses into latitude and longitudes



Automatically extract named entities that can be added to the categorization dictionary using an easy drag-and-drop-operation


Easily export text analysis results to common industry file formats such as Excel, SPSS, ASCII, HTML, XML, MS Word and graphs such as PNG, BMP and JPEG



Use Python script and its full range of open-source libraries to preprocess or transform text documents for analysis in WordStat

What’s New

What’s New in Version 8.0? 

WordStat 8 has new features and options to increase flexibility and allow the software to be used by less-experienced and expert users alike.
We know you are being inundated by more and more text data and are looking for ways to analyze and categorize it. You are looking for tools to help you quickly find themes, context, important concepts, and meaning in large amounts of text data. We know this is a challenge for expert data scientists and less experienced researchers and analysts. Finding a way to support these groups was the thinking behind WordStat 8. We wanted to improve usability and flexibility while still improving performance and precision.

We believe the new approach of WordStat 8 accomplishes these goals.  You can process enormous amounts of unstructured data in just seconds with minimal experience or easily build your own extensive categorization dictionaries to perform precise measurement of concepts.

1. Standalone text mining platform

Learning a new software can be a daunting task. Especially a software with many features like WordStat. Previously, WordStat was an add-on module of QDA Miner. This required users not only to learn WordStat but also elements of QDA Miner, to set up their project. WordStat 8 is now a standalone product. This cuts down on the complexity and learning curve as users can now create their projects directly in WordStat. However, it may still be run as a content analysis add-on of QDA Miner, STATA, or SimStat.

You can now create a project in WordStat itself from different sources:

• Documents: MS Word, RTF, PDF, HTML, etc.
• Data files: Excel, CSV, Stata, etc.
• Web survey platforms: SurveyMonkey, Qualtrics, SurveyGizmo, etc.
• Reference management tools: Endnote, Zotero, Mendeley
• Social media services: Twitter, Facebook, Reddit, RSS Feeds, Youtube
• Email platforms: Outlook, Gmail, Hotmail, Mbox, and EML format
• Many other sources…

2. New explorer mode

A new Explorer mode has been implemented to allow users with little text mining experience to quickly and easily extract meaning from large amounts of text data. You can identify the most frequent words and phrases and extract the most salient topics in your documents with the improved topic modeling tool of WordStat 8. At any time you can switch to the expert mode which gives you access to all of WordStat features including content analysis dictionaries, crosstabs, and cooccurrence analyzes features.

3. Improved topic modeling

The existing topic modeling routine benefits from numerous improvements such as an additional extraction algorithm (NNMF) for faster topic extraction, as well as an innovative topic enrichment process. This technique allows one to move beyond the “bag-of-word” solution typical of traditional topic modeling by automatically selecting related phrases and providing suggestions for additional expressions, potential exceptions as well as spelling corrections. All these innovations should lead to a more precise and comprehensive measurement of salient topics in your text collection.

4. New and improved graphic displays

WordStat 8 has several new graphic displays to help you better understand the results of your data analysis. We have improved, interactive word clouds, donut, and radar charts.

5. Deviation table

This is a brand new feature included in WordStat 8. It was added after the release and you need to have downloaded WordStat 8.0.7 or later to have access to it. The Deviation Table allows you to see words/phrases used more or less as compared to other variables. You first need to activate the crosstab button to see the icon. You can right-click to find KWIC, Delete and save to Tab Delimited, HTML or Bitmap. To learn more about this specific feature of WordStat 8, click on the following link: deviation-table/

6. Export results to Tableau Software

With a simple click, you can also export your results to Tableau Software to use its advanced interactive data visualization tools.

7. Improved content analysis dictionary building

Several new features and improvements have been made to the categorization dictionaries section, to help you be more precise in your text search and get more accurate results.
Case sensitive entries: the categorization dictionaries and the exclusion list now support case-sensitive entries to disambiguate words such as “Bill” and “bill”, “Buck” and “buck” or “us” and “US”.

Regular Expression (Regex) Searches: we have created a Regular Expression Editor where you can create your own Regex formulas to quickly extract specific information from your text data such as email addresses or postal codes.
New Substitution process: we have improved the substitution process by splitting it in two. By separating it from our lemmatization process you can easily track substitutions and keep your content dictionary free of misspelled words.

Exclusion and substitution lists along with your categorization dictionary can now be saved into a categorization model file. This file can be used on other WordStat projects as well as in QDA Miner, WordStat Document Explorer, or in our SDK.

8. Improved interface

The improved interface allows you to quickly access and compare results, so you can extract valuable insights in a few seconds.

WordStat 7

WordStat 8

 9. Transform text using Python scripts

WordStat 8 opens the possibility of NLP data scientists to use Python script and its full range of open-source libraries to preprocess or transform text documents for analysis in WordStat. This new feature increases the flexibility of WordStat and allows users to use their Python programming skills.

10. Numerical transformation

A new numerical transformation dialog box allows you to compute numerical variables from other variables with up to 50 transformation functions including trigonometric, statistical, random number functions. Conditional transformation can also be performed using an IF-THEN-ELSE logical structure.


11. Binning

A binning feature can now be used to transform continuous values into a smaller number of distinct categories. It may be used to reduce the effect of numerical outliers, abnormal distributions, or convert a continuous numerical variable into an ordinal one. It is especially useful for creating graphical displays of comparisons when the number of distinct values in the numerical variable is too

12. Analysis of emojis

Emojis have become ubiquitous in social media, text messaging, emails and other electronic communications and are often used to represent an object, express an idea or sentiment, or add a nuance to a written message. They are often an integral part of the message and can hardly be ignored. WordStat 8.0 can transform emojis into their text representation, allowing you to analyze them either on their own or as part of the whole message.

13. Explore your documents from Windows Explorer

The new Document Explorer tool allows users to quickly explore the content of their documents from Windows Explorer without the need to import documents or create a project. You just have to select the documents you would like to explore or the folder containing them, right-click and select Explore to quickly identify the most frequent words and phrases and where they are in your documents. With a simple right-click, you can also perform a semantic search on your documents using an existing categorization dictionary or classify documents using a prediction model in WordStat.

Watch the WordStat Document Explorer Video Demo

WordStat for STATA


Stata is a complete, integrated statistical software package created by StataCorp LP (www.stata.com). It provides a wide range of statistical analysis, data management, and graphics. The latest versions of Stata added many new features, including a long string data type allowing one to store along with numerical and categorical data, documents up to 2 billion characters. One could thus create a statistical database with journal abstracts, news transcripts, patents, incident reports, customer feedback, interviews, and so on.
WordStat for Stata was created to allow Stata 13 and Stata 16 users running under Windows, to apply text analytics techniques on any string variables stored in a Stata data file. WordStat combines natural language processing, content analysis, and statistical techniques to quickly extract topics, patterns, and relationships in large amounts of text. It can process millions of words in seconds and compare extracted themes across any other numerical, categorical, or date variables in the Stata file.

What it is used for?

WordStat can be used by anyone who needs to quickly extract and analyze information stored in Stata text variables. It may be used for:

• Directly import text and quantitative data from social media, online survey platforms, reference management tools
• Content analysis of open-ended responses, interview or focus group transcripts
• Business intelligence and competitive web sites analysis
• Information extraction and knowledge discovery from incident reports, customer complaints
• Content analysis of news coverage or scientific literature (scientometrics or bibliometrics studies)
• Automatic tagging and classification of documents
• Fraud detection, authorship attribution, patent analysis
• Taxonomy development and validation
• Etc. (for some examples of studies using WordStat, see the Studies page).



Integrated exploratory text mining and visualization tools such as clustering, multidimensional scaling, proximity plots, and more, to quickly extract themes and automatically identify patterns



Get a quick overview of the most salient topics from large text collections. A side panel allows one to compare the frequency of specific topics across other variables using bar charts or line charts



Use existing or create custom dictionaries composed of words, word patterns, phrases and proximity rules. Get computer assistance for building taxonomies with phrase and named-entity extraction, misspelling replacements, integrated thesaurus, etc.



Explore relationships between unstructured text and structured data with statistical and graphical tools (correspondence analysis, heatmaps, bubble charts, etc.)



Explore relationships among words or extracted concepts using force-based graphs, multidimensional scaling or circular graphs. Retrieve text segments associated with specific connections



Develop automatic document classification models by using Naive Bayes and K-Nearest Neighbors. Classification models may then be saved on disk and reapplied on new data




Illustrate patterns and explore complex phenomena with interactive visualization tools such as bar charts, line charts, heatmaps, word clouds, bubble charts, MDS plots, etc.. Copy and paste charts or saved them to disk in bmp, jpg, or png file formats



The Document conversion wizard allows one to easily import into a new Stata .dta file, documents stored in various file formats (.DOC, HTML, PDF, TXT) and automatically extract numeric and alphanumeric values from structured documents



Technical Information

• Operating System: Microsoft Windows XP, 2000, Vista, Windows 7, 8 and 10
• Memory: From 256 MB (XP) to 1GB (Vista, Windows 7, 8 and 10)
• Disk Space:  40 MB of disk space.

QDA Miner will run on a Mac OS using virtual machine solution or Boot Camp, and on Linux computers using CrossOver or Wine. Click here for more information on ways to run QDA Miner on a Mac OS computer.

