zaro

How do you extract data from a text file in MATLAB?

Published in MATLAB File I/O 3 mins read

Extracting data from a text file in MATLAB is typically done using specific functions designed for reading various data formats, with the choice depending on the structure and type of the text file.

Extracting Plain Text Data

For general text files, such as those with a .txt extension that contain unstructured or semi-structured text, the most straightforward approach is often to use the extractFileText function.

As noted in the reference: "Usually, the easiest way to import text data into MATLAB is to use the extractFileText function. This function extracts the text data from text, PDF, HTML, and Microsoft Word files." This function is excellent for simply pulling out the raw text content of a file.

Using extractFileText

The basic syntax involves providing the file path to the function.

Example:

filePath = 'my_document.txt'; % Specify the path to your text file
fileContent = extractFileText(filePath); % Extract the text content
disp(fileContent); % Display the extracted text

This returns the entire content of the file as a character array or string. You can then process this text further using MATLAB's text analysis functions.

Extracting Structured Text Data (CSV, Delimited Files)

If your text file contains data organized in a structured format, like comma-separated values (CSV), tab-delimited data, or fixed-width columns, using functions specifically designed for tabular data is generally more efficient and convenient.

According to the reference: "To import text from CSV and Microsoft Excel files, use readtable." While Excel files are binary, readtable is also the go-to for many common text-based tabular formats like CSV because it automatically detects delimiters and headers, returning the data in a table format, which is easy to work with in MATLAB.

Using readtable

readtable can handle various delimiters and data types within columns.

Example:

csvFilePath = 'my_data.csv'; % Specify the path to your CSV file
dataTable = readtable(csvFilePath); % Read the data into a table
disp(dataTable); % Display the extracted table data

readtable offers numerous options to customize the import process, such as specifying delimiters, handling missing values, and selecting specific columns.

Special Case: Extracting Text from HTML Code

While extractFileText can handle HTML files, if you have HTML content already loaded as a string or character array (e.g., fetched from a web page), you can use a different function to extract just the visible text content.

The reference mentions: "To extract text from HTML code, use extractHTMLText." This function is useful for stripping HTML tags and retrieving the readable text part of an HTML snippet.

Choosing the Right Function

Here's a quick summary of the functions based on the file type and content structure:

File/Content Type Recommended Function Description
Plain Text Files (.txt) extractFileText Extracts all text content as a string/character array.
Structured Text (CSV, etc.) readtable Imports delimited or fixed-width data into a table format.
HTML Files extractFileText Extracts all text content from the HTML file.
HTML Code (string) extractHTMLText Extracts visible text content from an HTML string, stripping tags.
PDF, Word Files extractFileText Extracts all text content from the specified file types.

By selecting the appropriate function based on whether you need raw text content or structured tabular data, you can efficiently extract information from your text files in MATLAB.