Easy Read & Parse Text or CSV File to JavaScript Array
Full Source Code: You can find the complete source code of the examples in this repo
Reading of file is a straightforward task. The important part is planning what to include and how to organize the code. Please focus on the reasoning of the organization.
File Types: CSV, TXT, and Binary Files
Computers use two main file types based on file content: Binary files and Text files.
Text (txt) files are ordinary files that contain textual (non-binary) content. Humans can read text files. On the other hand, binary files contain encoded data that software can understand. For example, image files, music files, old Excel files (new Excel files are textual XML files), etc.
CSV files are a subtype type of text file that contains comma-separated values. Although the name implies that the separator is a comma, it could be and usually is a semicolon, tab, or any pre-determined separator. The first line of a CSV file may contain titles.
Below is an example of the contents of a TXT file:
Pencil
Ruler
Eraser
Below is an example of the contents of a CSV file, including titles:
id;name;color
1;Pencil;red
2;Ruler;blue
7;Eraser;white
The Code
Now, we will start to write code. I can write the code as cryptic one-liners, but I prefer organizing it and explaining its thought. I also included TypeDoc (JSDoc) documentation. The following code snippets are written using TypeScript. If you are using JavaScript, delete types from the code.
Also, we won't use any callback function because async functions are the standard in modern javascript. Create a TypeScript file or a JavaScript file and save the functions below.
Read TXT File
Our first task is to read the local text file. We won't use any npm modules because the Node.js fs
module is enough for reading lines of the file to get all of them as a list of strings.
EOL Problem
Each operating system has different End of Line
(EOL) characters. We can use the EOL character as a function parameter or the current operating system's EOL. Both approaches have drawbacks. The first approach forces us to know the EOL character of the file, which is not the case most of the time. The second approach mistakenly assumes the given file is created with the operating system we are currently using.
I prefer a regular expression to split the lines for all possible end-of-line character combinations.
The async function below is reading files. It returns the contents of the file as an array of strings. Since browsers do not have a local file system, the code below only works on your computer or server side.
There are libraries for emulating the file system on the client side, but for the example, I prefer the local file system because it is easier access.
The code example below takes a file path and reads lines of the files, and returns each line as an element in a javascript array.
import { readFile } from "node:fs/promises";
/**
* Reads lines of a text file.
*
* @param filePath is the path of the file.
* @returns an array containing each line of the file.
*/
async function readFileLines(filePath: string): Promise<string[]> {
const fileContent = await readFile(filePath, "utf8");
// Split lines considering many operating systems' end-of-line character.
const eolRegex = /\r\n|\n|\r/;
const lines = fileContent.split(eolRegex);
// Remove the last line if it is empty.
if (lines.at(-1) === "") lines.pop(); // ES 2022 feature
return lines;
}
Exercise
What if the given file is not found? We should throw err. Try to implement this functionality.
Parse CSV File Data
Now, we want to parse structured CSV data. We again read each line of the file using the previous async function and used the code example below to parse the data. This function can be used on both the server side and client side in any major browser because it doesn't depend on the file system. We may pass the contents from a data field of a user form in the front end.
/**
* Parses CSV data
*
* @param lines are an array of CSV data.
* @param separator is the character used to separate the data.
* @returns array of lines having each field as a value in an array.
*/
function parseCSV(lines: string[], separator = ";"): string[][] {
return lines.map((line) => line.split(separator));
}
Read CSV File Data having Titles
The following code snippet has two functions. parseCSVWithTitles
function uses the first line to determine the titles of the fields and parses the given CSV using the previous function. getRecordWithTitles
function uses titles to fill the object property values.
We want to get an array of objects containing each record.
/**
* Given the titles and data array, create an object with keys from titles and given values.
*
* @param titles are the titles of the array values.
* @param values are the value.
* @returns object with titles and values.
*
* @example
* const object = getRecordWithTitles(["id", "name"], ["1", "pencil"]); // { id: 1, name: "pencil" }
*/
function getRecordWithTitles(titles: string[], values: string[]): Record<string, string> {
return Object.fromEntries(titles.map((title, index) => [title, values[index]]));
}
/**
* Parses CSV data containing titles in the first element of the array.
*
* @param lines are an array of CSV data.
* @param separator is the character used to separate the data.
* @returns array of lines having each field as a value in an array.
*/
function parseCSVWithTitles(lines: string[], separator = ";"): Record<string, string>[] {
const valueLines = parseCSV(lines, separator);
const titles = valueLines.shift() ?? [];
return valueLines.map((valueLine) => getRecordWithTitles(titles, valueLine));
}
Use the Code
const txtLines = await readFileLines("items.txt");
const csvLines = await readFileLines("items.csv");
const csvLinesWithTitles = await readFileLines("items-with-titles.csv");
console.info(txtLines);
console.info(parseCSV(csvLines));
console.info(parseCSVWithTitles(csvLinesWithTitles));
The last variables are used only for naming the results and making the code readable.
Further Exercise
The code example above works well for small files but struggles when the size grows. Reading the whole file in one go and storing it in the limited memory of the Node.js process is not a good practice. Reading operations for large files are different from reading operations for small files. It would be best if you used streams for large files. I leave it as a competitive exercise for the reader.
Your Thoughts Matter
I organize the functions to minimize dependency on the local machine file system so there are no readCSVFile
or readCSVFileWithTitles
functions. In the future, you may want to use other functions in the front end.
I want to hear your opinions about the organization of the code and its functions. Do you think my organization is good enough for the task? Do you have a better idea for the organization? Please share your thoughts in the comments.