[Speech sample] Added numpy array support (#5479)

* [Speech sample] Added numpy array support * Added zlib library submodule to thirdparty * Added cnpy library original version * Changed cnpy library: some methods and fixed Klocwork issues * Change cmakelists and documentation * Added license information in TPP * Added support make install to thirdparty components
2021-06-03 12:22:06 +03:00 · 2021-06-03 12:22:06 +03:00 · 63a335f0b3
commit 63a335f0b3
parent 968888b510
18 changed files with 1171 additions and 189 deletions
--- a/.gitmodules
+++ b/.gitmodules
@ -18,3 +18,7 @@
 	path = thirdparty/xbyak
 	url = https://github.com/herumi/xbyak.git
 	ignore = dirty
+[submodule "thirdparty/zlib/zlib"]
+	path = thirdparty/zlib/zlib
+	url = https://github.com/madler/zlib.git
+	ignore = dirty
--- a/inference-engine/CMakeLists.txt
+++ b/inference-engine/CMakeLists.txt
@ -72,6 +72,18 @@ endif()

 ie_cpack_add_component(cpp_samples DEPENDS core)

+install(DIRECTORY ../thirdparty/zlib
+        DESTINATION ${IE_CPACK_IE_DIR}/samples/cpp/thirdparty
+        COMPONENT cpp_samples
+        USE_SOURCE_PERMISSIONS
+        PATTERN .clang-format EXCLUDE)
+
+install(DIRECTORY ../thirdparty/cnpy
+        DESTINATION ${IE_CPACK_IE_DIR}/samples/cpp/thirdparty
+        COMPONENT cpp_samples
+        USE_SOURCE_PERMISSIONS
+        PATTERN .clang-format EXCLUDE)
+
 if(UNIX)
    install(DIRECTORY samples/
            DESTINATION ${IE_CPACK_IE_DIR}/samples/cpp
--- a/inference-engine/samples/CMakeLists.txt
+++ b/inference-engine/samples/CMakeLists.txt
@ -129,6 +129,14 @@ if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/gflags")
    add_gflags()
 endif()

+if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/zlib")
+    add_subdirectory(thirdparty/zlib EXCLUDE_FROM_ALL)
+endif()
+
+if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/cnpy")
+    add_subdirectory(thirdparty/cnpy EXCLUDE_FROM_ALL)
+endif()
+
 if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
 endif()
--- a/inference-engine/samples/speech_sample/CMakeLists.txt
+++ b/inference-engine/samples/speech_sample/CMakeLists.txt
@ -2,7 +2,11 @@
 # SPDX-License-Identifier: Apache-2.0
 #

+file (GLOB SRC ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp
+file (GLOB HDR ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
+               ${CMAKE_CURRENT_SOURCE_DIR}/*.h)
+
 ie_add_sample(NAME speech_sample
-              SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/main.cpp"
-              HEADERS "${CMAKE_CURRENT_SOURCE_DIR}/speech_sample.hpp"
-              DEPENDENCIES ie_samples_utils)
+              SOURCES ${SRC}
+              HEADERS ${HDR}
+              DEPENDENCIES cnpy ie_samples_utils)
--- a/inference-engine/samples/speech_sample/README.md
+++ b/inference-engine/samples/speech_sample/README.md
@ -2,7 +2,7 @@

 This sample demonstrates how to execute an Asynchronous Inference of acoustic model based on Kaldi\* neural networks and speech feature vectors.

-The sample works with Kaldi ARK files only, so it does not cover an end-to-end speech recognition scenario (speech to text), requiring additional preprocessing (feature extraction) to get a feature vector from a speech signal, as well as postprocessing (decoding) to produce text from scores.
+The sample works with Kaldi ARK or Numpy* uncompressed NPZ files, so it does not cover an end-to-end speech recognition scenario (speech to text), requiring additional preprocessing (feature extraction) to get a feature vector from a speech signal, as well as postprocessing (decoding) to produce text from scores.

 Automatic Speech Recognition C++ sample application demonstrates how to use the following Inference Engine C++ API in applications:

@ -27,8 +27,8 @@ Basic Inference Engine API is covered by [Hello Classification C++ sample](../he

 ## How It Works

-Upon the start-up, the application reads command line parameters and loads a Kaldi-trained neural network along with Kaldi ARK speech feature vector file to the Inference Engine plugin. Then it performs inference on all speech utterances stored in the input ARK file. Context-windowed speech frames are processed in batches of 1-8
-frames according to the `-bs` parameter.  Batching across utterances is not supported by this sample.  When inference is done, the application creates an output ARK file.  If the `-r` option is given, error
+Upon the start-up, the application reads command line parameters, loads a specified model and input data to the Inference Engine plugin, performs synchronous inference on all speech utterances stored in the input file. Context-windowed speech frames are processed in batches of 1-8
+frames according to the `-bs` parameter.  Batching across utterances is not supported by this sample.  When inference is done, the application creates an output file.  If the `-r` option is given, error
 statistics are provided for each speech utterance as shown above.

 You can see the explicit description of
@ -43,7 +43,7 @@ Several parameters control neural network quantization. The `-q` flag determines
 Three modes are supported:

 - *static* - The first
-utterance in the input ARK file is scanned for dynamic range.  The scale factor (floating point scalar multiplier) required to scale the maximum input value of the first utterance to 16384 (15 bits) is used
+utterance in the input file is scanned for dynamic range.  The scale factor (floating point scalar multiplier) required to scale the maximum input value of the first utterance to 16384 (15 bits) is used
 for all subsequent inputs.  The neural network is quantized to accommodate the scaled input dynamic range.
 - *dynamic* - The user may specify a scale factor via the `-sf` flag that will be used for static quantization.
 - *user-defined* - The scale factor for each input batch is computed
@ -99,17 +99,17 @@ speech_sample [OPTION]
 Options:

    -h                      Print a usage message.
-    -i "<path>"             Required. Paths to .ark files. Example of usage: <file1.ark,file2.ark> or <file.ark>.
+    -i "<path>"             Required. Paths to input files. Example of usage: <file1.ark,file2.ark> or <file.ark> or <file.npz>.
    -m "<path>"             Required. Path to an .xml file with a trained model (required if -rg is missing).
-    -o "<path>"             Optional. Output file name to save ark scores.
+    -o "<path>"             Optional. Output file name to save scores. Example of usage: <output.ark> or <output.npz>
    -d "<device>"           Optional. Specify a target device to infer on. CPU, GPU, MYRIAD, GNA_AUTO, GNA_HW, GNA_SW_FP32, GNA_SW_EXACT and HETERO with combination of GNA
     as the primary device and CPU as a secondary (e.g. HETERO:GNA,CPU) are supported. The list of available devices is shown below. The sample will look for a suitable plugin for device specified.
    -pc                     Optional. Enables per-layer performance report.
-    -q "<mode>"             Optional. Input quantization mode:  "static" (default), "dynamic", or "user" (use with -sf).
+    -q "<mode>"             Optional. Input quantization mode:  static (default), dynamic, or user (use with -sf).
    -qb "<integer>"         Optional. Weight bits for quantization: 8 or 16 (default)
    -sf "<double>"          Optional. User-specified input scale factor for quantization (use with -q user). If the network contains multiple inputs, provide scale factors by separating them with commas.
    -bs "<integer>"         Optional. Batch size 1-8 (default 1)
-    -r "<path>"             Optional. Read reference score .ark file and compare scores.
+    -r "<path>"             Optional. Read referefile and compare scores. Example of usage: <reference.ark> or <reference.npz>
    -rg "<path>"            Read GNA model from file using path/filename provided (required if -m is missing).
    -wg "<path>"            Optional. Write GNA model to file using path/filename provided.
    -we "<path>"            Optional. Write GNA embedded model to file using path/filename provided.
@ -118,10 +118,9 @@ Options:
                            If you use the cw_l or cw_r flag, then batch size and nthreads arguments are ignored.
    -cw_r "<integer>"       Optional. Number of frames for right context windows (default is 0). Works only with context window networks.
                            If you use the cw_r or cw_l flag, then batch size and nthreads arguments are ignored.
-    -oname "<outputs>"      Optional. Layer names for output blobs. The names are separated with ",". Allows to change the order of output layers for -o flag.
-                            Example: Output1:port,Output2:port.
-    -iname "<inputs>"       Optional. Layer names for input blobs. The names are separated with ",". Allows to change the order of input layers for -i flag.
-                            Example: Input1,Input2
+    -oname "<string>"       Optional. Layer names for output blobs. The names are separated with "," Example: Output1:port,Output2:port
+    -iname "<string>"       Optional. Layer names for input blobs. The names are separated with "," Example: Input1,Input2
+    -pwl_me "<double>"      Optional. The maximum percent of error for PWL function.The value must be in <0, 100> range. The default value is 1.0.

 Available target devices: <devices>

@ -168,7 +167,7 @@ All of them can be downloaded from [https://storage.openvinotoolkit.org/models_c

 ## Sample Output

-The acoustic log likelihood sequences for all utterances are stored in the Kaldi ARK file, `scores.ark`.  If the `-r` option is used, a report on the statistical score error is generated for each utterance such as
+The acoustic log likelihood sequences for all utterances are stored in the file. Example `scores.ark` or `scores.npz`.  If the `-r` option is used, a report on the statistical score error is generated for each utterance such as
 the following:

 ```sh
--- a/inference-engine/samples/speech_sample/fileutils.cpp
+++ b/inference-engine/samples/speech_sample/fileutils.cpp
@ -0,0 +1,144 @@
+// Copyright (C) 2018-2021 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "fileutils.hpp"
+
+void ArkFile::GetFileInfo(const char* fileName, uint32_t numArrayToFindSize, uint32_t* ptrNumArrays, uint32_t* ptrNumMemoryBytes) {
+    uint32_t numArrays = 0;
+    uint32_t numMemoryBytes = 0;
+
+    std::ifstream in_file(fileName, std::ios::binary);
+    if (in_file.good()) {
+        while (!in_file.eof()) {
+            std::string line;
+            uint32_t numRows = 0u, numCols = 0u, num_bytes = 0u;
+            std::getline(in_file, line, '\0');  // read variable length name followed by space and NUL
+            std::getline(in_file, line, '\4');  // read "BFM" followed by space and control-D
+            if (line.compare("BFM ") != 0) {
+                break;
+            }
+            in_file.read(reinterpret_cast<char*>(&numRows), sizeof(uint32_t));  // read number of rows
+            std::getline(in_file, line, '\4');                                  // read control-D
+            in_file.read(reinterpret_cast<char*>(&numCols), sizeof(uint32_t));  // read number of columns
+            num_bytes = numRows * numCols * sizeof(float);
+            in_file.seekg(num_bytes, in_file.cur);  // read data
+
+            if (numArrays == numArrayToFindSize) {
+                numMemoryBytes += num_bytes;
+            }
+            numArrays++;
+        }
+        in_file.close();
+    } else {
+        throw std::runtime_error(std::string("Failed to open %s for reading in GetFileInfo()!\n") + fileName);
+    }
+
+    if (ptrNumArrays != NULL)
+        *ptrNumArrays = numArrays;
+    if (ptrNumMemoryBytes != NULL)
+        *ptrNumMemoryBytes = numMemoryBytes;
+}
+
+void ArkFile::LoadFile(const char* fileName, uint32_t arrayIndex, std::string& ptrName, std::vector<uint8_t>& memory, uint32_t* ptrNumRows,
+                       uint32_t* ptrNumColumns, uint32_t* ptrNumBytesPerElement) {
+    std::ifstream in_file(fileName, std::ios::binary);
+    if (in_file.good()) {
+        uint32_t i = 0;
+        while (i < arrayIndex) {
+            std::string line;
+            uint32_t numRows = 0u, numCols = 0u;
+            std::getline(in_file, line, '\0');  // read variable length name followed by space and NUL
+            std::getline(in_file, line, '\4');  // read "BFM" followed by space and control-D
+            if (line.compare("BFM ") != 0) {
+                break;
+            }
+            in_file.read(reinterpret_cast<char*>(&numRows), sizeof(uint32_t));  // read number of rows
+            std::getline(in_file, line, '\4');                                  // read control-D
+            in_file.read(reinterpret_cast<char*>(&numCols), sizeof(uint32_t));  // read number of columns
+            in_file.seekg(numRows * numCols * sizeof(float), in_file.cur);      // read data
+            i++;
+        }
+        if (!in_file.eof()) {
+            std::string line;
+            std::getline(in_file, ptrName, '\0');  // read variable length name followed by space and NUL
+            std::getline(in_file, line, '\4');     // read "BFM" followed by space and control-D
+            if (line.compare("BFM ") != 0) {
+                throw std::runtime_error(std::string("Cannot find array specifier in file %s in LoadFile()!\n") + fileName);
+            }
+            in_file.read(reinterpret_cast<char*>(ptrNumRows), sizeof(uint32_t));     // read number of rows
+            std::getline(in_file, line, '\4');                                       // read control-D
+            in_file.read(reinterpret_cast<char*>(ptrNumColumns), sizeof(uint32_t));  // read number of columns
+            in_file.read(reinterpret_cast<char*>(&memory.front()),
+                         *ptrNumRows * *ptrNumColumns * sizeof(float));  // read array data
+        }
+        in_file.close();
+    } else {
+        throw std::runtime_error(std::string("Failed to open %s for reading in LoadFile()!\n") + fileName);
+    }
+
+    *ptrNumBytesPerElement = sizeof(float);
+}
+
+void ArkFile::SaveFile(const char* fileName, bool shouldAppend, std::string name, void* ptrMemory, uint32_t numRows, uint32_t numColumns) {
+    std::ios_base::openmode mode = std::ios::binary;
+    if (shouldAppend) {
+        mode |= std::ios::app;
+    }
+    std::ofstream out_file(fileName, mode);
+    if (out_file.good()) {
+        out_file.write(name.c_str(), name.length());  // write name
+        out_file.write("\0", 1);
+        out_file.write("BFM ", 4);
+        out_file.write("\4", 1);
+        out_file.write(reinterpret_cast<char*>(&numRows), sizeof(uint32_t));
+        out_file.write("\4", 1);
+        out_file.write(reinterpret_cast<char*>(&numColumns), sizeof(uint32_t));
+        out_file.write(reinterpret_cast<char*>(ptrMemory), numRows * numColumns * sizeof(float));
+        out_file.close();
+    } else {
+        throw std::runtime_error(std::string("Failed to open %s for writing in SaveFile()!\n") + fileName);
+    }
+}
+
+void NumpyFile::GetFileInfo(const char* fileName, uint32_t numArrayToFindSize, uint32_t* ptrNumArrays, uint32_t* ptrNumMemoryBytes) {
+    uint32_t numArrays = 0;
+    uint32_t numMemoryBytes = 0;
+
+    cnpy::npz_t my_npz1 = cnpy::npz_load(fileName);
+    auto it = my_npz1.begin();
+    std::advance(it, numArrayToFindSize);
+
+    numArrays = my_npz1.size();
+    cnpy::NpyArray my_npy = it->second;
+    numMemoryBytes = my_npy.data_holder->size();
+
+    if (ptrNumArrays != NULL)
+        *ptrNumArrays = numArrays;
+    if (ptrNumMemoryBytes != NULL)
+        *ptrNumMemoryBytes = numMemoryBytes;
+}
+
+void NumpyFile::LoadFile(const char* fileName, uint32_t arrayIndex, std::string& ptrName, std::vector<uint8_t>& memory, uint32_t* ptrNumRows,
+                         uint32_t* ptrNumColumns, uint32_t* ptrNumBytesPerElement) {
+    cnpy::npz_t my_npz1 = cnpy::npz_load(fileName);
+    auto it = my_npz1.begin();
+    std::advance(it, arrayIndex);
+    ptrName = it->first;
+    cnpy::NpyArray my_npy = it->second;
+    *ptrNumRows = my_npy.shape[0];
+    *ptrNumColumns = my_npy.shape[1];
+
+    for (size_t i = 0; i < my_npy.data_holder->size(); i++) {
+        memory.at(i) = my_npy.data_holder->at(i);
+    }
+
+    *ptrNumBytesPerElement = sizeof(float);
+}
+
+void NumpyFile::SaveFile(const char* fileName, bool shouldAppend, std::string name, void* ptrMemory, uint32_t numRows, uint32_t numColumns) {
+    std::string mode;
+    shouldAppend ? mode = "a" : mode = "w";
+    std::vector<size_t> shape {numRows, numColumns};
+    cnpy::npz_save(fileName, name, reinterpret_cast<float*>(ptrMemory), shape, mode);
+}
--- a/inference-engine/samples/speech_sample/fileutils.hpp
+++ b/inference-engine/samples/speech_sample/fileutils.hpp
@ -0,0 +1,100 @@
+// Copyright (C) 2018-2021 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+#include <cnpy.h>
+
+#include <samples/common.hpp>
+#include <samples/slog.hpp>
+
+/// @brief Interface to work with files like input and output
+class BaseFile {
+public:
+    virtual void LoadFile(const char* fileName, uint32_t arrayIndex, std::string& ptrName, std::vector<uint8_t>& memory, uint32_t* ptrNumRows,
+                          uint32_t* ptrNumColumns, uint32_t* ptrNumBytesPerElement) = 0;
+
+    virtual void SaveFile(const char* fileName, bool shouldAppend, std::string name, void* ptrMemory, uint32_t numRows, uint32_t numColumns) = 0;
+
+    virtual void GetFileInfo(const char* fileName, uint32_t numArrayToFindSize, uint32_t* ptrNumArrays, uint32_t* ptrNumMemoryBytes) = 0;
+};
+
+/// @brief Responsible to work with .ark files
+class ArkFile : public BaseFile {
+public:
+    /**
+     * @brief Get info from Kaldi ARK speech feature vector file
+     * @param fileName .ark file name
+     * @param numArrayToFindSize number speech feature vectors in the file
+     * @param ptrNumArrays pointer to specific number array
+     * @param ptrNumMemoryBytes pointer to specific number of memory bytes
+     * @return none.
+     */
+    virtual void GetFileInfo(const char* fileName, uint32_t numArrayToFindSize, uint32_t* ptrNumArrays, uint32_t* ptrNumMemoryBytes);
+
+    /**
+     * @brief Load Kaldi ARK speech feature vector file
+     * @param fileName .ark file name
+     * @param arrayIndex number speech feature vector in the file
+     * @param ptrName reference to variable length name
+     * @param memory reference to speech feature vector to save
+     * @param ptrNumRows pointer to number of rows to read
+     * @param ptrNumColumns pointer to number of columns to read
+     * @param ptrNumBytesPerElement pointer to number bytes per element (size of float by default)
+     * @return none.
+     */
+    virtual void LoadFile(const char* fileName, uint32_t arrayIndex, std::string& ptrName, std::vector<uint8_t>& memory, uint32_t* ptrNumRows,
+                          uint32_t* ptrNumColumns, uint32_t* ptrNumBytesPerElement);
+
+    /**
+     * @brief Save Kaldi ARK speech feature vector file
+     * @param fileName .ark file name
+     * @param shouldAppend bool flag to rewrite or add to the end of file
+     * @param name reference to variable length name
+     * @param ptrMemory pointer to speech feature vector to save
+     * @param numRows number of rows
+     * @param numColumns number of columns
+     * @return none.
+     */
+    virtual void SaveFile(const char* fileName, bool shouldAppend, std::string name, void* ptrMemory, uint32_t numRows, uint32_t numColumns);
+};
+
+/// @brief Responsible to work with .npz files
+class NumpyFile : public BaseFile {
+public:
+    /**
+     * @brief Get info from Numpy* uncompressed NPZ speech feature vector file
+     * @param fileName .npz file name
+     * @param numArrayToFindSize number speech feature vectors in the file
+     * @param ptrNumArrays pointer to specific number array
+     * @param ptrNumMemoryBytes pointer to specific number of memory bytes
+     * @return none.
+     */
+    virtual void GetFileInfo(const char* fileName, uint32_t numArrayToFindSize, uint32_t* ptrNumArrays, uint32_t* ptrNumMemoryBytes);
+
+    /**
+     * @brief Load Numpy* uncompressed NPZ speech feature vector file
+     * @param fileName .npz file name
+     * @param arrayIndex number speech feature vector in the file
+     * @param ptrName reference to variable length name
+     * @param memory reference to speech feature vector to save
+     * @param ptrNumRows pointer to number of rows to read
+     * @param ptrNumColumns pointer to number of columns to read
+     * @param ptrNumBytesPerElement pointer to number bytes per element (size of float by default)
+     * @return none.
+     */
+    virtual void LoadFile(const char* fileName, uint32_t arrayIndex, std::string& ptrName, std::vector<uint8_t>& memory, uint32_t* ptrNumRows,
+                          uint32_t* ptrNumColumns, uint32_t* ptrNumBytesPerElement);
+
+    /**
+     * @brief Save Numpy* uncompressed NPZ speech feature vector file
+     * @param fileName .npz file name
+     * @param shouldAppend bool flag to rewrite or add to the end of file
+     * @param name reference to variable length name
+     * @param ptrMemory pointer to speech feature vector to save
+     * @param numRows number of rows
+     * @param numColumns number of columns
+     * @return none.
+     */
+    virtual void SaveFile(const char* fileName, bool shouldAppend, std::string name, void* ptrMemory, uint32_t numRows, uint32_t numColumns);
+};
--- a/inference-engine/samples/speech_sample/main.cpp
+++ b/inference-engine/samples/speech_sample/main.cpp
@ -24,6 +24,7 @@
 #include <utility>
 #include <vector>

+#include "fileutils.hpp"
 #include "speech_sample.hpp"

 #define MAX_SCORE_DIFFERENCE 0.0001f  // max score difference for frame error threshold
@ -63,144 +64,15 @@ struct InferRequestStruct {
 /**
 * @brief Check number of input files and model network inputs
 * @param numInputs number model inputs
- * @param numInputArkFiles number of input ARK files
+ * @param numInputFiles number of input files
 * @return none.
 */
-void CheckNumberOfInputs(size_t numInputs, size_t numInputArkFiles) {
-    if (numInputs != numInputArkFiles) {
+void CheckNumberOfInputs(size_t numInputs, size_t numInputFiles) {
+    if (numInputs != numInputFiles) {
        throw std::logic_error("Number of network inputs (" + std::to_string(numInputs) +
                               ")"
-                               " is not equal to number of ark files (" +
-                               std::to_string(numInputArkFiles) + ")");
-    }
-}
-
-/**
- * @brief Get info from Kaldi ARK speech feature vector file
- * @param fileName .ark file name
- * @param numArrayToFindSize number speech feature vectors in the file
- * @param ptrNumArrays pointer to specific number array
- * @param ptrNumMemoryBytes pointer to specific number of memory bytes
- * @return none.
- */
-void GetKaldiArkInfo(const char* fileName, uint32_t numArrayToFindSize, uint32_t* ptrNumArrays, uint32_t* ptrNumMemoryBytes) {
-    uint32_t numArrays = 0;
-    uint32_t numMemoryBytes = 0;
-
-    std::ifstream in_file(fileName, std::ios::binary);
-    if (in_file.good()) {
-        while (!in_file.eof()) {
-            std::string line;
-            uint32_t numRows = 0u, numCols = 0u, num_bytes = 0u;
-            std::getline(in_file, line, '\0');  // read variable length name followed by space and NUL
-            std::getline(in_file, line, '\4');  // read "BFM" followed by space and control-D
-            if (line.compare("BFM ") != 0) {
-                break;
-            }
-            in_file.read(reinterpret_cast<char*>(&numRows), sizeof(uint32_t));  // read number of rows
-            std::getline(in_file, line, '\4');                                  // read control-D
-            in_file.read(reinterpret_cast<char*>(&numCols), sizeof(uint32_t));  // read number of columns
-            num_bytes = numRows * numCols * sizeof(float);
-            in_file.seekg(num_bytes, in_file.cur);  // read data
-
-            if (numArrays == numArrayToFindSize) {
-                numMemoryBytes += num_bytes;
-            }
-            numArrays++;
-        }
-        in_file.close();
-    } else {
-        fprintf(stderr, "Failed to open %s for reading in GetKaldiArkInfo()!\n", fileName);
-        exit(-1);
-    }
-
-    if (ptrNumArrays != NULL)
-        *ptrNumArrays = numArrays;
-    if (ptrNumMemoryBytes != NULL)
-        *ptrNumMemoryBytes = numMemoryBytes;
-}
-
-/**
- * @brief Load Kaldi ARK speech feature vector file
- * @param fileName .ark file name
- * @param arrayIndex number speech feature vector in the file
- * @param ptrName reference to variable length name
- * @param memory reference to speech feature vector to save
- * @param ptrNumRows pointer to number of rows to read
- * @param ptrNumColumns pointer to number of columns to read
- * @param ptrNumBytesPerElement pointer to number bytes per element (size of float by default)
- * @return none.
- */
-void LoadKaldiArkArray(const char* fileName, uint32_t arrayIndex, std::string& ptrName, std::vector<uint8_t>& memory, uint32_t* ptrNumRows,
-                       uint32_t* ptrNumColumns, uint32_t* ptrNumBytesPerElement) {
-    std::ifstream in_file(fileName, std::ios::binary);
-    if (in_file.good()) {
-        uint32_t i = 0;
-        while (i < arrayIndex) {
-            std::string line;
-            uint32_t numRows = 0u, numCols = 0u;
-            std::getline(in_file, line, '\0');  // read variable length name followed by space and NUL
-            std::getline(in_file, line, '\4');  // read "BFM" followed by space and control-D
-            if (line.compare("BFM ") != 0) {
-                break;
-            }
-            in_file.read(reinterpret_cast<char*>(&numRows), sizeof(uint32_t));  // read number of rows
-            std::getline(in_file, line, '\4');                                  // read control-D
-            in_file.read(reinterpret_cast<char*>(&numCols), sizeof(uint32_t));  // read number of columns
-            in_file.seekg(numRows * numCols * sizeof(float), in_file.cur);      // read data
-            i++;
-        }
-        if (!in_file.eof()) {
-            std::string line;
-            std::getline(in_file, ptrName, '\0');  // read variable length name followed by space and NUL
-            std::getline(in_file, line, '\4');     // read "BFM" followed by space and control-D
-            if (line.compare("BFM ") != 0) {
-                fprintf(stderr, "Cannot find array specifier in file %s in LoadKaldiArkArray()!\n", fileName);
-                exit(-1);
-            }
-            in_file.read(reinterpret_cast<char*>(ptrNumRows), sizeof(uint32_t));     // read number of rows
-            std::getline(in_file, line, '\4');                                       // read control-D
-            in_file.read(reinterpret_cast<char*>(ptrNumColumns), sizeof(uint32_t));  // read number of columns
-            in_file.read(reinterpret_cast<char*>(&memory.front()),
-                         *ptrNumRows * *ptrNumColumns * sizeof(float));  // read array data
-        }
-        in_file.close();
-    } else {
-        fprintf(stderr, "Failed to open %s for reading in LoadKaldiArkArray()!\n", fileName);
-        exit(-1);
-    }
-
-    *ptrNumBytesPerElement = sizeof(float);
-}
-
-/**
- * @brief Save Kaldi ARK speech feature vector file
- * @param fileName .ark file name
- * @param shouldAppend bool flag to rewrite or add to the end of file
- * @param name reference to variable length name
- * @param ptrMemory pointer to speech feature vector to save
- * @param numRows number of rows
- * @param numColumns number of columns
- * @return none.
- */
-void SaveKaldiArkArray(const char* fileName, bool shouldAppend, std::string name, void* ptrMemory, uint32_t numRows, uint32_t numColumns) {
-    std::ios_base::openmode mode = std::ios::binary;
-    if (shouldAppend) {
-        mode |= std::ios::app;
-    }
-    std::ofstream out_file(fileName, mode);
-    if (out_file.good()) {
-        out_file.write(name.c_str(), name.length());  // write name
-        out_file.write("\0", 1);
-        out_file.write("BFM ", 4);
-        out_file.write("\4", 1);
-        out_file.write(reinterpret_cast<char*>(&numRows), sizeof(uint32_t));
-        out_file.write("\4", 1);
-        out_file.write(reinterpret_cast<char*>(&numColumns), sizeof(uint32_t));
-        out_file.write(reinterpret_cast<char*>(ptrMemory), numRows * numColumns * sizeof(float));
-        out_file.close();
-    } else {
-        throw std::runtime_error(std::string("Failed to open %s for writing in SaveKaldiArkArray()!\n") + fileName);
+                               " is not equal to number of input files (" +
+                               std::to_string(numInputFiles) + ")");
    }
 }

@ -637,7 +509,20 @@ int main(int argc, char* argv[]) {
            return 0;
        }

-        std::vector<std::string> inputArkFiles;
+        BaseFile* file;
+        BaseFile* fileOutput;
+        ArkFile arkFile;
+        NumpyFile numpyFile;
+        auto extInputFile = fileExt(FLAGS_i);
+        if (extInputFile == "ark") {
+            file = &arkFile;
+        } else if (extInputFile == "npz") {
+            file = &numpyFile;
+        } else {
+            throw std::logic_error("Invalid input file");
+        }
+
+        std::vector<std::string> inputFiles;
        std::vector<uint32_t> numBytesThisUtterance;
        uint32_t numUtterances(0);
        if (!FLAGS_i.empty()) {
@ -646,19 +531,19 @@ int main(int argc, char* argv[]) {

            uint32_t currentNumUtterances(0), currentNumBytesThisUtterance(0);
            while (getline(stream, outStr, ',')) {
-                std::string filename(fileNameNoExt(outStr) + ".ark");
-                inputArkFiles.push_back(filename);
+                std::string filename(fileNameNoExt(outStr) + "." + extInputFile);
+                inputFiles.push_back(filename);

-                GetKaldiArkInfo(filename.c_str(), 0, &currentNumUtterances, &currentNumBytesThisUtterance);
+                file->GetFileInfo(filename.c_str(), 0, &currentNumUtterances, &currentNumBytesThisUtterance);
                if (numUtterances == 0) {
                    numUtterances = currentNumUtterances;
                } else if (currentNumUtterances != numUtterances) {
-                    throw std::logic_error("Incorrect input files. Number of utterance must be the same for all ark files");
+                    throw std::logic_error("Incorrect input files. Number of utterance must be the same for all input files");
                }
                numBytesThisUtterance.push_back(currentNumBytesThisUtterance);
            }
        }
-        size_t numInputArkFiles(inputArkFiles.size());
+        size_t numInputFiles(inputFiles.size());
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- Step 1. Initialize inference engine core -------------------------------------
@ -689,7 +574,7 @@ int main(int argc, char* argv[]) {
        if (!FLAGS_m.empty()) {
            /** Read network model **/
            network = ie.ReadNetwork(FLAGS_m);
-            CheckNumberOfInputs(network.getInputsInfo().size(), numInputArkFiles);
+            CheckNumberOfInputs(network.getInputsInfo().size(), numInputFiles);
            // -------------------------------------------------------------------------------------------------

            // --------------------------- Set batch size ---------------------------------------------------
@ -718,9 +603,9 @@ int main(int argc, char* argv[]) {
                slog::warn << "Custom scale factor will be ignored - using scale factor from provided imported gna model: " << FLAGS_rg << slog::endl;
            } else {
                auto scaleFactorInput = ParseScaleFactors(FLAGS_sf);
-                if (numInputArkFiles != scaleFactorInput.size()) {
+                if (numInputFiles != scaleFactorInput.size()) {
                    std::string errMessage("Incorrect command line for multiple inputs: " + std::to_string(scaleFactorInput.size()) +
-                                           " scale factors provided for " + std::to_string(numInputArkFiles) + " input files.");
+                                           " scale factors provided for " + std::to_string(numInputFiles) + " input files.");
                    throw std::logic_error(errMessage);
                }

@ -735,14 +620,14 @@ int main(int argc, char* argv[]) {
            if (!FLAGS_rg.empty()) {
                slog::info << "Using scale factor from provided imported gna model: " << FLAGS_rg << slog::endl;
            } else {
-                for (size_t i = 0; i < numInputArkFiles; i++) {
-                    auto inputArkName = inputArkFiles[i].c_str();
+                for (size_t i = 0; i < numInputFiles; i++) {
+                    auto inputFileName = inputFiles[i].c_str();
                    std::string name;
                    std::vector<uint8_t> ptrFeatures;
                    uint32_t numArrays(0), numBytes(0), numFrames(0), numFrameElements(0), numBytesPerElement(0);
-                    GetKaldiArkInfo(inputArkName, 0, &numArrays, &numBytes);
+                    file->GetFileInfo(inputFileName, 0, &numArrays, &numBytes);
                    ptrFeatures.resize(numBytes);
-                    LoadKaldiArkArray(inputArkName, 0, name, ptrFeatures, &numFrames, &numFrameElements, &numBytesPerElement);
+                    file->LoadFile(inputFileName, 0, name, ptrFeatures, &numFrames, &numFrameElements, &numBytesPerElement);
                    auto floatScaleFactor = ScaleFactorForQuantization(ptrFeatures.data(), MAX_VAL_2B_FEAT, numFrames * numFrameElements);
                    slog::info << "Using scale factor of " << floatScaleFactor << " calculated from first utterance." << slog::endl;
                    std::string scaleFactorConfigKey = GNA_CONFIG_KEY(SCALE_FACTOR) + std::string("_") + std::to_string(i);
@ -840,7 +725,7 @@ int main(int argc, char* argv[]) {
        // --------------------------- Prepare input blobs -----------------------------------------------------
        /** Taking information about all topology inputs **/
        ConstInputsDataMap cInputInfo = executableNet.GetInputsInfo();
-        CheckNumberOfInputs(cInputInfo.size(), numInputArkFiles);
+        CheckNumberOfInputs(cInputInfo.size(), numInputFiles);

        /** Stores all input blobs data **/
        std::vector<Blob::Ptr> ptrInputBlobs;
@ -934,7 +819,7 @@ int main(int argc, char* argv[]) {
            std::vector<uint8_t> ptrReferenceScores;
            score_error_t frameError, totalError;

-            ptrUtterances.resize(inputArkFiles.size());
+            ptrUtterances.resize(inputFiles.size());

            // initialize memory state before starting
            for (auto&& state : inferRequests.begin()->inferRequest.QueryState()) {
@ -954,20 +839,20 @@ int main(int argc, char* argv[]) {

                slog::info << "Number scores per frame : " << numScoresPerFrame << slog::endl;

-                /** Get information from ark file for current utterance **/
-                numFrameElementsInput.resize(numInputArkFiles);
-                for (size_t i = 0; i < inputArkFiles.size(); i++) {
+                /** Get information from input file for current utterance **/
+                numFrameElementsInput.resize(numInputFiles);
+                for (size_t i = 0; i < inputFiles.size(); i++) {
                    std::vector<uint8_t> ptrUtterance;
-                    auto inputArkFilename = inputArkFiles[i].c_str();
+                    auto inputFilename = inputFiles[i].c_str();
                    uint32_t currentNumFrames(0), currentNumFrameElementsInput(0), currentNumBytesPerElementInput(0);
-                    GetKaldiArkInfo(inputArkFilename, utteranceIndex, &n, &numBytesThisUtterance[i]);
+                    file->GetFileInfo(inputFilename, utteranceIndex, &n, &numBytesThisUtterance[i]);
                    ptrUtterance.resize(numBytesThisUtterance[i]);
-                    LoadKaldiArkArray(inputArkFilename, utteranceIndex, uttName, ptrUtterance, &currentNumFrames, &currentNumFrameElementsInput,
-                                      &currentNumBytesPerElementInput);
+                    file->LoadFile(inputFilename, utteranceIndex, uttName, ptrUtterance, &currentNumFrames, &currentNumFrameElementsInput,
+                                   &currentNumBytesPerElementInput);
                    if (numFrames == 0) {
                        numFrames = currentNumFrames;
                    } else if (numFrames != currentNumFrames) {
-                        std::string errMessage("Number of frames in ark files is different: " + std::to_string(numFrames) + " and " +
+                        std::string errMessage("Number of frames in input files is different: " + std::to_string(numFrames) + " and " +
                                               std::to_string(currentNumFrames));
                        throw std::logic_error(errMessage);
                    }
@ -979,19 +864,28 @@ int main(int argc, char* argv[]) {
                int i = 0;
                for (auto& ptrInputBlob : ptrInputBlobs) {
                    if (ptrInputBlob->size() != numFrameElementsInput[i++] * batchSize) {
-                        throw std::logic_error("network input size(" + std::to_string(ptrInputBlob->size()) + ") mismatch to ark file size (" +
+                        throw std::logic_error("network input size(" + std::to_string(ptrInputBlob->size()) + ") mismatch to input file size (" +
                                               std::to_string(numFrameElementsInput[i - 1] * batchSize) + ")");
                    }
                }

                ptrScores.resize(numFrames * numScoresPerFrame * sizeof(float));
                if (!FLAGS_r.empty()) {
-                    /** Read ark file with reference scores **/
+                    /** Read file with reference scores **/
+                    BaseFile* fileReferenceScores;
+                    auto exReferenceScoresFile = fileExt(FLAGS_r);
+                    if (exReferenceScoresFile == "ark") {
+                        fileReferenceScores = &arkFile;
+                    } else if (exReferenceScoresFile == "npz") {
+                        fileReferenceScores = &numpyFile;
+                    } else {
+                        throw std::logic_error("Invalid Reference Scores file");
+                    }
                    std::string refUtteranceName;
-                    GetKaldiArkInfo(reference_name_files[next_output].c_str(), utteranceIndex, &n, &numBytesReferenceScoreThisUtterance);
+                    fileReferenceScores->GetFileInfo(reference_name_files[next_output].c_str(), utteranceIndex, &n, &numBytesReferenceScoreThisUtterance);
                    ptrReferenceScores.resize(numBytesReferenceScoreThisUtterance);
-                    LoadKaldiArkArray(reference_name_files[next_output].c_str(), utteranceIndex, refUtteranceName, ptrReferenceScores, &numFramesReference,
-                                      &numFrameElementsReference, &numBytesPerElementReference);
+                    fileReferenceScores->LoadFile(reference_name_files[next_output].c_str(), utteranceIndex, refUtteranceName, ptrReferenceScores,
+                                                  &numFramesReference, &numFrameElementsReference, &numBytesPerElementReference);
                }

                double totalTime = 0.0;
@ -1009,7 +903,7 @@ int main(int argc, char* argv[]) {
                std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> callPerfMap;

                size_t frameIndex = 0;
-                uint32_t numFramesArkFile = numFrames;
+                uint32_t numFramesFile = numFrames;
                numFrames += FLAGS_cw_l + FLAGS_cw_r;
                uint32_t numFramesThisBatch {batchSize};

@ -1120,7 +1014,7 @@ int main(int argc, char* argv[]) {
                        }

                        /** Iterate over all the input blobs **/
-                        for (size_t i = 0; i < numInputArkFiles; ++i) {
+                        for (size_t i = 0; i < numInputFiles; ++i) {
                            MemoryBlob::Ptr minput = as<MemoryBlob>(ptrInputBlobs[i]);
                            if (!minput) {
                                std::string errMessage("We expect ptrInputBlobs[" + std::to_string(i) + "] to be inherited from MemoryBlob, " +
@ -1141,14 +1035,14 @@ int main(int argc, char* argv[]) {
                        inferRequest.numFramesThisBatch = numFramesThisBatch;

                        frameIndex += numFramesThisBatch;
-                        for (size_t j = 0; j < inputArkFiles.size(); j++) {
+                        for (size_t j = 0; j < inputFiles.size(); j++) {
                            if (FLAGS_cw_l > 0 || FLAGS_cw_r > 0) {
                                int idx = frameIndex - FLAGS_cw_l;
-                                if (idx > 0 && idx < static_cast<int>(numFramesArkFile)) {
+                                if (idx > 0 && idx < static_cast<int>(numFramesFile)) {
                                    inputFrame[j] += sizeof(float) * numFrameElementsInput[j] * numFramesThisBatch;
-                                } else if (idx >= static_cast<int>(numFramesArkFile)) {
+                                } else if (idx >= static_cast<int>(numFramesFile)) {
                                    inputFrame[j] =
-                                        &ptrUtterances[j].front() + (numFramesArkFile - 1) * sizeof(float) * numFrameElementsInput[j] * numFramesThisBatch;
+                                        &ptrUtterances[j].front() + (numFramesFile - 1) * sizeof(float) * numFrameElementsInput[j] * numFramesThisBatch;
                                } else if (idx <= 0) {
                                    inputFrame[j] = &ptrUtterances[j].front();
                                }
@ -1179,9 +1073,17 @@ int main(int argc, char* argv[]) {
                // --------------------------- Step 8. Process output part 2 -------------------------------------------------------

                if (!FLAGS_o.empty()) {
+                    auto exOutputScoresFile = fileExt(FLAGS_o);
+                    if (exOutputScoresFile == "ark") {
+                        fileOutput = &arkFile;
+                    } else if (exOutputScoresFile == "npz") {
+                        fileOutput = &numpyFile;
+                    } else {
+                        throw std::logic_error("Invalid Reference Scores file");
+                    }
                    /* Save output data to file */
                    bool shouldAppend = (utteranceIndex == 0) ? false : true;
-                    SaveKaldiArkArray(output_name_files[next_output].c_str(), shouldAppend, uttName, &ptrScores.front(), numFramesArkFile, numScoresPerFrame);
+                    fileOutput->SaveFile(output_name_files[next_output].c_str(), shouldAppend, uttName, &ptrScores.front(), numFramesFile, numScoresPerFrame);
                }

                /** Show performance results **/
--- a/inference-engine/samples/speech_sample/speech_sample.hpp
+++ b/inference-engine/samples/speech_sample/speech_sample.hpp
@ -14,7 +14,7 @@
 static const char help_message[] = "Print a usage message.";

 /// @brief message for images argument
-static const char input_message[] = "Required. Paths to .ark files. Example of usage: <file1.ark,file2.ark> or <file.ark>.";
+static const char input_message[] = "Required. Paths to input files. Example of usage: <file1.ark,file2.ark> or <file.ark> or <file.npz>.";

 /// @brief message for model argument
 static const char model_message[] = "Required. Path to an .xml file with a trained model (required if -rg is missing).";
@ -49,10 +49,10 @@ static const char custom_cpu_library_message[] = "Required for CPU plugin custom
                                                 "Absolute path to a shared library with the kernels implementations.";

 /// @brief message for score output argument
-static const char output_message[] = "Optional. Output file name to save ark scores.";
+static const char output_message[] = "Optional. Output file name to save scores. Example of usage: <output.ark> or <output.npz>";

 /// @brief message for reference score file argument
-static const char reference_score_message[] = "Optional. Read reference score .ark file and compare scores.";
+static const char reference_score_message[] = "Optional. Read reference score file and compare scores. Example of usage: <reference.ark> or <reference.npz>";

 /// @brief message for read GNA model argument
 static const char read_gna_model_message[] = "Read GNA model from file using path/filename provided (required if -m is missing).";
--- a/licensing/third-party-programs.txt
+++ b/licensing/third-party-programs.txt
@ -277,6 +277,9 @@ OTHER DEALINGS IN THE SOFTWARE.
   libnpy (https://github.com/llohse/libnpy/)
   Copyright (c) 2021 Leon Merten Lohse

+   rogersce/cnpy (https://github.com/rogersce/cnpy/)
+   Copyright (c) Carl Rogers, 2011
+
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
@ -570,6 +573,27 @@ SOFTWARE.

 -------------------------------------------------------------

+15. ZLIB DATA COMPRESSION LIBRARY
+  Copyright (C) 1995-2017 Jean-loup Gailly and Mark Adler
+
+  This software is provided 'as-is', without any express or implied
+  warranty.  In no event will the authors be held liable for any damages
+  arising from the use of this software.
+
+  Permission is granted to anyone to use this software for any purpose,
+  including commercial applications, and to alter it and redistribute it
+  freely, subject to the following restrictions:
+
+  1. The origin of this software must not be misrepresented; you must not
+     claim that you wrote the original software. If you use this software
+     in a product, an acknowledgment in the product documentation would be
+     appreciated but is not required.
+  2. Altered source versions must be plainly marked as such, and must not be
+     misrepresented as being the original software.
+  3. This notice may not be removed or altered from any source distribution.
+
+-------------------------------------------------------------
+
 The following third party programs have their own third party program files. These additional third party program files are as follows:

 oneAPI Deep Neural Network Library (oneDNN) Third Party Programs File is available here https://github.com/openvinotoolkit/openvino/blob/master/licensing/onednn_third-party-programs.txt
--- a/thirdparty/CMakeLists.txt
+++ b/thirdparty/CMakeLists.txt
@ -5,5 +5,6 @@
 add_subdirectory(ittapi)
 add_subdirectory(itt_collector)
 add_subdirectory(xbyak EXCLUDE_FROM_ALL)
-
+add_subdirectory(zlib EXCLUDE_FROM_ALL)
+add_subdirectory(cnpy EXCLUDE_FROM_ALL)
 openvino_developer_export_targets(COMPONENT openvino_common TARGETS xbyak)
--- a/thirdparty/cnpy/CMakeLists.txt
+++ b/thirdparty/cnpy/CMakeLists.txt
@ -0,0 +1,21 @@
+CMAKE_MINIMUM_REQUIRED(VERSION 3.0 FATAL_ERROR)
+if(COMMAND cmake_policy)
+	cmake_policy(SET CMP0003 NEW)
+endif(COMMAND cmake_policy)
+
+project(CNPY)
+
+set(TARGET_NAME "cnpy")
+add_library(cnpy STATIC  "cnpy.cpp")
+
+if(NOT ${CMAKE_CXX_COMPILER_ID} STREQUAL "MSVC")
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-all")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-all")
+    target_compile_options(${TARGET_NAME} PUBLIC -Wno-unused-variable)
+endif()
+
+target_link_libraries(${TARGET_NAME} PUBLIC zlib)
+target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}"
+                                                 "${CMAKE_CURRENT_SOURCE_DIR}/..")
+
+set_target_properties(cnpy PROPERTIES FOLDER thirdparty)
--- a/thirdparty/cnpy/LICENSE
+++ b/thirdparty/cnpy/LICENSE
@ -0,0 +1,21 @@
+The MIT License
+
+Copyright (c) Carl Rogers, 2011
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
--- a/thirdparty/cnpy/README.md
+++ b/thirdparty/cnpy/README.md
@ -0,0 +1,55 @@
+# Purpose:
+
+NumPy offers the `save` method for easy saving of arrays into .npy and `savez` for zipping multiple .npy arrays together into a .npz file. 
+
+`cnpy` lets you read and write to these formats in C++. 
+
+The motivation comes from scientific programming where large amounts of data are generated in C++ and analyzed in Python.
+
+Writing to .npy has the advantage of using low-level C++ I/O (fread and fwrite) for speed and binary format for size. 
+The .npy file header takes care of specifying the size, shape, and data type of the array, so specifying the format of the data is unnecessary.
+
+Loading data written in numpy formats into C++ is equally simple, but requires you to type-cast the loaded data to the type of your choice.
+
+# Installation:
+
+Default installation directory is /usr/local. 
+To specify a different directory, add `-DCMAKE_INSTALL_PREFIX=/path/to/install/dir` to the cmake invocation in step 4.
+
+1. get [cmake](www.cmake.org)
+2. create a build directory, say $HOME/build
+3. cd $HOME/build
+4. cmake /path/to/cnpy
+5. make
+6. make install
+
+# Using:
+
+To use, `#include"cnpy.h"` in your source code. Compile the source code mycode.cpp as
+
+```bash
+g++ -o mycode mycode.cpp -L/path/to/install/dir -lcnpy -lz --std=c++11
+```
+
+# Description:
+
+There are two functions for writing data: `npy_save` and `npz_save`.
+
+There are 3 functions for reading:
+- `npy_load` will load a .npy file. 
+- `npz_load(fname)` will load a .npz and return a dictionary of NpyArray structues. 
+- `npz_load(fname,varname)` will load and return the NpyArray for data varname from the specified .npz file.
+
+The data structure for loaded data is below. 
+Data is accessed via the `data<T>()`-method, which returns a pointer of the specified type (which must match the underlying datatype of the data). 
+The array shape and word size are read from the npy header.
+
+```c++
+struct NpyArray {
+    std::vector<size_t> shape;
+    size_t word_size;
+    template<typename T> T* data();
+};
+```
+
+See [example1.cpp](example1.cpp) for examples of how to use the library. example1 will also be build during cmake installation.
--- a/thirdparty/cnpy/cnpy.cpp
+++ b/thirdparty/cnpy/cnpy.cpp
@ -0,0 +1,362 @@
+//Copyright (C) 2011  Carl Rogers
+//Released under MIT License
+//license available in LICENSE file, or at http://www.opensource.org/licenses/mit-license.php
+
+#include"cnpy.h"
+#include<complex>
+#include<cstdlib>
+#include<algorithm>
+#include<cstring>
+#include<iomanip>
+#include<stdint.h>
+#include<stdexcept>
+#include <regex>
+
+char cnpy::BigEndianTest() {
+    int x = 1;
+    return (((char *)&x)[0]) ? '<' : '>';
+}
+
+char cnpy::map_type(const std::type_info& t)
+{
+    if(t == typeid(float) ) return 'f';
+    if(t == typeid(double) ) return 'f';
+    if(t == typeid(long double) ) return 'f';
+
+    if(t == typeid(int) ) return 'i';
+    if(t == typeid(char) ) return 'i';
+    if(t == typeid(short) ) return 'i';
+    if(t == typeid(long) ) return 'i';
+    if(t == typeid(long long) ) return 'i';
+
+    if(t == typeid(unsigned char) ) return 'u';
+    if(t == typeid(unsigned short) ) return 'u';
+    if(t == typeid(unsigned long) ) return 'u';
+    if(t == typeid(unsigned long long) ) return 'u';
+    if(t == typeid(unsigned int) ) return 'u';
+
+    if(t == typeid(bool) ) return 'b';
+
+    if(t == typeid(std::complex<float>) ) return 'c';
+    if(t == typeid(std::complex<double>) ) return 'c';
+    if(t == typeid(std::complex<long double>) ) return 'c';
+
+    else return '?';
+}
+
+template<> std::vector<char>& cnpy::operator+=(std::vector<char>& lhs, const std::string rhs) {
+    lhs.insert(lhs.end(),rhs.begin(),rhs.end());
+    return lhs;
+}
+
+template<> std::vector<char>& cnpy::operator+=(std::vector<char>& lhs, const char* rhs) {
+    //write in little endian
+    size_t len = strlen(rhs);
+    lhs.reserve(len);
+    for(size_t byte = 0; byte < len; byte++) {
+        lhs.push_back(rhs[byte]);
+    }
+    return lhs;
+}
+
+void cnpy::parse_npy_header(unsigned char* buffer,size_t& word_size, std::vector<size_t>& shape, bool& fortran_order) {
+    //std::string magic_string(buffer,6);
+    uint8_t major_version = *reinterpret_cast<uint8_t*>(buffer+6);
+    uint8_t minor_version = *reinterpret_cast<uint8_t*>(buffer+7);
+    uint16_t header_len = *reinterpret_cast<uint16_t*>(buffer+8);
+    std::string header(reinterpret_cast<char*>(buffer+9),header_len);
+
+    size_t loc1, loc2;
+
+    //fortran order
+    loc1 = header.find("fortran_order")+16;
+    fortran_order = (header.substr(loc1,4) == "True" ? true : false);
+
+    //shape
+    loc1 = header.find("(");
+    loc2 = header.find(")");
+
+    std::regex num_regex("[0-9][0-9]*");
+    std::smatch sm;
+    shape.clear();
+
+    std::string str_shape = header.substr(loc1+1,loc2-loc1-1);
+    while(std::regex_search(str_shape, sm, num_regex)) {
+        shape.push_back(std::stoi(sm[0].str()));
+        str_shape = sm.suffix().str();
+    }
+
+    //endian, word size, data type
+    //byte order code | stands for not applicable. 
+    //not sure when this applies except for byte array
+    loc1 = header.find("descr")+9;
+    bool littleEndian = (header[loc1] == '<' || header[loc1] == '|' ? true : false);
+    assert(littleEndian);
+
+    //char type = header[loc1+1];
+    //assert(type == map_type(T));
+
+    std::string str_ws = header.substr(loc1+2);
+    loc2 = str_ws.find("'");
+    word_size = atoi(str_ws.substr(0,loc2).c_str());
+}
+
+void cnpy::parse_npy_header(FILE* fp, size_t& word_size, std::vector<size_t>& shape, bool& fortran_order) {  
+    char buffer[256];
+    std::string header;
+    size_t res = fread(buffer,sizeof(char),11,fp);
+    if (res != 11)
+        throw std::runtime_error("parse_npy_header: failed fread");
+    char * data = fgets(buffer, 256, fp);
+    if (data != NULL) {
+        header = data;
+    }
+    else {
+        header = "";
+    }
+    assert(header[header.size()-1] == '\n');
+
+    size_t loc1, loc2;
+
+    //fortran order
+    loc1 = header.find("fortran_order");
+    if (loc1 == std::string::npos)
+        throw std::runtime_error("parse_npy_header: failed to find header keyword: 'fortran_order'");
+    loc1 += 16;
+    fortran_order = (header.substr(loc1,4) == "True" ? true : false);
+
+    //shape
+    loc1 = header.find("(");
+    loc2 = header.find(")");
+    if (loc1 == std::string::npos || loc2 == std::string::npos)
+        throw std::runtime_error("parse_npy_header: failed to find header keyword: '(' or ')'");
+
+    std::regex num_regex("[0-9][0-9]*");
+    std::smatch sm;
+    shape.clear();
+
+    std::string str_shape = header.substr(loc1+1,loc2-loc1-1);
+    while(std::regex_search(str_shape, sm, num_regex)) {
+        shape.push_back(std::stoi(sm[0].str()));
+        str_shape = sm.suffix().str();
+    }
+
+    //endian, word size, data type
+    //byte order code | stands for not applicable. 
+    //not sure when this applies except for byte array
+    loc1 = header.find("descr");
+    if (loc1 == std::string::npos)
+        throw std::runtime_error("parse_npy_header: failed to find header keyword: 'descr'");
+    loc1 += 9;
+    bool littleEndian = (header[loc1] == '<' || header[loc1] == '|' ? true : false);
+    assert(littleEndian);
+
+    //char type = header[loc1+1];
+    //assert(type == map_type(T));
+
+    std::string str_ws = header.substr(loc1+2);
+    loc2 = str_ws.find("'");
+    word_size = atoi(str_ws.substr(0,loc2).c_str());
+}
+
+void cnpy::parse_zip_footer(FILE* fp, uint16_t& nrecs, size_t& global_header_size, size_t& global_header_offset)
+{
+    std::vector<char> footer(22);
+    fseek(fp,-22,SEEK_END);
+    size_t res = fread(&footer[0],sizeof(char),22,fp);
+    if(res != 22)
+        throw std::runtime_error("parse_zip_footer: failed fread");
+
+    uint16_t disk_no, disk_start, nrecs_on_disk, comment_len;
+    disk_no = *(uint16_t*) &footer[4];
+    disk_start = *(uint16_t*) &footer[6];
+    nrecs_on_disk = *(uint16_t*) &footer[8];
+    nrecs = *(uint16_t*) &footer[10];
+    global_header_size = *(uint32_t*) &footer[12];
+    global_header_offset = *(uint32_t*) &footer[16];
+    comment_len = *(uint16_t*) &footer[20];
+
+    assert(disk_no == 0);
+    assert(disk_start == 0);
+    assert(nrecs_on_disk == nrecs);
+    assert(comment_len == 0);
+}
+
+cnpy::NpyArray load_the_npy_file(FILE* fp) {
+    std::vector<size_t> shape;
+    size_t word_size;
+    bool fortran_order;
+    cnpy::parse_npy_header(fp,word_size,shape,fortran_order);
+    if (word_size >= 0 && word_size < ULLONG_MAX) {
+        cnpy::NpyArray arr(shape, word_size, fortran_order);
+        size_t nread = fread(arr.data<char>(), 1, arr.num_bytes(), fp);
+        if (nread != arr.num_bytes())
+            throw std::runtime_error("load_the_npy_file: failed fread");
+        return arr;
+    }
+    else {
+        throw std::runtime_error("load_the_npy_file: incorrect word_size");
+    }
+}
+
+cnpy::NpyArray load_the_npz_array(FILE* fp, uint32_t compr_bytes, uint32_t uncompr_bytes) {
+
+    std::vector<unsigned char> buffer_compr(compr_bytes);
+    std::vector<unsigned char> buffer_uncompr(uncompr_bytes);
+    size_t nread = fread(&buffer_compr[0],1,compr_bytes,fp);
+    if(nread != compr_bytes)
+        throw std::runtime_error("load_the_npy_file: failed fread");
+
+    int err;
+    z_stream d_stream;
+
+    d_stream.zalloc = Z_NULL;
+    d_stream.zfree = Z_NULL;
+    d_stream.opaque = Z_NULL;
+    d_stream.avail_in = 0;
+    d_stream.next_in = Z_NULL;
+    err = inflateInit2(&d_stream, -MAX_WBITS);
+
+    d_stream.avail_in = compr_bytes;
+    d_stream.next_in = &buffer_compr[0];
+    d_stream.avail_out = uncompr_bytes;
+    d_stream.next_out = &buffer_uncompr[0];
+
+    err = inflate(&d_stream, Z_FINISH);
+    err = inflateEnd(&d_stream);
+
+    std::vector<size_t> shape;
+    size_t word_size;
+    bool fortran_order;
+    cnpy::parse_npy_header(&buffer_uncompr[0],word_size,shape,fortran_order);
+    if (word_size >= 0 && word_size < ULLONG_MAX) {
+        cnpy::NpyArray array(shape, word_size, fortran_order);
+
+        size_t offset = uncompr_bytes - array.num_bytes();
+        memcpy(array.data<unsigned char>(), &buffer_uncompr[0] + offset, array.num_bytes());
+
+        return array;
+    }
+    else {
+        throw std::runtime_error("load_the_npz_array: incorrect word_size");
+    }
+}
+
+cnpy::npz_t cnpy::npz_load(std::string fname) {
+    FILE* fp = fopen(fname.c_str(),"rb");
+
+    if(!fp) {
+        throw std::runtime_error("npz_load: Error! Unable to open file "+fname+"!");
+    }
+
+    cnpy::npz_t arrays;  
+
+    while(1) {
+        std::vector<char> local_header(30);
+        size_t headerres = fread(&local_header[0],sizeof(char),30,fp);
+        if (headerres != 30) {
+            fclose(fp);
+            throw std::runtime_error("npz_load: failed fread");
+        }
+        //if we've reached the global header, stop reading
+        if(local_header[2] != 0x03 || local_header[3] != 0x04) break;
+
+        //read in the variable name
+        uint16_t name_len = *(uint16_t*) &local_header[26];
+        std::string varname(name_len,' ');
+        size_t vname_res = fread(&varname[0],sizeof(char),name_len,fp);
+        if (vname_res != name_len) {
+            fclose(fp);
+            throw std::runtime_error("npz_load: failed fread");
+        }
+        //erase the lagging .npy        
+        varname.erase(varname.end()-4,varname.end());
+
+        //read in the extra field
+        uint16_t extra_field_len = *(uint16_t*) &local_header[28];
+        if(extra_field_len > 0) {
+            std::vector<char> buff(extra_field_len);
+            size_t efield_res = fread(&buff[0],sizeof(char),extra_field_len,fp);
+            if (efield_res != extra_field_len) {
+                fclose(fp);
+                throw std::runtime_error("npz_load: failed fread");
+            }
+        }
+
+        uint16_t compr_method = *reinterpret_cast<uint16_t*>(&local_header[0]+8);
+        uint32_t compr_bytes = *reinterpret_cast<uint32_t*>(&local_header[0]+18);
+        uint32_t uncompr_bytes = *reinterpret_cast<uint32_t*>(&local_header[0]+22);
+
+        if(compr_method == 0) {arrays.push_back({ varname,load_the_npy_file(fp)});}
+        else { arrays.push_back({ varname, load_the_npz_array(fp,compr_bytes,uncompr_bytes)});}
+    }
+
+    fclose(fp);
+    return arrays;  
+}
+
+cnpy::NpyArray cnpy::npz_load(std::string fname, std::string varname) {
+    FILE* fp = fopen(fname.c_str(),"rb");
+
+    if(!fp) throw std::runtime_error("npz_load: Unable to open file "+fname);
+
+    while(1) {
+        std::vector<char> local_header(30);
+        size_t header_res = fread(&local_header[0],sizeof(char),30,fp);
+        if(header_res != 30){
+            fclose(fp);
+            throw std::runtime_error("npz_load: failed fread");
+        }
+        //if we've reached the global header, stop reading
+        if(local_header[2] != 0x03 || local_header[3] != 0x04) break;
+
+        //read in the variable name
+        uint16_t name_len = *(uint16_t*) &local_header[26];
+        std::string vname(name_len,' ');
+        size_t vname_res = fread(&vname[0],sizeof(char),name_len,fp);
+        if (vname_res != name_len) {
+            fclose(fp);
+            throw std::runtime_error("npz_load: failed fread");
+        }
+        vname.erase(vname.end()-4,vname.end()); //erase the lagging .npy
+
+        //read in the extra field
+        uint16_t extra_field_len = *(uint16_t*) &local_header[28];
+        fseek(fp,extra_field_len,SEEK_CUR); //skip past the extra field
+        
+        uint16_t compr_method = *reinterpret_cast<uint16_t*>(&local_header[0]+8);
+        uint32_t compr_bytes = *reinterpret_cast<uint32_t*>(&local_header[0]+18);
+        uint32_t uncompr_bytes = *reinterpret_cast<uint32_t*>(&local_header[0]+22);
+
+        if(vname == varname) {
+            NpyArray array  = (compr_method == 0) ? load_the_npy_file(fp) : load_the_npz_array(fp,compr_bytes,uncompr_bytes);
+            fclose(fp);
+            return array;
+        }
+        else {
+            //skip past the data
+            uint32_t size = *(uint32_t*) &local_header[22];
+            fseek(fp,size,SEEK_CUR);
+        }
+    }
+
+    fclose(fp);
+
+    //if we get here, we haven't found the variable in the file
+    throw std::runtime_error("npz_load: Variable name "+varname+" not found in "+fname);
+}
+
+cnpy::NpyArray cnpy::npy_load(std::string fname) {
+
+    FILE* fp = fopen(fname.c_str(), "rb");
+
+    if(!fp) throw std::runtime_error("npy_load: Unable to open file "+fname);
+
+    NpyArray arr = load_the_npy_file(fp);
+
+    fclose(fp);
+    return arr;
+}
+
+
+
--- a/thirdparty/cnpy/cnpy.h
+++ b/thirdparty/cnpy/cnpy.h
@ -0,0 +1,272 @@
+//Copyright (C) 2011  Carl Rogers
+//Released under MIT License
+//license available in LICENSE file, or at http://www.opensource.org/licenses/mit-license.php
+
+#ifndef LIBCNPY_H_
+#define LIBCNPY_H_
+
+#include<string>
+#include<stdexcept>
+#include<sstream>
+#include<vector>
+#include<cstdio>
+#include<typeinfo>
+#include<iostream>
+#include<cassert>
+#include<zlib.h>
+#include<map>
+#include<memory>
+#include<stdint.h>
+#include<numeric>
+
+namespace cnpy {
+
+    struct NpyArray {
+        NpyArray(const std::vector<size_t>& _shape, size_t _word_size, bool _fortran_order) :
+            shape(_shape), word_size(_word_size), fortran_order(_fortran_order)
+        {
+            num_vals = 1;
+            for(size_t i = 0;i < shape.size();i++) num_vals *= shape[i];
+            data_holder = std::shared_ptr<std::vector<char>>(
+                new std::vector<char>(num_vals * word_size));
+        }
+
+        NpyArray() : shape(0), word_size(0), fortran_order(0), num_vals(0) { }
+
+        template<typename T>
+        T* data() {
+            return reinterpret_cast<T*>(&(*data_holder)[0]);
+        }
+
+        template<typename T>
+        const T* data() const {
+            return reinterpret_cast<T*>(&(*data_holder)[0]);
+        }
+
+        template<typename T>
+        std::vector<T> as_vec() const {
+            const T* p = data<T>();
+            return std::vector<T>(p, p+num_vals);
+        }
+
+        size_t num_bytes() const {
+            return data_holder->size();
+        }
+
+        std::shared_ptr<std::vector<char>> data_holder;
+        std::vector<size_t> shape;
+        size_t word_size;
+        bool fortran_order;
+        size_t num_vals;
+    };
+   
+    using npz_t = std::vector<std::pair<std::string, NpyArray>>;
+
+    char BigEndianTest();
+    char map_type(const std::type_info& t);
+    template<typename T> std::vector<char> create_npy_header(const std::vector<size_t>& shape);
+    void parse_npy_header(FILE* fp,size_t& word_size, std::vector<size_t>& shape, bool& fortran_order);
+    void parse_npy_header(unsigned char* buffer,size_t& word_size, std::vector<size_t>& shape, bool& fortran_order);
+    void parse_zip_footer(FILE* fp, uint16_t& nrecs, size_t& global_header_size, size_t& global_header_offset);
+    npz_t npz_load(std::string fname);
+    NpyArray npz_load(std::string fname, std::string varname);
+    NpyArray npy_load(std::string fname);
+
+    template<typename T> std::vector<char>& operator+=(std::vector<char>& lhs, const T rhs) {
+        //write in little endian
+        for(size_t byte = 0; byte < sizeof(T); byte++) {
+            char val = *((char*)&rhs+byte); 
+            lhs.push_back(val);
+        }
+        return lhs;
+    }
+
+    template<> std::vector<char>& operator+=(std::vector<char>& lhs, const std::string rhs);
+    template<> std::vector<char>& operator+=(std::vector<char>& lhs, const char* rhs);
+
+
+    template<typename T> void npy_save(std::string fname, const T* data, const std::vector<size_t> shape, std::string mode = "w") {
+        FILE* fp = NULL;
+        std::vector<size_t> true_data_shape; //if appending, the shape of existing + new data
+
+        if(mode == "a") fp = fopen(fname.c_str(),"r+b");
+
+        if(fp) {
+            //file exists. we need to append to it. read the header, modify the array size
+            size_t word_size;
+            bool fortran_order;
+            parse_npy_header(fp,word_size,true_data_shape,fortran_order);
+            assert(!fortran_order);
+
+            if(word_size != sizeof(T)) {
+                std::cout<<"libnpy error: "<<fname<<" has word size "<<word_size<<" but npy_save appending data sized "<<sizeof(T)<<"\n";
+                assert( word_size == sizeof(T) );
+            }
+            if(true_data_shape.size() != shape.size()) {
+                std::cout<<"libnpy error: npy_save attempting to append misdimensioned data to "<<fname<<"\n";
+                assert(true_data_shape.size() != shape.size());
+            }
+
+            for(size_t i = 1; i < shape.size(); i++) {
+                if(shape[i] != true_data_shape[i]) {
+                    std::cout<<"libnpy error: npy_save attempting to append misshaped data to "<<fname<<"\n";
+                    assert(shape[i] == true_data_shape[i]);
+                }
+            }
+            true_data_shape[0] += shape[0];
+        }
+        else {
+            fp = fopen(fname.c_str(),"wb");
+            true_data_shape = shape;
+        }
+
+        std::vector<char> header = create_npy_header<T>(true_data_shape);
+        size_t nels = std::accumulate(shape.begin(),shape.end(),1,std::multiplies<size_t>());
+
+        fseek(fp,0,SEEK_SET);
+        fwrite(&header[0],sizeof(char),header.size(),fp);
+        fseek(fp,0,SEEK_END);
+        fwrite(data,sizeof(T),nels,fp);
+        fclose(fp);
+    }
+
+    template<typename T> void npz_save(std::string zipname, std::string fname, const T* data, const std::vector<size_t>& shape, std::string mode = "w")
+    {
+        //first, append a .npy to the fname
+        fname += ".npy";
+
+        //now, on with the show
+        FILE* fp = NULL;
+        uint16_t nrecs = 0;
+        size_t global_header_offset = 0;
+        std::vector<char> global_header;
+
+        if(mode == "a") fp = fopen(zipname.c_str(),"r+b");
+
+        if(fp) {
+            //zip file exists. we need to add a new npy file to it.
+            //first read the footer. this gives us the offset and size of the global header
+            //then read and store the global header.
+            //below, we will write the the new data at the start of the global header then append the global header and footer below it
+            size_t global_header_size;
+            parse_zip_footer(fp,nrecs,global_header_size,global_header_offset);
+            fseek(fp,global_header_offset,SEEK_SET);
+            global_header.resize(global_header_size);
+            size_t res = fread(&global_header[0],sizeof(char),global_header_size,fp);
+            if(res != global_header_size){
+                fclose(fp);
+                throw std::runtime_error("npz_save: header read error while adding to existing zip");
+            }
+            fseek(fp,global_header_offset,SEEK_SET);
+        }
+        else {
+            fp = fopen(zipname.c_str(),"wb");
+        }
+
+        std::vector<char> npy_header = create_npy_header<T>(shape);
+
+        size_t nels = std::accumulate(shape.begin(),shape.end(),1,std::multiplies<size_t>());
+        size_t nbytes = nels*sizeof(T) + npy_header.size();
+
+        //get the CRC of the data to be added
+        uint32_t crc = crc32(0L,(uint8_t*)&npy_header[0],npy_header.size());
+        crc = crc32(crc,(uint8_t*)data,nels*sizeof(T));
+
+        //build the local header
+        std::vector<char> local_header;
+        local_header += "PK"; //first part of sig
+        local_header += (uint16_t) 0x0403; //second part of sig
+        local_header += (uint16_t) 20; //min version to extract
+        local_header += (uint16_t) 0; //general purpose bit flag
+        local_header += (uint16_t) 0; //compression method
+        local_header += (uint16_t) 0; //file last mod time
+        local_header += (uint16_t) 0;     //file last mod date
+        local_header += (uint32_t) crc; //crc
+        local_header += (uint32_t) nbytes; //compressed size
+        local_header += (uint32_t) nbytes; //uncompressed size
+        local_header += (uint16_t) fname.size(); //fname length
+        local_header += (uint16_t) 0; //extra field length
+        local_header += fname;
+
+        //build global header
+        global_header += "PK"; //first part of sig
+        global_header += (uint16_t) 0x0201; //second part of sig
+        global_header += (uint16_t) 20; //version made by
+        global_header.insert(global_header.end(),local_header.begin()+4,local_header.begin()+30);
+        global_header += (uint16_t) 0; //file comment length
+        global_header += (uint16_t) 0; //disk number where file starts
+        global_header += (uint16_t) 0; //internal file attributes
+        global_header += (uint32_t) 0; //external file attributes
+        global_header += (uint32_t) global_header_offset; //relative offset of local file header, since it begins where the global header used to begin
+        global_header += fname;
+
+        //build footer
+        std::vector<char> footer;
+        footer += "PK"; //first part of sig
+        footer += (uint16_t) 0x0605; //second part of sig
+        footer += (uint16_t) 0; //number of this disk
+        footer += (uint16_t) 0; //disk where footer starts
+        footer += (uint16_t) (nrecs+1); //number of records on this disk
+        footer += (uint16_t) (nrecs+1); //total number of records
+        footer += (uint32_t) global_header.size(); //nbytes of global headers
+        footer += (uint32_t) (global_header_offset + nbytes + local_header.size()); //offset of start of global headers, since global header now starts after newly written array
+        footer += (uint16_t) 0; //zip file comment length
+
+        //write everything
+        if (fp) {
+            fwrite(&local_header[0], sizeof(char), local_header.size(), fp);
+            fwrite(&npy_header[0], sizeof(char), npy_header.size(), fp);
+            fwrite(data, sizeof(T), nels, fp);
+            fwrite(&global_header[0], sizeof(char), global_header.size(), fp);
+            fwrite(&footer[0], sizeof(char), footer.size(), fp);
+            fclose(fp);
+        }
+    }
+
+    template<typename T> void npy_save(std::string fname, const std::vector<T> data, std::string mode = "w") {
+        std::vector<size_t> shape;
+        shape.push_back(data.size());
+        npy_save(fname, &data[0], shape, mode);
+    }
+
+    template<typename T> void npz_save(std::string zipname, std::string fname, const std::vector<T> data, std::string mode = "w") {
+        std::vector<size_t> shape;
+        shape.push_back(data.size());
+        npz_save(zipname, fname, &data[0], shape, mode);
+    }
+
+    template<typename T> std::vector<char> create_npy_header(const std::vector<size_t>& shape) {  
+
+        std::vector<char> dict;
+        dict += "{'descr': '";
+        dict += BigEndianTest();
+        dict += map_type(typeid(T));
+        dict += std::to_string(sizeof(T));
+        dict += "', 'fortran_order': False, 'shape': (";
+        dict += std::to_string(shape[0]);
+        for(size_t i = 1;i < shape.size();i++) {
+            dict += ", ";
+            dict += std::to_string(shape[i]);
+        }
+        if(shape.size() == 1) dict += ",";
+        dict += "), }";
+        //pad with spaces so that preamble+dict is modulo 16 bytes. preamble is 10 bytes. dict needs to end with \n
+        int remainder = 16 - (10 + dict.size()) % 16;
+        dict.insert(dict.end(),remainder,' ');
+        dict.back() = '\n';
+
+        std::vector<char> header;
+        header += (char) 0x93;
+        header += "NUMPY";
+        header += (char) 0x01; //major version of numpy format
+        header += (char) 0x00; //minor version of numpy format
+        header += (uint16_t) dict.size();
+        header.insert(header.end(),dict.begin(),dict.end());
+
+        return header;
+    }
+
+
+}
+
+#endif
--- a/thirdparty/zlib/CMakeLists.txt
+++ b/thirdparty/zlib/CMakeLists.txt
@ -0,0 +1,52 @@
+PROJECT(zlib)
+
+if(NOT WIN32)
+    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-all")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-all")
+endif()
+
+
+if(CMAKE_C_COMPILER_ID STREQUAL "MSVC")
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS}  /MP /wd4996  /W3")
+endif()
+
+set(TARGET_NAME "zlib")
+
+set(lib_srcs 
+    zlib/adler32.c
+    zlib/compress.c
+    zlib/crc32.c
+    zlib/deflate.c
+    zlib/gzclose.c
+    zlib/gzlib.c
+    zlib/gzread.c
+    zlib/gzwrite.c
+    zlib/inflate.c
+    zlib/infback.c
+    zlib/inftrees.c
+    zlib/inffast.c
+    zlib/trees.c
+    zlib/uncompr.c
+    zlib/zutil.c
+)
+
+set(lib_hdrs 
+    zlib/crc32.h
+    zlib/deflate.h
+    zlib/gzguts.h
+    zlib/inffast.h
+    zlib/inffixed.h
+    zlib/inflate.h
+    zlib/inftrees.h
+    zlib/trees.h
+    zlib/zutil.h
+)
+
+set(lib_ext_hdrs "zlib/zlib.h" "zlib/zconf.h")
+add_library(${TARGET_NAME} STATIC ${lib_srcs} ${lib_hdrs} ${lib_ext_hdrs})
+
+
+target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/zlib"
+                                                 "${CMAKE_CURRENT_SOURCE_DIR}/zlib/..")
+
+set_target_properties(zlib PROPERTIES FOLDER thirdparty)
--- a/thirdparty/zlib/zlib
+++ b/thirdparty/zlib/zlib
@ -0,0 +1 @@
+Subproject commit cacf7f1d4e3d44d871b605da3b647f07d718623f
				`@ -0,0 +1 @@`
				`Subproject commit cacf7f1d4e3d44d871b605da3b647f07d718623f`