Data compression
Dotaz
Zobrazit nápovědu
After a boom that coincided with the advent of the internet, digital cameras, digital video and audio storage and playback devices, the research on data compression has rested on its laurels for a quarter of a century. Domain-dependent lossy algorithms of the time, such as JPEG, AVC, MP3 and others, achieved remarkable compression ratios and encoding and decoding speeds with acceptable data quality, which has kept them in common use to this day. However, recent computing paradigms such as cloud computing, edge computing, the Internet of Things (IoT), and digital preservation have gradually posed new challenges, and, as a consequence, development trends in data compression are focusing on concepts that were not previously in the spotlight. In this article, we try to critically evaluate the most prominent of these trends and to explore their parallels, complementarities, and differences. Digital data restoration mimics the human ability to omit memorising information that is satisfactorily retrievable from the context. Feature-based data compression introduces a two-level data representation with higher-level semantic features and with residuals that correct the feature-restored (predicted) data. The integration of the advantages of individual domain-specific data compression methods into a general approach is also challenging. To the best of our knowledge, a method that addresses all these trends does not exist yet. Our methodology, COMPROMISE, has been developed exactly to make as many solutions to these challenges as possible inter-operable. It incorporates features and digital restoration. Furthermore, it is largely domain-independent (general), asymmetric, and universal. The latter refers to the ability to compress data in a common framework in a lossy, lossless, and near-lossless mode. COMPROMISE may also be considered an umbrella that links many existing domain-dependent and independent methods, supports hybrid lossless-lossy techniques, and encourages the development of new data compression algorithms.
- Klíčová slova
- data compression, data restoration, feature, residual, universal algorithm,
- Publikační typ
- časopisecké články MeSH
The performance of ECG signals compression is influenced by many things. However, there is not a single study primarily focused on the possible effects of ECG pathologies on the performance of compression algorithms. This study evaluates whether the pathologies present in ECG signals affect the efficiency and quality of compression. Single-cycle fractal-based compression algorithm and compression algorithm based on combination of wavelet transform and set partitioning in hierarchical trees are used to compress 125 15-leads ECG signals from CSE database. Rhythm and morphology of these signals are newly annotated as physiological or pathological. The compression performance results are statistically evaluated. Using both compression algorithms, physiological signals are compressed with better quality than pathological signals according to 8 and 9 out of 12 quality metrics, respectively. Moreover, it was statistically proven that pathological signals were compressed with lower efficiency than physiological signals. Signals with physiological rhythm and physiological morphology were compressed with the best quality. The worst results reported the group of signals with pathological rhythm and pathological morphology. This study is the first one which deals with effects of ECG pathologies on the performance of compression algorithms. Signal-by-signal rhythm and morphology annotations (physiological/pathological) for the CSE database are newly published.
- MeSH
- algoritmy MeSH
- databáze faktografické MeSH
- elektrokardiografie metody MeSH
- fraktály MeSH
- komprese dat metody MeSH
- lidé MeSH
- vlnková analýza MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Electroencephalography (EEG) experiments typically generate vast amounts of data due to the high sampling rates and the use of multiple electrodes to capture brain activity. Consequently, storing and transmitting these large datasets is challenging, necessitating the creation of specialized compression techniques tailored to this data type. This study proposes one such method, which at its core uses an artificial neural network (specifically a convolutional autoencoder) to learn the latent representations of modelled EEG signals to perform lossy compression, which gets further improved with lossless corrections based on the user-defined threshold for the maximum tolerable amplitude loss, resulting in a flexible near-lossless compression scheme. To test the viability of our approach, a case study was performed on the 256-channel binocular rivalry dataset, which also describes mostly data-specific statistical analyses and preprocessing steps. Compression results, evaluation metrics, and comparisons with baseline general compression methods suggest that the proposed method can achieve substantial compression results and speed, making it one of the potential research topics for follow-up studies.
- Klíčová slova
- Artificial neural networks, Data compression, Electroencephalography, Machine learning, Neuroinformatics,
- MeSH
- autoenkodér MeSH
- elektroencefalografie * metody MeSH
- komprese dat * metody MeSH
- lidé MeSH
- neuronové sítě * MeSH
- počítačové zpracování signálu * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- dataset MeSH
We propose an efficient method for compressing Vietnamese text using n-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n-grams and then encodes them based on n-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Each n-gram is encoded by two to four bytes accordingly based on its corresponding n-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to build n-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.
- MeSH
- algoritmy * MeSH
- Asijci * MeSH
- komprese dat * MeSH
- lidé MeSH
- slovní zásoba * MeSH
- slovníky jako téma * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
The assessment of ECG signal quality after compression is an essential part of the compression process. Compression facilitates the signal archiving, speeds up signal transmission, and reduces the energy consumption. Conversely, lossy compression distorts the signals. Therefore, it is necessary to express the compression performance through both compression efficiency and signal quality. This paper provides an overview of objective algorithms for the assessment of both ECG signal quality after compression and compression efficiency. In this area, there is a lack of standardization, and there is no extensive review as such. 40 methods were tested in terms of their suitability for quality assessment. For this purpose, the whole CSE database was used. The tested signals were compressed using an algorithm based on SPIHT with varying efficiency. As a reference, compressed signals were manually assessed by two experts and classified into three quality groups. Owing to the experts' classification, we determined corresponding ranges of selected quality evaluation methods' values. The suitability of the methods for quality assessment was evaluated based on five criteria. For the assessment of ECG signal quality after compression, we recommend using a combination of these methods: PSim SDNN, QS, SNR1, MSE, PRDN1, MAX, STDERR, and WEDD SWT.
The utilization of computer vision in smart farming is becoming a trend in constructing an agricultural automation scheme. Deep learning (DL) is famous for the accurate approach to addressing the tasks in computer vision, such as object detection and image classification. The superiority of the deep learning model on the smart farming application, called Progressive Contextual Excitation Network (PCENet), has also been studied in our recent study to classify cocoa bean images. However, the assessment of the computational time on the PCENet model shows that the original model is only 0.101s or 9.9 FPS on the Jetson Nano as the edge platform. Therefore, this research demonstrates the compression technique to accelerate the PCENet model using pruning filters. From our experiment, we can accelerate the current model and achieve 16.7 FPS assessed in the Jetson Nano. Moreover, the accuracy of the compressed model can be maintained at 86.1%, while the original model is 86.8%. In addition, our approach is more accurate than ResNet18 as the state-of-the-art only reaches 82.7%. The assessment using the corn leaf disease dataset indicates that the compressed model can achieve an accuracy of 97.5%, while the accuracy of the original PCENet is 97.7%.
- Klíčová slova
- deep learning, model compression, progressive contextual excitation, pruning filters,
- MeSH
- automatizace MeSH
- farmy MeSH
- fyzikální jevy MeSH
- komprese dat * MeSH
- zemědělství * MeSH
- Publikační typ
- časopisecké články MeSH
Distinguishing cause from effect is a scientific challenge resisting solutions from mathematics, statistics, information theory and computer science. Compression-Complexity Causality (CCC) is a recently proposed interventional measure of causality, inspired by Wiener-Granger's idea. It estimates causality based on change in dynamical compression-complexity (or compressibility) of the effect variable, given the cause variable. CCC works with minimal assumptions on given data and is robust to irregular-sampling, missing-data and finite-length effects. However, it only works for one-dimensional time series. We propose an ordinal pattern symbolization scheme to encode multidimensional patterns into one-dimensional symbolic sequences, and thus introduce the Permutation CCC (PCCC). We demonstrate that PCCC retains all advantages of the original CCC and can be applied to data from multidimensional systems with potentially unobserved variables which can be reconstructed using the embedding theorem. PCCC is tested on numerical simulations and applied to paleoclimate data characterized by irregular and uncertain sampling and limited numbers of samples.
Compression of ECG signal is essential especially in the area of signal transmission in telemedicine. There exist many compression algorithms which are described in various details, tested on various datasets and their performance is expressed by different ways. There is a lack of standardization in this area. This study points out these drawbacks and presents new compression algorithm which is properly described, tested and objectively compared with other authors. This study serves as an example how the standardization should look like. Single-cycle fractal-based (SCyF) compression algorithm is introduced and tested on 4 different databases-CSE database, MIT-BIH arrhythmia database, High-frequency signal and Brno University of Technology ECG quality database (BUT QDB). SCyF algorithm is always compared with well-known algorithm based on wavelet transform and set partitioning in hierarchical trees in terms of efficiency (2 methods) and quality/distortion of the signal after compression (12 methods). Detail analysis of the results is provided. The results of SCyF compression algorithm reach up to avL = 0.4460 bps and PRDN = 2.8236%.
An application of the wavelet transform to electrocardiography is described in the paper. The transform is used as a first stage of a lossy compression algorithm for efficient coding of rest ECG signals. The proposed technique is based on the decomposition of the ECG signal into a set of basic functions covering the time-frequency domain. Thus, non-stationary character of ECG data is considered. Some of the time-frequency signal components are removed because of their low influence to signal characteristics. Resulting components are efficiently coded by quantization, composition into a sequence of coefficients and compression by a run-length coder and a entropic Huffman coder. The proposed wavelet-based compression algorithm can compress data to average code length about 1 bit/sample. The algorithm can be also implemented to a real-time processing system when wavelet transform is computed by fast linear filters described in the paper.
- MeSH
- akustika MeSH
- algoritmy MeSH
- časové faktory MeSH
- elektrokardiografie * MeSH
- Fourierova analýza MeSH
- lidé MeSH
- odpočinek fyziologie MeSH
- počítačové zpracování signálu * MeSH
- rozpoznávání automatizované MeSH
- telefon MeSH
- telekomunikace MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
- MeSH
- chemické databáze MeSH
- komprese dat metody MeSH
- krystalografie metody MeSH
- makromolekulární látky chemie ultrastruktura MeSH
- molekulární modely * MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Názvy látek
- makromolekulární látky MeSH