publications
publications by categories in reversed chronological order.
2024
- TMC’24 TH-CPL A CCF AUnderstanding Differencing Algorithms for Mobile Application UpdatesIEEE Transactions on Mobile Computing (TMC), 2024IF=7.9, CAS Q2, JCR Q1
Mobile application updates occur frequently, and they continue to add considerable traffic over the Internet. Differencing algorithms, which compute a small delta between the new version and the old version, are often employed to reduce the update overhead. Researchers have proposed many differencing algorithms over the years. Unfortunately, it is currently unknown how these algorithms quantitatively perform for different categories of applications. It is also challenging to know the impacts of different techniques and whether a technique in one algorithm can be integrated into another algorithm for further performance improvement. This paper conducts the first systematic study to understand the performance of four widely used differencing algorithms for mobile application updates, including xdelta3, bsdiff, archive patcher, and HDiffPatch with respect to five key metrics, including compression ratio, differencing time/memory overhead, and reconstruction time/memory overhead. We perform measurements for 200 mobile applications, and analyze key techniques (such as decompressing-before-differencing, sliding window, and copy instructions merging) that influence the performance of these algorithms. We have provided four important findings which give insights to further optimize for performance improvement. Guided by these insights, we have also proposed a novel algorithm, sdiff, which achieves the smallest compression ratio to state-of-the-art algorithms by combining an appropriately chosen set of key techniques.
- USENIX ATC’24 TH-CPL A CCF ASimEnc: A High-Performance Similarity-Preserving Encryption Approach for Deduplication of Encrypted Docker ImagesIn Proc. of USENIX ATC, 2024Acceptance Rate: 15.8% (77 out of 488)
Encrypted Docker images are becoming increasingly popular in Docker registries for privacy. As the Docker registry is tasked with managing an increasing number of images, it becomes essential to implement deduplication to conserve storage space. However, deduplication for encrypted images is difficult because deduplication exploits identical content, while encryption tries to make all contents look random. Existing state-of-the-art works try to decompress images and perform message-locked encryption (MLE) to deduplicate encrypted images. Unfortunately, our measurements uncover two limitations in current works: (i) even minor modifications to the image content can hinder MLE deduplication, (ii) decompressing image layers would increase the size of the storage for duplicate data, and significantly compromise user pull latency and deduplication throughput. In this paper, we propose SimEnc, a high-performance similarity-preserving encryption approach for deduplication of encrypted Docker images. SimEnc is the first work that integrates the semantic hash technique into MLE to extract semantic information among layers for improving the deduplication ratio. SimEnc builds on a fast similarity space selection mechanism for flexibility. Unlike existing works completely decompressing the layer, we explore a new similarity space by Huffman decoding that achieves a better deduplication ratio and performance. Experiments show that SimEnc outperforms both the state-of-the-art encrypted serverless platform and plaintext Docker registry, reducing storage consumption by up to 261.7% and 54.2%, respectively. Meanwhile, SimEnc can surpass them in terms of pull latency.
- IPSN’24 TH-CPL A CCF BdTEE: A Declarative Approach to Secure IoT Applications Using TrustZoneIn Proc. of ACM/IEEE IPSN, 2024Acceptance Rate: 21.5% (20 out of 93)
Internet of Things (IoT) applications have recently been widely used in safety-critical scenarios. To prevent sensitive information leaks, IoT device vendors provide hardware-assisted protections, called Trusted Execution Environments (TEEs), like ARM TrustZone. Programming a TEE-based application requires separate code for two components, significantly slowing down the development process. Existing solutions tackle this issue by automatic code partition while not successfully applying it in two complicated scenarios: adding trusted logic and interactions with secure peripherals. We propose dTEE, a declarative approach to secure IoT applications based on TrustZone. dTEE proposes a rapid approach that enables developers to declare tiered-sensitive variables and functions of existing applications. Besides, dTEE automatically transforms device drivers into trusted ones. We evaluate dTEE on four real world IoT applications and seven micro-benchmarks. Results show that dTEE achieves high expressiveness for supporting 50% more applications than existing approaches and reduces 90% of the lines of code against handcrafted development.
- INFOCOM’24 TH-CPL A CCF AExploiting Multiple Similarity Spaces for Efficient and Flexible Incremental Update of Mobile AppsIn Proc. of IEEE INFOCOM, 2024Acceptance Rate: 19.6% (256 out of 1307)
Mobile application updates occur frequently, and they continue to add considerable traffic over the Internet. Differencing algorithms, which compute a small delta between the new version and the old version, are often employed to reduce the update overhead. Transforming the old and new files into the decoded similarity spaces can drastically reduce the delta size. However, this transformation is often hindered by two practical reasons: (1) insufficient decoding (2) long recompression time. To address this challenge, we have proposed two general approaches to transforming the compressed files (more specifically, deflate stream) into the full decoded similarity space and partial decoded similarity space, with low recompression time. The first approach uses recompression-aware searching mechanism, based on a general full decoding tool to transform deflate stream to the full decoded similarity space with a configurable searching complexity, even when it cannot be recompressed identically. The second approach uses a novel solution to transform a deflate stream into the partial decoded similarity space with differencing-friendly LZ77 token reencoding. We have also proposed an algorithm called MDiffPatch to exploit the full and partial decoded similarity spaces. The algorithm can well balance compression ratio and recompression time by exposing a tunable parameter. Extensive evaluation results show that MDiffPatch achieves lower compression ratio than state-of-the-art algorithms and its tunable parameter allows us to achieve a good tradeoff between compression ratio and recompression time.