Extracting GZip & Tar Files Natively in .NET Without External Libraries

Extracting GZip & Tar Files Natively in .NET Without External Libraries

Introduction

Imagine being in a scenario where a file of type .tar.gz lands in your Azure Blob Storage container. This file, when uncompressed, yields a collection of individual files. The trigger event for the arrival of this file is an Azure function, which springs into action, decompressing the contents and transferring them into a different container.

In this context, a team may instinctively reach out for a robust library like SharpZipLib. However, what if there is a mandate to accomplish this without external dependencies? This becomes a reality with .NET 7.

In .NET 7, native support for Tar files has been introduced, and GZip is catered to via System.IO.Compression. This means we can decompress a .tar.gz file natively in .NET 7, bypassing any need for external libraries.

This post will walk you through this process, providing a practical example using .NET 7 to show how this can be achieved.

.NET 7: Native TAR Support

As of .NET 7, the System.Formats.Tar namespace was introduced to deal with TAR files, adding to the toolkit of .NET developers:

  • System.Formats.Tar.TarFile to pack a directory into a TAR file or extract a TAR file to a directory
  • System.Formats.Tar.TarReader to read a TAR file
  • System.Formats.Tar.TarWriter to write a TAR file

These new capabilities significantly simplify the process of working with TAR files in .NET. Lets dive in an have a look at a code sample that demonstrates how to extract a .tar.gz file natively in .NET 7.

A Simple Example In .NET 7

Below is an example demonstrating the extraction of a .tar.gz file natively in .NET 7 in a simple console app to extract the contents of a .tar.gz file to a directory

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
using System;
using System.IO;
using System.IO.Compression;
using System.Formats.Tar;

class Program
{
static void Main(string[] args)
{
string sourceTarGzFilePath = @"C:\_Temp\test.tar.gz";
string targetDirectory = @"C:\_Temp\ExtractedFiles\";

string tarFilePath = Path.ChangeExtension(sourceTarGzFilePath, ".tar");

Directory.CreateDirectory(targetDirectory);

// Decompress the .gz file
using (FileStream originalFileStream = File.OpenRead(sourceTarGzFilePath))
{
using (FileStream decompressedFileStream = File.Create(tarFilePath))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
}

// Extract the .tar file
using (FileStream tarStream = File.OpenRead(tarFilePath))
{
using (TarReader tarReader = new TarReader(tarStream))
{
TarEntry entry;
while ((entry = tarReader.GetNextEntryAsync().Result) != null)
{
if (entry.EntryType is TarEntryType.SymbolicLink or TarEntryType.HardLink or TarEntryType.GlobalExtendedAttributes)
{
continue;
}

Console.WriteLine($"Extracting {entry.Name}");
entry.ExtractToFileAsync(Path.Combine(targetDirectory, entry.Name), true).Wait();
}
}
}

// Delete the temporary .tar file
File.Delete(tarFilePath);

Console.WriteLine("Extraction Completed");
}
}

You can also find this on GitHub Gist.

Wrapping Up

The introduction of System.Formats.Tar in .NET 7 marks a significant milestone for developers dealing with .tar.gz files. It provides us with the ability to decompress these file types natively, without relying on external libraries. This functionality is a game-changer as it reduces complexity, minimizes external dependencies, and enhances the versatility of .NET applications.

The new namespace System.Formats.Tar, along with the established System.IO.Compression, effectively handle TAR and GZip files. This considerably simplifies the process, making the .NET environment more self-contained and versatile.

References

Author

Ricky Gummadi

Posted on

2023-06-25

Updated on

2023-07-17

Licensed under

Comments