/articles/hdinsight-dotnet-avro-serialization.md
Markdown | 1312 lines | 1073 code | 239 blank | 0 comment | 0 complexity | c302f4de0d30e60de42df1f381f25e87 MD5 | raw file
Large files files are truncated, but you can click here to view the full file
- <properties linkid="hdinsight-dotnet-avro-serialization" urlDisplayName="HDInsight Microsoft .NET Library for Serialization with Avro" pageTitle="Serialize data with the Microsoft .NET Library for Avro | Azure" metaKeywords="" description="Learn how Azure HDInsight uses Avro to serialize big data." metaCanonical="" services="hdinsight" documentationCenter="" title="Serialize data with the Microsoft .NET Library for Avro " authors="bradsev" solutions="" manager="paulettm" editor="cgronlun" />
- # Serialize data with the Microsoft .NET Library for Avro
- ##Overview
- This topic shows how to use the Microsoft .NET Library for Avro to serialize objects and other data structures into streams in order to persist them to memory, a database or a file, and also how to deserialize them to recover the original objects.
- ###Apache Avro
- The Microsoft .NET Library for Avro implements the Apache Avro data serialization system for the Microsoft.NET environment. Apache Avro provides a compact binary data interchange format for serialization. It uses [JSON](http://www.json.org) to define language agnostic schema that underwrites language interoperability. Data serialized in one language can be read in another. Currently C, C++, C#, Java, PHP, Python, and Ruby are supported. Detailed information on the format can be found in the [Apache Avro Specification](http://avro.apache.org/docs/current/spec.html). Note that the current version of the Microsoft .NET Library for Avro does not support the Remote Procedure Calls (RPC) part of this specification.
- The serialized representation of an object in Avro system consists of two parts: schema and actual value. The Avro schema describes the language independent data model of the serialized data with JSON. It is present side-by-side with a binary representation of data. Having the schema separate from the binary representation permits each object to be written with no per-value overheads, making serialization fast and the representation small.
- ###The Hadoop scenario
- Apache Avro serialization format is widely used in Azure HDInsight and other Apache Hadoop environments. Avro provides a convenient way to represent complex data structures within a Hadoop MapReduce job. The format of Avro files has been designed to support the distributed MapReduce programming model. The key feature that enables the distribution is that the files are “splittable” in the sense that one can seek any point in a file and start reading from a particular block.
-
- ###Serialization in the Microsoft .NET Library for Avro
- The .NET Library for Avro supports two ways of serializing objects:
- - **reflection**: The JSON schema for the types is automatically built from the data contract attributes of the .NET types to be serialized.
- - **generic record**:The A JSON schema is explicitly specified in a record represented by the [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) class when no .NET types are present to describe the schema for the data to be serialized.
- When the data schema is known to both the writer and reader of the stream, the data can be sent without its schema. But when this is not the case, the schema must be shared using an Avro container file. Other parameters such as the codec used for data compression can be specified. These scenarios are outlined in more detail and illustrated in the code examples below.
- ###Microsoft .NET Library for Avro prerequisites
- - [Microsoft .NET Framework v4.0](http://www.microsoft.com/en-us/download/details.aspx?id=17851)
- - [Newtonsoft Json.NET](http://james.newtonking.com/json) (v5.0.5 or later)
- Note that the Newtonsoft.Json.dll dependency is downloaded automatically with with the installation of the Microsoft .NET Library for Avro, the procedure for which is provided in the following section.
- ###Microsoft .NET Library for Avro installation
- The Microsoft .NET Library for Avro is distributed as a NuGet Package that can be installed from Visual Studio using the following procedure:
- - Select the **Project** tab -> **Manage NuGet Packages...**
- - Search for "Microsoft.Hadoop.Avro" in the **Online Search** box.
- - Click the **Install** button next to **Microsoft .NET Library for Avro**.
- Note that the Newtonsoft.Json.dll (>= .5.0.5) dependency is also downloaded automatically with with the Microsoft .NET Library for Avro.
-
- ##Guide to the samples
- Five examples provided in this topic each illustrate different scenarios supported by the Microsoft .NET Library for Avro.
- The first two show how to serialize and deserialize data into memory stream buffers using reflection and generic records. The schema in these two cases is assumed to be shared between the readers and writers out-of-band so that the schema does not need to be serialized with the data in an Avro container file.
- The third and fourth examples show how to serialize and deserialize data into memory stream buffers using reflection and generic record with Avro object container files. When data is stored in an Avro container file, its schema is always stored with it because the schema must be shared for deserialization.
- The sample containing the first four examples can be downloaded from [Azure code samples](http://code.msdn.microsoft.com/windowsazure/Serialize-data-with-the-86055923) site.
- The fifth and final example shows how to how to use a custom compression codec for object container files. A sample containing the code for this example can be downloaded from the [Azure code samples](http://code.msdn.microsoft.com/windowsazure/Serialize-data-with-the-67159111) site.
- The Microsoft .NET Library for Avro is designed to work with any stream. In these examples, data is manipulated using memory streams rather than file streams or databases for simplicity and consistency. The approach taken in a production environment will depend on the exact scenario requirements, data source and volume, performance constraints, and other factors.
- * <a href="#Scenario1">**Serialization with reflection**</a>: The JSON schema for types to be serialized is automatically built from the data contract attributes.
- * <a href="#Scenario2">**Serialization with generic record**</a>: The JSON schema is explicitly specified in a record when no .NET type is available for reflection.
- * <a href="#Scenario3">**Serialization using object container files with reflection**</a>: The JSON schema is implicitly serialized with the data and shared using an Avro container file.
- * <a href="#Scenario4">**Serialization using object container files with generic record**</a>: The JSON schema is explicitly serialized with the data and shared using an Avro container file.
- * <a href="#Scenario5">**Serialization using object container files with a custom compression codec**</a>: The JSON schema is serialized with data and shared using an Avro container file with a customized .NET implementation of the deflate data compression codec.
- <h2> <a name="Scenario1"></a>Serialization with reflection</h2>
-
- The JSON schema for the types can be automatically built by Microsoft .NET Library for Avro using reflection from the data contract attributes of the C# objects to be serialized. Microsoft .NET Library for Avro creates an [**IAvroSeralizer<T>**](http://msdn.microsoft.com/en-us/library/dn627341.aspx) to identify the fields to be serialized.
- In this example objects (a **SensorData** class with a member **Location** struct) are serialized to a memory stream and this stream is in turn deserialized. The result is then compared to the initial instance to confirm that the **SensorData** object recovered is identical to original.
- The schema in this example is assumed to be shared between the readers and writers, so the Avro object container format is not required. For an example of how to serialize and deserialize data into memory buffers using reflection with the object container format when the schema must be serialized with the data, see <a href="#Scenario3">Serialization using object container files with reflection.</a>
- namespace Microsoft.Hadoop.Avro.Sample
- {
- using System;
- using System.Collections.Generic;
- using System.IO;
- using System.Linq;
- using System.Runtime.Serialization;
- using Microsoft.Hadoop.Avro.Container;
- //Sample Class used in serialization samples
- [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
- internal class SensorData
- {
- [DataMember(Name = "Location")]
- public Location Position { get; set; }
- [DataMember(Name = "Value")]
- public byte[] Value { get; set; }
- }
- //Sample struct used in serialization samples
- [DataContract]
- internal struct Location
- {
- [DataMember]
- public int Floor { get; set; }
- [DataMember]
- public int Room { get; set; }
- }
- //This class contains all methods demonstrating
- //the usage of Microsoft .NET Library for Avro
- public class AvroSample
- {
- //Serialize and deserialize sample data set represented as an object using Reflection
- //No explicit schema definition is required - schema of serialized objects is automatically built
- public void SerializeDeserializeObjectUsingReflection()
- {
- Console.WriteLine("SERIALIZATION USING REFLECTION\n");
- Console.WriteLine("Serializing Sample Data Set...");
- //Create a new AvroSerializer instance and specify a custom serialization strategy AvroDataContractResolver
- //for serializing only properties attributed with DataContract/DateMember
- var avroSerializer = AvroSerializer.Create<SensorData>();
- //Create a Memory Stream buffer
- using (var buffer = new MemoryStream())
- {
- //Create a data set using sample Class and struct
- var expected = new SensorData { Value = new byte[] { 1, 2, 3, 4, 5 }, Position = new Location { Room = 243, Floor = 1 } };
- //Serialize the data to the specified stream
- avroSerializer.Serialize(buffer, expected);
- Console.WriteLine("Deserializing Sample Data Set...");
- //Prepare the stream for deserializing the data
- buffer.Seek(0, SeekOrigin.Begin);
- //Deserialize data from the stream and cast it to the same type used for serialization
- var actual = avroSerializer.Deserialize(buffer);
- Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
- //Finally, verify that deserialized data matches the original one
- bool isEqual = this.Equal(expected, actual);
- Console.WriteLine("Result of Data Set Identity Comparison is {0}", isEqual);
- }
- }
- //
- //Helper methods
- //
- //Comparing two SensorData objects
- private bool Equal(SensorData left, SensorData right)
- {
- return left.Position.Equals(right.Position) && left.Value.SequenceEqual(right.Value);
- }
- static void Main()
- {
- string sectionDivider = "---------------------------------------- ";
- //Create an instance of AvroSample Class and invoke methods
- //illustrating different serializing approaches
- AvroSample Sample = new AvroSample();
- //Serialization to memory using Reflection
- Sample.SerializeDeserializeObjectUsingReflection();
- Console.WriteLine(sectionDivider);
- Console.WriteLine("Press any key to exit.");
- Console.Read();
- }
- }
- }
- // The example is expected to display the following output:
- // SERIALIZATION USING REFLECTION
- //
- // Serializing Sample Data Set...
- // Deserializing Sample Data Set...
- // Comparing Initial and Deserialized Data Sets...
- // Result of Data Set Identity Comparison is True
- // ----------------------------------------
- // Press any key to exit.
- <h2> <a name="Scenario2"></a>Serialization with a generic record</h2>
- A JSON schema can be explicitly specified in a generic record when reflection cannot be used because the data cannot be represented using .NET classes with a data contract. This method is generally slower than using reflection and serializers for specific C# class. In such cases, the schema for the data may also be dynamic because it is not be known until compile-time. Data represented as Comma Separated Values (CSV) files whose schema is unknown until it is transformed to the Avro format at run-time is an example of this sort of dynamic scenario.
- This example shows how to create and use an [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) to explicitly specify a JSON schema, how to populate it with the data, then serialize and deserialize it. The result is then compared to the initial instance to confirm that the record recovered is identical to original.
- The schema in this example is assumed to be shared between the readers and writers, so the Avro object container format is not required. For an example of how to serialize and deserialize data into memory buffers using a generic record with the object container format when the schema must be included with the serialized data, see <a href="#Scenario4">Serialization using object container files with generic record</a> example.
- namespace Microsoft.Hadoop.Avro.Sample
- {
- using System;
- using System.Collections.Generic;
- using System.IO;
- using System.Linq;
- using System.Runtime.Serialization;
- using Microsoft.Hadoop.Avro.Container;
- using Microsoft.Hadoop.Avro.Schema;
- //This class contains all methods demonstrating
- //the usage of Microsoft .NET Library for Avro
- public class AvroSample
- {
- //Serialize and deserialize sample data set using Generic Record.
- //Generic Record is a special class with the schema explicitly defined in JSON.
- //All serialized data should be mapped to the fields of Generic Record,
- //which in turn will be then serialized.
- public void SerializeDeserializeObjectUsingGenericRecords()
- {
- Console.WriteLine("SERIALIZATION USING GENERIC RECORD\n");
- Console.WriteLine("Defining the Schema and creating Sample Data Set...");
- //Define the schema in JSON
- const string Schema = @"{
- ""type"":""record"",
- ""name"":""Microsoft.Hadoop.Avro.Specifications.SensorData"",
- ""fields"":
- [
- {
- ""name"":""Location"",
- ""type"":
- {
- ""type"":""record"",
- ""name"":""Microsoft.Hadoop.Avro.Specifications.Location"",
- ""fields"":
- [
- { ""name"":""Floor"", ""type"":""int"" },
- { ""name"":""Room"", ""type"":""int"" }
- ]
- }
- },
- { ""name"":""Value"", ""type"":""bytes"" }
- ]
- }";
- //Create a generic serializer based on the schema
- var serializer = AvroSerializer.CreateGeneric(Schema);
- var rootSchema = serializer.WriterSchema as RecordSchema;
- //Create a Memory Stream buffer
- using (var stream = new MemoryStream())
- {
- //Create a generic record to represent the data
- dynamic location = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
- location.Floor = 1;
- location.Room = 243;
- dynamic expected = new AvroRecord(serializer.WriterSchema);
- expected.Location = location;
- expected.Value = new byte[] { 1, 2, 3, 4, 5 };
- Console.WriteLine("Serializing Sample Data Set...");
- //Serialize the data
- serializer.Serialize(stream, expected);
- stream.Seek(0, SeekOrigin.Begin);
- Console.WriteLine("Deserializing Sample Data Set...");
- //Deserialize the data into a generic record
- dynamic actual = serializer.Deserialize(stream);
- Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
- //Finally, verify the results
- bool isEqual = expected.Location.Floor.Equals(actual.Location.Floor);
- isEqual = isEqual && expected.Location.Room.Equals(actual.Location.Room);
- isEqual = isEqual && ((byte[])expected.Value).SequenceEqual((byte[])actual.Value);
- Console.WriteLine("Result of Data Set Identity Comparison is {0}", isEqual);
- }
- }
- static void Main()
- {
- string sectionDivider = "---------------------------------------- ";
- //Create an instance of AvroSample Class and invoke methods
- //illustrating different serializing approaches
- AvroSample Sample = new AvroSample();
- //Serialization to memory using Generic Record
- Sample.SerializeDeserializeObjectUsingGenericRecords();
- Console.WriteLine(sectionDivider);
- Console.WriteLine("Press any key to exit.");
- Console.Read();
- }
- }
- }
- // The example is expected to display the following output:
- // SERIALIZATION USING GENERIC RECORD
- //
- // Defining the Schema and creating Sample Data Set...
- // Serializing Sample Data Set...
- // Deserializing Sample Data Set...
- // Comparing Initial and Deserialized Data Sets...
- // Result of Data Set Identity Comparison is True
- // ----------------------------------------
- // Press any key to exit.
- <h2> <a name="Scenario3"></a>Serialization using object container files and serialization with reflection</h2>
- This example is similar to scenario in the <a href="#Scenario1"> first example</a> where the schema is implicitly specified with reflection, except that here the schema is not assumed to be known to the reader that deserializes it. The **SensorData** objects to be serialized and its implicitly specified schema are stored in an object container file represented by the [**AvroContainer**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.avrocontainer.aspx) class.
- The data is serialized in this example with [**SequentialWriter<SensorData>**](http://msdn.microsoft.com/en-us/library/dn627340.aspx) and deserialized with [**SequentialReader<SensorData>**](http://msdn.microsoft.com/en-us/library/dn627340.aspx). The result then is compared to the initial instances to insure identity.
- The data in object container file is compressed using the default [**Deflate**][deflate-100] compression codec from .NET Framework 4.0. See the <a href="#Scenario5"> last example</a> in this topic to learn how to use a more recent and superior version of the [**Deflate**][deflate-110] compression codec available in .NET Framework 4.5.
- namespace Microsoft.Hadoop.Avro.Sample
- {
- using System;
- using System.Collections.Generic;
- using System.IO;
- using System.Linq;
- using System.Runtime.Serialization;
- using Microsoft.Hadoop.Avro.Container;
- //Sample Class used in serialization samples
- [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
- internal class SensorData
- {
- [DataMember(Name = "Location")]
- public Location Position { get; set; }
- [DataMember(Name = "Value")]
- public byte[] Value { get; set; }
- }
- //Sample struct used in serialization samples
- [DataContract]
- internal struct Location
- {
- [DataMember]
- public int Floor { get; set; }
- [DataMember]
- public int Room { get; set; }
- }
- //This class contains all methods demonstrating
- //the usage of Microsoft .NET Library for Avro
- public class AvroSample
- {
- //Serializes and deserializes sample data set using Reflection and Avro Object Container Files
- //Serialized data is compressed with Deflate codec
- public void SerializeDeserializeUsingObjectContainersReflection()
- {
- Console.WriteLine("SERIALIZATION USING REFLECTION AND AVRO OBJECT CONTAINER FILES\n");
- //Path for Avro Object Container File
- string path = "AvroSampleReflectionDeflate.avro";
- //Create a data set using sample Class and struct
- var testData = new List<SensorData>
- {
- new SensorData { Value = new byte[] { 1, 2, 3, 4, 5 }, Position = new Location { Room = 243, Floor = 1 } },
- new SensorData { Value = new byte[] { 6, 7, 8, 9 }, Position = new Location { Room = 244, Floor = 1 } }
- };
- //Serializing and saving data to file
- //Creating a Memory Stream buffer
- using (var buffer = new MemoryStream())
- {
- Console.WriteLine("Serializing Sample Data Set...");
- //Create a SequentialWriter instance for type SensorData which can serialize a sequence of SensorData objects to stream
- //Data will be compressed using Deflate codec
- using (var w = AvroContainer.CreateWriter<SensorData>(buffer, Codec.Deflate))
- {
- using (var writer = new SequentialWriter<SensorData>(w, 24))
- {
- // Serialize the data to stream using the sequential writer
- testData.ForEach(writer.Write);
- }
- }
- //Save stream to file
- Console.WriteLine("Saving serialized data to file...");
- if (!WriteFile(buffer, path))
- {
- Console.WriteLine("Error during file operation. Quitting method");
- return;
- }
- }
- //Reading and deserializing data
- //Creating a Memory Stream buffer
- using (var buffer = new MemoryStream())
- {
- Console.WriteLine("Reading data from file...");
- //Reading data from Object Container File
- if (!ReadFile(buffer, path))
- {
- Console.WriteLine("Error during file operation. Quitting method");
- return;
- }
- Console.WriteLine("Deserializing Sample Data Set...");
- //Prepare the stream for deserializing the data
- buffer.Seek(0, SeekOrigin.Begin);
- //Create a SequentialReader for type SensorData which will derserialize all serialized objects from the given stream
- //It allows iterating over the deserialized objects because it implements IEnumerable<T> interface
- using (var reader = new SequentialReader<SensorData>(
- AvroContainer.CreateReader<SensorData>(buffer, true)))
- {
- var results = reader.Objects;
- //Finally, verify that deserialized data matches the original one
- Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
- int count = 1;
- var pairs = testData.Zip(results, (serialized, deserialized) => new { expected = serialized, actual = deserialized });
- foreach (var pair in pairs)
- {
- bool isEqual = this.Equal(pair.expected, pair.actual);
- Console.WriteLine("For Pair {0} result of Data Set Identity Comparison is {1}", count, isEqual);
- count++;
- }
- }
- }
- //Delete the file
- RemoveFile(path);
- }
- //
- //Helper methods
- //
- //Comparing two SensorData objects
- private bool Equal(SensorData left, SensorData right)
- {
- return left.Position.Equals(right.Position) && left.Value.SequenceEqual(right.Value);
- }
- //Saving memory stream to a new file with the given path
- private bool WriteFile(MemoryStream InputStream, string path)
- {
- if (!File.Exists(path))
- {
- try
- {
- using (FileStream fs = File.Create(path))
- {
- InputStream.Seek(0, SeekOrigin.Begin);
- InputStream.CopyTo(fs);
- }
- return true;
- }
- catch (Exception e)
- {
- Console.WriteLine("The following exception was thrown during creation and writing to the file \"{0}\"", path);
- Console.WriteLine(e.Message);
- return false;
- }
- }
- else
- {
- Console.WriteLine("Can not create file \"{0}\". File already exists", path);
- return false;
- }
- }
- //Reading a file content using given path to a memory stream
- private bool ReadFile(MemoryStream OutputStream, string path)
- {
- try
- {
- using (FileStream fs = File.Open(path, FileMode.Open))
- {
- fs.CopyTo(OutputStream);
- }
- return true;
- }
- catch (Exception e)
- {
- Console.WriteLine("The following exception was thrown during reading from the file \"{0}\"", path);
- Console.WriteLine(e.Message);
- return false;
- }
- }
- //Deleting file using given path
- private void RemoveFile(string path)
- {
- if (File.Exists(path))
- {
- try
- {
- File.Delete(path);
- }
- catch (Exception e)
- {
- Console.WriteLine("The following exception was thrown during deleting the file \"{0}\"", path);
- Console.WriteLine(e.Message);
- }
- }
- else
- {
- Console.WriteLine("Can not delete file \"{0}\". File does not exist", path);
- }
- }
- static void Main()
- {
- string sectionDivider = "---------------------------------------- ";
- //Create an instance of AvroSample Class and invoke methods
- //illustrating different serializing approaches
- AvroSample Sample = new AvroSample();
- //Serialization using Reflection to Avro Object Container File
- Sample.SerializeDeserializeUsingObjectContainersReflection();
- Console.WriteLine(sectionDivider);
- Console.WriteLine("Press any key to exit.");
- Console.Read();
- }
- }
- }
- // The example is expected to display the following output:
- // SERIALIZATION USING REFLECTION AND AVRO OBJECT CONTAINER FILES
- //
- // Serializing Sample Data Set...
- // Saving serialized data to file...
- // Reading data from file...
- // Deserializing Sample Data Set...
- // Comparing Initial and Deserialized Data Sets...
- // For Pair 1 result of Data Set Identity Comparison is True
- // For Pair 2 result of Data Set Identity Comparison is True
- // ----------------------------------------
- // Press any key to exit.
-
- <h2> <a name="Scenario4"></a>Serialization using object container files and serialization with generic record</h2>
- This example is similar to scenario in the <a href="#Scenario2"> second example</a> where the schema is explicitly specified with JSON, except that here the schema is not assumed to be known to the reader that deserializes it.
- The test data set is collected into a list of [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) objects using an explicitly defined JSON schema and then stored in an object container file represented by the [**AvroContainer**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.avrocontainer.aspx) class. This container file creates a writer that is used to serialize the data, uncompressed, to a memory stream that is then saved to a file. It is the [**Codex.Null**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.codec.null.aspx) parameter used when creating the reader that specifies this data will not be compressed.
- The data is then read from the file and deserialized into a collection of objects. This collection is compared to the initial list of Avro records to confirm that they are identical.
- namespace Microsoft.Hadoop.Avro.Sample
- {
- using System;
- using System.Collections.Generic;
- using System.IO;
- using System.Linq;
- using System.Runtime.Serialization;
- using Microsoft.Hadoop.Avro.Container;
- using Microsoft.Hadoop.Avro.Schema;
- //This class contains all methods demonstrating
- //the usage of Microsoft .NET Library for Avro
- public class AvroSample
- {
- //Serializes and deserializes sample data set using Generic Record and Avro Object Container Files
- //Serialized data is not compressed
- public void SerializeDeserializeUsingObjectContainersGenericRecord()
- {
- Console.WriteLine("SERIALIZATION USING GENERIC RECORD AND AVRO OBJECT CONTAINER FILES\n");
- //Path for Avro Object Container File
- string path = "AvroSampleGenericRecordNullCodec.avro";
- Console.WriteLine("Defining the Schema and creating Sample Data Set...");
- //Define the schema in JSON
- const string Schema = @"{
- ""type"":""record"",
- ""name"":""Microsoft.Hadoop.Avro.Specifications.SensorData"",
- ""fields"":
- [
- {
- ""name"":""Location"",
- ""type"":
- {
- ""type"":""record"",
- ""name"":""Microsoft.Hadoop.Avro.Specifications.Location"",
- ""fields"":
- [
- { ""name"":""Floor"", ""type"":""int"" },
- { ""name"":""Room"", ""type"":""int"" }
- ]
- }
- },
- { ""name"":""Value"", ""type"":""bytes"" }
- ]
- }";
- //Create a generic serializer based on the schema
- var serializer = AvroSerializer.CreateGeneric(Schema);
- var rootSchema = serializer.WriterSchema as RecordSchema;
- //Create a generic record to represent the data
- var testData = new List<AvroRecord>();
- dynamic expected1 = new AvroRecord(rootSchema);
- dynamic location1 = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
- location1.Floor = 1;
- location1.Room = 243;
- expected1.Location = location1;
- expected1.Value = new byte[] { 1, 2, 3, 4, 5 };
- testData.Add(expected1);
- dynamic expected2 = new AvroRecord(rootSchema);
- dynamic location2 = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
- location2.Floor = 1;
- location2.Room = 244;
- expected2.Location = location2;
- expected2.Value = new byte[] { 6, 7, 8, 9 };
- testData.Add(expected2);
- //Serializing and saving data to file
- //Create a MemoryStream buffer
- using (var buffer = new MemoryStream())
- {
- Console.WriteLine("Serializing Sample Data Set...");
- //Create a SequentialWriter instance for type SensorData which can serialize a sequence of SensorData objects to stream
- //Data will not be compressed (Null compression codec)
- using (var writer = AvroContainer.CreateGenericWriter(Schema, buffer, Codec.Null))
- {
- using (var streamWriter = new SequentialWriter<object>(writer, 24))
- {
- // Serialize the data to stream using the sequential writer
- testData.ForEach(streamWriter.Write);
- }
- }
- Console.WriteLine("Saving serialized data to file...");
- //Save stream to file
- if (!WriteFile(buffer, path))
- {
- Console.WriteLine("Error during file operation. Quitting method");
- return;
- }
- }
- //Reading and deserializng the data
- //Create a Memory Stream buffer
- using (var buffer = new MemoryStream())
- {
- Console.WriteLine("Reading data from file...");
- //Reading data from Object Container File
- if (!ReadFile(buffer, path))
- {
- Console.WriteLine("Error during file operation. Quitting method");
- return;
- }
- Console.WriteLine("Deserializing Sample Data Set...");
- //Prepare the stream for deserializing the data
- buffer.Seek(0, SeekOrigin.Begin);
- //Create a SequentialReader for type SensorData which will derserialize all serialized objects from the given stream
- //It allows iterating over the deserialized objects because it implements IEnumerable<T> interface
- using (var reader = AvroContainer.CreateGenericReader(buffer))
- {
- using (var streamReader = new SequentialReader<object>(reader))
- {
- var results = streamReader.Objects;
- Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
- //Finally, verify the results
- var pairs = testData.Zip(results, (serialized, deserialized) => new { expected = (dynamic)serialized, actual = (dynamic)deserialized });
- int count = 1;
- foreach (var pair in pairs)
- {
- bool isEqual = pair.expected.Location.Floor.Equals(pair.actual.Location.Floor);
- isEqual = isEqual && pair.expected.Location.Room.Equals(pair.actual.Location.Room);
- isEqual = isEqual && ((byte[])pair.expected.Value).SequenceEqual((byte[])pair.actual.Value);
- Console.WriteLine("For Pair {0} result of Data Set Identity Comparison is {1}", count, isEqual.ToString());
- count++;
- }
- }
- }
- }
- //Delete the file
- RemoveFile(path);
- }
- //
- //Helper methods
- //
- //Saving memory stream to a new file with the given path
- private bool WriteFile(MemoryStream InputStream, string path)
- {
- if (!File.Exists(path))
- {
- try
- {
- using (FileStream fs = File.Create(path))
- {
- InputStream.Seek(0, SeekOrigin.Begin);
- InputStream.CopyTo(fs);
- }
- return true;
- }
- catch (Exception e)
- {
- Console.WriteLine("The following exception was thrown during creation and writing to the file \"{0}\"", path);
- Console.WriteLine(e.Message);
- return false;
- }
- }
- else
- {
- Console.WriteLine("Can not create file \"{0}\". File already exists", path);
- return false;
- }
- }
- //Reading a file content using given path to a memory stream
- private bool ReadFile(MemoryStream OutputStream, string path)
- {
- try
- {
- using (FileStream fs = File.Open(path, FileMode.Open))
- {
- fs.CopyTo(OutputStream);
- }
- return true;
- }
- catch (Exception e)
- {
- Console.WriteLine("The following exception was thrown during reading from the file \"{0}\"", path);
- Console.WriteLine(e.Message);
- return false;
- }
- }
- //Deleting file using given path
- private void RemoveFile(string path)
- {
- if (File.Exists(path))
- {
- try
- {
- File.Delete(path);
- }
- catch (Exception e)
- {
- Console.WriteLine("The following exception was thrown during deleting the file \"{0}\"", path);
- Console.WriteLine(e.Message);
- }
- }
- else
- {
- Console.WriteLine("Can not delete file \"{0}\". File does not exist", path);
- }
- }
- static void Main()
- {
- string sectionDivider = "---------------------------------------- ";
- //Create an instance of AvroSample Class and invoke methods
- //illustrating different serializing approaches
- AvroSample Sample = new AvroSample();
- //Serialization using Generic Record to Avro Object Container File
- Sample.SerializeDeserializeUsingObjectContainersGenericRecord();
- Console.WriteLine(sectionDivider);
- Console.WriteLine("Press any key to exit.");
- Console.Read();
- }
- }
- }
- // The example is expected to display the following output:
- // SERIALIZATION USING GENERIC RECORD AND AVRO OBJECT CONTAINER FILES
- //
- // Defining the Schema and creating Sample Data Set...
- // Serializing Sample Data Set...
- // Saving serialized data to file...
- // Reading data from file...
- // Deserializing Sample Data Set...
- // Comparing Initial and Deserialized Data Sets...
- // For Pair 1 result of Data Set Identity Comparison is True
- // For Pair 2 result of Data Set Identity Comparison is True
- // ----------------------------------------
- // Press any key to exit.
- <h2> <a name="Scenario5"></a>Serialization using object container files with a custom compression codec</h2>
- The example below shows how to use a custom compression codec for Avro object container files. The [Avro Specification](http://avro.apache.org/docs/current/spec.html#Required+Codecs) allows usage of an optional compression codec (in addition to **Null** and **Deflate** defaults). This example is not implementing completely new codec such Snappy (mentioned as a supported optional codec in [Avro Specification](http://avro.apache.org/docs/current/spec.html#snappy)). It shows how to use the .NET Framework 4.5 implementation of the [**Deflate**][deflate-110] codec which provides a better compression algorithm based on the [zlib](http://zlib.net/) compression library than the default .NET Framework 4.0 version.
- //
- // This code needs to be compiled with the parameter Target Framework set as ".NET Framework 4.5"
- // to ensure the desired implementation of Deflate compression algorithm is used
- // Ensure your C# Project is set up accordingly
- //
- namespace Microsoft.Hadoop.Avro.Sample
- {
- using System;
- using System.Collections.Generic;
- using System.Diagnostics;
- using System.IO;
- using System.IO.Compression;
- using System.Linq;
- using System.Runtime.Serialization;
- using Microsoft.Hadoop.Avro.Container;
- #region Defining objects for serialization
- //Sample Class used in serialization samples
- [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
- internal class SensorData
- {
- [DataMember(Name = "Location")]
- public Location Position { get; set; }
- [DataMember(Name = "Value")]
- public byte[] Value { get; set; }
- }
- //Sample struct used in serialization samples
- [DataContract]
- internal struct Location
- {
- [DataMember]
- public int Floor { get; set; }
- [DataMember]
- public int Room { get; set; }
- }
- #endregion
- #region Defining custom codec based on .NET Framework V.4.5 Deflate
- //Avro.NET Codec class contains two methods
- //GetCompressedStreamOver(Stream uncompressed) and GetDecompressedStreamOver(Stream compressed)
- //which are the key ones for data compression.
- //To enable a custom codec one needs to implement these methods for the required codec
- #region Defining Compression and Decompression Streams
- //DeflateStream (class from System.IO.Compression namespace that implements Deflate algorithm)
- //can not be directly used for Avro because it does not support vital operations like Seek.
- //Thus one needs to implement two classes inherited from Stream
- //(one for compressed and one for decompressed stream)
- //that use Deflate compression and implement all required features
- internal sealed class CompressionStreamDeflate45 : Stream
- {
- private readonly Stream buffer;
- private DeflateStream compressionStream;
- public CompressionStreamDeflate45(Stream buffer)
- {
- Debug.Assert(buffer != null, "Buffer is not allowed to be null.");
- this.compressionStream = new DeflateStream(buffer, CompressionLevel.Fastest, true);
- this.buffer = buffer;
- }
- public override bool CanRead
- {
- get { return this.buffer.CanRead; }
- }
- public override bool CanSeek
- {
- get { return true; }
- }
- public override bool CanWrite
- {
- get { return this.buffer.CanWrite; }
- }
- public override void Flush()
- {
- this.compressionStream.Close();
- }
- public override long Length
- {
- get { return this.buffer.Length; }
- }
- public override long Position
- {
- get
- {
- return this.buffer.Position;
- }
- set
- {
- this.buffer.Position = value;
- }
- }
- public override int Read(byte[] buffer, int offset, int count)
- {
- return this.buffer.Read(buffer, offset, count);
- }
- public override long Seek(long offset, SeekOrigin origin)
- {
- return this.buffer.Seek(offset, origin);
- }
- public override void SetLength(long value)
- {
- throw new NotSupportedException();
- }
- public override void Write(byte[] buffer, int offset, int count)
- {
- this.compressionStream.Write(buffer, offset, count);
- }
- protected override void Dispose(bool disposed)
- {
- base.Dispose(disposed);
- if (disposed)
- {
- this.compressionStream.Dispose();
- this.compressionStream = null;
- }
- }
- }
- internal sealed class DecompressionStreamDeflate45 : Stream
- {
- private readonly DeflateStream decompressed;
- public DecompressionStreamDeflate45(Stream compressed)
- {
- this.decompressed = new DeflateStream(compressed, CompressionMode.Decompress, true);
- }
- public override bool CanRead
- {
- get { return true; }
- }
- public override bool CanSeek
- {
- get { return true; }
- }
- public override bool CanWrite
- {
- get { return false; }
- }
- public override void Flush()
- {
- this.decompressed.Close();
- }
- public override long Length
- {
- get { return this.decompressed.Length; }
- }
- public override long Position
- {
- get
- {
- return this.decompressed.Position;
- }
- set
- {
- throw new NotSupportedException();
- }
- }
- public override int Read(byte[] buffer, int offset, int count)
- {
- return this.decompressed.Read(buffer, offset, count);
- }
- public override long Seek(long offset, SeekOrigin origin)
- {
- throw new NotSupportedException();
- }
- public override void SetLength(long value)
- {
- throw new NotSupportedException();
- }
- public override void Write(byte[] buffer, int offset, int count)
- {
- throw new NotSupportedException();
- }
- protected override void Dispose(bool disposing)
- {
- base.Dispose(disposing);
- if (disposing)
- {
- this.decompressed.Dispose();
- }
- }
- }
- #endregion
- #region Define Codec
- //Define the actual codec class containing the required methods for manipulating streams:
- //GetCompressedStreamOver(Stream uncompressed) and GetDecompressedStreamOver(Stream compressed)
- //Codec class uses classes for comressed and decompressed streams defined above
- internal sealed class DeflateCodec45 : Codec
- {
- //We merely use different IMPLEMENTATION of Deflate, so the CodecName remains "deflate"
- public static readonly string CodecName = "deflate";
- public DeflateCodec45()
- : base(CodecName)
- {
- }
- public override Stream GetCompressedStreamOver(Stream decompressed)
- {
- if (decompressed == null)
- {
- throw new ArgumentNullException("decompressed");
- }
- return new CompressionStreamDeflate45(decompressed);
- }
- public override Stream GetDecompressedStreamOver(Stream compress…
Large files files are truncated, but you can click here to view the full file