PageRenderTime 113ms CodeModel.GetById 2ms app.highlight 104ms RepoModel.GetById 1ms app.codeStats 1ms

/articles/hdinsight-dotnet-avro-serialization.md

https://github.com/dboyi/azure-content
Markdown | 1312 lines | 1073 code | 239 blank | 0 comment | 0 complexity | c302f4de0d30e60de42df1f381f25e87 MD5 | raw file

Large files files are truncated, but you can click here to view the full file

   1<properties linkid="hdinsight-dotnet-avro-serialization" urlDisplayName="HDInsight Microsoft .NET Library for Serialization with Avro" pageTitle="Serialize data with the Microsoft .NET Library for Avro | Azure" metaKeywords="" description="Learn how Azure HDInsight uses Avro to serialize big data." metaCanonical="" services="hdinsight" documentationCenter="" title="Serialize data with the Microsoft .NET Library for Avro " authors="bradsev" solutions="" manager="paulettm" editor="cgronlun" />
   2
   3
   4# Serialize data with the Microsoft .NET Library for Avro
   5
   6##Overview
   7This topic shows how to use the Microsoft .NET Library for Avro to serialize objects and other data structures into streams in order to persist them to memory, a database or a file, and also how to deserialize them to recover the original objects. 
   8
   9###Apache Avro
  10The Microsoft .NET Library for Avro implements the Apache Avro data serialization system for the Microsoft.NET environment. Apache Avro provides a compact binary data interchange format for serialization. It uses [JSON](http://www.json.org) to define language agnostic schema that underwrites language interoperability. Data serialized in one language can be read in another. Currently C, C++, C#, Java, PHP, Python, and Ruby are supported. Detailed information on the format can be found in the [Apache Avro Specification](http://avro.apache.org/docs/current/spec.html). Note that the current version of the Microsoft .NET Library for Avro does not support the Remote Procedure Calls (RPC) part of this specification.
  11
  12The serialized representation of an object in Avro system consists of two parts: schema and actual value. The Avro schema describes the language independent data model of the serialized data with JSON. It is present side-by-side with a binary representation of data.  Having the schema separate from the binary representation permits each object to be written with no per-value overheads, making serialization fast and the representation small. 
  13
  14###The Hadoop scenario 
  15Apache Avro serialization format is widely used in Azure HDInsight and other Apache Hadoop environments. Avro provides a convenient way to represent complex data structures within a Hadoop MapReduce job. The format of Avro files has been designed to support the distributed MapReduce programming model. The key feature that enables the distribution is that the files are “splittable” in the sense that one can seek any point in a file and start reading from a particular block. 
  16 
  17###Serialization in the Microsoft .NET Library for Avro
  18The .NET Library for Avro supports two ways of serializing objects:
  19
  20- **reflection**: The JSON schema for the types is automatically built from the data contract attributes of the .NET types to be serialized. 
  21- **generic record**:The A JSON schema is explicitly specified in a record represented by the [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) class when no .NET types are present to describe the schema for the data to be serialized. 
  22
  23When the data schema is known to both the writer and reader of the stream, the data can be sent without its schema. But when this is not the case, the schema must be shared using an Avro container file. Other parameters such as the codec used for data compression can be specified. These scenarios are outlined in more detail and illustrated in the code examples below.
  24
  25
  26###Microsoft .NET Library for Avro prerequisites
  27- [Microsoft .NET Framework v4.0](http://www.microsoft.com/en-us/download/details.aspx?id=17851)
  28- [Newtonsoft Json.NET](http://james.newtonking.com/json) (v5.0.5 or later) 
  29
  30Note that the Newtonsoft.Json.dll dependency is downloaded automatically with with the installation of the Microsoft .NET Library for Avro, the procedure for which is provided in the following section.
  31
  32###Microsoft .NET Library for Avro installation
  33The Microsoft .NET Library for Avro is distributed as a NuGet Package that can be installed from Visual Studio using the following procedure: 
  34
  35- Select the **Project** tab -> **Manage NuGet Packages...**
  36- Search for "Microsoft.Hadoop.Avro" in the **Online Search** box.
  37- Click the **Install** button next to **Microsoft .NET Library for Avro**. 
  38
  39Note that the Newtonsoft.Json.dll (>= .5.0.5) dependency is also downloaded automatically with with the Microsoft .NET Library for Avro.
  40 
  41
  42##Guide to the samples
  43Five examples provided in this topic each illustrate different scenarios supported by the Microsoft .NET Library for Avro. 
  44
  45The first two show how to serialize and deserialize data into memory stream buffers using reflection and generic records. The schema in these two cases is assumed to be shared between the readers and writers out-of-band so that the schema does not need to be serialized with the data in an Avro container file. 
  46
  47The third and fourth examples show how to serialize and deserialize data into memory stream buffers using reflection and generic record with Avro object container files. When data is stored in an Avro container file, its schema is always stored with it because the schema must be shared for deserialization.
  48
  49The sample containing the first four examples can be downloaded from [Azure code samples](http://code.msdn.microsoft.com/windowsazure/Serialize-data-with-the-86055923) site.
  50
  51The fifth and final example shows how to how to use a custom compression codec for object container files. A sample containing the code for this example can be downloaded from the  [Azure  code samples](http://code.msdn.microsoft.com/windowsazure/Serialize-data-with-the-67159111) site.
  52
  53The Microsoft .NET Library for Avro is designed to work with any stream. In these examples, data is manipulated using memory streams rather than file streams or databases for simplicity and consistency. The approach taken in a production environment will depend on the exact scenario requirements, data source and volume, performance constraints, and other factors.
  54
  55 * <a href="#Scenario1">**Serialization with reflection**</a>: The JSON schema for types to be serialized is automatically built from the data contract attributes.
  56 * <a href="#Scenario2">**Serialization with generic record**</a>: The JSON schema is explicitly specified in a record when no .NET type is available for reflection.
  57 * <a href="#Scenario3">**Serialization using object container files with reflection**</a>: The JSON schema is implicitly serialized with the data and shared using an Avro container file.
  58 * <a href="#Scenario4">**Serialization using object container files with generic record**</a>: The JSON schema is explicitly serialized with the data and shared using an Avro container file.
  59 * <a href="#Scenario5">**Serialization using object container files with a custom compression codec**</a>: The JSON schema is serialized with data and shared using an Avro container file with a customized .NET implementation of the deflate data compression codec.
  60
  61
  62<h2> <a name="Scenario1"></a>Serialization with reflection</h2>
  63 
  64The JSON schema for the types can be automatically built by Microsoft .NET Library for Avro using reflection from the data contract attributes of the C# objects to be serialized. Microsoft .NET Library for Avro creates an [**IAvroSeralizer<T>**](http://msdn.microsoft.com/en-us/library/dn627341.aspx) to identify the fields to be serialized.
  65
  66In this example objects (a **SensorData** class with a member **Location** struct) are serialized to a memory stream and this stream is in turn deserialized. The result is then compared to the initial instance to confirm that the **SensorData** object recovered is identical to original.
  67
  68The schema in this example is assumed to be shared between the readers and writers, so the Avro object container format is not required. For an example of how to serialize and deserialize data into memory buffers using reflection with the object container format when the schema must be serialized with the data, see <a href="#Scenario3">Serialization using object container files with reflection.</a>
  69
  70    namespace Microsoft.Hadoop.Avro.Sample
  71    {
  72        using System;
  73        using System.Collections.Generic;
  74        using System.IO;
  75        using System.Linq;
  76        using System.Runtime.Serialization;
  77        using Microsoft.Hadoop.Avro.Container;
  78
  79        //Sample Class used in serialization samples
  80        [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
  81        internal class SensorData
  82        {
  83            [DataMember(Name = "Location")]
  84            public Location Position { get; set; }
  85
  86            [DataMember(Name = "Value")]
  87            public byte[] Value { get; set; }
  88        }
  89
  90        //Sample struct used in serialization samples
  91        [DataContract]
  92        internal struct Location
  93        {
  94            [DataMember]
  95            public int Floor { get; set; }
  96
  97            [DataMember]
  98            public int Room { get; set; }
  99        }
 100
 101        //This class contains all methods demonstrating
 102        //the usage of Microsoft .NET Library for Avro
 103        public class AvroSample
 104        {
 105
 106            //Serialize and deserialize sample data set represented as an object using Reflection
 107            //No explicit schema definition is required - schema of serialized objects is automatically built
 108            public void SerializeDeserializeObjectUsingReflection()
 109            {
 110
 111                Console.WriteLine("SERIALIZATION USING REFLECTION\n");
 112                Console.WriteLine("Serializing Sample Data Set...");
 113
 114                //Create a new AvroSerializer instance and specify a custom serialization strategy AvroDataContractResolver
 115                //for serializing only properties attributed with DataContract/DateMember
 116                var avroSerializer = AvroSerializer.Create<SensorData>();
 117
 118                //Create a Memory Stream buffer
 119                using (var buffer = new MemoryStream())
 120                {
 121                    //Create a data set using sample Class and struct 
 122                    var expected = new SensorData { Value = new byte[] { 1, 2, 3, 4, 5 }, Position = new Location { Room = 243, Floor = 1 } };
 123
 124                    //Serialize the data to the specified stream
 125                    avroSerializer.Serialize(buffer, expected);
 126
 127
 128                    Console.WriteLine("Deserializing Sample Data Set...");
 129
 130                    //Prepare the stream for deserializing the data
 131                    buffer.Seek(0, SeekOrigin.Begin);
 132
 133                    //Deserialize data from the stream and cast it to the same type used for serialization
 134                    var actual = avroSerializer.Deserialize(buffer);
 135
 136                    Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
 137
 138                    //Finally, verify that deserialized data matches the original one
 139                    bool isEqual = this.Equal(expected, actual);
 140
 141                    Console.WriteLine("Result of Data Set Identity Comparison is {0}", isEqual);
 142
 143                }
 144            }
 145
 146            //
 147            //Helper methods
 148            //
 149
 150            //Comparing two SensorData objects
 151            private bool Equal(SensorData left, SensorData right)
 152            {
 153                return left.Position.Equals(right.Position) && left.Value.SequenceEqual(right.Value);
 154            }
 155
 156
 157
 158            static void Main()
 159            {
 160
 161                string sectionDivider = "---------------------------------------- ";
 162
 163                //Create an instance of AvroSample Class and invoke methods
 164                //illustrating different serializing approaches
 165                AvroSample Sample = new AvroSample();
 166
 167                //Serialization to memory using Reflection
 168                Sample.SerializeDeserializeObjectUsingReflection();
 169
 170                Console.WriteLine(sectionDivider);
 171                Console.WriteLine("Press any key to exit.");
 172                Console.Read();
 173            }
 174        }
 175    }
 176    // The example is expected to display the following output: 
 177    // SERIALIZATION USING REFLECTION
 178    //
 179    // Serializing Sample Data Set...
 180    // Deserializing Sample Data Set...
 181    // Comparing Initial and Deserialized Data Sets...
 182    // Result of Data Set Identity Comparison is True
 183    // ----------------------------------------
 184    // Press any key to exit.
 185
 186
 187<h2> <a name="Scenario2"></a>Serialization with a generic record</h2>
 188
 189A JSON schema can be explicitly specified in a generic record when reflection cannot be used because the data cannot be represented using .NET classes with a data contract. This method is generally slower than using reflection and serializers for specific C# class. In such cases, the schema for the  data may also be dynamic because it is not be known until compile-time. Data represented as Comma Separated Values (CSV) files whose schema is unknown until it is transformed to the Avro format at run-time is an example of this sort of dynamic scenario.
 190
 191This example shows how to create and use an [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) to explicitly specify a JSON schema, how to populate it with the data, then serialize and deserialize it. The result is then compared to the initial instance to confirm that the record recovered is identical to original.
 192
 193The schema in this example is assumed to be shared between the readers and writers, so the Avro object container format is not required. For an example of how to serialize and deserialize data into memory buffers using a generic record with the object container format when the schema must be included with the serialized data, see <a href="#Scenario4">Serialization using object container files with generic record</a> example.
 194
 195
 196	namespace Microsoft.Hadoop.Avro.Sample
 197	{
 198    using System;
 199    using System.Collections.Generic;
 200    using System.IO;
 201    using System.Linq;
 202    using System.Runtime.Serialization;
 203    using Microsoft.Hadoop.Avro.Container;
 204    using Microsoft.Hadoop.Avro.Schema;
 205
 206    //This class contains all methods demonstrating
 207    //the usage of Microsoft .NET Library for Avro
 208    public class AvroSample
 209    {
 210
 211        //Serialize and deserialize sample data set using Generic Record.
 212        //Generic Record is a special class with the schema explicitly defined in JSON.
 213        //All serialized data should be mapped to the fields of Generic Record,
 214        //which in turn will be then serialized.
 215        public void SerializeDeserializeObjectUsingGenericRecords()
 216        {
 217            Console.WriteLine("SERIALIZATION USING GENERIC RECORD\n");
 218            Console.WriteLine("Defining the Schema and creating Sample Data Set...");
 219
 220            //Define the schema in JSON
 221            const string Schema = @"{
 222                                ""type"":""record"",
 223                                ""name"":""Microsoft.Hadoop.Avro.Specifications.SensorData"",
 224                                ""fields"":
 225                                    [
 226                                        { 
 227                                            ""name"":""Location"", 
 228                                            ""type"":
 229                                                {
 230                                                    ""type"":""record"",
 231                                                    ""name"":""Microsoft.Hadoop.Avro.Specifications.Location"",
 232                                                    ""fields"":
 233                                                        [
 234                                                            { ""name"":""Floor"", ""type"":""int"" },
 235                                                            { ""name"":""Room"", ""type"":""int"" }
 236                                                        ]
 237                                                }
 238                                        },
 239                                        { ""name"":""Value"", ""type"":""bytes"" }
 240                                    ]
 241                            }";
 242
 243            //Create a generic serializer based on the schema
 244            var serializer = AvroSerializer.CreateGeneric(Schema);
 245            var rootSchema = serializer.WriterSchema as RecordSchema;
 246
 247            //Create a Memory Stream buffer
 248            using (var stream = new MemoryStream())
 249            {
 250                //Create a generic record to represent the data
 251                dynamic location = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
 252                location.Floor = 1;
 253                location.Room = 243;
 254
 255                dynamic expected = new AvroRecord(serializer.WriterSchema);
 256                expected.Location = location;
 257                expected.Value = new byte[] { 1, 2, 3, 4, 5 };
 258
 259                Console.WriteLine("Serializing Sample Data Set...");
 260
 261                //Serialize the data
 262                serializer.Serialize(stream, expected);
 263
 264                stream.Seek(0, SeekOrigin.Begin);
 265
 266                Console.WriteLine("Deserializing Sample Data Set...");
 267
 268                //Deserialize the data into a generic record
 269                dynamic actual = serializer.Deserialize(stream);
 270
 271                Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
 272
 273                //Finally, verify the results
 274                bool isEqual = expected.Location.Floor.Equals(actual.Location.Floor);
 275                isEqual = isEqual && expected.Location.Room.Equals(actual.Location.Room);
 276                isEqual = isEqual && ((byte[])expected.Value).SequenceEqual((byte[])actual.Value);
 277                Console.WriteLine("Result of Data Set Identity Comparison is {0}", isEqual);
 278            }
 279        }
 280
 281        static void Main()
 282        {
 283
 284            string sectionDivider = "---------------------------------------- ";
 285
 286            //Create an instance of AvroSample Class and invoke methods
 287            //illustrating different serializing approaches
 288            AvroSample Sample = new AvroSample();
 289
 290            //Serialization to memory using Generic Record
 291            Sample.SerializeDeserializeObjectUsingGenericRecords();
 292
 293            Console.WriteLine(sectionDivider);
 294            Console.WriteLine("Press any key to exit.");
 295            Console.Read();
 296        }
 297    }
 298	}
 299    // The example is expected to display the following output: 
 300    // SERIALIZATION USING GENERIC RECORD
 301    //
 302    // Defining the Schema and creating Sample Data Set...
 303    // Serializing Sample Data Set...
 304    // Deserializing Sample Data Set...
 305    // Comparing Initial and Deserialized Data Sets...
 306    // Result of Data Set Identity Comparison is True
 307    // ----------------------------------------
 308    // Press any key to exit.
 309
 310
 311<h2> <a name="Scenario3"></a>Serialization using object container files and serialization with reflection</h2>
 312
 313This example is similar to scenario in the <a href="#Scenario1"> first example</a> where the schema is implicitly specified with reflection, except that here the schema is not assumed to be known to the reader that deserializes it. The **SensorData** objects to be serialized and its implicitly specified schema are stored in an object container file represented by the [**AvroContainer**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.avrocontainer.aspx) class. 
 314
 315The data is serialized in this example with [**SequentialWriter<SensorData>**](http://msdn.microsoft.com/en-us/library/dn627340.aspx) and deserialized with [**SequentialReader<SensorData>**](http://msdn.microsoft.com/en-us/library/dn627340.aspx). The result then is compared to the initial instances to insure identity.
 316
 317The data in object container file is compressed using the default [**Deflate**][deflate-100] compression codec from .NET Framework 4.0. See the <a href="#Scenario5"> last example</a> in this topic to learn how to use a more recent and superior version of the [**Deflate**][deflate-110] compression codec available in .NET Framework 4.5.
 318
 319    namespace Microsoft.Hadoop.Avro.Sample
 320    {
 321        using System;
 322        using System.Collections.Generic;
 323        using System.IO;
 324        using System.Linq;
 325        using System.Runtime.Serialization;
 326        using Microsoft.Hadoop.Avro.Container;
 327
 328        //Sample Class used in serialization samples
 329        [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
 330        internal class SensorData
 331        {
 332            [DataMember(Name = "Location")]
 333            public Location Position { get; set; }
 334
 335            [DataMember(Name = "Value")]
 336            public byte[] Value { get; set; }
 337        }
 338
 339        //Sample struct used in serialization samples
 340        [DataContract]
 341        internal struct Location
 342        {
 343            [DataMember]
 344            public int Floor { get; set; }
 345
 346            [DataMember]
 347            public int Room { get; set; }
 348        }
 349
 350        //This class contains all methods demonstrating
 351        //the usage of Microsoft .NET Library for Avro
 352        public class AvroSample
 353        {
 354
 355            //Serializes and deserializes sample data set using Reflection and Avro Object Container Files
 356            //Serialized data is compressed with Deflate codec
 357            public void SerializeDeserializeUsingObjectContainersReflection()
 358            {
 359
 360                Console.WriteLine("SERIALIZATION USING REFLECTION AND AVRO OBJECT CONTAINER FILES\n");
 361
 362                //Path for Avro Object Container File
 363                string path = "AvroSampleReflectionDeflate.avro";
 364
 365                //Create a data set using sample Class and struct
 366                var testData = new List<SensorData>
 367                        {
 368                            new SensorData { Value = new byte[] { 1, 2, 3, 4, 5 }, Position = new Location { Room = 243, Floor = 1 } },
 369                            new SensorData { Value = new byte[] { 6, 7, 8, 9 }, Position = new Location { Room = 244, Floor = 1 } }
 370                        };
 371
 372                //Serializing and saving data to file
 373                //Creating a Memory Stream buffer
 374                using (var buffer = new MemoryStream())
 375                {
 376                    Console.WriteLine("Serializing Sample Data Set...");
 377
 378                    //Create a SequentialWriter instance for type SensorData which can serialize a sequence of SensorData objects to stream
 379                    //Data will be compressed using Deflate codec
 380                    using (var w = AvroContainer.CreateWriter<SensorData>(buffer, Codec.Deflate))
 381                    {
 382                        using (var writer = new SequentialWriter<SensorData>(w, 24))
 383                        {
 384                            // Serialize the data to stream using the sequential writer
 385                            testData.ForEach(writer.Write);
 386                        }
 387                    }
 388
 389                    //Save stream to file
 390                    Console.WriteLine("Saving serialized data to file...");
 391                    if (!WriteFile(buffer, path))
 392                    {
 393                        Console.WriteLine("Error during file operation. Quitting method");
 394                        return;
 395                    }
 396                }
 397
 398                //Reading and deserializing data
 399                //Creating a Memory Stream buffer
 400                using (var buffer = new MemoryStream())
 401                {
 402                    Console.WriteLine("Reading data from file...");
 403
 404                    //Reading data from Object Container File
 405                    if (!ReadFile(buffer, path))
 406                    {
 407                        Console.WriteLine("Error during file operation. Quitting method");
 408                        return;
 409                    }
 410
 411                    Console.WriteLine("Deserializing Sample Data Set...");
 412
 413                    //Prepare the stream for deserializing the data
 414                    buffer.Seek(0, SeekOrigin.Begin);
 415
 416                    //Create a SequentialReader for type SensorData which will derserialize all serialized objects from the given stream
 417                    //It allows iterating over the deserialized objects because it implements IEnumerable<T> interface
 418                    using (var reader = new SequentialReader<SensorData>(
 419                        AvroContainer.CreateReader<SensorData>(buffer, true)))
 420                    {
 421                        var results = reader.Objects;
 422
 423                        //Finally, verify that deserialized data matches the original one
 424                        Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
 425                        int count = 1;
 426                        var pairs = testData.Zip(results, (serialized, deserialized) => new { expected = serialized, actual = deserialized });
 427                        foreach (var pair in pairs)
 428                        {
 429                            bool isEqual = this.Equal(pair.expected, pair.actual);
 430                            Console.WriteLine("For Pair {0} result of Data Set Identity Comparison is {1}", count, isEqual);
 431                            count++;
 432                        }
 433                    }
 434                }
 435
 436                //Delete the file
 437                RemoveFile(path);
 438            }
 439
 440            //
 441            //Helper methods
 442            //
 443
 444            //Comparing two SensorData objects
 445            private bool Equal(SensorData left, SensorData right)
 446            {
 447                return left.Position.Equals(right.Position) && left.Value.SequenceEqual(right.Value);
 448            }
 449
 450            //Saving memory stream to a new file with the given path
 451            private bool WriteFile(MemoryStream InputStream, string path)
 452            {
 453                if (!File.Exists(path))
 454                {
 455                    try
 456                    {
 457                        using (FileStream fs = File.Create(path))
 458                        {
 459                            InputStream.Seek(0, SeekOrigin.Begin);
 460                            InputStream.CopyTo(fs);
 461                        }
 462                        return true;
 463                    }
 464                    catch (Exception e)
 465                    {
 466                        Console.WriteLine("The following exception was thrown during creation and writing to the file \"{0}\"", path);
 467                        Console.WriteLine(e.Message);
 468                        return false;
 469                    }
 470                }
 471                else
 472                {
 473                    Console.WriteLine("Can not create file \"{0}\". File already exists", path);
 474                    return false;
 475
 476                }
 477            }
 478
 479            //Reading a file content using given path to a memory stream
 480            private bool ReadFile(MemoryStream OutputStream, string path)
 481            {
 482                try
 483                {
 484                    using (FileStream fs = File.Open(path, FileMode.Open))
 485                    {
 486                        fs.CopyTo(OutputStream);
 487                    }
 488                    return true;
 489                }
 490                catch (Exception e)
 491                {
 492                    Console.WriteLine("The following exception was thrown during reading from the file \"{0}\"", path);
 493                    Console.WriteLine(e.Message);
 494                    return false;
 495                }
 496            }
 497
 498            //Deleting file using given path
 499            private void RemoveFile(string path)
 500            {
 501                if (File.Exists(path))
 502                {
 503                    try
 504                    {
 505                        File.Delete(path);
 506                    }
 507                    catch (Exception e)
 508                    {
 509                        Console.WriteLine("The following exception was thrown during deleting the file \"{0}\"", path);
 510                        Console.WriteLine(e.Message);
 511                    }
 512                }
 513                else
 514                {
 515                    Console.WriteLine("Can not delete file \"{0}\". File does not exist", path);
 516                }
 517            }
 518
 519            static void Main()
 520            {
 521
 522                string sectionDivider = "---------------------------------------- ";
 523
 524                //Create an instance of AvroSample Class and invoke methods
 525                //illustrating different serializing approaches
 526                AvroSample Sample = new AvroSample();
 527
 528                //Serialization using Reflection to Avro Object Container File
 529                Sample.SerializeDeserializeUsingObjectContainersReflection();
 530
 531                Console.WriteLine(sectionDivider);
 532                Console.WriteLine("Press any key to exit.");
 533                Console.Read();
 534            }
 535        }
 536    }
 537    // The example is expected to display the following output:
 538    // SERIALIZATION USING REFLECTION AND AVRO OBJECT CONTAINER FILES
 539    //
 540    // Serializing Sample Data Set...
 541    // Saving serialized data to file...
 542    // Reading data from file...
 543    // Deserializing Sample Data Set...
 544    // Comparing Initial and Deserialized Data Sets...
 545    // For Pair 1 result of Data Set Identity Comparison is True
 546    // For Pair 2 result of Data Set Identity Comparison is True
 547    // ----------------------------------------
 548    // Press any key to exit.
 549  
 550
 551<h2> <a name="Scenario4"></a>Serialization using object container files and serialization with generic record</h2>
 552
 553This example is similar to scenario in the <a href="#Scenario2"> second example</a> where the schema is explicitly specified with JSON, except that here the schema is not assumed to be known to the reader that deserializes it. 
 554
 555The test data set is collected into a list of [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) objects using an explicitly defined JSON schema and then stored in an object container file represented by the [**AvroContainer**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.avrocontainer.aspx) class. This container file creates a writer that is used to serialize the data, uncompressed, to a memory stream that is then saved to a file. It is the [**Codex.Null**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.codec.null.aspx) parameter used when creating the reader that specifies this data will not be compressed. 
 556
 557The data is then read from the file and deserialized into a collection of objects. This collection is compared to the initial list of Avro records to confirm that they are identical.
 558
 559
 560    namespace Microsoft.Hadoop.Avro.Sample
 561    {
 562        using System;
 563        using System.Collections.Generic;
 564        using System.IO;
 565        using System.Linq;
 566        using System.Runtime.Serialization;
 567        using Microsoft.Hadoop.Avro.Container;
 568        using Microsoft.Hadoop.Avro.Schema;
 569
 570        //This class contains all methods demonstrating
 571        //the usage of Microsoft .NET Library for Avro
 572        public class AvroSample
 573        {
 574
 575            //Serializes and deserializes sample data set using Generic Record and Avro Object Container Files
 576            //Serialized data is not compressed
 577            public void SerializeDeserializeUsingObjectContainersGenericRecord()
 578            {
 579                Console.WriteLine("SERIALIZATION USING GENERIC RECORD AND AVRO OBJECT CONTAINER FILES\n");
 580
 581                //Path for Avro Object Container File
 582                string path = "AvroSampleGenericRecordNullCodec.avro";
 583
 584                Console.WriteLine("Defining the Schema and creating Sample Data Set...");
 585
 586                //Define the schema in JSON
 587                const string Schema = @"{
 588                                ""type"":""record"",
 589                                ""name"":""Microsoft.Hadoop.Avro.Specifications.SensorData"",
 590                                ""fields"":
 591                                    [
 592                                        { 
 593                                            ""name"":""Location"", 
 594                                            ""type"":
 595                                                {
 596                                                    ""type"":""record"",
 597                                                    ""name"":""Microsoft.Hadoop.Avro.Specifications.Location"",
 598                                                    ""fields"":
 599                                                        [
 600                                                            { ""name"":""Floor"", ""type"":""int"" },
 601                                                            { ""name"":""Room"", ""type"":""int"" }
 602                                                        ]
 603                                                }
 604                                        },
 605                                        { ""name"":""Value"", ""type"":""bytes"" }
 606                                    ]
 607                            }";
 608
 609                //Create a generic serializer based on the schema
 610                var serializer = AvroSerializer.CreateGeneric(Schema);
 611                var rootSchema = serializer.WriterSchema as RecordSchema;
 612
 613                //Create a generic record to represent the data
 614                var testData = new List<AvroRecord>();
 615
 616                dynamic expected1 = new AvroRecord(rootSchema);
 617                dynamic location1 = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
 618                location1.Floor = 1;
 619                location1.Room = 243;
 620                expected1.Location = location1;
 621                expected1.Value = new byte[] { 1, 2, 3, 4, 5 };
 622                testData.Add(expected1);
 623
 624                dynamic expected2 = new AvroRecord(rootSchema);
 625                dynamic location2 = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
 626                location2.Floor = 1;
 627                location2.Room = 244;
 628                expected2.Location = location2;
 629                expected2.Value = new byte[] { 6, 7, 8, 9 };
 630                testData.Add(expected2);
 631
 632                //Serializing and saving data to file
 633                //Create a MemoryStream buffer
 634                using (var buffer = new MemoryStream())
 635                {
 636                    Console.WriteLine("Serializing Sample Data Set...");
 637
 638                    //Create a SequentialWriter instance for type SensorData which can serialize a sequence of SensorData objects to stream
 639                    //Data will not be compressed (Null compression codec)
 640                    using (var writer = AvroContainer.CreateGenericWriter(Schema, buffer, Codec.Null))
 641                    {
 642                        using (var streamWriter = new SequentialWriter<object>(writer, 24))
 643                        {
 644                            // Serialize the data to stream using the sequential writer
 645                            testData.ForEach(streamWriter.Write);
 646                        }
 647                    }
 648
 649                    Console.WriteLine("Saving serialized data to file...");
 650
 651                    //Save stream to file
 652                    if (!WriteFile(buffer, path))
 653                    {
 654                        Console.WriteLine("Error during file operation. Quitting method");
 655                        return;
 656                    }
 657                }
 658
 659                //Reading and deserializng the data
 660                //Create a Memory Stream buffer
 661                using (var buffer = new MemoryStream())
 662                {
 663                    Console.WriteLine("Reading data from file...");
 664
 665                    //Reading data from Object Container File
 666                    if (!ReadFile(buffer, path))
 667                    {
 668                        Console.WriteLine("Error during file operation. Quitting method");
 669                        return;
 670                    }
 671
 672                    Console.WriteLine("Deserializing Sample Data Set...");
 673
 674                    //Prepare the stream for deserializing the data
 675                    buffer.Seek(0, SeekOrigin.Begin);
 676
 677                    //Create a SequentialReader for type SensorData which will derserialize all serialized objects from the given stream
 678                    //It allows iterating over the deserialized objects because it implements IEnumerable<T> interface
 679                    using (var reader = AvroContainer.CreateGenericReader(buffer))
 680                    {
 681                        using (var streamReader = new SequentialReader<object>(reader))
 682                        {
 683                            var results = streamReader.Objects;
 684
 685                            Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
 686
 687                            //Finally, verify the results
 688                            var pairs = testData.Zip(results, (serialized, deserialized) => new { expected = (dynamic)serialized, actual = (dynamic)deserialized });
 689                            int count = 1;
 690                            foreach (var pair in pairs)
 691                            {
 692                                bool isEqual = pair.expected.Location.Floor.Equals(pair.actual.Location.Floor);
 693                                isEqual = isEqual && pair.expected.Location.Room.Equals(pair.actual.Location.Room);
 694                                isEqual = isEqual && ((byte[])pair.expected.Value).SequenceEqual((byte[])pair.actual.Value);
 695                                Console.WriteLine("For Pair {0} result of Data Set Identity Comparison is {1}", count, isEqual.ToString());
 696                                count++;
 697                            }
 698                        }
 699                    }
 700                }
 701
 702                //Delete the file
 703                RemoveFile(path);
 704            }
 705
 706            //
 707            //Helper methods
 708            //
 709
 710            //Saving memory stream to a new file with the given path
 711            private bool WriteFile(MemoryStream InputStream, string path)
 712            {
 713                if (!File.Exists(path))
 714                {
 715                    try
 716                    {
 717                        using (FileStream fs = File.Create(path))
 718                        {
 719                            InputStream.Seek(0, SeekOrigin.Begin);
 720                            InputStream.CopyTo(fs);
 721                        }
 722                        return true;
 723                    }
 724                    catch (Exception e)
 725                    {
 726                        Console.WriteLine("The following exception was thrown during creation and writing to the file \"{0}\"", path);
 727                        Console.WriteLine(e.Message);
 728                        return false;
 729                    }
 730                }
 731                else
 732                {
 733                    Console.WriteLine("Can not create file \"{0}\". File already exists", path);
 734                    return false;
 735
 736                }
 737            }
 738
 739            //Reading a file content using given path to a memory stream
 740            private bool ReadFile(MemoryStream OutputStream, string path)
 741            {
 742                try
 743                {
 744                    using (FileStream fs = File.Open(path, FileMode.Open))
 745                    {
 746                        fs.CopyTo(OutputStream);
 747                    }
 748                    return true;
 749                }
 750                catch (Exception e)
 751                {
 752                    Console.WriteLine("The following exception was thrown during reading from the file \"{0}\"", path);
 753                    Console.WriteLine(e.Message);
 754                    return false;
 755                }
 756            }
 757
 758            //Deleting file using given path
 759            private void RemoveFile(string path)
 760            {
 761                if (File.Exists(path))
 762                {
 763                    try
 764                    {
 765                        File.Delete(path);
 766                    }
 767                    catch (Exception e)
 768                    {
 769                        Console.WriteLine("The following exception was thrown during deleting the file \"{0}\"", path);
 770                        Console.WriteLine(e.Message);
 771                    }
 772                }
 773                else
 774                {
 775                    Console.WriteLine("Can not delete file \"{0}\". File does not exist", path);
 776                }
 777            }
 778
 779            static void Main()
 780            {
 781
 782                string sectionDivider = "---------------------------------------- ";
 783
 784                //Create an instance of AvroSample Class and invoke methods
 785                //illustrating different serializing approaches
 786                AvroSample Sample = new AvroSample();
 787
 788                //Serialization using Generic Record to Avro Object Container File
 789                Sample.SerializeDeserializeUsingObjectContainersGenericRecord();
 790
 791                Console.WriteLine(sectionDivider);
 792                Console.WriteLine("Press any key to exit.");
 793                Console.Read();
 794            }
 795        }
 796    }
 797    // The example is expected to display the following output:
 798    // SERIALIZATION USING GENERIC RECORD AND AVRO OBJECT CONTAINER FILES
 799    //
 800    // Defining the Schema and creating Sample Data Set...
 801    // Serializing Sample Data Set...
 802    // Saving serialized data to file...
 803    // Reading data from file...
 804    // Deserializing Sample Data Set...
 805    // Comparing Initial and Deserialized Data Sets...
 806    // For Pair 1 result of Data Set Identity Comparison is True
 807    // For Pair 2 result of Data Set Identity Comparison is True
 808    // ----------------------------------------
 809    // Press any key to exit.
 810
 811
 812<h2> <a name="Scenario5"></a>Serialization using object container files with a custom compression codec</h2>
 813
 814The example below shows how to use a custom compression codec for Avro object container files. The [Avro Specification](http://avro.apache.org/docs/current/spec.html#Required+Codecs) allows usage of an optional compression codec (in addition to **Null** and **Deflate** defaults). This example is not implementing completely new codec such Snappy (mentioned as a supported optional codec in [Avro Specification](http://avro.apache.org/docs/current/spec.html#snappy)). It shows how to use the .NET Framework 4.5  implementation of the [**Deflate**][deflate-110] codec which provides a better compression algorithm based on the [zlib](http://zlib.net/) compression library than the default .NET Framework 4.0 version.
 815
 816
 817    // 
 818    // This code needs to be compiled with the parameter Target Framework set as ".NET Framework 4.5"
 819    // to ensure the desired implementation of Deflate compression algorithm is used
 820    // Ensure your C# Project is set up accordingly
 821    //
 822
 823    namespace Microsoft.Hadoop.Avro.Sample
 824    {
 825        using System;
 826        using System.Collections.Generic;
 827        using System.Diagnostics;
 828        using System.IO;
 829        using System.IO.Compression;
 830        using System.Linq;
 831        using System.Runtime.Serialization;
 832        using Microsoft.Hadoop.Avro.Container;
 833
 834        #region Defining objects for serialization
 835        //Sample Class used in serialization samples
 836        [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
 837        internal class SensorData
 838        {
 839            [DataMember(Name = "Location")]
 840            public Location Position { get; set; }
 841
 842            [DataMember(Name = "Value")]
 843            public byte[] Value { get; set; }
 844        }
 845
 846        //Sample struct used in serialization samples
 847        [DataContract]
 848        internal struct Location
 849        {
 850            [DataMember]
 851            public int Floor { get; set; }
 852
 853            [DataMember]
 854            public int Room { get; set; }
 855        }
 856        #endregion
 857
 858        #region Defining custom codec based on .NET Framework V.4.5 Deflate
 859        //Avro.NET Codec class contains two methods 
 860        //GetCompressedStreamOver(Stream uncompressed) and GetDecompressedStreamOver(Stream compressed)
 861        //which are the key ones for data compression.
 862        //To enable a custom codec one needs to implement these methods for the required codec
 863
 864        #region Defining Compression and Decompression Streams
 865        //DeflateStream (class from System.IO.Compression namespace that implements Deflate algorithm)
 866        //can not be directly used for Avro because it does not support vital operations like Seek.
 867        //Thus one needs to implement two classes inherited from Stream
 868        //(one for compressed and one for decompressed stream)
 869        //that use Deflate compression and implement all required features 
 870        internal sealed class CompressionStreamDeflate45 : Stream
 871        {
 872            private readonly Stream buffer;
 873            private DeflateStream compressionStream;
 874
 875            public CompressionStreamDeflate45(Stream buffer)
 876            {
 877                Debug.Assert(buffer != null, "Buffer is not allowed to be null.");
 878
 879                this.compressionStream = new DeflateStream(buffer, CompressionLevel.Fastest, true);
 880                this.buffer = buffer;
 881            }
 882
 883            public override bool CanRead
 884            {
 885                get { return this.buffer.CanRead; }
 886            }
 887
 888            public override bool CanSeek
 889            {
 890                get { return true; }
 891            }
 892
 893            public override bool CanWrite
 894            {
 895                get { return this.buffer.CanWrite; }
 896            }
 897
 898            public override void Flush()
 899            {
 900                this.compressionStream.Close();
 901            }
 902
 903            public override long Length
 904            {
 905                get { return this.buffer.Length; }
 906            }
 907
 908            public override long Position
 909            {
 910                get
 911                {
 912                    return this.buffer.Position;
 913                }
 914
 915                set
 916                {
 917                    this.buffer.Position = value;
 918                }
 919            }
 920
 921            public override int Read(byte[] buffer, int offset, int count)
 922            {
 923                return this.buffer.Read(buffer, offset, count);
 924            }
 925
 926            public override long Seek(long offset, SeekOrigin origin)
 927            {
 928                return this.buffer.Seek(offset, origin);
 929            }
 930
 931            public override void SetLength(long value)
 932            {
 933                throw new NotSupportedException();
 934            }
 935
 936            public override void Write(byte[] buffer, int offset, int count)
 937            {
 938                this.compressionStream.Write(buffer, offset, count);
 939            }
 940
 941            protected override void Dispose(bool disposed)
 942            {
 943                base.Dispose(disposed);
 944
 945                if (disposed)
 946                {
 947                    this.compressionStream.Dispose();
 948                    this.compressionStream = null;
 949                }
 950            }
 951        }
 952
 953        internal sealed class DecompressionStreamDeflate45 : Stream
 954        {
 955            private readonly DeflateStream decompressed;
 956
 957            public DecompressionStreamDeflate45(Stream compressed)
 958            {
 959                this.decompressed = new DeflateStream(compressed, CompressionMode.Decompress, true);
 960            }
 961
 962            public override bool CanRead
 963            {
 964                get { return true; }
 965            }
 966
 967            public override bool CanSeek
 968            {
 969                get { return true; }
 970            }
 971
 972            public override bool CanWrite
 973            {
 974                get { return false; }
 975            }
 976
 977            public override void Flush()
 978            {
 979                this.decompressed.Close();
 980            }
 981
 982            public override long Length
 983            {
 984                get { return this.decompressed.Length; }
 985            }
 986
 987            public override long Position
 988            {
 989                get
 990                {
 991                    return this.decompressed.Position;
 992                }
 993
 994                set
 995                {
 996                    throw new NotSupportedException();
 997                }
 998            }
 999
1000            public override int Read(byte[] buffer, int offset, int count)
1001            {
1002                return this.decompressed.Read(buffer, offset, count);
1003            }
1004
1005            public override long Seek(long offset, SeekOrigin origin)
1006            {
1007                throw new NotSupportedException();
1008            }
1009
1010            public override void SetLength(long value)
1011            {
1012                throw new NotSupportedException();
1013            }
1014
1015            public override void Write(byte[] buffer, int offset, int count)
1016            {
1017                throw new NotSupportedException();
1018            }
1019
1020            protected override void Dispose(bool disposing)
1021            {
1022                base.Dispose(disposing);
1023
1024                if (disposing)
1025                {
1026                    this.decompressed.Dispose();
1027                }
1028            }
1029        }
1030        #endregion
1031
1032        #region Define Codec
1033        //Define the actual codec class containing the required methods for manipulating streams:
1034        //GetCompressedStreamOver(Stream uncompressed) and GetDecompressedStreamOver(Stream compressed)
1035        //Codec class uses classes for comressed and decompressed streams defined above
1036        internal sealed class DeflateCodec45 : Codec
1037        {
1038
1039            //We merely use different IMPLEMENTATION of Deflate, so the CodecName remains "deflate"
1040            public static readonly string CodecName = "deflate";
1041
1042            public DeflateCodec45()
1043                : base(CodecName)
1044            {
1045            }
1046
1047            public override Stream GetCompressedStreamOver(Stream decompressed)
1048            {
1049                if (decompressed == null)
1050                {
1051                    throw new ArgumentNullException("decompressed");
1052                }
1053
1054                return new CompressionStreamDeflate45(decompressed);
1055            }
1056
1057            public override Stream GetDecompressedStreamOver(Stream compress…

Large files files are truncated, but you can click here to view the full file