PageRenderTime 59ms CodeModel.GetById 23ms RepoModel.GetById 0ms app.codeStats 1ms

/articles/hdinsight-dotnet-avro-serialization.md

https://github.com/dboyi/azure-content
Markdown | 1312 lines | 1073 code | 239 blank | 0 comment | 0 complexity | c302f4de0d30e60de42df1f381f25e87 MD5 | raw file

Large files files are truncated, but you can click here to view the full file

  1. <properties linkid="hdinsight-dotnet-avro-serialization" urlDisplayName="HDInsight Microsoft .NET Library for Serialization with Avro" pageTitle="Serialize data with the Microsoft .NET Library for Avro | Azure" metaKeywords="" description="Learn how Azure HDInsight uses Avro to serialize big data." metaCanonical="" services="hdinsight" documentationCenter="" title="Serialize data with the Microsoft .NET Library for Avro " authors="bradsev" solutions="" manager="paulettm" editor="cgronlun" />
  2. # Serialize data with the Microsoft .NET Library for Avro
  3. ##Overview
  4. This topic shows how to use the Microsoft .NET Library for Avro to serialize objects and other data structures into streams in order to persist them to memory, a database or a file, and also how to deserialize them to recover the original objects.
  5. ###Apache Avro
  6. The Microsoft .NET Library for Avro implements the Apache Avro data serialization system for the Microsoft.NET environment. Apache Avro provides a compact binary data interchange format for serialization. It uses [JSON](http://www.json.org) to define language agnostic schema that underwrites language interoperability. Data serialized in one language can be read in another. Currently C, C++, C#, Java, PHP, Python, and Ruby are supported. Detailed information on the format can be found in the [Apache Avro Specification](http://avro.apache.org/docs/current/spec.html). Note that the current version of the Microsoft .NET Library for Avro does not support the Remote Procedure Calls (RPC) part of this specification.
  7. The serialized representation of an object in Avro system consists of two parts: schema and actual value. The Avro schema describes the language independent data model of the serialized data with JSON. It is present side-by-side with a binary representation of data. Having the schema separate from the binary representation permits each object to be written with no per-value overheads, making serialization fast and the representation small.
  8. ###The Hadoop scenario
  9. Apache Avro serialization format is widely used in Azure HDInsight and other Apache Hadoop environments. Avro provides a convenient way to represent complex data structures within a Hadoop MapReduce job. The format of Avro files has been designed to support the distributed MapReduce programming model. The key feature that enables the distribution is that the files are splittable in the sense that one can seek any point in a file and start reading from a particular block.
  10. ###Serialization in the Microsoft .NET Library for Avro
  11. The .NET Library for Avro supports two ways of serializing objects:
  12. - **reflection**: The JSON schema for the types is automatically built from the data contract attributes of the .NET types to be serialized.
  13. - **generic record**:The A JSON schema is explicitly specified in a record represented by the [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) class when no .NET types are present to describe the schema for the data to be serialized.
  14. When the data schema is known to both the writer and reader of the stream, the data can be sent without its schema. But when this is not the case, the schema must be shared using an Avro container file. Other parameters such as the codec used for data compression can be specified. These scenarios are outlined in more detail and illustrated in the code examples below.
  15. ###Microsoft .NET Library for Avro prerequisites
  16. - [Microsoft .NET Framework v4.0](http://www.microsoft.com/en-us/download/details.aspx?id=17851)
  17. - [Newtonsoft Json.NET](http://james.newtonking.com/json) (v5.0.5 or later)
  18. Note that the Newtonsoft.Json.dll dependency is downloaded automatically with with the installation of the Microsoft .NET Library for Avro, the procedure for which is provided in the following section.
  19. ###Microsoft .NET Library for Avro installation
  20. The Microsoft .NET Library for Avro is distributed as a NuGet Package that can be installed from Visual Studio using the following procedure:
  21. - Select the **Project** tab -> **Manage NuGet Packages...**
  22. - Search for "Microsoft.Hadoop.Avro" in the **Online Search** box.
  23. - Click the **Install** button next to **Microsoft .NET Library for Avro**.
  24. Note that the Newtonsoft.Json.dll (>= .5.0.5) dependency is also downloaded automatically with with the Microsoft .NET Library for Avro.
  25. ##Guide to the samples
  26. Five examples provided in this topic each illustrate different scenarios supported by the Microsoft .NET Library for Avro.
  27. The first two show how to serialize and deserialize data into memory stream buffers using reflection and generic records. The schema in these two cases is assumed to be shared between the readers and writers out-of-band so that the schema does not need to be serialized with the data in an Avro container file.
  28. The third and fourth examples show how to serialize and deserialize data into memory stream buffers using reflection and generic record with Avro object container files. When data is stored in an Avro container file, its schema is always stored with it because the schema must be shared for deserialization.
  29. The sample containing the first four examples can be downloaded from [Azure code samples](http://code.msdn.microsoft.com/windowsazure/Serialize-data-with-the-86055923) site.
  30. The fifth and final example shows how to how to use a custom compression codec for object container files. A sample containing the code for this example can be downloaded from the [Azure code samples](http://code.msdn.microsoft.com/windowsazure/Serialize-data-with-the-67159111) site.
  31. The Microsoft .NET Library for Avro is designed to work with any stream. In these examples, data is manipulated using memory streams rather than file streams or databases for simplicity and consistency. The approach taken in a production environment will depend on the exact scenario requirements, data source and volume, performance constraints, and other factors.
  32. * <a href="#Scenario1">**Serialization with reflection**</a>: The JSON schema for types to be serialized is automatically built from the data contract attributes.
  33. * <a href="#Scenario2">**Serialization with generic record**</a>: The JSON schema is explicitly specified in a record when no .NET type is available for reflection.
  34. * <a href="#Scenario3">**Serialization using object container files with reflection**</a>: The JSON schema is implicitly serialized with the data and shared using an Avro container file.
  35. * <a href="#Scenario4">**Serialization using object container files with generic record**</a>: The JSON schema is explicitly serialized with the data and shared using an Avro container file.
  36. * <a href="#Scenario5">**Serialization using object container files with a custom compression codec**</a>: The JSON schema is serialized with data and shared using an Avro container file with a customized .NET implementation of the deflate data compression codec.
  37. <h2> <a name="Scenario1"></a>Serialization with reflection</h2>
  38. The JSON schema for the types can be automatically built by Microsoft .NET Library for Avro using reflection from the data contract attributes of the C# objects to be serialized. Microsoft .NET Library for Avro creates an [**IAvroSeralizer<T>**](http://msdn.microsoft.com/en-us/library/dn627341.aspx) to identify the fields to be serialized.
  39. In this example objects (a **SensorData** class with a member **Location** struct) are serialized to a memory stream and this stream is in turn deserialized. The result is then compared to the initial instance to confirm that the **SensorData** object recovered is identical to original.
  40. The schema in this example is assumed to be shared between the readers and writers, so the Avro object container format is not required. For an example of how to serialize and deserialize data into memory buffers using reflection with the object container format when the schema must be serialized with the data, see <a href="#Scenario3">Serialization using object container files with reflection.</a>
  41. namespace Microsoft.Hadoop.Avro.Sample
  42. {
  43. using System;
  44. using System.Collections.Generic;
  45. using System.IO;
  46. using System.Linq;
  47. using System.Runtime.Serialization;
  48. using Microsoft.Hadoop.Avro.Container;
  49. //Sample Class used in serialization samples
  50. [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
  51. internal class SensorData
  52. {
  53. [DataMember(Name = "Location")]
  54. public Location Position { get; set; }
  55. [DataMember(Name = "Value")]
  56. public byte[] Value { get; set; }
  57. }
  58. //Sample struct used in serialization samples
  59. [DataContract]
  60. internal struct Location
  61. {
  62. [DataMember]
  63. public int Floor { get; set; }
  64. [DataMember]
  65. public int Room { get; set; }
  66. }
  67. //This class contains all methods demonstrating
  68. //the usage of Microsoft .NET Library for Avro
  69. public class AvroSample
  70. {
  71. //Serialize and deserialize sample data set represented as an object using Reflection
  72. //No explicit schema definition is required - schema of serialized objects is automatically built
  73. public void SerializeDeserializeObjectUsingReflection()
  74. {
  75. Console.WriteLine("SERIALIZATION USING REFLECTION\n");
  76. Console.WriteLine("Serializing Sample Data Set...");
  77. //Create a new AvroSerializer instance and specify a custom serialization strategy AvroDataContractResolver
  78. //for serializing only properties attributed with DataContract/DateMember
  79. var avroSerializer = AvroSerializer.Create<SensorData>();
  80. //Create a Memory Stream buffer
  81. using (var buffer = new MemoryStream())
  82. {
  83. //Create a data set using sample Class and struct
  84. var expected = new SensorData { Value = new byte[] { 1, 2, 3, 4, 5 }, Position = new Location { Room = 243, Floor = 1 } };
  85. //Serialize the data to the specified stream
  86. avroSerializer.Serialize(buffer, expected);
  87. Console.WriteLine("Deserializing Sample Data Set...");
  88. //Prepare the stream for deserializing the data
  89. buffer.Seek(0, SeekOrigin.Begin);
  90. //Deserialize data from the stream and cast it to the same type used for serialization
  91. var actual = avroSerializer.Deserialize(buffer);
  92. Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
  93. //Finally, verify that deserialized data matches the original one
  94. bool isEqual = this.Equal(expected, actual);
  95. Console.WriteLine("Result of Data Set Identity Comparison is {0}", isEqual);
  96. }
  97. }
  98. //
  99. //Helper methods
  100. //
  101. //Comparing two SensorData objects
  102. private bool Equal(SensorData left, SensorData right)
  103. {
  104. return left.Position.Equals(right.Position) && left.Value.SequenceEqual(right.Value);
  105. }
  106. static void Main()
  107. {
  108. string sectionDivider = "---------------------------------------- ";
  109. //Create an instance of AvroSample Class and invoke methods
  110. //illustrating different serializing approaches
  111. AvroSample Sample = new AvroSample();
  112. //Serialization to memory using Reflection
  113. Sample.SerializeDeserializeObjectUsingReflection();
  114. Console.WriteLine(sectionDivider);
  115. Console.WriteLine("Press any key to exit.");
  116. Console.Read();
  117. }
  118. }
  119. }
  120. // The example is expected to display the following output:
  121. // SERIALIZATION USING REFLECTION
  122. //
  123. // Serializing Sample Data Set...
  124. // Deserializing Sample Data Set...
  125. // Comparing Initial and Deserialized Data Sets...
  126. // Result of Data Set Identity Comparison is True
  127. // ----------------------------------------
  128. // Press any key to exit.
  129. <h2> <a name="Scenario2"></a>Serialization with a generic record</h2>
  130. A JSON schema can be explicitly specified in a generic record when reflection cannot be used because the data cannot be represented using .NET classes with a data contract. This method is generally slower than using reflection and serializers for specific C# class. In such cases, the schema for the data may also be dynamic because it is not be known until compile-time. Data represented as Comma Separated Values (CSV) files whose schema is unknown until it is transformed to the Avro format at run-time is an example of this sort of dynamic scenario.
  131. This example shows how to create and use an [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) to explicitly specify a JSON schema, how to populate it with the data, then serialize and deserialize it. The result is then compared to the initial instance to confirm that the record recovered is identical to original.
  132. The schema in this example is assumed to be shared between the readers and writers, so the Avro object container format is not required. For an example of how to serialize and deserialize data into memory buffers using a generic record with the object container format when the schema must be included with the serialized data, see <a href="#Scenario4">Serialization using object container files with generic record</a> example.
  133. namespace Microsoft.Hadoop.Avro.Sample
  134. {
  135. using System;
  136. using System.Collections.Generic;
  137. using System.IO;
  138. using System.Linq;
  139. using System.Runtime.Serialization;
  140. using Microsoft.Hadoop.Avro.Container;
  141. using Microsoft.Hadoop.Avro.Schema;
  142. //This class contains all methods demonstrating
  143. //the usage of Microsoft .NET Library for Avro
  144. public class AvroSample
  145. {
  146. //Serialize and deserialize sample data set using Generic Record.
  147. //Generic Record is a special class with the schema explicitly defined in JSON.
  148. //All serialized data should be mapped to the fields of Generic Record,
  149. //which in turn will be then serialized.
  150. public void SerializeDeserializeObjectUsingGenericRecords()
  151. {
  152. Console.WriteLine("SERIALIZATION USING GENERIC RECORD\n");
  153. Console.WriteLine("Defining the Schema and creating Sample Data Set...");
  154. //Define the schema in JSON
  155. const string Schema = @"{
  156. ""type"":""record"",
  157. ""name"":""Microsoft.Hadoop.Avro.Specifications.SensorData"",
  158. ""fields"":
  159. [
  160. {
  161. ""name"":""Location"",
  162. ""type"":
  163. {
  164. ""type"":""record"",
  165. ""name"":""Microsoft.Hadoop.Avro.Specifications.Location"",
  166. ""fields"":
  167. [
  168. { ""name"":""Floor"", ""type"":""int"" },
  169. { ""name"":""Room"", ""type"":""int"" }
  170. ]
  171. }
  172. },
  173. { ""name"":""Value"", ""type"":""bytes"" }
  174. ]
  175. }";
  176. //Create a generic serializer based on the schema
  177. var serializer = AvroSerializer.CreateGeneric(Schema);
  178. var rootSchema = serializer.WriterSchema as RecordSchema;
  179. //Create a Memory Stream buffer
  180. using (var stream = new MemoryStream())
  181. {
  182. //Create a generic record to represent the data
  183. dynamic location = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
  184. location.Floor = 1;
  185. location.Room = 243;
  186. dynamic expected = new AvroRecord(serializer.WriterSchema);
  187. expected.Location = location;
  188. expected.Value = new byte[] { 1, 2, 3, 4, 5 };
  189. Console.WriteLine("Serializing Sample Data Set...");
  190. //Serialize the data
  191. serializer.Serialize(stream, expected);
  192. stream.Seek(0, SeekOrigin.Begin);
  193. Console.WriteLine("Deserializing Sample Data Set...");
  194. //Deserialize the data into a generic record
  195. dynamic actual = serializer.Deserialize(stream);
  196. Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
  197. //Finally, verify the results
  198. bool isEqual = expected.Location.Floor.Equals(actual.Location.Floor);
  199. isEqual = isEqual && expected.Location.Room.Equals(actual.Location.Room);
  200. isEqual = isEqual && ((byte[])expected.Value).SequenceEqual((byte[])actual.Value);
  201. Console.WriteLine("Result of Data Set Identity Comparison is {0}", isEqual);
  202. }
  203. }
  204. static void Main()
  205. {
  206. string sectionDivider = "---------------------------------------- ";
  207. //Create an instance of AvroSample Class and invoke methods
  208. //illustrating different serializing approaches
  209. AvroSample Sample = new AvroSample();
  210. //Serialization to memory using Generic Record
  211. Sample.SerializeDeserializeObjectUsingGenericRecords();
  212. Console.WriteLine(sectionDivider);
  213. Console.WriteLine("Press any key to exit.");
  214. Console.Read();
  215. }
  216. }
  217. }
  218. // The example is expected to display the following output:
  219. // SERIALIZATION USING GENERIC RECORD
  220. //
  221. // Defining the Schema and creating Sample Data Set...
  222. // Serializing Sample Data Set...
  223. // Deserializing Sample Data Set...
  224. // Comparing Initial and Deserialized Data Sets...
  225. // Result of Data Set Identity Comparison is True
  226. // ----------------------------------------
  227. // Press any key to exit.
  228. <h2> <a name="Scenario3"></a>Serialization using object container files and serialization with reflection</h2>
  229. This example is similar to scenario in the <a href="#Scenario1"> first example</a> where the schema is implicitly specified with reflection, except that here the schema is not assumed to be known to the reader that deserializes it. The **SensorData** objects to be serialized and its implicitly specified schema are stored in an object container file represented by the [**AvroContainer**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.avrocontainer.aspx) class.
  230. The data is serialized in this example with [**SequentialWriter<SensorData>**](http://msdn.microsoft.com/en-us/library/dn627340.aspx) and deserialized with [**SequentialReader<SensorData>**](http://msdn.microsoft.com/en-us/library/dn627340.aspx). The result then is compared to the initial instances to insure identity.
  231. The data in object container file is compressed using the default [**Deflate**][deflate-100] compression codec from .NET Framework 4.0. See the <a href="#Scenario5"> last example</a> in this topic to learn how to use a more recent and superior version of the [**Deflate**][deflate-110] compression codec available in .NET Framework 4.5.
  232. namespace Microsoft.Hadoop.Avro.Sample
  233. {
  234. using System;
  235. using System.Collections.Generic;
  236. using System.IO;
  237. using System.Linq;
  238. using System.Runtime.Serialization;
  239. using Microsoft.Hadoop.Avro.Container;
  240. //Sample Class used in serialization samples
  241. [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
  242. internal class SensorData
  243. {
  244. [DataMember(Name = "Location")]
  245. public Location Position { get; set; }
  246. [DataMember(Name = "Value")]
  247. public byte[] Value { get; set; }
  248. }
  249. //Sample struct used in serialization samples
  250. [DataContract]
  251. internal struct Location
  252. {
  253. [DataMember]
  254. public int Floor { get; set; }
  255. [DataMember]
  256. public int Room { get; set; }
  257. }
  258. //This class contains all methods demonstrating
  259. //the usage of Microsoft .NET Library for Avro
  260. public class AvroSample
  261. {
  262. //Serializes and deserializes sample data set using Reflection and Avro Object Container Files
  263. //Serialized data is compressed with Deflate codec
  264. public void SerializeDeserializeUsingObjectContainersReflection()
  265. {
  266. Console.WriteLine("SERIALIZATION USING REFLECTION AND AVRO OBJECT CONTAINER FILES\n");
  267. //Path for Avro Object Container File
  268. string path = "AvroSampleReflectionDeflate.avro";
  269. //Create a data set using sample Class and struct
  270. var testData = new List<SensorData>
  271. {
  272. new SensorData { Value = new byte[] { 1, 2, 3, 4, 5 }, Position = new Location { Room = 243, Floor = 1 } },
  273. new SensorData { Value = new byte[] { 6, 7, 8, 9 }, Position = new Location { Room = 244, Floor = 1 } }
  274. };
  275. //Serializing and saving data to file
  276. //Creating a Memory Stream buffer
  277. using (var buffer = new MemoryStream())
  278. {
  279. Console.WriteLine("Serializing Sample Data Set...");
  280. //Create a SequentialWriter instance for type SensorData which can serialize a sequence of SensorData objects to stream
  281. //Data will be compressed using Deflate codec
  282. using (var w = AvroContainer.CreateWriter<SensorData>(buffer, Codec.Deflate))
  283. {
  284. using (var writer = new SequentialWriter<SensorData>(w, 24))
  285. {
  286. // Serialize the data to stream using the sequential writer
  287. testData.ForEach(writer.Write);
  288. }
  289. }
  290. //Save stream to file
  291. Console.WriteLine("Saving serialized data to file...");
  292. if (!WriteFile(buffer, path))
  293. {
  294. Console.WriteLine("Error during file operation. Quitting method");
  295. return;
  296. }
  297. }
  298. //Reading and deserializing data
  299. //Creating a Memory Stream buffer
  300. using (var buffer = new MemoryStream())
  301. {
  302. Console.WriteLine("Reading data from file...");
  303. //Reading data from Object Container File
  304. if (!ReadFile(buffer, path))
  305. {
  306. Console.WriteLine("Error during file operation. Quitting method");
  307. return;
  308. }
  309. Console.WriteLine("Deserializing Sample Data Set...");
  310. //Prepare the stream for deserializing the data
  311. buffer.Seek(0, SeekOrigin.Begin);
  312. //Create a SequentialReader for type SensorData which will derserialize all serialized objects from the given stream
  313. //It allows iterating over the deserialized objects because it implements IEnumerable<T> interface
  314. using (var reader = new SequentialReader<SensorData>(
  315. AvroContainer.CreateReader<SensorData>(buffer, true)))
  316. {
  317. var results = reader.Objects;
  318. //Finally, verify that deserialized data matches the original one
  319. Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
  320. int count = 1;
  321. var pairs = testData.Zip(results, (serialized, deserialized) => new { expected = serialized, actual = deserialized });
  322. foreach (var pair in pairs)
  323. {
  324. bool isEqual = this.Equal(pair.expected, pair.actual);
  325. Console.WriteLine("For Pair {0} result of Data Set Identity Comparison is {1}", count, isEqual);
  326. count++;
  327. }
  328. }
  329. }
  330. //Delete the file
  331. RemoveFile(path);
  332. }
  333. //
  334. //Helper methods
  335. //
  336. //Comparing two SensorData objects
  337. private bool Equal(SensorData left, SensorData right)
  338. {
  339. return left.Position.Equals(right.Position) && left.Value.SequenceEqual(right.Value);
  340. }
  341. //Saving memory stream to a new file with the given path
  342. private bool WriteFile(MemoryStream InputStream, string path)
  343. {
  344. if (!File.Exists(path))
  345. {
  346. try
  347. {
  348. using (FileStream fs = File.Create(path))
  349. {
  350. InputStream.Seek(0, SeekOrigin.Begin);
  351. InputStream.CopyTo(fs);
  352. }
  353. return true;
  354. }
  355. catch (Exception e)
  356. {
  357. Console.WriteLine("The following exception was thrown during creation and writing to the file \"{0}\"", path);
  358. Console.WriteLine(e.Message);
  359. return false;
  360. }
  361. }
  362. else
  363. {
  364. Console.WriteLine("Can not create file \"{0}\". File already exists", path);
  365. return false;
  366. }
  367. }
  368. //Reading a file content using given path to a memory stream
  369. private bool ReadFile(MemoryStream OutputStream, string path)
  370. {
  371. try
  372. {
  373. using (FileStream fs = File.Open(path, FileMode.Open))
  374. {
  375. fs.CopyTo(OutputStream);
  376. }
  377. return true;
  378. }
  379. catch (Exception e)
  380. {
  381. Console.WriteLine("The following exception was thrown during reading from the file \"{0}\"", path);
  382. Console.WriteLine(e.Message);
  383. return false;
  384. }
  385. }
  386. //Deleting file using given path
  387. private void RemoveFile(string path)
  388. {
  389. if (File.Exists(path))
  390. {
  391. try
  392. {
  393. File.Delete(path);
  394. }
  395. catch (Exception e)
  396. {
  397. Console.WriteLine("The following exception was thrown during deleting the file \"{0}\"", path);
  398. Console.WriteLine(e.Message);
  399. }
  400. }
  401. else
  402. {
  403. Console.WriteLine("Can not delete file \"{0}\". File does not exist", path);
  404. }
  405. }
  406. static void Main()
  407. {
  408. string sectionDivider = "---------------------------------------- ";
  409. //Create an instance of AvroSample Class and invoke methods
  410. //illustrating different serializing approaches
  411. AvroSample Sample = new AvroSample();
  412. //Serialization using Reflection to Avro Object Container File
  413. Sample.SerializeDeserializeUsingObjectContainersReflection();
  414. Console.WriteLine(sectionDivider);
  415. Console.WriteLine("Press any key to exit.");
  416. Console.Read();
  417. }
  418. }
  419. }
  420. // The example is expected to display the following output:
  421. // SERIALIZATION USING REFLECTION AND AVRO OBJECT CONTAINER FILES
  422. //
  423. // Serializing Sample Data Set...
  424. // Saving serialized data to file...
  425. // Reading data from file...
  426. // Deserializing Sample Data Set...
  427. // Comparing Initial and Deserialized Data Sets...
  428. // For Pair 1 result of Data Set Identity Comparison is True
  429. // For Pair 2 result of Data Set Identity Comparison is True
  430. // ----------------------------------------
  431. // Press any key to exit.
  432. <h2> <a name="Scenario4"></a>Serialization using object container files and serialization with generic record</h2>
  433. This example is similar to scenario in the <a href="#Scenario2"> second example</a> where the schema is explicitly specified with JSON, except that here the schema is not assumed to be known to the reader that deserializes it.
  434. The test data set is collected into a list of [**AvroRecord**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.avrorecord.aspx) objects using an explicitly defined JSON schema and then stored in an object container file represented by the [**AvroContainer**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.avrocontainer.aspx) class. This container file creates a writer that is used to serialize the data, uncompressed, to a memory stream that is then saved to a file. It is the [**Codex.Null**](http://msdn.microsoft.com/en-us/library/microsoft.hadoop.avro.container.codec.null.aspx) parameter used when creating the reader that specifies this data will not be compressed.
  435. The data is then read from the file and deserialized into a collection of objects. This collection is compared to the initial list of Avro records to confirm that they are identical.
  436. namespace Microsoft.Hadoop.Avro.Sample
  437. {
  438. using System;
  439. using System.Collections.Generic;
  440. using System.IO;
  441. using System.Linq;
  442. using System.Runtime.Serialization;
  443. using Microsoft.Hadoop.Avro.Container;
  444. using Microsoft.Hadoop.Avro.Schema;
  445. //This class contains all methods demonstrating
  446. //the usage of Microsoft .NET Library for Avro
  447. public class AvroSample
  448. {
  449. //Serializes and deserializes sample data set using Generic Record and Avro Object Container Files
  450. //Serialized data is not compressed
  451. public void SerializeDeserializeUsingObjectContainersGenericRecord()
  452. {
  453. Console.WriteLine("SERIALIZATION USING GENERIC RECORD AND AVRO OBJECT CONTAINER FILES\n");
  454. //Path for Avro Object Container File
  455. string path = "AvroSampleGenericRecordNullCodec.avro";
  456. Console.WriteLine("Defining the Schema and creating Sample Data Set...");
  457. //Define the schema in JSON
  458. const string Schema = @"{
  459. ""type"":""record"",
  460. ""name"":""Microsoft.Hadoop.Avro.Specifications.SensorData"",
  461. ""fields"":
  462. [
  463. {
  464. ""name"":""Location"",
  465. ""type"":
  466. {
  467. ""type"":""record"",
  468. ""name"":""Microsoft.Hadoop.Avro.Specifications.Location"",
  469. ""fields"":
  470. [
  471. { ""name"":""Floor"", ""type"":""int"" },
  472. { ""name"":""Room"", ""type"":""int"" }
  473. ]
  474. }
  475. },
  476. { ""name"":""Value"", ""type"":""bytes"" }
  477. ]
  478. }";
  479. //Create a generic serializer based on the schema
  480. var serializer = AvroSerializer.CreateGeneric(Schema);
  481. var rootSchema = serializer.WriterSchema as RecordSchema;
  482. //Create a generic record to represent the data
  483. var testData = new List<AvroRecord>();
  484. dynamic expected1 = new AvroRecord(rootSchema);
  485. dynamic location1 = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
  486. location1.Floor = 1;
  487. location1.Room = 243;
  488. expected1.Location = location1;
  489. expected1.Value = new byte[] { 1, 2, 3, 4, 5 };
  490. testData.Add(expected1);
  491. dynamic expected2 = new AvroRecord(rootSchema);
  492. dynamic location2 = new AvroRecord(rootSchema.GetField("Location").TypeSchema);
  493. location2.Floor = 1;
  494. location2.Room = 244;
  495. expected2.Location = location2;
  496. expected2.Value = new byte[] { 6, 7, 8, 9 };
  497. testData.Add(expected2);
  498. //Serializing and saving data to file
  499. //Create a MemoryStream buffer
  500. using (var buffer = new MemoryStream())
  501. {
  502. Console.WriteLine("Serializing Sample Data Set...");
  503. //Create a SequentialWriter instance for type SensorData which can serialize a sequence of SensorData objects to stream
  504. //Data will not be compressed (Null compression codec)
  505. using (var writer = AvroContainer.CreateGenericWriter(Schema, buffer, Codec.Null))
  506. {
  507. using (var streamWriter = new SequentialWriter<object>(writer, 24))
  508. {
  509. // Serialize the data to stream using the sequential writer
  510. testData.ForEach(streamWriter.Write);
  511. }
  512. }
  513. Console.WriteLine("Saving serialized data to file...");
  514. //Save stream to file
  515. if (!WriteFile(buffer, path))
  516. {
  517. Console.WriteLine("Error during file operation. Quitting method");
  518. return;
  519. }
  520. }
  521. //Reading and deserializng the data
  522. //Create a Memory Stream buffer
  523. using (var buffer = new MemoryStream())
  524. {
  525. Console.WriteLine("Reading data from file...");
  526. //Reading data from Object Container File
  527. if (!ReadFile(buffer, path))
  528. {
  529. Console.WriteLine("Error during file operation. Quitting method");
  530. return;
  531. }
  532. Console.WriteLine("Deserializing Sample Data Set...");
  533. //Prepare the stream for deserializing the data
  534. buffer.Seek(0, SeekOrigin.Begin);
  535. //Create a SequentialReader for type SensorData which will derserialize all serialized objects from the given stream
  536. //It allows iterating over the deserialized objects because it implements IEnumerable<T> interface
  537. using (var reader = AvroContainer.CreateGenericReader(buffer))
  538. {
  539. using (var streamReader = new SequentialReader<object>(reader))
  540. {
  541. var results = streamReader.Objects;
  542. Console.WriteLine("Comparing Initial and Deserialized Data Sets...");
  543. //Finally, verify the results
  544. var pairs = testData.Zip(results, (serialized, deserialized) => new { expected = (dynamic)serialized, actual = (dynamic)deserialized });
  545. int count = 1;
  546. foreach (var pair in pairs)
  547. {
  548. bool isEqual = pair.expected.Location.Floor.Equals(pair.actual.Location.Floor);
  549. isEqual = isEqual && pair.expected.Location.Room.Equals(pair.actual.Location.Room);
  550. isEqual = isEqual && ((byte[])pair.expected.Value).SequenceEqual((byte[])pair.actual.Value);
  551. Console.WriteLine("For Pair {0} result of Data Set Identity Comparison is {1}", count, isEqual.ToString());
  552. count++;
  553. }
  554. }
  555. }
  556. }
  557. //Delete the file
  558. RemoveFile(path);
  559. }
  560. //
  561. //Helper methods
  562. //
  563. //Saving memory stream to a new file with the given path
  564. private bool WriteFile(MemoryStream InputStream, string path)
  565. {
  566. if (!File.Exists(path))
  567. {
  568. try
  569. {
  570. using (FileStream fs = File.Create(path))
  571. {
  572. InputStream.Seek(0, SeekOrigin.Begin);
  573. InputStream.CopyTo(fs);
  574. }
  575. return true;
  576. }
  577. catch (Exception e)
  578. {
  579. Console.WriteLine("The following exception was thrown during creation and writing to the file \"{0}\"", path);
  580. Console.WriteLine(e.Message);
  581. return false;
  582. }
  583. }
  584. else
  585. {
  586. Console.WriteLine("Can not create file \"{0}\". File already exists", path);
  587. return false;
  588. }
  589. }
  590. //Reading a file content using given path to a memory stream
  591. private bool ReadFile(MemoryStream OutputStream, string path)
  592. {
  593. try
  594. {
  595. using (FileStream fs = File.Open(path, FileMode.Open))
  596. {
  597. fs.CopyTo(OutputStream);
  598. }
  599. return true;
  600. }
  601. catch (Exception e)
  602. {
  603. Console.WriteLine("The following exception was thrown during reading from the file \"{0}\"", path);
  604. Console.WriteLine(e.Message);
  605. return false;
  606. }
  607. }
  608. //Deleting file using given path
  609. private void RemoveFile(string path)
  610. {
  611. if (File.Exists(path))
  612. {
  613. try
  614. {
  615. File.Delete(path);
  616. }
  617. catch (Exception e)
  618. {
  619. Console.WriteLine("The following exception was thrown during deleting the file \"{0}\"", path);
  620. Console.WriteLine(e.Message);
  621. }
  622. }
  623. else
  624. {
  625. Console.WriteLine("Can not delete file \"{0}\". File does not exist", path);
  626. }
  627. }
  628. static void Main()
  629. {
  630. string sectionDivider = "---------------------------------------- ";
  631. //Create an instance of AvroSample Class and invoke methods
  632. //illustrating different serializing approaches
  633. AvroSample Sample = new AvroSample();
  634. //Serialization using Generic Record to Avro Object Container File
  635. Sample.SerializeDeserializeUsingObjectContainersGenericRecord();
  636. Console.WriteLine(sectionDivider);
  637. Console.WriteLine("Press any key to exit.");
  638. Console.Read();
  639. }
  640. }
  641. }
  642. // The example is expected to display the following output:
  643. // SERIALIZATION USING GENERIC RECORD AND AVRO OBJECT CONTAINER FILES
  644. //
  645. // Defining the Schema and creating Sample Data Set...
  646. // Serializing Sample Data Set...
  647. // Saving serialized data to file...
  648. // Reading data from file...
  649. // Deserializing Sample Data Set...
  650. // Comparing Initial and Deserialized Data Sets...
  651. // For Pair 1 result of Data Set Identity Comparison is True
  652. // For Pair 2 result of Data Set Identity Comparison is True
  653. // ----------------------------------------
  654. // Press any key to exit.
  655. <h2> <a name="Scenario5"></a>Serialization using object container files with a custom compression codec</h2>
  656. The example below shows how to use a custom compression codec for Avro object container files. The [Avro Specification](http://avro.apache.org/docs/current/spec.html#Required+Codecs) allows usage of an optional compression codec (in addition to **Null** and **Deflate** defaults). This example is not implementing completely new codec such Snappy (mentioned as a supported optional codec in [Avro Specification](http://avro.apache.org/docs/current/spec.html#snappy)). It shows how to use the .NET Framework 4.5 implementation of the [**Deflate**][deflate-110] codec which provides a better compression algorithm based on the [zlib](http://zlib.net/) compression library than the default .NET Framework 4.0 version.
  657. //
  658. // This code needs to be compiled with the parameter Target Framework set as ".NET Framework 4.5"
  659. // to ensure the desired implementation of Deflate compression algorithm is used
  660. // Ensure your C# Project is set up accordingly
  661. //
  662. namespace Microsoft.Hadoop.Avro.Sample
  663. {
  664. using System;
  665. using System.Collections.Generic;
  666. using System.Diagnostics;
  667. using System.IO;
  668. using System.IO.Compression;
  669. using System.Linq;
  670. using System.Runtime.Serialization;
  671. using Microsoft.Hadoop.Avro.Container;
  672. #region Defining objects for serialization
  673. //Sample Class used in serialization samples
  674. [DataContract(Name = "SensorDataValue", Namespace = "Sensors")]
  675. internal class SensorData
  676. {
  677. [DataMember(Name = "Location")]
  678. public Location Position { get; set; }
  679. [DataMember(Name = "Value")]
  680. public byte[] Value { get; set; }
  681. }
  682. //Sample struct used in serialization samples
  683. [DataContract]
  684. internal struct Location
  685. {
  686. [DataMember]
  687. public int Floor { get; set; }
  688. [DataMember]
  689. public int Room { get; set; }
  690. }
  691. #endregion
  692. #region Defining custom codec based on .NET Framework V.4.5 Deflate
  693. //Avro.NET Codec class contains two methods
  694. //GetCompressedStreamOver(Stream uncompressed) and GetDecompressedStreamOver(Stream compressed)
  695. //which are the key ones for data compression.
  696. //To enable a custom codec one needs to implement these methods for the required codec
  697. #region Defining Compression and Decompression Streams
  698. //DeflateStream (class from System.IO.Compression namespace that implements Deflate algorithm)
  699. //can not be directly used for Avro because it does not support vital operations like Seek.
  700. //Thus one needs to implement two classes inherited from Stream
  701. //(one for compressed and one for decompressed stream)
  702. //that use Deflate compression and implement all required features
  703. internal sealed class CompressionStreamDeflate45 : Stream
  704. {
  705. private readonly Stream buffer;
  706. private DeflateStream compressionStream;
  707. public CompressionStreamDeflate45(Stream buffer)
  708. {
  709. Debug.Assert(buffer != null, "Buffer is not allowed to be null.");
  710. this.compressionStream = new DeflateStream(buffer, CompressionLevel.Fastest, true);
  711. this.buffer = buffer;
  712. }
  713. public override bool CanRead
  714. {
  715. get { return this.buffer.CanRead; }
  716. }
  717. public override bool CanSeek
  718. {
  719. get { return true; }
  720. }
  721. public override bool CanWrite
  722. {
  723. get { return this.buffer.CanWrite; }
  724. }
  725. public override void Flush()
  726. {
  727. this.compressionStream.Close();
  728. }
  729. public override long Length
  730. {
  731. get { return this.buffer.Length; }
  732. }
  733. public override long Position
  734. {
  735. get
  736. {
  737. return this.buffer.Position;
  738. }
  739. set
  740. {
  741. this.buffer.Position = value;
  742. }
  743. }
  744. public override int Read(byte[] buffer, int offset, int count)
  745. {
  746. return this.buffer.Read(buffer, offset, count);
  747. }
  748. public override long Seek(long offset, SeekOrigin origin)
  749. {
  750. return this.buffer.Seek(offset, origin);
  751. }
  752. public override void SetLength(long value)
  753. {
  754. throw new NotSupportedException();
  755. }
  756. public override void Write(byte[] buffer, int offset, int count)
  757. {
  758. this.compressionStream.Write(buffer, offset, count);
  759. }
  760. protected override void Dispose(bool disposed)
  761. {
  762. base.Dispose(disposed);
  763. if (disposed)
  764. {
  765. this.compressionStream.Dispose();
  766. this.compressionStream = null;
  767. }
  768. }
  769. }
  770. internal sealed class DecompressionStreamDeflate45 : Stream
  771. {
  772. private readonly DeflateStream decompressed;
  773. public DecompressionStreamDeflate45(Stream compressed)
  774. {
  775. this.decompressed = new DeflateStream(compressed, CompressionMode.Decompress, true);
  776. }
  777. public override bool CanRead
  778. {
  779. get { return true; }
  780. }
  781. public override bool CanSeek
  782. {
  783. get { return true; }
  784. }
  785. public override bool CanWrite
  786. {
  787. get { return false; }
  788. }
  789. public override void Flush()
  790. {
  791. this.decompressed.Close();
  792. }
  793. public override long Length
  794. {
  795. get { return this.decompressed.Length; }
  796. }
  797. public override long Position
  798. {
  799. get
  800. {
  801. return this.decompressed.Position;
  802. }
  803. set
  804. {
  805. throw new NotSupportedException();
  806. }
  807. }
  808. public override int Read(byte[] buffer, int offset, int count)
  809. {
  810. return this.decompressed.Read(buffer, offset, count);
  811. }
  812. public override long Seek(long offset, SeekOrigin origin)
  813. {
  814. throw new NotSupportedException();
  815. }
  816. public override void SetLength(long value)
  817. {
  818. throw new NotSupportedException();
  819. }
  820. public override void Write(byte[] buffer, int offset, int count)
  821. {
  822. throw new NotSupportedException();
  823. }
  824. protected override void Dispose(bool disposing)
  825. {
  826. base.Dispose(disposing);
  827. if (disposing)
  828. {
  829. this.decompressed.Dispose();
  830. }
  831. }
  832. }
  833. #endregion
  834. #region Define Codec
  835. //Define the actual codec class containing the required methods for manipulating streams:
  836. //GetCompressedStreamOver(Stream uncompressed) and GetDecompressedStreamOver(Stream compressed)
  837. //Codec class uses classes for comressed and decompressed streams defined above
  838. internal sealed class DeflateCodec45 : Codec
  839. {
  840. //We merely use different IMPLEMENTATION of Deflate, so the CodecName remains "deflate"
  841. public static readonly string CodecName = "deflate";
  842. public DeflateCodec45()
  843. : base(CodecName)
  844. {
  845. }
  846. public override Stream GetCompressedStreamOver(Stream decompressed)
  847. {
  848. if (decompressed == null)
  849. {
  850. throw new ArgumentNullException("decompressed");
  851. }
  852. return new CompressionStreamDeflate45(decompressed);
  853. }
  854. public override Stream GetDecompressedStreamOver(Stream compress

Large files files are truncated, but you can click here to view the full file