PageRenderTime 102ms CodeModel.GetById 21ms RepoModel.GetById 1ms app.codeStats 0ms

/SPARK/1.5.2/RELEASENOTES.1.5.2.md

https://gitlab.com/andrewmusselman/eco-release-metadata
Markdown | 192 lines | 169 code | 23 blank | 0 comment | 0 complexity | 479b943416b07e07512c8ac768a6e595 MD5 | raw file
  1. <!---
  2. # Licensed to the Apache Software Foundation (ASF) under one
  3. # or more contributor license agreements. See the NOTICE file
  4. # distributed with this work for additional information
  5. # regarding copyright ownership. The ASF licenses this file
  6. # to you under the Apache License, Version 2.0 (the
  7. # "License"); you may not use this file except in compliance
  8. # with the License. You may obtain a copy of the License at
  9. #
  10. # http://www.apache.org/licenses/LICENSE-2.0
  11. #
  12. # Unless required by applicable law or agreed to in writing, software
  13. # distributed under the License is distributed on an "AS IS" BASIS,
  14. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. # See the License for the specific language governing permissions and
  16. # limitations under the License.
  17. -->
  18. # Apache Spark 1.5.2 Release Notes
  19. These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
  20. ---
  21. * [SPARK-11023](https://issues.apache.org/jira/browse/SPARK-11023) | *Major* | **Error initializing SparkContext. java.net.URISyntaxException**
  22. Simliar to SPARK-10326. [https://issues.apache.org/jira/browse/SPARK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949470#comment-14949470]
  23. C:\WINDOWS\system32\>pyspark --master yarn-client
  24. Python 2.7.10 \|Anaconda 2.3.0 (64-bit)\| (default, Sep 15 2015, 14:26:14) [MSC v.1500 64 bit (AMD64)]
  25. Type "copyright", "credits" or "license" for more information.
  26. IPython 4.0.0 An enhanced Interactive Python.
  27. ? -\> Introduction and overview of IPython's features.
  28. %quickref -\> Quick reference.
  29. help -\> Python's own help system.
  30. object? -\> Details about 'object', use 'object??' for extra details.
  31. 15/10/08 09:28:05 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
  32. 15/10/08 09:28:06 WARN : Your hostname, PC-509512 resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:a5f:c318%net3, but we couldn't find any external IP address!
  33. 15/10/08 09:28:08 WARN BlockReaderLocal: The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows.
  34. 15/10/08 09:28:08 ERROR SparkContext: Error initializing SparkContext.
  35. java.net.URISyntaxException: Illegal character in opaque part at index 2: C:\spark\bin\..\python\lib\pyspark.zip
  36. at java.net.URI$Parser.fail(Unknown Source)
  37. at java.net.URI$Parser.checkChars(Unknown Source)
  38. at java.net.URI$Parser.parse(Unknown Source)
  39. at java.net.URI.\<init\>(Unknown Source)
  40. at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$7.apply(Client.scala:558)
  41. at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$7.apply(Client.scala:557)
  42. at scala.collection.immutable.List.foreach(List.scala:318)
  43. at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:557)
  44. at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:628)
  45. at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
  46. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
  47. at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
  48. at org.apache.spark.SparkContext.\<init\>(SparkContext.scala:523)
  49. at org.apache.spark.api.java.JavaSparkContext.\<init\>(JavaSparkContext.scala:61)
  50. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  51. at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
  52. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
  53. at java.lang.reflect.Constructor.newInstance(Unknown Source)
  54. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
  55. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
  56. at py4j.Gateway.invoke(Gateway.java:214)
  57. at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
  58. at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
  59. at py4j.GatewayConnection.run(GatewayConnection.java:207)
  60. at java.lang.Thread.run(Unknown Source)
  61. 15/10/08 09:28:08 ERROR Utils: Uncaught exception in thread Thread-2
  62. java.lang.NullPointerException
  63. at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)
  64. at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1228)
  65. at org.apache.spark.SparkEnv.stop(SparkEnv.scala:100)
  66. at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1749)
  67. at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
  68. at org.apache.spark.SparkContext.stop(SparkContext.scala:1748)
  69. at org.apache.spark.SparkContext.\<init\>(SparkContext.scala:593)
  70. at org.apache.spark.api.java.JavaSparkContext.\<init\>(JavaSparkContext.scala:61)
  71. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  72. at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
  73. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
  74. at java.lang.reflect.Constructor.newInstance(Unknown Source)
  75. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
  76. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
  77. at py4j.Gateway.invoke(Gateway.java:214)
  78. at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
  79. at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
  80. at py4j.GatewayConnection.run(GatewayConnection.java:207)
  81. at java.lang.Thread.run(Unknown Source)
  82. ---------------------------------------------------------------------------
  83. Py4JJavaError Traceback (most recent call last)
  84. C:\spark\bin\..\python\pyspark\shell.py in \<module\>()
  85. 41 SparkContext.setSystemProperty("spark.executor.uri", os.environ["SPARK\_EXECUTOR\_URI"])
  86. 42
  87. ---\> 43 sc = SparkContext(pyFiles=add\_files)
  88. 44 atexit.register(lambda: sc.stop())
  89. 45
  90. C:\spark\python\pyspark\context.pyc in \_init\_(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler\_cls)
  91. 111 try:
  92. 112 self.\_do\_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  93. --\> 113 conf, jsc, profiler\_cls)
  94. 114 except:
  95. 115 # If an error occurs, clean up in order to allow future SparkContext creation:
  96. C:\spark\python\pyspark\context.pyc in \_do\_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, jsc, profiler\_cls)
  97. 168
  98. 169 # Create the Java SparkContext through Py4J
  99. --\> 170 self.\_jsc = jsc or self.\_initialize\_context(self.\_conf.\_jconf)
  100. 171
  101. 172 # Create a single Accumulator in Java that we'll send all our updates through;
  102. C:\spark\python\pyspark\context.pyc in \_initialize\_context(self, jconf)
  103. 222 Initialize SparkContext in function to allow subclass specific initialization
  104. 223 """
  105. --\> 224 return self.\_jvm.JavaSparkContext(jconf)
  106. 225
  107. 226 @classmethod
  108. C:\spark\python\lib\py4j-0.8.2.1-src.zip\py4j\java\_gateway.py in \_call\_(self, \*args)
  109. 699 answer = self.\_gateway\_client.send\_command(command)
  110. 700 return\_value = get\_return\_value(answer, self.\_gateway\_client, None,
  111. --\> 701 self.\_fqn)
  112. 702
  113. 703 for temp\_arg in temp\_args:
  114. C:\spark\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py in get\_return\_value(answer, gateway\_client, target\_id, name)
  115. 298 raise Py4JJavaError(
  116. 299 'An error occurred while calling
  117. {0} {1} {2}
  118. .\n'.
  119. --\> 300 format(target\_id, '.', name), value)
  120. 301 else:
  121. 302 raise Py4JError(
  122. Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
  123. : java.net.URISyntaxException: Illegal character in opaque part at index 2: C:\spark\bin\..\python\lib\pyspark.zip
  124. at java.net.URI$Parser.fail(Unknown Source)
  125. at java.net.URI$Parser.checkChars(Unknown Source)
  126. at java.net.URI$Parser.parse(Unknown Source)
  127. at java.net.URI.\<init\>(Unknown Source)
  128. at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$7.apply(Client.scala:558)
  129. at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$7.apply(Client.scala:557)
  130. at scala.collection.immutable.List.foreach(List.scala:318)
  131. at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:557)
  132. at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:628)
  133. at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
  134. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
  135. at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
  136. at org.apache.spark.SparkContext.\<init\>(SparkContext.scala:523)
  137. at org.apache.spark.api.java.JavaSparkContext.\<init\>(JavaSparkContext.scala:61)
  138. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  139. at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
  140. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
  141. at java.lang.reflect.Constructor.newInstance(Unknown Source)
  142. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
  143. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
  144. at py4j.Gateway.invoke(Gateway.java:214)
  145. at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
  146. at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
  147. at py4j.GatewayConnection.run(GatewayConnection.java:207)
  148. at java.lang.Thread.run(Unknown Source)
  149. In [1]:
  150. Reply
  151. Marcelo Vanzin added a comment - 10 hours ago
  152. Ah, that's similar but not the same bug; it's a different part of the code that only affects pyspark. Could you file a separate bug for that? This is the
  153. {code}
  154. (pySparkArchives ++ pyArchives).foreach { path =\>
  155. val uri = new URI(path)
  156. {code}
  157. ---
  158. * [SPARK-11481](https://issues.apache.org/jira/browse/SPARK-11481) | *Major* | **orderBy with multiple columns in WindowSpec does not work properly**
  159. When using multiple columns in the orderBy of a WindowSpec the order by seems to work only for the first column.
  160. A possible workaround is to sort previosly the DataFrame and then apply the window spec over the sorted DataFrame
  161. e.g.
  162. THIS NOT WORKS:
  163. window\_sum = Window.partitionBy('user\_unique\_id').orderBy('creation\_date', 'mib\_id', 'day').rowsBetween(-sys.maxsize, 0)
  164. df = df.withColumn('user\_version', func.sum(df.group\_counter).over(window\_sum))
  165. THIS WORKS WELL:
  166. df = df.sort('user\_unique\_id', 'creation\_date', 'mib\_id', 'day')
  167. window\_sum = Window.partitionBy('user\_unique\_id').orderBy('creation\_date', 'mib\_id', 'day').rowsBetween(-sys.maxsize, 0)
  168. df = df.withColumn('user\_version', func.sum(df.group\_counter).over(window\_sum))
  169. Also, can anybody confirm that this is a true workaround?