PageRenderTime 42ms CodeModel.GetById 13ms RepoModel.GetById 0ms app.codeStats 0ms

/examples/export/plot_epochs_as_data_frame.py

http://github.com/mne-tools/mne-python
Python | 230 lines | 164 code | 11 blank | 55 comment | 0 complexity | 1ca0139014a87b4ed8ce9c48990786df MD5 | raw file
  1. """
  2. =================================
  3. Export epochs to Pandas DataFrame
  4. =================================
  5. In this example the pandas exporter will be used to produce a DataFrame
  6. object. After exploring some basic features a split-apply-combine
  7. work flow will be conducted to examine the latencies of the response
  8. maxima across epochs and conditions.
  9. Note. Equivalent methods are available for raw and evoked data objects.
  10. Short Pandas Primer
  11. -------------------
  12. Pandas Data Frames
  13. ~~~~~~~~~~~~~~~~~~
  14. A data frame can be thought of as a combination of matrix, list and dict:
  15. It knows about linear algebra and element-wise operations but is size mutable
  16. and allows for labeled access to its data. In addition, the pandas data frame
  17. class provides many useful methods for restructuring, reshaping and visualizing
  18. data. As most methods return data frame instances, operations can be chained
  19. with ease; this allows to write efficient one-liners. Technically a DataFrame
  20. can be seen as a high-level container for numpy arrays and hence switching
  21. back and forth between numpy arrays and DataFrames is very easy.
  22. Taken together, these features qualify data frames for inter operation with
  23. databases and for interactive data exploration / analysis.
  24. Additionally, pandas interfaces with the R statistical computing language that
  25. covers a huge amount of statistical functionality.
  26. Export Options
  27. ~~~~~~~~~~~~~~
  28. The pandas exporter comes with a few options worth being commented.
  29. Pandas DataFrame objects use a so called hierarchical index. This can be
  30. thought of as an array of unique tuples, in our case, representing the higher
  31. dimensional MEG data in a 2D data table. The column names are the channel names
  32. from the epoch object. The channels can be accessed like entries of a
  33. dictionary:
  34. df['MEG 2333']
  35. Epochs and time slices can be accessed with the .ix method:
  36. epochs_df.ix[(1, 2), 'MEG 2333']
  37. However, it is also possible to include this index as regular categorial data
  38. columns which yields a long table format typically used for repeated measure
  39. designs. To take control of this feature, on export, you can specify which
  40. of the three dimensions 'condition', 'epoch' and 'time' is passed to the Pandas
  41. index using the index parameter. Note that this decision is revertible any
  42. time, as demonstrated below.
  43. Similarly, for convenience, it is possible to scale the times, e.g. from
  44. seconds to milliseconds.
  45. Some Instance Methods
  46. ~~~~~~~~~~~~~~~~~~~~~
  47. Most numpy methods and many ufuncs can be found as instance methods, e.g.
  48. mean, median, var, std, mul, , max, argmax etc.
  49. Below an incomplete listing of additional useful data frame instance methods:
  50. apply : apply function to data.
  51. Any kind of custom function can be applied to the data. In combination with
  52. lambda this can be very useful.
  53. describe : quickly generate summary stats
  54. Very useful for exploring data.
  55. groupby : generate subgroups and initialize a 'split-apply-combine' operation.
  56. Creates a group object. Subsequently, methods like apply, agg, or transform
  57. can be used to manipulate the underlying data separately but
  58. simultaneously. Finally, reset_index can be used to combine the results
  59. back into a data frame.
  60. plot : wrapper around plt.plot
  61. However it comes with some special options. For examples see below.
  62. shape : shape attribute
  63. gets the dimensions of the data frame.
  64. values :
  65. return underlying numpy array.
  66. to_records :
  67. export data as numpy record array.
  68. to_dict :
  69. export data as dict of arrays.
  70. Reference
  71. ~~~~~~~~~
  72. More information and additional introductory materials can be found at the
  73. pandas doc sites: http://pandas.pydata.org/pandas-docs/stable/
  74. """
  75. # Author: Denis Engemann <d.engemann@fz-juelich.de>
  76. #
  77. # License: BSD (3-clause)
  78. print __doc__
  79. import mne
  80. import pylab as pl
  81. import numpy as np
  82. from mne.fiff import Raw
  83. from mne.datasets import sample
  84. # turn on interactive mode
  85. pl.ion()
  86. data_path = sample.data_path()
  87. raw_fname = data_path + '/MEG/sample/sample_audvis_filt-0-40_raw.fif'
  88. event_fname = data_path + '/MEG/sample/sample_audvis_filt-0-40_raw-eve.fif'
  89. raw = Raw(raw_fname)
  90. # For simplicity we will only consider the first 10 epochs
  91. events = mne.read_events(event_fname)[:10]
  92. # Add a bad channel
  93. raw.info['bads'] += ['MEG 2443']
  94. picks = mne.fiff.pick_types(raw.info, meg='grad', eeg=False, eog=True,
  95. stim=False, exclude='bads')
  96. tmin, tmax = -0.2, 0.5
  97. baseline = (None, 0)
  98. reject = dict(grad=4000e-13, eog=150e-6)
  99. event_id = dict(auditory_l=1, auditory_r=2, visual_l=3, visual_r=4)
  100. epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True, picks=picks,
  101. baseline=baseline, preload=True, reject=reject)
  102. ###############################################################################
  103. # Export DataFrame
  104. # The following parameters will scale the channels and times plotting
  105. # friendly. The info columns 'epoch' and 'time' will be used as hierarchical
  106. # index whereas the condition is treated as categorial data. Note that
  107. # this is optional. By passing None you could also print out all nesting
  108. # factors in a long table style commonly used for analyzing repeated measure
  109. # designs.
  110. index, scale_time, scalings = ['epoch', 'time'], 1e3, dict(grad=1e13)
  111. df = epochs.as_data_frame(picks=None, scalings=scalings, scale_time=scale_time,
  112. index=index)
  113. # Create MEG channel selector and drop EOG channel.
  114. meg_chs = [c for c in df.columns if 'MEG' in c]
  115. df.pop('EOG 061') # this works just like with a list.
  116. ###############################################################################
  117. # Explore Pandas MultiIndex
  118. # Pandas is using a MultiIndex or hierarchical index to handle higher
  119. # dimensionality while at the same time representing data in a flat 2d manner.
  120. print df.index.names, df.index.levels
  121. # Inspecting the index object unveils that 'epoch', 'time' are used
  122. # for subsetting data. We can take advantage of that by using the
  123. # .ix attribute, where in this case the first position indexes the MultiIndex
  124. # and the second the columns, that is, channels.
  125. # Plot some channels across the first three epochs
  126. xticks, sel = np.arange(3, 600, 120), meg_chs[:15]
  127. df.ix[:3, sel].plot(xticks=xticks)
  128. mne.viz.tight_layout()
  129. # slice the time starting at t0 in epoch 2 and ending 500ms after
  130. # the base line in epoch 3. Note that the second part of the tuple
  131. # represents time in milliseconds from stimulus onset.
  132. df.ix[(1, 0):(3, 500), sel].plot(xticks=xticks)
  133. mne.viz.tight_layout()
  134. # Note: For convenience the index was converted from floating point values
  135. # to integer values. To restore the original values you can e.g. say
  136. # df['times'] = np.tile(epoch.times, len(epochs_times)
  137. # We now reset the index of the DataFrame to expose some Pandas
  138. # pivoting functionality. To simplify the groupby operation we
  139. # we drop the indices to treat epoch and time as categroial factors.
  140. df = df.reset_index()
  141. # The ensuing DataFrame then is split into subsets reflecting a crossing
  142. # between condition and trial number. The idea is that we can broadcast
  143. # operations into each cell simultaneously.
  144. factors = ['condition', 'epoch']
  145. sel = factors + ['MEG 1332', 'MEG 1342']
  146. grouped = df[sel].groupby(factors)
  147. # To make the plot labels more readable let's edit the values of 'condition'.
  148. df.condition = df.condition.apply(lambda name: name + ' ')
  149. # Now we compare the mean of two channels response across conditions.
  150. grouped.mean().plot(kind='bar', stacked=True, title='Mean MEG Response',
  151. color=['steelblue', 'orange'])
  152. mne.viz.tight_layout()
  153. # We can even accomplish more complicated tasks in a few lines calling
  154. # apply method and passing a function. Assume we wanted to know the time
  155. # slice of the maximum response for each condition.
  156. max_latency = grouped[sel[2]].apply(lambda x: df.time[x.argmax()])
  157. print max_latency
  158. # Then make the plot labels more readable let's edit the values of 'condition'.
  159. df.condition = df.condition.apply(lambda name: name + ' ')
  160. pl.figure()
  161. max_latency.plot(kind='barh', title='Latency of Maximum Reponse',
  162. color=['steelblue'])
  163. mne.viz.tight_layout()
  164. # Finally, we will again remove the index to create a proper data table that
  165. # can be used with statistical packages like statsmodels or R.
  166. final_df = max_latency.reset_index()
  167. final_df.rename(columns={0: sel[2]}) # as the index is oblivious of names.
  168. # The index is now written into regular columns so it can be used as factor.
  169. print final_df
  170. # To save as csv file, uncomment the next line.
  171. # final_df.to_csv('my_epochs.csv')
  172. # Note. Data Frames can be easily concatenated, e.g., across subjects.
  173. # E.g. say:
  174. #
  175. # import pandas as pd
  176. # group = pd.concat([df_1, df_2])
  177. # group['subject'] = np.r_[np.ones(len(df_1)), np.ones(len(df_2)) + 1]