regex: Python // Regex // Tags

dimanche 28 juin 2015

Python // Regex // Tags

I am trying to extract some text from between

</br></td>, <td class="first">TEXT_1a<br>TEXT_1b
                                </br></td>, <td class="first">TEXT_2a<br>TEXT_2b
                                </br></td>, <td class="first">TEXT_3a<br>TEXT_3b
                                </br></td>, <td class="first">TEXT_4a<br>TEXT_4b
                                </br></td>, <td class="first">TEXT_5a<br>TEXT_5b
                                </br></td>, <td class="first">TEXT_6a<br>TEXT_6b

I used BeautifulSoup (BS4) text = first_td.renderContents() trimmed_text = text.strip() print trimmed_text to extract the text. However, I only get the first text after <td tag. Nevertheless I would like to extract all the text in the tags, preferably sorted in columns (array). After I went with BS and it did not work I thought Regex is the way to go. One minor thing, I am an absolute regex amateur...

Any ideas how to get the text out there?

Thank you in advance

regex

dimanche 28 juin 2015

Python // Regex // Tags

Aucun commentaire:

Enregistrer un commentaire