I am trying to extract some text from between
</br></td>, <td class="first">TEXT_1a<br>TEXT_1b
</br></td>, <td class="first">TEXT_2a<br>TEXT_2b
</br></td>, <td class="first">TEXT_3a<br>TEXT_3b
</br></td>, <td class="first">TEXT_4a<br>TEXT_4b
</br></td>, <td class="first">TEXT_5a<br>TEXT_5b
</br></td>, <td class="first">TEXT_6a<br>TEXT_6b
I used BeautifulSoup (BS4) text = first_td.renderContents() trimmed_text = text.strip() print trimmed_text to extract the text. However, I only get the first text after <td tag. Nevertheless I would like to extract all the text in the tags, preferably sorted in columns (array). After I went with BS and it did not work I thought Regex is the way to go. One minor thing, I am an absolute regex amateur...
Any ideas how to get the text out there?
Thank you in advance
Aucun commentaire:
Enregistrer un commentaire